The LA Crime Data from 2020 to Present dataset contains 26 variables and over 1 million entries. This data is collected from the police databases of Los Angeles and compiled into a single comma-separated-value (.csv) spreadsheet. The entries themselves are taken from police reports, written down by an officer either on the scene or after the fact. The variables present different pieces of information for each entry, detailing the sequence of events that took place.
This includes:
- Vict Age - Documenting the age of the victim.
- Vict Sex - Documents the sex of the victim. (F - Female, M - Male, X - Unknown)
- Vict Descent - Documents the descent of the victim (A - Other Asian, B - Black, C - Chinese, D - Cambodian, etc.)
- Area Name (Police Station District) - Documents which Police District took up on investigating the crime.
- Crm Cd - Indicated the crime committed.
- CRM CD Desc - Describes the crime code
- Premis Cd - The type of structure, vehicle or location where the crime took place.
- Date Rptd - Day that the crime was reported
- Date OCC - Day that the crime happened
- Dr_NO - Division of Records Number: Official file number made up of a 2 digit year, area ID, and 5 digits
Our question is: How do crime rates affect the population of LA, and are you more likely to be targeted to become a victim of crime?
And our hypothesis is: While this dataset of Los Angeles crime does report a high volume of reported incidents, the raw totals may exaggerate the perceived danger.
For this project, we applied the deconstructive method for analyzing the dataset. This method allowed us to understand what factors were present, and especially some notably absent ones. We consulted the website containing the dataset and handbooks for interpreting the jargon in the entries. Utilizing these, we were able to make sense of the data. Instead of understanding some of the basic variables, we were able to
To understand how this dataset works, we took some smaller samples of the data and put them into graphs, allowing us to compare and contrast the data. Using the entire dataset was not a possibility with the resources we had available; considering the dataset has over one million entries, it would have taken too long. What we discovered is that the missing pieces of data from many of the entries skewed the results of our graphs. For example, when we attempted to find the rates of crime for victim age groups, the group for "unspecified age" had a large majority. It became apparent to us that this data is difficult to use to find trends in crime.
While there are several different variables available for each entry and a large quantity of entries, it is difficult to utilize it to make conclusions about crime rates from this data for several reasons. Many entries in the dataset are missing crucial variables; they are completely exempt from the entry itself. These missing pieces make finding resemblances between incidents much more complicated. The number of variables for each entry appear to be extensive enough to come to conclusions, but we found that this is not the case. The data does not provide enough context to make definite inferences. The circumstances for each incident were so different from each other, that it was challenging to determine if there were strong trends for reported crimes that would show that a particular demographic is at higher risk than others.
Alyssa Dubois, Justin Sivard, Shayley Stirtz, Robin Tabor
This presentation is about the LA Crime Data from 2020 to the present. We ask if this data can be used to draw a proper conclusion about the crime in Los Angeles.