Difference between revisions of "Group 2 Overview"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 124: Line 124:
 
[5]http://www.straitstimes.com/singapore/courts-crime/spike-in-online-scams-but-overall-crime-rate-still-low  
 
[5]http://www.straitstimes.com/singapore/courts-crime/spike-in-online-scams-but-overall-crime-rate-still-low  
 
<br />
 
<br />
[6]http://www.channelnewsasia.com/news/singapore/crime-rate-down-in-2016-but-online-scams-remain-a-concern-7623920  
+
[6 http://www.channelnewsasia.com/news/singapore/crime-rate-down-in-2016-but-online-scams-remain-a-concern-7623920  
 
<br />
 
<br />
[7]https://www.hometeam.sg/article.aspx?news_sid=20160212zTCp2vhJHNa0  
+
[7] https://www.hometeam.sg/article.aspx?news_sid=20160212zTCp2vhJHNa0  
 
<br />
 
<br />
[8]https://ucr.fbi.gov/crime-in-the-u.s/2011/crime-in-the-u.s.-2011/offenses-known-to-law-enforcement/standard-links/region  
+
[8] https://ucr.fbi.gov/crime-in-the-u.s/2011/crime-in-the-u.s.-2011/offenses-known-to-law-enforcement/standard-links/region  
 
<br />
 
<br />
[9]http://www.ncpc.org/resources/enhancement-assets/charts-and-graphs/uscrimestatistics010708.jpg/view  
+
[9] http://www.ncpc.org/resources/enhancement-assets/charts-and-graphs/uscrimestatistics010708.jpg/view  
 
<br />
 
<br />
[10]http://www.huffingtonpost.com/brian-beltz/crime-at-the-top-100-colleges-in-the-us_b_6432864.html  
+
[10] http://www.huffingtonpost.com/brian-beltz/crime-at-the-top-100-colleges-in-the-us_b_6432864.html  
 
<br />
 
<br />
 
[11] http://www.juiceanalytics.com/writing/better-know-visualization-small-multiples/<br />
 
[11] http://www.juiceanalytics.com/writing/better-know-visualization-small-multiples/<br />

Revision as of 22:04, 24 September 2017

MOTIVATION

Crime is prevalent in any society. In Singapore, the Singapore Police Force[1] provides a bi-annual and annual update on the crime statistics whilst the Ministry of Home Affairs[2][3][4] provides statistics on overall crime cases, crime rates, major offences and the victim profiles annually up till year 2015. Based on this data, visualizations are published by media sources such as The Straits Times[5], Channel NewsAsia[6] and Home Team[7], and provide a year-on-year comparison of crime rates by type of crime. In comparison, visualizations on crime statistics of the United States of America fare better through the provision of more information such as year-on-year comparison of crime rate by type and state[8][9][10]. Nonetheless, these crime visualizations only provide an overview of crimes and are not interactive for the user to obtain more details on the crimes such as neighborhoods, victim profiles and time. There is also no predictive visualization tools currently available for everyday use of members of the public and public agencies.

As such, this project serves the dual purpose of (i) improving upon current publicly available visualizations, and (ii) illustrating the benefits of releasing such data, in a bid to act as a driving force for similar action in Singapore; a move which would provide a great step forward for Singapore's Smart Nation Initiative, especially in further enhancing the law enforcement capabilities of our public agencies.

OBJECTIVES AND POTENTIAL BENEFITS TO USERS

Our analysis attempts to fill the gaps highlighted through both an exploratory and predictive model bearing the following objectives:

(1) To provide a visualization platform for user-interactive data exploration of crime statistics The dashboard will incorporate functionalities that allow for the exploratory analysis of the dataset based on variables of time, date (day, week, month or year), type of weapon, type of crime, victim profile and longitudinal location. The visualizations will allow the user to do their own comparison of crime such as across locations and periods of time.

(2) To develop a predictive model for crime patterns visualized in a geographical map for route planning The model will predict the likelihood of a certain crime occurring on a certain day, time, and longitudinal location and affecting a particular victim profile.

Both models may be useful for both the public and law enforcement agencies as follows:

  • Members of the public: The public may use the information to make decisions such as choice of schools and purchase of homes in certain neighborhoods. Based on the predictive model, travelers can also plan their travel routes better to avoid routes that have a higher likelihood of certain crimes occurring at a particular time and affecting some that may match certain victim profiles.
  • Law enforcement agencies: This will facilitate more time- and location-appropriate deployment of patrol officers, in sufficient numbers and with suitable skills, allowing them to better respond to incidents. Usage of such information in planning manpower and patrol routes in particular, not only allows swifter response times, but also may further serve as deterrent to potential crimes.

THE DATASET

The dataset provided by the Los Angeles Police Department (LAPD) comprises 1.59 million crime incidents in Los Angeles, California from year 2010 until 19 September 2017. The dataset is updated on a weekly basis and consists of 26 variables. The exploratory section of the project will involve all records in the dataset. For the purpose of prediction, the dataset will be split into a training and validation portion (incorporating records from year 2010 till mid-September 2016) and a testing portion (non-overlapping records from mid-September 2016 to mid-September 2017).

The 26 variables in the raw dataset have been summarized in the table below:

Variable Name Description
DR Number Division of Records Number: Official file number made up of a 2-digit year, area ID, and 5 digits
Date Reported MM/DD/YYYY
Date Occurred MM/DD/YYYY
Time Occurred In 24-hour military time.
Area ID The LAPD has 21 Community Police Stations referred to as Geographic Areas within the department. These Geographic Areas are sequentially numbered from 1-21.
Area Name The 21 Geographic Areas or Patrol Divisions are also given a name designation that references a landmark or the surrounding community that it is responsible for.
Reporting District A four-digit code that represents a sub-area within a Geographic Area. All crime records reference the "RD" that it occurred in for statistical comparisons.
Crime Code Indicates the crime committed. (Same as Crime Code 1)
Crime Code Description Defines the Crime Code provided.
MO Codes Modus Operandi: Activities associated with the suspect in commission of the crime.
Victim Age Two-character numeric
Victim Sex F - Female M - Male X - Unknown
Victim Descent Descent Code: A - Other Asian; B – Black; C – Chinese; D – Cambodian; F – Filipino; G – Guamanian; H - Hispanic/Latin/Mexican; I - American Indian/Alaskan Native; J – Japanese; K – Korean; L – Laotian; O – Other; P - Pacific Islander; S – Samoan; U – Hawaiian; V – Vietnamese; W – White; X – Unknown; Z - Asian Indian
Premise Code The type of structure, vehicle, or location where the crime took place.
Premise Description Defines the Premise Code provided.
Weapon Used Code The type of weapon used in the crime.
Weapon Description Defines the Weapon Used Code provided.
Status Code Status of the case. (IC is the default)
Status Description Defines the Status Code provided.
Crime Code 1 Indicates the crime committed. Crime Code 1 is the primary and most serious one. Crime Code 2, 3, and 4 have decreasing severity. Lower crime class numbers are more serious.
Crime Code 2 May contain a code for an additional crime, less serious than Crime Code 1.
Crime Code 3 May contain a code for an additional crime, less serious than Crime Code 1.
Crime Code 4 May contain a code for an additional crime, less serious than Crime Code 1.
Address Street address of crime incident rounded to the nearest hundred blocks to maintain anonymity.
Cross Street Cross Street of rounded Address.
Location The location where the crime incident occurred. Actual address is omitted for confidentiality. XY coordinates reflect the nearest 100 blocks.

VISUALISATION DELIVERABLES

The deliverables may be classified into 2 main parts - an exploratory section, and a predictive section.

Exploratory Section

In accordance with Shneiderman’s ‘Overview first, zoom and filter, then details-on-demand’, the exploratory section will be delivered with interactive capabilities, allowing the user to visualize data at various levels and across different dimensions.

Overview based on the 5 exploratory variables

Different angles of visualization will be provided. This could include aggregate plots (bar charts, tree maps, sunburst diagrams) of variables over levels like a fixed window period (e.g. within 12 hours/24 hours/1 week), and/or of a specific crime type, and/or at a specific location/street. Additionally, the variation of the frequencies of various crimes and their relevant details may be visualized over time via the use of time-specific slider bars.

Time-based small multiples of geographical map

The use of the small multiples format for the geographical map will allow the user to compare crime occurrence across different time periods (different months in each year) - a filtering feature will also be built-in to allow for filtering based on the type of crime, victim profile the user is interested in, as well as the type of weapon involved in the crime. This will allow the user to observe any trends in the occurrences of crime for each variable included in the filter, as well as any trends across time (i.e. all the months in one year).[11]

Parallel plots and parallel sets

The use of parallel plots and parallel sets will allow us to easily observe any points of similarities in the criminal occurrences by looking for either points of convergence or similar values for each variable in the parallel plots and parallel sets: these points of similarities might be noted down for consideration during the process of building our predictive model, as these may point to relations in the different variables in our dataset.

Predictive Section

The predictive section involves a prediction of the likelihood of falling prey to/becoming a victim of crime during a planned journey between places in the city at a specific time, using a given route. The results will be visualized through a geographical map for route recommendation. The user can input the latitude and longitude of the start and end location as well as the start time of the journey. Route(s) will be recommended to them based on Google Maps and will indicate the predicted occurrence of certain crimes, victim profile and weapons used along the route(s).

ANALYTICAL AND VISUALISATION PACKAGES

The following is a tentative list of packages* in R that are relevant to the scope of the project:

ggplot2

ggplot2 is a well-established graphical package that provides a more systematic means of plotting graphs via leveraging on the grammar of graphics. Given the extensive involvement of visualization, this will be the central package used across the various parts of the project.

ggmaps

ggmaps, which is a separate function but builds upon the layering structure established in ggplot2, will provide our application the functionality to display the underlying map information, and allow us to access the google maps server. The latter functionality - provided through the route parameter - gives us the ability to obtain traveling routes options for the dashboard user based on the source and destination defined through the user's inputs. This parameter provides traveling modes of driving, cycling, walking or transit and options for more than one route recommendation.

Shiny

The visualizations will be built as a web application using Shiny that will thus allow for user interactivity. The interactive features include sliders, dropdown menus, date range inputs and zoom-ins.

Note: This list will be updated accordingly as the project progresses, depending on the suitability and extent of use of various packages in practical implementation.

References

[1] https://www.police.gov.sg/news-and-publications/statistics
[2] https://data.gov.sg/dataset/victims-of-selected-major-selected-offences
[3] https://data.gov.sg/dataset/islandwide-cases-recorded-for-selected-major-offences
[4] https://data.gov.sg/dataset/overall-crime-cases-crime-rate
[5]http://www.straitstimes.com/singapore/courts-crime/spike-in-online-scams-but-overall-crime-rate-still-low
[6 http://www.channelnewsasia.com/news/singapore/crime-rate-down-in-2016-but-online-scams-remain-a-concern-7623920
[7] https://www.hometeam.sg/article.aspx?news_sid=20160212zTCp2vhJHNa0
[8] https://ucr.fbi.gov/crime-in-the-u.s/2011/crime-in-the-u.s.-2011/offenses-known-to-law-enforcement/standard-links/region
[9] http://www.ncpc.org/resources/enhancement-assets/charts-and-graphs/uscrimestatistics010708.jpg/view
[10] http://www.huffingtonpost.com/brian-beltz/crime-at-the-top-100-colleges-in-the-us_b_6432864.html
[11] http://www.juiceanalytics.com/writing/better-know-visualization-small-multiples/