Difference between revisions of "Group02 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 33: Line 33:
 
<br />
 
<br />
 
[[File:Fig 1.png|300px|center]]
 
[[File:Fig 1.png|300px|center]]
 +
<div align="center">Figure 1. Geographical map view of the crime sharing website</div>
 +
<br />
 +
<br />
 +
The same website also features a link to relevant plot summarizing the data shown above; where crimes displayed in the map are aggregated in the forms of a stacked bar chart and a pie chart  (Figure 2). Such plots are typically not ideal for representation of data, as they do not allow for quick and easy deduction of which crime type is the most prevalent during the period, which we have surmised to be the aim of the plots in this section. Also, while the map with the built-in filers is highly useful, it does not make use of the whole dataset for statistical analysis, but focuses only on the frequency of occurrence. As such, through our project, we aim to build in a more holistic view of the crimes occurring in LA through incorporation of more features of each crime (such as victim profile, premise description) that are modelled with select statistical analytical methodologies and visualized with more effective charts.
 +
<br />
 +
<br />
 +
[[File:Fig 2.png|300px|center]]
 +
<div align="center">Figure 2. Charts available on crime-sharing website</div>
 +
<br />
 +
<br />
 +
The same website also features a link to access the plot shown above, which are an aggregate of the crimes displayed in the map being represented in the forms of a stacked bar chart and a pie chart. Such plots are typically not ideal for representation of data, as they do not allow for quick and easy deduction of which crime type is the most prevalent during the period, which we have surmised to be the aim of the plots in this section. Also, while the map with the built-in filers is highly useful, it does not make use of the whole dataset for statistical analysis, but focuses only on the frequency of occurrence. As such, through our project, we aim to build in a more holistic view of the crimes occurring in LA through incorporation of more features of each crime (such as victim profile, premise description) that are modeled with select statistical analytical methodologies and visualized with more effective charts.
 +
<br />
 +
<br />
 +
Nolan III (2004) [1] established the relationship between crime rate and population size based on crime data and population of the state of California. In his research, the author calculated the observed crime rate and the expected crime rate of each jurisdiction in California, weighted by the population within each jurisdiction. The crime rates are expressed as the frequency of crime per 100,000 inhabitants in the population. Meanwhile, there has been extensive research on disease mapping through Empirical Bayes Estimate of relative risk (Clayton & Kaldor, 1987 [2]; Leyland & Davies, 2005[3]). Our research amalgamated these by performing an Empirical Bayes Estimate of posterior relative risk of crime occurrence in each Los Angeles Police Department (LAPD) reporting district by incorporating the population data in each district.
 +
<br />
 +
<br />
  
 
==Dataset and Data Preparation==
 
==Dataset and Data Preparation==

Revision as of 22:36, 3 December 2017

Overview Proposal Poster Application Report


Introduction

Environmental criminology focuses on the relations between crime (including aspects such as victim characteristics and criminality) and spatial and behavioural factors. As crime data becomes increasingly available to the public, geo-spatial and temporal analysis of crime occurrence matures to provide better insights. This increased understanding will potentially contribute to enhanced law enforcement efforts and even urban management.

In our research, we take a step in this direction by examining how geographic and date-time variables interact with other variables to better understand crime occurrences in the city of Los Angeles (LA). Crime data coupled with population by zip code were obtained from the LA city official data repository for analysis and visualization. The research culminates in an interactive application built on R Shiny that allows a casual user to explore, analyse and model data to derive insights. R is used as the tool of choice in creating the web application due to its rich library of packages for statistical analysis and data visualization. With the data visualizations and intuitive user interface in this application, the user can easily filter and transform crime data to derive the insights he or she requires. R’s status as a free software environment for statistical computing and graphics allows for availability for use by many, which would further encourage the spread of such visual analytics initiatives across more fields.

This paper provides information on our analytical development efforts for the application and consists of 8 sections. The introduction is followed by the motivation and objectives of this research. Section 3 provides a review on previous works in the field. Section 4 describes the dataset and its preparation for modelling. Section 5 describes the design framework as well as visualization methodologies whereas section 6 provides insights we have derived in the process of the development of the application. Future works are stated in section 7 and finally, an installation and user guide in section 8.

Motivation and Objectives

Governmental agencies in Singapore such as data.gov and Ministry of Home Affairs provide crime data reports on a bi-annual and annual basis that displays trend for instance by crime type and across year. Even so, these data only provide an overview of crimes and there is no information on crime details (e.g. location, time), victim profiles and possible associations between the different crime variables. Our research aims to incorporate geo-spatial and temporal analytics for better insights on crime occurrence modelled using the rich data of crime occurrences in Los Angeles that may be replicated with increased availability of similar data in Singapore.
This research aims to:
(a) Create a user-friendly and interactive visualization platform for data exploration that supports both macro and micro views that can be potentially used by members of the public and law enforcement agencies alike
(b) Provide statistical analysis on crime occurrences with data on population and location spatial area
(c) Build a predictive model of crime occurrence based on geo-spatial temporal data, crime details and victim profiles

Previous Works

Due to the large set of variables available in the Los Angeles crimes dataset, it was expected that a wide range of analysis and visualisations would be available for it. An example of this can be found at CrimeMapping.com (https://www.crimemapping.com/Share/dd1a50e5fa4d4da4a41c8989c6ee791d), which plots crimes across the city in LA on a map, with options to filter the input dataset by the type of crime, as well as certain location and search radius, or by date of occurrence. The feature plots will then drop pins of type of crimes selected on the map based on the user input criteria (Figure 1).

Fig 1.png
Figure 1. Geographical map view of the crime sharing website



The same website also features a link to relevant plot summarizing the data shown above; where crimes displayed in the map are aggregated in the forms of a stacked bar chart and a pie chart (Figure 2). Such plots are typically not ideal for representation of data, as they do not allow for quick and easy deduction of which crime type is the most prevalent during the period, which we have surmised to be the aim of the plots in this section. Also, while the map with the built-in filers is highly useful, it does not make use of the whole dataset for statistical analysis, but focuses only on the frequency of occurrence. As such, through our project, we aim to build in a more holistic view of the crimes occurring in LA through incorporation of more features of each crime (such as victim profile, premise description) that are modelled with select statistical analytical methodologies and visualized with more effective charts.

Fig 2.png
Figure 2. Charts available on crime-sharing website



The same website also features a link to access the plot shown above, which are an aggregate of the crimes displayed in the map being represented in the forms of a stacked bar chart and a pie chart. Such plots are typically not ideal for representation of data, as they do not allow for quick and easy deduction of which crime type is the most prevalent during the period, which we have surmised to be the aim of the plots in this section. Also, while the map with the built-in filers is highly useful, it does not make use of the whole dataset for statistical analysis, but focuses only on the frequency of occurrence. As such, through our project, we aim to build in a more holistic view of the crimes occurring in LA through incorporation of more features of each crime (such as victim profile, premise description) that are modeled with select statistical analytical methodologies and visualized with more effective charts.

Nolan III (2004) [1] established the relationship between crime rate and population size based on crime data and population of the state of California. In his research, the author calculated the observed crime rate and the expected crime rate of each jurisdiction in California, weighted by the population within each jurisdiction. The crime rates are expressed as the frequency of crime per 100,000 inhabitants in the population. Meanwhile, there has been extensive research on disease mapping through Empirical Bayes Estimate of relative risk (Clayton & Kaldor, 1987 [2]; Leyland & Davies, 2005[3]). Our research amalgamated these by performing an Empirical Bayes Estimate of posterior relative risk of crime occurrence in each Los Angeles Police Department (LAPD) reporting district by incorporating the population data in each district.

Dataset and Data Preparation

Design Framework and Visualisation Methodologies

Insights Derived

Future Works

Installation and User Guide