Group11 Report

From Visual Analytics and Applications
Jump to navigation Jump to search

Abstract

Based on UN’s Survey of Crime Trends published in 2006, England and Wales have one of the highest crime rates among OECD countries. We have developed CrimeModeler, a geospatially modelling tool to investigate the spatial variation of crime across different districts in England and Wales, and the relationship between crime and socio-economic characteristics for each district. As it is common for neighbouring areas to have correlation in their crime rate, we compare the use of geographically weighted regression (GWR) and conventional (or global) multiple regression model to see whether a better result can be obtained from GWR. We will also investigate whether there are certain variables that have an impact on crime rate in one area but not in another. Local governments may use this information to come up with better policies to tackle crime.

Introduction

Based on UN’s Survey of Crime Trends published in 2006, comparisons were made for OECD countries by CIVITAS Institute for the Study of Civil Society. Comparisons are based on six of the most serious crimes: intentional homicide, rape, robbery, assault causing serious bodily harm, burglary and vehicle theft.

Table 1: Ranking of different countries in UK for crime rate based on different crime types (England refers to both England and Wales)

Table 1: Ranking of different countries in UK for crime rate based on different crime types (England refers to both England and Wales)

As shown in Table 1, England and Wales have one of the highest crime rates among OECD countries for rape, robbery, assault and burglary. We therefore aim to investigate the spatial variation of crime across different districts in England and Wales, and the relationship between crime and socio-economic characteristics for each district. Crime mapping and spatial analysis of crime are essential elements of the macro-level research in criminology (Block et al. 1995; LaVigne and Wartell 1998; Messner et al. 1998; Weisburd and McEwen 1998; Harries 1999; Anselin et al. 2000). Two important aspects of spatial crime analysis are: (i) identifying areas of the high crime concentration or “hot spots” (Messner et al. 1998; Ratcliffe and McCullagh 1999; Anselin et al. 2000; Craglia et al. 2000) and (ii) exploring the relationship between the spatial pattern of crime and socioeconomic characteristics (Bowers and Hirschfield 1999; Ceccato et al. 2002).

England has 326 districts and Wales has 22. On average, the population in each district is about 160,000. There are local authorities overseeing each district and these local authorities are responsible for services such as housing, education, waste management and strategic planning. We feel that it is more meaningful to focus the analysis at the district level as local authorities overseeing the district can use the information to do comparison studies and improve their policies.

We will develop CrimeModeler, a geospatially modelling tool to examine the comparison of crime rate across districts for different crime types and identify hot spots where there is relatively higher crime rate. The tool will use geographically weighted regression (GWR) and compare the results against conventional (or global) multiple regression model. Lastly, for each independent variable (socioeconomic characteristic), the tool will allow us to compare its significance level across the districts.

Data

Crime Data

The crime data is obtained from data.police.uk. The raw data is individual crime data with details such as the longitude, latitude, Lower Layer Super Output Area (LSOA) code and name, and crime type. The time frame is from October 2014 to September 2017 (3 years). Crime data for England, Wales and Northern Ireland are available. However, the area code and name for Northern Ireland are not provided so we have excluded Northern Ireland from our analysis.

For purpose of our investigation, we use data for 2016 (1 year) as we would like to calculate the crime rate per year. The LSOA is a very low-level area and there are nearly 35,000 LSOA in England and Wales. As it does not serve much purpose to do the analysis at the LSOA level, we map the LSOA to the district that it is in. Subsequently we do a count of the number of crime (for each crime type) in each district. To obtain the crime rate, we divide the number of crime in the district by the population in the district and multiply by 100,000. The crime rate shown subsequently is therefore per 100,000 persons.

Socioeconomic Data

Socioeconomic data is obtained from the Office for National Statistics. The data is provided at the district level, making it suitable for use in our analysis. We have used data for 2016 to tie in with the crime data. Some of the available data are population by gender and age, internal/international migration inflow and outflow, residential property sales by type of property, school data (number of students, number of primary and secondary schools, number of students eligible for free school meals, number of students by ethnic group, number of students by first language etc.), personal well-being rating (life satisfaction, worthwhile, happiness and anxiety) and household activity (working, mixed or workless).

Table 2: Six kernel functions; wij is the j-th element of the diagonal of the matrix of geographical weights W(ui; vi), and dij is the distance between observations i and j, and b is the bandwidth

Table 2: Six kernel functions; wij is the j-th element of the diagonal of the matrix of geographical weights W(ui; vi), and dij is the distance between observations i and j, and b is the bandwidth

Figure 1: Plot of the six kernel functions, with the bandwidth b = 1000, and where w is the weight, and d is the distance between two observations.

Figure 1: Plot of the six kernel functions, with the bandwidth b = 1000, and where w is the weight, and d is the distance between two observations.

For discontinuous functions, bandwidths can be specified as a fixed distance or as a fixed number of local data (i.e., an adaptive distance). For continuous functions, bandwidths can be specified either as a fixed distance or as a fixed quantity that reflects the local sample size' (i.e., still an `adaptive' distance, but the actual local sample size will be the sample size). A fixed bandwidth is suitable for fairly regular sample configurations while an adaptive bandwidth is suitable for highly irregular sample configurations. Adaptive bandwidths ensure sufficient (and constant) local information for each local calibration of a given GW model (Gollini et al. 2015).

In this analysis, a Gaussian function is selected as the kernel (the weight at the regression point is unity and it decreases with increasing distance from the regression point) and the bandwidth is found optimally using the lowest corrected Akaike Information Criterion (AICc). After running the global regression model and GWR, we will compare the adjusted R squared and AICc to determine whether the GWR model provides a better fit than the global OLS model.

Development of CrimeModeler

Variable Selection with Regularised Generalised Linear Model (Gaussian)

Linear regression is a form of generalised linear model under the constrain of gaussian error distribution. The R package “glmnet” is used to perform the initial linear regression because the package can perform both cross validation and regularisation. The initial global modelling is performed with a 10-fold cross validated lasso penalty which has the effect of variable selection while minimising the mean square error as well as model complexity. If no independent variable is returned after the process, another round of modelling is performed with ridge penalty.

Subsequent user triggered addition or removal of independent variable will perform the multiple linear regression using the built in “lm” R function.

Figure 2: Variable selection process.

Figure 2: Variable selection process.

Geographically Weighted Regression (GWR)

The R package “gwmodel” provides all the necessary functions to perform the GWR modelling as well as to calibrate the bandwidth. In addition, “gwmodel” also provides the ability to generate geographically weighted summary statistics that is localised to each individual region.

Moran Test

Moran test determines the amount of spatial autocorrelation within the data. Although “gwmodel” provides many GWR features, Moran test is not included in it. To perform the Moran Test, the R package “spdep” is used instead.

R Web Services

Plumber is selected as the R web service provider because of its ability to transform R functions into web APIs with only one addition line of decorator code above the function. The direct use of function as a web API is an important factor for consideration because in the design of the application, the intricate control of the information that is transmitted is required for a more responsive and interactive user experience. For example, spatial polygons will only be communicated to the client once via a static GeoJson file. To achieve that, spatial polygon information is deliberately removed from all modelling results before transmitting the results back to the client as a http response.

Client Design

The design of the client application consists of the communication layer, the map layer and the presentation layer.

Figure 3: Client-side application design

Figure 3: Client-side application design

The communication layer provides the web API services that the layers use to communicate with the server. It also downloads the GeoJson file once and shared across all other layers to prevent duplicated file downloads. Lastly, it is also a provider of broadcasting services to command the other layers perform specific tasks when an event occurs.

The map layer is the background of the user interface. It controls the zoom level, the scrolling and tiling of the map. The application uses Google map API to implement this layer.

Sitting on top of all the layers is the presentation layer that provides the user interface controls and draws the choropleth map on top of the map layer. It is the main layer that the user interacts with.

Front End Stack

The front end javascript and css packages used are AngularJs, Google Map API and Bootstrap.

The most important front end javascript used is AngularJs because it provides the interactivity and the control of the entire web application. AngularJs is an open-source front-end web application platform maintained by Google. It provides functionalities such as data binding, shared services, event broadcasting, and directives. When used in conjunction with each other, the platform allows for a modular approach in interface design and independence between each layer.

Spatial Pattern of Crime in England & Wales

To allow a visual examination of the spatial patterns of crime in England and Wales, Fig. 4 presents two choropleth maps showing crime rate across districts in England and Wales using raw crime rate and GW-mean crime rate (shown by quartiles). The former shows the “hot spots” for crime, indicated by districts coloured in dark blue. The latter shows a more prevalent pattern of spatial clustering as districts belonging to the same quartile are grouped together. A bandwidth of 20 is used.

Figure 4: Choropleth maps illustrating the crime rate in England and Wales, by raw crime rate (left) and GW-mean crime rate (right), shown by quartiles
Figure 4: Choropleth maps illustrating the crime rate in England and Wales, by raw crime rate (left) and GW-mean crime rate (right), shown by quartiles
Figure 4: Choropleth maps illustrating the crime rate in England and Wales, by raw crime rate (left) and GW-mean crime rate (right), shown by quartiles

Figure 4: Choropleth maps illustrating the crime rate in England and Wales, by raw crime rate (top) and GW-mean crime rate (bottom), shown by quartiles

Geographically Weighted Regression

We conduct the GWR analysis for “all crimes”, that is we look at the overall crime rate, regardless of the type of crime. The correlogram for our selected independent variables is shown in Table 3. The highest correlation ratio is -0.60 which suggests that multicollinearity is not a major issue based on our selected variables.

Table 3: Correlogram of independent variables

Table 3: Correlogram of independent variables

The fit statistics for both global and GW models are shown in Table 4.

Table 4: Fit statistics from global and GW models

Table 4: Fit statistics from global and GW models

For “all crimes”, the adjusted R squared from the global regression model is 0.508 while the adjusted R squared from the GWR model is 0.663. The corrected Akaike Information Criterion is also lower at 6,544 for the GWR model compared to 6,637 from the global regression model. This shows that the data is not spatially stationary and using GWR will help improve the explanatory power of the model.

The coefficients and p-values of the independent variables from the global regression model is shown in Table 5.

Table 5: Parameter estimates of independent variables from global regression model

GW model gives us localised parameter estimates and regression diagnostics. The localised R squared for each district is shown in Figure 5.

Figure 5: Localised R squared for each district
Figure 5: Localised R squared for each district

Figure 5: Localised R squared for each district

We can see that the R-squared for a cluster of districts in southeast England has high R-squared (highest at 0.93) and a cluster of districts in central England has low R-squared (lowest at 0.44).

As shown in Figure 6, for the independent variable Workless Households Percent, we can see that the variable is not significant at 10% level for many districts in northwest England and some parts of southwest England (shown in dark blue). This implies that reducing the number of workless households in these districts will not reduce the crime rate. For the independent variable Working Households Percent, it is not significant at the 10% level for districts in the entire north England and parts of southwest England.

Figure 6: Significance level of independent variables Workless Households Percent (left) and Working Households Percent (right)
Figure 6: Significance level of independent variables Workless Households Percent (left) and Working Households Percent (right)
Figure 6: Significance level of independent variables Workless Households Percent (left) and Working Households Percent (right)

Figure 6: Significance level of independent variables Workless Households Percent (top) and Working Households Percent (bottom)

Comparing independent variables across different crime types at the district level, we see that for a particular independent variable, it can be significant for one crime type but not significant for another. In Figure 7, we see that the independent variable Male Population Percent is not significant at the 10% level in more parts of England for robbery compared to violence and sexual offences (shown in dark blue).


Figure 7: Significance level of independent variable Male Population Percent for robbery (left) and violence and sexual offences (right)
Figure 7: Significance level of independent variable Male Population Percent for robbery (left) and violence and sexual offences (right)
Figure 7: Significance level of independent variable Male Population Percent for robbery (left) and violence and sexual offences (right)

Figure 7: Significance level of independent variable Male Population Percent for robbery (top) and violence and sexual offences (bottom)

Future Work

A limitation that we encounter is the lack of socioeconomic data for the districts in England and Wales, which result in a fairly low R squared. For the school data from the Office for National Statistics, we face missing values for about half of the districts. The missing values prevent us from using independent variables such as proportion of students eligible for free school meals and proportion of students by ethnic group, which serve as proxies for poverty level and minority population respectively.

Future work could be done in obtaining data for other socioeconomic variables to improve the explanatory power of the model. Independent variables such as poverty, single parent families, minority population, rental homes, residential tenure, education level, household income, unemployment rate and government transfer payments are likely to have an impact on the crime rate.

Conclusion

In spatial analysis, it is common for the dependent variable in one region to be affected not by only its characteristics, but also by characteristics of neighbouring regions. For such cases, the geographically weighted model will give better predictive accuracy compared to the global regression model. This is true for crime rate in England and Wales.

We also note that in England and Wales, an independent variable may be significant in one district but not in another. Policymakers can use this knowledge to come up with targeted policies to tackle crime for each district rather than have a one size fit all policy.

Acknowledgements

The authors would like to thank Prof Kam Tin Seong for his insightful comments and suggestions, which has helped us a lot in our research.

References

Anselin, L., J. Cohen, D.Cook, W. Gorr, and G. Tita. 2000. Spatial analyses of crime Criminal Justice 4: 213-262.

Block, C. R., M. Dabdoub, and S. Fregly. 1995. Crime Analysis Through Computer Mapping. Washington, D.C. Police Executive Research Forum.

Bowers, K. and A. Hirschfield. 1999. Exploring links between crime and disadvantage in North-West England: An analysis using Geographical Information Systems. International Journal of Geographical Information Science 13: 159-184.

Brunsdon, C., A. S. Fotheringham, and M. E. Charlton. 1996. Geographically weighted regression: A method for exploring spatial nonstationarity. Geographical Analysis 28: 281-298.

Ceccato, V., R. Haining, and P. Signoretta. 2002. Exploring Offence Statistics In Stockholm City Using Spatial Analysis Tools. Annals of the Association of American Geographers 92: 29-51.

Craglia, M., R. Haining and P. Wiles. 2000. Comparative Evaluation of Approaches to Crime Pattern Analysis Urban Studies 37( 4): 711–729.

Fotheringham, A. S., M. E. Charlton, and C. Brunsdon. 2002. Geographically Weighted Regression the analysis of spatially varying relationships. London, John Wiley & Sons, Ltd.

Gollini I., B. Lu, M. Charlton, C. Brunsdon, P. Harris. 2015. GWmodel: An R Package for Exploring Spatial Heterogeneity Using Geographically Weighted Models. Journal of Statistical Software 63(17): 1-50

Harries, K. 1999. Mapping Crime: Principle and Practice, Washington DC, Crime Mapping Research Center, National Institute of Justice.

LaVigne, N. and J. Wartell (1998) Crime Mapping Case Studies: Successes in the Field, Police Executive Research Forum, Washington, D.C.

Malczewski, J., A. Poetz, and L. Iannuzzi. 2004. Spatial Analysis of Residential Burglaries in London, Ontario. The Great Lakes Geographer 11(1): 15-27

Messner S. F., L. Anselin, D. F. Hawkins, G. Deane, S. E. Tolnay, and R. D. Baller. 1998. An Atlas of the Spatial Patterning of County-Level Homicide, 1960-1990, The 50th Annual Meeting of the American Society of Criminology, November 11-14, 1998, Washington, D.C.

Ratcliffe, J.H., and M. J. McCullagh. 1999. Hotbeds of crime and the search for spatial accuracy. Journal of Geographic Systems 1: 385-398.

Weisburd D. and T. McEwen. 1998. Crime Mapping and Crime Prevention, Monsey, New York: Criminal Justice Press.