Loading...

Benchmark - Recommendation for Crime Reduction

Open Posted By: highheaven1 Date: 15/09/2020 High School Coursework Writing

 

Using the data from the Topic 7 assignment, conduct data analysis and create a crime reduction response of 1,250-1,500 words, to reduce crime near the police stations in the Lake Mendota area.

Create your Crime Reduction Response by doing the following:

  1. Formally state the patterns and findings from the crimes you analyzed in Topic 7.
  2. Are there any correlations between the crimes happening near the three police stations? For example, the Time of day, day of the week, types of locations, etc.
  3. Explain any insights on what might be causing these crimes to occur in these areas.
  4. State any recommendations on how theft can be reduced in these geographic areas based on the crime patterns that you uncovered. Explain how the recommendations that you make are informed by the data analysis.
  5. Identify other agencies (human services, etc.) and their role in helping crime reduction.
  6. Suggest ways that the three departments can share resources to reduce theft near their stations.

Be sure to cite three to five relevant scholarly sources in support of your content. Use only sources found at government websites, or those provided in Topic Materials.

Category: Business & Management Subjects: Business Law Deadline: 12 Hours Budget: $150 - $300 Pages: 3-6 Pages (Medium Assignment)

Attachment 1

Computers, Environment and Urban Systems 39 (2013) 93–106

Contents lists available at SciVerse ScienceDi rect

Com puters, Environ ment and Urban System s

journal homepage: www.elsevier .com/locate /compenvurbsys

Understanding the spatial distribution of crime based on its related variables using geospatial discriminative patterns

0198-9715/$ - see front matter � 2013 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.compenvurbsys.2013.01.008

⇑ Corresponding author. E-mail addresses: [email protected] (W. Ding), [email protected] (P. Chen).

Dawei Wang a, Wei Ding a,⇑, Henry Lo a, Melissa Morabito b, Ping Chen e, Josue Salazar c, Tomasz Stepinski d a Department of Computer Science, University of Massachusetts Boston, United States b Department of Criminology and Criminal Justice, University of Massachusetts Lowell, United States c Department of Computer Science, Rice University, United States d Department of Geography, University of Cincinnati, United States e Computer and Mathematical Sciences Department, University of Houston-Downtown, Texas, United States

a r t i c l e i n f o

Article history: Received 23 April 2012 Received in revised form 25 January 2013 Accepted 31 January 2013 Available online 9 April 2013

Keywords: Crime related variable Geospatial Discriminative Pattern Hotspot Optimization Tool Footprint

a b s t r a c t

Crime tends to clust er geographi cally. This has led to the wide usage of hotspot analysis to identify and visualize crime. Accurately identified crime hotspots can greatly benefit the public by creating accurate threat visualizations, more efficiently allocating police resources, and predicting crime. Yet existing map- ping methods usually identify hotspots without considering the underlying correlates of crime. In this study, we introduce a spatial data mining framework to study crime hotspots through their related vari- ables. We use Geospatial Discriminative Patterns (GDPatterns) to capture the significant difference between two classes (hotspots and normal areas) in a geo-spatial dataset. Utilizing GDPatterns, we develop a novel model—Hotspot Optimization Tool (HOT)—to improve the identification of crime hotspots. Finally, based on a similarity measure, we group GDPattern clusters and visualize the distribution and characteristics of crime related variables. We evaluate our approach using a real world dataset collected from a northeast city in the United States.

� 2013 Elsevier Ltd. All rights reserved.

1. Introduction

Crime is understood to be related to the interactio n of victims and offenders, and to the strength of guardianship (Cornish & Clarke, 1986 ). In practice, these concepts can be measure d using a variety of socio-economi c and crime opportunity variables, such as population density, economic investment, and arrest rate.

Geographical studies reveal that crime is often concentr ated in clusters, which in the literature are called hotspots. Hotspot map- ping techniques for crimes draw continuo us attention from researchers and public safety agencies. This is because accurately identified and clearly visualized crime hotspots , and understanding their relation to underlyin g crime related variables, can signifi- cantly benefit crime analysis and police practices by providing a solid basis for threat visualization, police resource allocation, and crime predictio n.

Existing hotspot mapping methods can be essentially divided into three main categories: point mapping, choropleth mapping, and kernel density estimation (KDE) (Eck, Chainey, Cameron, Leit- ner, & Wilson, 2005; Williamson , McGuire, Ross, Mollenkopf , & Goldsmith, 2001; Boba, 2005 ). Usually, these methods aggregat e

the density of a target crime, which results in a net loss of informa- tion (Van Patten, McKeldin -Coner, & Cox, 2009 ). For example, in chorople th mapping, incident-level data is first aggregated into arbitrary administrat ive or political boundary areas. During this step, spatial details within and across the thematic areas are lost. Second, when hotspots are generate d based on aggregat ed data, there is a further decline of precision in the resulting map. Because traditional methods mainly rely on target crime density, particular areas with relatively less crime may be left out of hotspots, even though crime related variables indicate they are under similar risks as those hotspots .

A reasonable way to reduce this accuracy and precision loss in chorople th mapping is to use more related information in the map- ping process. Crime related variables can be aggregated and used along with target crime data in the hotspot identification process. Informati on carried by these variables can provide clues on whether the relatively high crime rate in a certain area happens by chance. Compared to traditional methods, the utilization of re- lated informat ion in hotspot mapping can reduce information loss during analysis.

Addition ally, such an approach can benefit further analysis on the characteristics of crime related variables. Instead of just evaluating crime by itself, recent studies also integrate crime related data into a unified framework that assists the analysis and exploration of crime hotspots (Maciejews ki et al., 2010 ). Using

94 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

related variables in hotspot mapping can additionally benefit such visualization and analyzation processes by providing an intuitive linkage between target crime and its related data.

In this paper, we present a framework that uses spatial data mining concepts to map hotspots and investigate the relationship between socio-eco nomic and criminal variables. Recently , spatial data mining has emerged as an active research area in studies of spatial relationshi ps that try to answer the questions like ‘‘why’’ and ‘‘where’’ (Ester, Kriegel, & Sander, 1997; Mu, Ding, Mor- abito, & Tao, 2011 ). It has been proven to be very powerful in iden- tifying the linkage between target objects and its related factors. The components of our method are shown in Fig. 1. In particular, we:

� Introduce a spatial data mining concept, Geospatial Discrimina - tive Patterns (GDPatterns), to study the relationship between target crime hotspots and their underlying related variables. � Introduce a model, Hotspot Optimization Tool (HOT), to identify

crime hotspots through their related variables.

Fig. 1. The framework of our methods. With the help of GDPatterns, criminal hotspot m are clustered and visualized for domain scientists.

� Use a similarity based method to cluster the crime related vari- ables that contribute to hotspots into groups. � Visualize the locations of those clusters in a rational way to

assist domain scientists in further analysis, using the footprint s of GDPatterns .

Utilizing the proposed framework, a case study is conducte d using a 6-year crime dataset from a city in northeast United States. We compare our mapping tool with a widely used hotspot evaluat- ing technique,the G�i statistics (Getis & Ord, 2010 ), and demon- strate the potential in assisting crime analysis using related variable clusters.

The rest of the paper is organized as follows. Section 2 discusses related work. Section 3 introduces the data represen- tation and formal definition of the research problems . HOT mod el and the implementation of the similarity mea sure are also presented in this section. Section 4 evaluates the proposed framework in a real-world case study. We conclude the paper in Section 5.

aps are generated using HOT. By applying a similarity measure method, GDPatterns

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 95

2. Related work

In this section we briefly present some literatures related to criminology , spatial data mining, and hotspot mapping techniqu es. Additionally , we give a brief introduction to a choropleth mapping application—the Hotspot Analysis (HSA) tool implemented by Esri ArcGIS (ESRI, 2011 ).

Occurrence of crime has been linked to a number of different variables. Classic criminology theories, such as Routine Activities Theory (Cohen & Felson, 1979 ), conclude that three concepts con- tribute to crime: accessible and attractive targets, a pool of moti- vated offenders, and lack of guardianship (Brantingham & Brantingham, 1984; Cornish & Clarke, 1986 ). The concept of ‘‘disor- der’’ (Skogan, 1992 ) explains why adjacent areas of crime hotspots are at higher risk. The probability of arrest or the social penalties for committ ing crime may be lower in crime hotspots than in other neighborho ods, which leads to the ‘‘contagion’’ of criminal activity in crime hotspots (Ludwig, Duncan, & Hirschfield, 2001; Sah, 1991; Sampson, Raudenbush, & Earls, 1997 ). Recent work done by Short, Bertozzi, and Brantingham (2010) also discusses how an area is af- fected by the activity scope of offenders. Criminology theories ex- plain why crime is clustered in particular areas, and why certain victims are selected. They also help in deciding which variables are related to a certain type of crime.

Spa tial dat a min ing (Est er et al., 199 7) is a kno wle dge dis cov ery tech niq ue for ‘‘e xtra ctio n of imp lici t kno wle dge , spa tial relat ion s, or oth er pat tern s not expl ici tly sto red in spat ial dat abas es’’ (Kop ersk i & Han, 199 5). It has been pro ven to be very pow erfu l and efficien t for stu dyi ng com preh ens ive rela tion ship s in lar ge dat abas es (Mil ler & Han, 2009 ; Est er et al., 1997 ; Qia n, He, Chie w, & He, 2012 ). The GDP atte rn is an app lica tion of inte gra ting spat ial ass ocia tion rul es (Agr awal et al., 199 4; Kop ersk i & Han , 199 5) wit h emerg ing pat tern s (Don g & Li, 199 9; Her rera , Car mon a, Gonz ález, & del Jesu s, 201 1; Yu, Ding , Sim ovi ci, & Wu, 201 2). App lica tion s usin g ass oci atio n rul es have been dev elope d to expl ore the spa tial and tem por al rel atio n- ship s among obj ects usin g cens us data (Mal erb a, Espo sito , Lisi , & Ap- pice , 2002 ). In the wor k of Men nis (2006) and Menn is and Liu (2005), asso ciat ion rule mini ng tec hni ques hav e been use d to exp lore the non- lin ear rela tion shi ps amo ng soc ioe conom ic- vege tati on var i- abl es. In the wor k of Lin (1998) the auth ors pre sen t a simil ari ty mea- sure method for summ ari zin g larg e numb er of emerg ing pat tern s. Ding , Step insk i, and Sal aza r (2009) ado pts the rel ativ e risk rati o as the mea sure of pat tern eme rgen ce and use s spati al dat a mini ng tech niq ues in inv est iga ting veget atio n rem ote sen sin g dat aset s. In our wor k GDPatt ern s are use d as a tool to dis cover the sta tica lly sig- nificant dif fere nce betw een targ et crim e hots pot s and nor mal are as spat iall y, wit h res pec t to the und erl ying rel ated var iab les.

The Spatial and Temporal Analysis of Crime (STAC) program (Bates, 1987 ) is one of the earliest and widely used hotspot map- ping applications. Based on point mapping, STAC uses ‘‘standard deviational ellipses’’ to display crime hotspots on a map and does not pre-define any spatial boundaries. But some studies (Eck et al., 2005 ) show that STAC may be misleading because hotspots do not naturally follow the shape of ellipses. Another popular hot- spot representat ion method is choropleth mapping, in which boundary areas (geographic boundaries like census blocks or uni- form grids) are used as the basic mapping elements (Hirschfield, 2001). Unlike point mapping, choropleth mapping uses aggregate data, which removes spatial details within the thematic areas. Also, identified hotspots are restricted to the shape of these areas. The method of Kernel Density Estimation (KDE) (Wand & Jones, 1995) aggregates point data inside a user-spe cified search radius and generate s a continuous surface representi ng the density of points. It overcomes the limitatio n of geometric shapes but still lacks statistical robustness that can be validated in the produced

map. Reviews and comparative studies for the three methods have been done in the works of Chainey, Tompson, and Uhlig (2008), in which authors introduce a ‘‘prediction accuracy index’’ to evaluate the accuracy of the different methods in the content of predicting where crime may occur.

Esri ArcGIS (ESRI, 2011 ) is the most widely used Geographic Informati on System (GIS) and its newest component, ArcMap 10.1, includes a Hotspot Analysis (HSA) toolbox, which implements the G�i statistics (Getis & Ord, 2010 ) and provides users the ability to analyze the hotspots existed in the input spatial dataset (usually a polygon map with interested attributes). In particular, HSA calcu- late the G�i statistics and outputs z-scores and p-values for each spatial area (polygon) that tell the statistically significance of the polygon as a hotspot. To be a statistically significant hotspot, a polygon will have a high value of the target attribute and be sur- rounded by other polygons with high values as well. The local sum of the attribute values for a polygon and its neighbors are compare d proportionally to the sum of attribute values of all poly- gons. When the local sum is very different from the expected local sum (very high z-score), and that difference is too large to be the result of random chance (very small p-value), the polygon is con- sidered as a hotspot.

3. Methodology

The key insight behind our methods is identifyin g hotspots by searching , utilizing, and presenting patterns in geographic space. By preprocessing the crime related data sets into a transacti on- based geospatia l dataset, we develop a model, called HOT, to map crime hotspots through the related variables. Then we introduce a similarity method to summari ze the identified GDPatterns into clusters. Based on these clusters, a relevant report of crime hotspots and related variables is visually presented for domain experts.

3.1. Problem formulation and data representatio n

To discover GDPatterns from a target crime’s related variables, we firstly build a transacti on-based geospatial database, which we refer to as the database or simply D. A widely used method for representing spatial distribution of entities in D is through grid mapping (Harries, 1999; Janeja & Palanisamy, 2012 ). Both target crime and related variables in the original spatial dataset can be plotted onto grid maps with the same dimensions . The cell value in the grid is assigned to be the number of incidents falling into it. An illustrative example of D is shown on the top right of Fig. 1. Addition ally, instead of using the original values directly, the way to fairly represent all the variables in one pattern is to cat- egorize them and change the original values into categories. Stan- dard tools (Nguyen & Nguyen, 1998 ) such as the Jenks Optimization for Natural Breaks Classification (or Nature Breaks; Jenks, 1967 ), a method that is based on natural groupings inherited in data, can be used in the categorization process.

Definition 1 (Database object ). A object in D is a tuple of the form: {x,y,V1,V2, . . . ,Vn,C}, where x,y indicate the object’s spatial coordi- nates, V1,V2, . . . ,Vn are the values of the related variables, and C is the class label of target crime.

Using C, objects in D can be labeled into different classes. For example, we say C is 0 if the area is not a hotspot (or normal area) and 1 if the area is a hotspot. Then the geospatial database can be divided into two parts: Dh (hotspots) if C = 1, or Dn (normal area) if C = 0. Disregarding the location informat ion (x,y) and the class la- bel C, each object in D can be viewed as a transaction of n variable values. For example, in Table 1, T1, T2, T3, and T4 are transacti ons with three variable values.

Table 1 Examples of transactions, patterns and patterns’ supports. In the examples AR, POP and IC stand for arrest rate, population density and average income, respectively. Pattern X3 is not a closed pattern because X1, its immediate superset, has exactly the same support. X1 is a closed frequent pattern if we set the minimum support threshold q = 70%.

Transactions T1: {AR = high,POP = low, IC = low} T2: {AR = high,POP = low, IC = high} T3: {AR = high,POP = low, IC = medium} T4: {AR = medium,POP = low, IC = medium}

Patterns Support

X1: {AR = high,POP = low} supðX1Þ ¼ 34 ¼ 75%ðT1; T2; T3Þ X2: {AR = high, IC = high} supðX2Þ ¼ 14 ¼ 25%ðT2Þ X3: {AR = high} supðX3Þ ¼ 34 ¼ 75%ðT1; T2; T3Þ

96 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

3.2. Geospatial Discriminative Patterns (GDPatterns)

The GDPatterns we are looking for should meet two require- ments: (1) to significantly represent the situation or condition s of related variables in objects in database D; (2) to significantly dis- tinguish hotspots Dh from normal areas Dn. GDPatterns are built upon closed frequent patterns. Here we give a brief introduction of relevant concepts.

Definition 2 (Pattern). Given a set of related variables, a pattern is a set of values for a subset of those related variables.

For example, Table 1 gives an example of a database that has 3 related variables AR, POP, and IC, which can take the values of low, medium, or high. In the examples AR, POP and IC stand for arrest rate, population density and average income, respectively . A com- bination of these variables and values constitutes a pattern; e.g., X1: {AR = high,POP = low}, or X3: {AR = high}.

Definition 3 (Support and support count (Agrawal et al., 1994)). A pattern is said to be supported by a transacti on when it is a sub set of the transaction. The support count of a pattern X is the number of times X appears in a database D.

supportcountDðXÞ ¼ jfT 2 DjX # Tgj ð1Þ

where T represe nts transactions in D. The support of a pattern X is calculated as the support count of X

divided by the total number of transactions in the database D.

supportDðXÞ ¼ supportcountDðXÞ

jDj ð2Þ

For exampl e, in Table 1 pattern X1 = {AR = high,POP = low} is sup- ported by transactions T1,T2 and T3, then the support count of X1 is 3 for the database . Since there are totally 4 transac tions in this database , the support of X1 is 3/4 = 0.75.

Definition 4 (Closed pattern (Pasquier, Bastide, Taouil, & Lakhal, 1999)). A pattern is closed if none of its supersets has exactly the same support.

For example, in Table 1 X1 is a closed pattern and X3 is not, be- cause its immediate superset X1 has exactly the same support.

Note that if we consider only closed frequent patterns, we can deduce the support of non-clos ed frequent patterns from their cor- respondent closed patterns. To see why this is true, note that the supports of patterns exhibit a property called downwa rd closure:

If X � X0; then supportDðXÞP supportDðX 0Þ

Thus, if X is closed, and X0 is not, then supportD(X) = supportD(X0). The benefit of considering only closed patterns is a reduction in

the set of considered patterns without losing informat ion. In Table 1 both X3 and X1 are supported by T1, T2 and T3. In other words, both X3 and X1 carry information about the characteri stics of these transactions . But X1 carries more information ({AR = high, - POP = low}) than X3 ({AR = high}) does and the informat ion carried by X3 ({AR = high}) is fully represented by X1. There is no informa- tion loss if we only consider X1 in further analysis.

Definition 5 (Closed frequent pattern (Pasquier et al., 1999)). A closed pattern whose support is above a user-defined threshold is considered as a closed frequent pattern.

Definition 6 (Growth ratio ). Let set {Dh,Dn} be an exhaustive par- tition of D. The growth ratio d of a pattern X is the ratio of X0s sup- port in one partition Dh to its support in the other partition Dn.

d ¼ supportDh ðXÞ supportDn ðXÞ

ð3Þ

Definition 7 (Geospatia l Discriminative Pattern (GDPattern)). A closed frequent pattern X whose growth ratio exceeds a user- defined threshold is considered a GDPattern.

With a rational growth ratio threshold, the GDPatterns mined from D carries information that is significantly different between a subset and the remainder in D. For example, if the growth ratio is greater than 20, thus a closed frequent pattern will be considered as a GDPattern when the pattern is 20 times more frequent in hot- spots than in normal areas. In other words, this pattern will have a more than 95% (19/20) chance of being found in hotspots. So the locations out of which such a pattern is mined are more than 95% (or ‘‘significantly’’) likely to be a hotspot.

Definition 8 (Footprint). The footprint of a GDPattern X is the objects that support X in database D. It is the set of cells in the grid map whose correspondi ng objects support X.

For example, in Fig. 2 a GDPattern: {Comme rcial Burglary- ‘‘low’’,Street Robbery-‘ ‘Average’’,Motor-Vehicle Larceny-‘‘Aver - age’’} is selected from the case study (Section 4) and the hollow squares with slash lines are footprints of this GDPattern. These areas (the footprint s) have similar characterist ics of the related variables (low in commercial burglary rate and average in street robbery and motor-vehic le larceny rate). The utilizing of footprint provides a way to measure the spatial distribution of the corre- sponding patterns in studied area.

3.3. Hotspot Optimization Tool

GDPatterns are capable of digging out the meaningful informa- tion underlying the spatial distribut ion of target crime hotspots . Utilizing the informative GDPatterns, here we develop a model, Hotspot Optimization Tool (HOT), to emphasize the identification of hotspots by optimizin g user-specified hotspot boundaries. The practicality of HOT is based on two concepts: firstly, a hot- spot can be considered as the source of disorder of its adjacent blocks, which means the adjacent areas have the possibility of being affected by crimes happening in hotspots. Also, from the point of view of spatial correlations (Bailey & Gatrell, 1995 ), adja- cent areas of a hotspot are more likely to fall into the active range of the same criminals. Therefore these cells can be considered as potential hotspots, especially those with a relatively high crime density. Secondly, according to the definition, GDPatterns are much more frequent in hotspots than in normal areas. Normal areas located in the footprint s of GDPatterns are more likely to be hotspots because in these areas the values of related variables are the same.

Fig. 2. A example map of GDPattern footprints. By selecting Residential Burglary (RB) data as the target crime, nine other variables are used as related variables from the experiment dataset and GDPatterns are mined with a growth ratio larger than twenty (d P 20). The hollow squares with slash lines are footprints of one example GDPattern (Commercial Burglary-‘‘low’’, Street Robbery-‘‘Average’’, Motor-Vehicle Larceny-‘‘Average’’) whose grawth ratio is 67.0. The red area are RB hotspots defined by a user-specific threshold. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 97

In summary , by initializing hotspots of a target crime with a user-specified threshold, HOT considers a normal location as a hot- spot if (1) it is adjacent to current hotspots; (2) its crime rate is rel- atively high compared to the user-spe cified hotspot threshold; and (3) it is inside the footprints of GDPatterns mined out of current hotspots. The detailed process of HOT is showed in Algorithm 1.

Algorithm 1. The Hotspot Optimiza tion Tool.

This algorithm takes as input a geospatial dataset D, a hotspot threshold h, a hotspot candidate threshold h0, a support threshold q of closed frequent pattern, a growth ratio threshold d, and re- turns a new set of hotspots Dh, a set of GDPatterns G, and their foot- prints w. It does the following:

� Identify areas with a relatively high crime density (Dh0 , areas with high target crime density that are close to the density in hotspots , line 2). � Mine GDPatterns based on current hotspot boundari es and

draw the footprint s of GDPatterns (lines 6 and 7). � Generate candidate cells (lines 8–12): cells whose correspond-

ing objects belong to Dh0 and adjacent to some cell whose corre- sponding objects belong to Dh. � Test the hypothesis for candidate cells (line 14): a candidate cell

is inside the footprint s of GDPatterns (w). � If the hypothes is is true, the boundaries of the hotspot are mod-

ified by changing the current cell into a hotspot cell (moving its correspondi ng object from Dh0 to Dh) (line 15). � Iterate until all hypothesis tests are false (lines 3 and 19).

When hotspot boundari es are changed, a new set of GDPatterns will be generated based on the modified hotspots , followed by the change of footprints. If in the current loop the set of GDPatterns is the same as the former loop, it means there are no new footprints and there will be no ‘‘true’’ from the hypothesis test (lines 4–10 in Algorithm 1). The HOT will stop and a new optimized hotspot map is generate d.

3.4. Crime related variables demonstration

Hotspots of target crime extracted using GDPatterns carry a wealth of informat ion. But the GDPattern mining process usually results in an explosive number of possible patterns (Han et al.,

98 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

2000). It is desirable to organize these GDPatterns in a meaningful way in order to make the information usable to domain analysts. Here we present a pattern summarization method that can cluster GDPatterns into small groups which have similar structures.

Given two patterns X and Y that are mined out of m variables, the function to calculate similarity between X and Y is

s0ðX;YÞ ¼ Pm

i¼1sðXi;YiÞ m

ð4Þ

where s0(X,Y) is the similarity betwee n pattern X and Y; s(Xi,Yi) is the similarity between the ith variable s of X and Y; m is the number of variable s in each pattern. For example, s(Xi,Yi) = 1 if Xi and Yi are in the same category and 0 if they are not. We calculate the similar- ities between every variable and take the mean of the m similar ities as the overall similarity between the pattern s.

The categories of the crime related variables can be presented using ordinal numbers . For example, the categories of population density can be presented using ordinal numbers: 1 (‘‘low’’), 2 (‘‘medium’’) and 3 (‘‘high’’). The similarity between two ordinal values of the ith variable s(Xi,Yi) can be measured by the ratio be- tween the amount of informat ion needed to state the commonality between Xi and Yi, and the information needed to fully describe both Xi and Yi. In practice when we calculate the similarity be- tween patterns X and Y, the ith variable does not always exist in both patterns (Fig. 3). There are three cases according to the pres- ence of Xi and Yi.

Case 1: Both Xi and Yi are in the pattern:

sðXi;YiÞ ¼ 2� log PðXi _ Z1 _ Z2 � � � _ Zk _ YiÞ

log PðXiÞ þ log PðYiÞ ð5Þ

where P() is the probability calculated using the known distrib ution of the values of ith variable in D and Z1,Z2, . . . ,Zk is the ordinal inter- vals delimited by Xi and Yi. For example, in Fig. 3 the ordina l interv al between the first variable XAand YA is Z1 = 2.

Case 2: Either Xi or Yi is absent (here we use the case that Xi is absent):

sð�;YiÞ ¼ Xn k¼1

PXðZkÞsðZk;YiÞ ð6Þ

where n is the amount of different values that the ith variable has, PX(Zk) is the probability of the ith variable having value Zk in all transactions that support pattern X. PX(Zk) = 0 if Zk does not exist in the footprin t of X at all and

Pn k¼1PXðZkÞ ¼ 1. The similarity is a

weighte d average betwee n Yi and all ordinal values of the ith vari- able presented in the footprin t of pattern X. Example is shown in Fig. 3 case 2.

Case 3: Neither Xi or Yi is present:

sð�;�Þ ¼ Xn l¼1

Xn k¼1

PXðZlÞPY sðZkÞsðZl; ZkÞ ð7Þ

Fig. 3. An illustrative example showing the similarit

In this case the probability of all ordinal values (Z1,Z2, . . . ,Zn) of the ith variable in patterns X and Y are checked and a weighted average pairwise comparisons is calculated (case 3 in Fig. 3).

Using the similarity measureme nts, we can build a N � N dis- tance matrix of GDPatterns using distance ¼ 1similarity, where N is the number of GDPatterns . Standard clustering tools such as Hier- archical Agglomerati ve Clustering (HAC), which treat each GDPat- tern as a singleton cluster at the outset and then successivel y merge (or agglomerate) pairs of clusters according to their distance until all clusters have been merged into a single cluster that con- tains all GDPatterns , can be used to group the closest GDPatterns into clusters.

These clusters serve as compositi ons of crime related variables and carry rich information not only about relationships between variables, but also about their spatial distributions. Locations exhibiting certain socio-economi c and crime-related characteris- tics tShat are significantly related with target crime hotspots can be drawn using the clusters’ footprints. In Section 4 we present a case study to show how these GDPattern clusters can assist do- main experts in criminal studies.

4. Case study

Utilizing the proposed framework, a case study is conducte d with real world data from a northeastern city in the United States. We firstly describe the data preproces sing in Section 4.1. Secondly, with the purpose of comparison study, crime hotspot maps are drawn in Section 4.2 using HOT, HSA, and user-specified thresh- olds, respectively . Kappa Index (Cohen et al., 1960; Rossiter, 2004) and cell statistics are used to compare the results and the pros and cons of HOT are discussed. Finally, we cluster the GDPat- terns using the similarity method (Section 3.4) and discuss the potential s of utilizing GDPattern clusters in demonstrat ing the characteri stics of crime related variables in Section 4.3.

4.1. Data preprocessin g

The data in the case study includes reported crimes and associ- ated variables in a northeastern city in the United States from 2004 to 2009. The size of study area is 130.1 km2 and the approximate population is 600,000. As one of the most frequently reported and resource -demanding crimes in the studied city (according to the city’s police department report), residenti al burglary (RB) is selected as the target crime (Fig. 4). In addition to RB, total of eight social/criminal features (Table 2) are selected in this study as related variables with the help of a domain expert. Among those are:

� Commerc ial burglary (CB), street robbery (SR), and motor vehicle larceny (MV). These indicate the level of activity of related crimes, and also reflect the strength of guardianship in the area.

y measure approach between patterns X and Y.

Table 2 Crime related variables for the case study.

Variables Number of incidents (2005–2009)

Residential burglary (RB) 18,321 Street robbery (SR) 12,020 Commercial burglary (CB) 4438 Motor-vehicle larceny (MV) 29,685 Arrest (AR) 254,309 Foreclosed houses (FC) 11,671 Population (POP) – Number of houses units (HUs) – Distance to colleges (DCs) –

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 99

� Arrests (AR). This helps indicate the size of the pool of offenders. � Foreclosed homes (FC). A vacant house has a higher risk of being

broken into than an inhabited one, and is also a sign of lack of guardianship . � Population (POP) and housing density (HU). A hotspot of RB may

simply be a location of high housing density because such areas have a potential higher RB rate than areas with fewer houses. � Distance to colleges (DC). The studied city is heavily populated

by college students , which makes many properties easy targets for burglars during semester breaks. DC is calculated as the dis- tance to the geographical center of a university or college.

Fig. 4. Residential burglary rates in the studied city. Top is the grid density map of RB. On the bottom it is a graph showing the frequency of cell values.

100 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

The original criminal dataset comes as vector maps (points and polygons). We firstly convert all the variable data into grid maps (Fig. 4). The grid cell size selected is 100 m � 100 m, which results in a number of 12,984 cells in the study area. There are two con- cepts to consider when choosing the cell size. Firstly, the cell size (10,000 m2) is approximat ely half the size of average city block size (19,873 m2) in the studied city. According to domain experts, this will be a good representat ive of reality and helpful in police prac- tice. Secondly , at this size, the number of cells covering the study area is the same order of magnitude as the number of RB incidents (Table 2), which minimizes the loss of spatial information during aggregation . On the other hand, HSA needs to be conducted using polygon maps instead of rasters. The raster of RB is converted into a fishnet map with the same dimensio n as the mask. Each polygon in the fishnet map has an attribute of ‘‘RB Counts’’ indicating the amount of RB incidents in the area. In order to facilitate the discus- sion, we call the polygons in the fishnet map cells as well.

Since the related variables come from very different sources, the range of their values varies. As with most criminal activities, the counts of cells with same values in each grid map follow a power-law distribution (Cook, Ormerod, & Cooper, 2004 ) (Fig. 4). Using Nature Breaks (Jenks, 1967 ), every variable is divided into six categories: 0 – ‘‘empty’’, 1 – ‘‘lowest’’, 2 – ‘‘low’’, 3 – ‘‘average ’’, 4 – ‘‘high’’, and 5 – ‘‘highest’’. Using the Nature Break method the categories’ breaks are identified with best grouping of similar val- ues, and the differences between categories are maximized.

4.2. Hotspot mapping

An initial threshold of RB hotspots is needed to set the initial classes before utilizing HOT. From the study of (Short et al., 2010), a house is at relatively higher risk if a burglary happened nearby within the past 4 months. Therefore if three or more bur- glaries happened in the block in one year, the area is likely to be a burglary hotspot. Because the time span of our data is 6 years, we set an area (cell) to be a hotspot if there are eighteen or more burglary incidents (h P 18). We use the threshold of 9 RB incidents (18 > h0 P 9), to define the ‘‘potential hot’’ areas ðDh0 Þ. The growth ratio for GDPatterns is set at more than twenty (d > 20), which in- sures an at least 95% confidence level (1:20) that these GDPatterns will reveal the difference between hotspots and normal areas. To test the tolerance of HOT, four different support thresholds (q = 0.001,0.005,0.01,0.02) are used in the experiments.

For comparison, hotspot maps generate d by hard thresholds and the HSA method are presented. Three maps using the hard thresholds are generated. Two of them are just using the thresh- olds of h P 18 and h P 9. The third one is generated using an initial threshold of h P 18 and then locating cells with RB rate h P 9 that are also adjacent to the h P 18 cells.

HSA takes the fishnet map (Section 4.1) as input and calculates a G�i (Formula (8)) statistic for each polygon in the map. The G

� i statis-

tic is considered as the z-score of the polygon. Then a p-value, the probability distribution of the z-scores, is calculated for each poly- gon. In summary , a polygon with a high z-score and a p-value less or equal to 0.05 is considered as having a high enough attribute value to be statistically significant, and thus is considered as a hotspot.

X � ¼ Pn

j¼1xj n

S ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPn j¼1x

2 j

n � ðX

� Þ2

s

G�i ¼ Pn

j¼1wi;jxj � X �Pn

j¼1wi;j

S

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi n Pn

j¼1w 2 i;j � Pn

j¼1wi;j

� �2� � n�1

vuut

ð8Þ

where xj is the value of the attribute (amount of incident s) for spa- tial polygon j, wi,j is the spatial weight between polygon i and j (In the case study we use inverse distances as the spatial weights (Deane, Beck, & Tolnay, 1998; Ratcliffe & Taniguchi , 2008; Tita & Greenbau m, 2009 ) and Euclidean Distance as the distance method.), n is the total number of polygon s.

We name the maps generate d using hard thresholds h P 18 and h P 9 HT18 and HT9, respectivel y. The map generate d using h P 18 cells and their adjacent cells with h P 9 is called HT18_9. The HOT produced maps using the support thresholds q = 0.001, q = 0.005, q = 0.01, q = 0.02 are called HOT001, HOT005, HOT01, and HOT02, respectively . The map generate d by HSA is named the HSA map. All these maps are shown in Fig. 5.

The standard Kappa Index k (Formula (9)) (Cohen et al., 1960; Rossiter, 2004 ) is used to compare the difference between hotspot maps (Table 3). The value of k is between �1 and 1, and two maps are considered more similar when the k between them is larger (closer to 1).

k ¼ p0 � pc 1� pc

ð9Þ

where p0 is the proportio n of cells that classified into the same class (agreed) by both maps. pc is the proportio n of units for which the agreem ent is expected by chance.

From Fig. 5 and Table 3 we can tell that even using different support thresholds, the final HOT hotspot maps are very close to each other (the Kappa indices between them are all larger than 0.94). Although different support thresholds will result in different set of closed frequent patterns, by setting a relatively high growth ratio value, only the most significant patterns are selected as GDPatterns that contribute to hotspot mapping.

The HOT maps and the HT18_9 map are similar to each other (average Kappa Index 0.86) because they all contain the h P 18 cells. On the other hand, there are totally 344 (different hotspot cells between HT18 and HT18_9, Table 4) cells that having RB rate h P 9 and adjacent to the h P 18 cells and around 69.4% of them are considered as hotspots by HOT (calculated by dividing the average value of different hotspot cells between the HOT maps and the HT18 map by 344, Table 4). The differenc e between them (HOT maps and the HT18_9 map) can be considered as the infor- mation gained using HOT.

A land cover map of the studied city is drawn (Fig. 6) with the purpose of evaluating the precision of our hotspot maps. In Table 4 we calculated the cell statistics for each map. The percentages of RB hotspot cells that are actually located in residential areas can be seen as the precisions of the maps (Column 3, Table 4).

All the hotspot maps we generated are based on grid choropleth mapping. There is an intrinsic defect when using grid choropleth mapping for hotspot identification. By converting points represent- ing crime incidents into cells with crime counts, spatial details within and across the cells boundaries can be lost. In the case study, this limitation is reflected by the fact that cells in non-resi- dential areas (Fig. 6) are classified as hotspots of residential bur- glary (RB) in all the hotspot maps. For example, after the aggregat ion process a certain cell may contain 20% non-residenti al areas, like roads or parks, and 80% of residential areas. If during the hotspot analysis process the cell is classified as a residential bur- glary (RB) hotspot, then the precision of this hotspot is 80%.

The hotspot maps using the user-specified thresholds (HT18, HT9 and HT18_9) can be considered as benchmarks for the case study. In other words, using the current grid map (cell size 100 m � 100 m), the precision for describing residenti al areas in the studied city is around 85% (percentage of hotspot cells locating in residential area in the hard threshold hotspot maps; Table 4). HSA does not achieve this precision. Because during the hotspot analysis (the G�i statistic calculation) process, all the cells are only

Fig. 5. RB hotspot maps of the studied city. HT18 and HT9 are generated by the thresholds of h P 18 and h P 9, respectively. HSA is the hotspot map generated by the Hotspot Analysis tool in Esri ArcGIS. HOT001, HOT005, HOT01, HOT02 are the HOT generated hotspot maps with the support thresholds equal to 0.001, 0.005, 0.01, and 0.02, respectively. In the map of HT18_9, cells with RB rate h P 18 and cells with RB rate h P 9 that are also adjacent to the h P 18 cells are considered as hotspots.

Table 3 Comparison results of the hotspot maps. The number in front of the brackets is the amount of cells that being classified as hotspots in both maps. The number inside the brackets is the Kappa Index betwe en the two maps.

HT18 HT9 HSA HOT001 HOT005 HOT01 HOT02 HT18_9

HT18 301(1.00) HT9 301(0.38) 1245(1.00) HSA 262(0.39) 668(0.74) 1094(1.0) HOT001 301(0.69) 561(0.61) 456(0.61) 561(1.0) HOT005 301(0.73) 523(0.58) 428(0.58) 509(0.95) 523(1.0) HOT01 301(0.69) 567(0.62) 457(0.61) 546(0.98) 511(0.95) 567(1.0) HOT02 301(0.74) 508(0.57) 416(0.57) 504(0.95) 487(0.96) 507(0.94) 508(1.0) HT18_9 301(0.63) 645(0.67) 523(0.66) 496(0.87) 475(0.85) 501(0.88) 466(0.84) 645(1.0)

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 101

considered as areas with or without RB rates. There is not enough information for HSA to tell if a cell contains 80%, or only 20% resi- dential areas. This results in a further precision lost (82%). The HOT model outperform s HSA under current setting of paramete rs be- cause not only the target crime rate, but also the related variables have been taking into account in HOT. By using the informative GDPatterns, only the areas with similar background (or similar characterist ics of related variables) as the hard threshold hotspots

are considered . The use of GDPatterns ensures that the precision of the HOT hotspot maps (86% in average, Table 4) will consist with the original inputs.

To give an intuitive view of HOT’s performanc e, two of the hot- spot maps, HT18 and HOT001 (Fig. 5) are projected with satellite images of the studied city and a figure of sample site is extracted (Fig. 7). Using an initial threshold (h P 18) the red cells are classi- fied into hotspots and cells in same blocks (in the color of blue)

Table 4 Cell statistic of the hotspot maps. The number in front of the brackets is the amoun t of cells located in the corresponding area. The number inside the brackets shows the percentage.

Total hotspot cells

Cells in residential areas

Cells in non-residential areas

HT18 301 257(85.4%) 44(14.6%) HT9 1245 1056(84.8%) 189(15.2%) HSA 1094 901(82.4%) 192(17.6%) HOT001 561 484(86.3%) 77(13.7%) HOT005 523 451(86.2%) 72(13.8%) HOT01 567 488(86.1%) 79(13.9%) HOT02 508 435(85.6%) 73(14.4%) HT18_9 645 548(85.0%) 97(15.0%)

102 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

have been left out. Understandabl y, houses in the same block are at similar risk of being broken into. Our optimizati on method suc- cessfully captures these cells. Other than a chorople th mapping tool, the HOT performs a dasymetric mapping by modifying the hotspot boundari es rationally. Also, locations covered by natural land, parking lots, roads, and highways are identified and are clas- sified out of hotspots using our method (Fig. 7).

4.3. Demonstratin g crime related variables

One thousand five hundred GDPatterns in the experiment satis- fying a support threshold of 0.001 are selected for further analysis. These GDPatterns (H-GDPatterns) are sorted by growth ratios from

Fig. 6. A land cover map showing the r

high to low. All 1500 patterns have a growth ratio greater than 50 (d > 50). For comparison, a set of GDPatterns (N-GDPatterns) based on normal areas are also mined using HOT. Specifically, we set cells with h P 18 as Dn, cells with 18 > h0 P 9 into Dh0 and other cells into Dh (h < 9). In order to facilitate the compara tive analysis, 1500 top N-GDPatter ns are selected after running HOT. The growth ratios of these N-GDPat terns are all larger than 30 (d > 30).

Using the similarity method discussed in Section 3.4, the dis- tance between each pair of GDPatterns is calculated. We use the cluster heat map tool (Wilkinso n & Friendly, 2009 ) to visualize the clusters in sorted distance matrices (Fig. 8). In sorted distance matrices, the value of aij represents the distance between GDPat- tern i and GDPattern j, where GDPattern j is the ji � jjth closest to GDPattern i by distance. The heat maps use different colors to represent the different values in the sorted distance matrices.

After locating all the clusters, the footprint s of these clusters are drawn (Fig. 9), which demonst rate the spatial distribution of GDPatterns . Moreover, we use pie-chart to explore the structure of GDPatterns in the same clusters (Fig. 10), in which the values of variables are shown using different colors.

A lot of information can be revealed from these figures. For example, when we look at the H-GDPattern clusters in the studied city,

� High residential burglary (RB) rates are associated with high population density only in areas with few foreclosures (FC), commerc ial burglaries (CB), motor-ve hicle larcenies (MV),

esidential areas in the studied city.

Fig. 7. An example of re-projected hotspots with satellite images. The blue cells are hotspots defined using a threshold of h P 18 (HT18 in Fig. 5). Both the blue and red cells belong to the hotspots identified using HOT with a support threshold of 0.001 (HOT001 in Fig. 5). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Fig. 8. Heat maps for distance matrices of GDPatterns. On the left side a heap map based on distance matrix of H-GDPatterns is drawn by using the color ramp from blue to red representing distances between H-GDPatterns from small to great. GDPattern clusters that identified using HAC (Section 3.4) are marked with white frames. On the right is the heat map for the distance matrix of N-GDPatterns with color ramp from black to white representing distances from small to great and GDPattern clusters are marked with blue frames. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 103

street robberies (SR), and very low arrest rates (AR) (Cluster 1). These areas also have high residenti al density (HU) and are close to universities or colleges (DC). Such locations are shown in the footprint map of H-GDPattern Cluster 1 in Fig. 9. � High residential burglary (RB) rates are associate d with very

low foreclosure rate (FC) in most instances (Cluster 1–7). The only locations with many residenti al burglaries (RB) and a moderate number of foreclosu res (FC) are shown in Fig. 9,

H-GDPattern Cluster 8. These areas are usually far from univer- sities or colleges, have average population and house density, and low to moderate arrest (AR), commercial burglary (CB), motor-ve hicle larceny (MV) and street robbery (SR) rates (Clus- ter 8). � Areas with high residential burglary rates and not close to any

colleges or universities (low in DC) can be mainly considered in two categories (Clusters 4 and 7 in Fig. 10). One of them is

Fig. 9. Footprint maps of GDPatterns’ clusters. Areas inside blue circle are where most colleges located in the studied city and the green circle indicate the centre park of the city. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

104 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

Fig. 10. Pie-charts of GDPatterns’ clusters. The values of each related variable are shown in different colors. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106 105

characterized by high residenti al density (HU), as well as low motor-vehic le larcenies (MV) and street robberies (SR) rates (Cluster 4). The other has low residential density and average MV and SR rates (Cluster 4). The locations of the two categories are shown in H-GDPattern Cluster 4 and H-GDPattern Cluster 7 of Fig. 9, respectively.

The information revealed by our approach has been verified by domain scientists . For example:

� Offenders are known to focus on neighborho ods with large pro- portions of college students living in off-campus residences (the blue circles in Fig. 9 show areas where most colleges located), (Fig. 10, H-GDPattern Cluster 1 in which the value of DC is high). � Where college students are less significantly represented,

offenders take a different approach, and the FC rates become a more important indicator of RB offenders (Fig. 10, H-GDPatter n Cluster 8 in which the value of FC is relatively high). This also explains why high RB is associated with low FC in most areas of the city. � The footprint map of N-GDPatter n Cluster 7 (green circle in

Fig. 9) covers mostly non-residenti al areas like parks, because these areas have similar condition s and no RB incidents.

The case study and the comparison experime nts have shown the potential of using crime related variables in hotspot mapping. Our method helps maintain the mapping precision during the hot- spots representat ion process and also provides a comprehensive way for further analysis.

5. Conclusion

In this paper, we present a spatial data mining framework to study the spatial distribution of crimes through their related variables. To the best of our knowledge, it is the first attempt to use related variables in crime hot spot mapping. Spatial data min- ing is often said to ‘‘let the data speak for themselves’’. But the data cannot tell stories unless appropriate questions are formulated and asked, and appropriate methods are needed to solicit the answers from the data. In the framework we address an iterative and induc- tive learning process to study the spatial properties of crime. Experiment results show that our HOT model outperforms HSA in precisely identifying crime hotspots. Additionally , by using a similarity measure method, we demonstrate the characterist ics

of target crime’s related variables using GDPattern clusters and footprint maps, which help explainin g the varying of crime over space and deliver the knowled ge in a quantitat ive, as well as com- prehensive and systematic manner.

Acknowled gement

The work was partially funded by the National Institute of Jus- tice (No. 2009-DE-BX-K21 9).

References

Agrawal, R., Srikant, R., (1994). Fast algorithms for mining association rules. In: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB (Vol. 1215, pp. 487–499).

Bailey, T., & Gatrell, A. (1995). Interactive spatial data analysis . Longman Scientific & Technical Essex.

Bates, S. (1987). Spatial and temporal analysis of crime. Research Bulletin, April. Boba, R. (2005). Crime analysis and crime mapping . Sage Publications, Inc.. Brantingham, J., & Brantingham, L. (1984). Patterns in crime . New York: NCJ. Chainey, S., Tompson, L., & Uhlig, S. (2008). The utility of hotspot mapping for

predicting spatial patterns of crime. Security Journal, 21(1), 4–28. Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and

Psychological Measurement, 20(1), 37–46. Cohen, L., & Felson, M. (1979). Social change and crime rate trends: A routine

activity approach. American Sociological Review , 588–608. Cook, W., Ormerod, P., & Cooper, E. (2004). Scaling behaviour in the number of

criminal acts committed by individuals. Journal of Statistical Mechanics: Theory and Experiment, 2004 , P07003.

Cornish, D., & Clarke, R. (1986). The reasoning criminal: Rational choice perspectives on offending. New York: Springer-Verlag.

Deane, G., Beck, E., & Tolnay, S. (1998). Incorporating space into social histories: How spatial processes operate and how we observe them. International Review of Social History, 43(S6), 57–80.

Ding, W., Stepinski, T., & Salazar, J. (2009). Discovery of geospatial discriminating patterns from remote sensing datasets. In: Proceedings of SIAM international conference on data mining.

Dong, G., & Li, J. (1999). Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the fifth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 43–52). ACM.

Eck, J., Chainey, S., Cameron, J., Leitner, M., & Wilson, R. (2005). Mapping crime: Understanding hot spots . National Institute of Justice.

ESRI (2011). Arcgis desktop: Release 10. Ester, M., Kriegel, H., & Sander, J. (1997). Spatial data mining: A database approach.

In Advances in spatial databases (pp. 47–66). Springer. Getis, A., & Ord, J. (2010). The analysis of spatial association by use of distance

statistics. Perspectives on Spatial Data Analysis , 127–145. Han, J., Pei, J., Mortazavi-Asl, B., Chen, Q., Dayal, U., & Hsu, M. (2000). Freespan:

Frequent pattern-projected sequential pattern mining. In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 355–359). ACM.

Harries, K. (1999). Mapping crime: Principle and practice. US Dept. of Justice, Office of Justice Programs, National Institute of Justice, Crime Mapping Research Center.

106 D. Wang et al. / Computers, Environment and Urban Systems 39 (2013) 93–106

Herrera, F., Carmona, C. J., González, P., & del Jesus, M. J. (2011). An overview on subgroup discovery: Foundations and applications. Knowledge and information systems, 29(3), 495–525.

Hirschfield, A. (2001). Mapping and analysing crime data: Lessons from research and practice. CRC.

Janeja, V. P., & Palanisamy, R. (2012). Multi-domain anomaly detection in spatial datasets. Knowledge and Information Systems , 1–40.

Jenks, G. (1967). The data model concept in statistical mapping. International Yearbook of Cartography, 7, 186–190.

Koperski, K., & Han, J. (1995). Discovery of spatial association rules in geographic information databases. In Advances in spatial databases (pp. 47–66). Springer.

Lin, D. (1998). An information-theoretic definition of similarity. In: Proceedings of the 15th international conference on machine learning, San Francisco (Vol. 1,, pp. 296–304).

Ludwig, J., Duncan, G., & Hirschfield, P. (2001). Urban poverty and juvenile crime: Evidence from a randomized housing-mobility experiment. The Quarterly Journal of Economics, 116 (2), 655–679.

Maciejewski, R., Rudolph, S., Hafen, R., Abusalah, A., Yakout, M., Ouzzani, M., et al. (2010). A visual analytics approach to understanding spatiotemporal hotspots. IEEE Transactions on Visualization and Computer Graphics, 16(2), 205–220.

Malerba, D., Esposito, F., Lisi, F., & Appice, A. (2002). Mining spatial association rules in census data. Research in Official Statistics, 5(1), 19–44.

Mennis, J. (2006). Socioeconomic-vegetation relationships in urban, residential land: The case of denver, colorado. Photogrammetric Engineering and Remote Sensing, 72(8), 933.

Mennis, J., & Liu, J. (2005). Mining association rules in spatio-temporal data: An analysis of urban socioeconomic and land cover change. Transactions in GIS, 9(1), 5–17.

Miller, H., & Han, J. (2009). Geographic data mining and knowledge discovery . CRC. Mu, Y., Ding, W., Morabito, M., & Tao, D. (2011). Empirical discriminative tensor

analysis for crime forecasting. Knowledge Science. Engineering and Management , 293–304.

Nguyen, H., & Nguyen, S. (1998). Discretization methods in data mining. Rough Sets in Knowledge Discovery, 1, 451–482.

Pasquier, N., Bastide, Y., Taouil, R., & Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Database Theory ICDTT, 99, 398–416.

Qian, F., He, Q., Chiew, K., & He, J. (2012). Spatial co-location pattern discovery without thresholds. Knowledge and Information Systems , 1–27.

Ratcliffe, J., & Taniguchi, T. (2008). Is crime higher around drug-gang street corners? Two spatial approaches to the relationship between gang set spaces and local crime levels. Crime Patterns and Analysis, 1(1), 17–39.

Rossiter, D. (2004). Technical note: Statistical methods for accuracy assessment of classified thematic maps. Enschede (NL): International Institute for Geo- information Science & Earth Observation (ITC), 25(92), 107. <http://www.itc.nl/ personal/rossiter/teach/R/R_ac.pdf>.

Sah, R. (1991). Social osmosis and patterns of crime: A dynamic economic analysis. Journal of political Economy, 99(6).

Sampson, R., Raudenbush, S., & Earls, F. (1997). Neighborhoods and violent crime: A multilevel study of collective efficacy. Science, 277 (5328), 918–924.

Short, M., Bertozzi, A., & Brantingham, P. (2010). Nonlinear patterns in urban crime: Hotspots, bifurcations, and suppression. SIAM Journal on Applied Dynamical Systems, 9, 462.

Skogan, W. (1992). Disorder and decline: Crime and the spiral of decay in American neighborhoods. Univ. of California Pr..

Tita, G., & Greenbaum, R. (2009). Crime, neighborhoods, and units of analysis: Putting space in its place. Putting Crime in its Place , 145–170.

Van Patten, I., McKeldin-Coner, J., & Cox, D. (2009). A microspatial analysis of robbery: Prospective hot spotting in a small city. Crime Mapping: A Journal of Research and Practice, 1(1), 7–32.

Wand, M., & Jones, M. (1995). Kernel smoothing (Vol. 60). Chapman & Hall/CRC. Wilkinson, L., & Friendly, M. (2009). The history of the cluster heat map. The

American Statistician, 63(2), 179–184. Williamson, D. McLafferty, S., McGuire, P., Ross, T., Mollenkopf, J., Goldsmith, V.,

et al. (2001). 9 tools in the spatial analysis of crime.Mapping and Analysing Crime Data: Lessons from Research and Practice , 187, CRC.

Yu, K., Ding, W., Simovici, D. A., & Wu, X. (2012). Mining emerging patterns by streaming feature selection. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 60-68). ACM.

  • Understanding the spatial distribution of crime based on its related variables using geospatial discriminative patterns
    • 1 Introduction
    • 2 Related work
    • 3 Methodology
      • 3.1 Problem formulation and data representation
      • 3.2 Geospatial Discriminative Patterns (GDPatterns)
      • 3.3 Hotspot Optimization Tool
      • 3.4 Crime related variables demonstration
    • 4 Case study
      • 4.1 Data preprocessing
      • 4.2 Hotspot mapping
      • 4.3 Demonstrating crime related variables
    • 5 Conclusion
    • Acknowledgement
    • References

Attachment 2

Computers, Environment and Urban Systems 36 (2012) 551–561

Contents lists available at SciVerse ScienceDirect

Computers, Environment and Urban Systems

journal homepage: www.elsevier .com/locate /compenvurbsys

Analysis of crime patterns through the integration of an agent-based model and a population microsimulation

Nick Malleson ⇑, Mark Birkin School of Geography, University of Leeds, Leeds LS2 9JT, United Kingdom

a r t i c l e i n f o a b s t r a c t

Article history: Available online 15 June 2012

Keywords: Agent-based modelling Crime simulation Burglary Microsimulation

0198-9715/$ - see front matter � 2012 Elsevier Ltd. A http://dx.doi.org/10.1016/j.compenvurbsys.2012.04.00

⇑ Corresponding author. Tel.: +44 113 343 6757. E-mail address: [email protected] (N. Mal

1 There is some similarity here to retail models in w of retailer agents has been combined with a more coars activity (Heppenstall et al., 2007).

In recent years, criminologists have become interested in understanding crime variations at progressively finer spatial scales, right down to individual streets or even houses. To model at these fine spatial scales, and to better account for the dynamics of the crime system, agent-based models of crime are emerging. Generally, these have been more successful in representing the behaviour of criminals than their victims. In this paper it is suggested that individual representations of criminal behaviour can be enhanced by combining them with models of the criminal environment which are specified at a similar scale. In the case of burglary this means the identification of individual households as targets. We will show how this can be achieved using the complementary technique of microsimulation. The work is significant because it allows agent-based models of crime to be refined geographically (to allow, for example, individual households with varying wealth or occupancy measures) and leads to the identification of the character- istics of individual victims.

� 2012 Elsevier Ltd. All rights reserved.

1. Introduction

An early contribution of environmental criminology has been to illustrate systematic variations in the profile of criminal activity between different area types, such as persistently lower levels of crime in rural communities than intensely urban neighbourhoods. In recent years, however, criminologists have become interested in understanding variations at progressively finer spatial scales, right down to individual streets or even houses. The analysis of crime at such a fine scale is also supported by recent developments in com- puter simulation such as agent-based modelling. In earlier work, the value in representing criminals and their behaviour as individ- uals has been demonstrated within a richly specified modelling framework. Ultimately such models exploit the fact that aggregate crime patterns are no more or less than the sum of a series of un- ique events, each bringing together a criminal and a victim in space.

To date, agent-based models of crime have been more success- ful in representing the behaviour of criminals than their victims. In a sense these models are hybrids which combine individual criminal actors with a less disaggregate view of the environments in which they operate.1 In this paper it is suggested that individual

ll rights reserved. 3

leson). hich the individual behaviour ely-grained view of customer

representations of criminal behaviour can be enhanced by combining them with models of the criminal environment which are specified at a similar scale. In the case of burglary this means the identification of individual households as targets. We will show how this can be achieved for an agent-based model using the com- plementary technique of microsimulation.

The work is significant for a number of reasons. It allows agent- based models of crime to be refined to allow for the variable attrac- tiveness of specific targets, for example households with high wealth or low occupancy. Second, by identifying individual victims we allow the possibility of including repeat victimisation itself as a major contributor to crime patterns. Studies have shown that re- peat victimisation is the strongest at risk factor for the victims of burglary (Tseloni, 2006). We introduce the possibility of further disaggregation of victim characteristics and behaviours, such as the influence of age, ethnicity or household composition. Finally, we hope that this work will also be of interest to those who are looking to disaggregate models of individual behaviours in other sectors such as retailing, health or education.

In Section 2 of the paper the importance of spatial environ- ments for crime modelling will be discussed. The individual level modelling techniques of agent-based modelling and microsimula- tion are reviewed. The way in which individual-based models have been implemented in the context of crime is described in Section 3, together with a discussion of the means for model validation. A method for integrating individual models of both criminal and vic- tim is presented in Section 4 of the paper, before a discussion of some numerical experiments and results from the new model in

552 N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561

Section 5. The paper concludes with some reflections, conclusions, and suggestions of the most immediate priorities for further work.

2. Background – modelling crime

Crime is inherently a human phenomenon; a single crime event is the result of the motivations and behaviour of the criminal, victim and other people who might be able to influence the event (Cohen and Felson, 1979) as well as their relationships with or attitudes to the surrounding environment (Brantingham and Brantingham, 1993). These complex human factors, coupled with vast environ- mental complexity, make crime very difficult to understand, predict and model. However, crime does not occur at random and a consid- erable body of literature has been developed in order to identify the underlying drivers required to model the ‘crime system’. This sec- tion will discuss the relevant theoretical and practical approaches for understanding and modelling crime. It will be shown how the individual-level modelling techniques employed – agent-based modelling and microsimulation – are ideally suited to capturing the dynamics of the crime system and offer additional insight through their integration.

2.1. Environmental criminology and ‘traditional’ crime modelling

Although the earliest examples of spatial crime analysis date back to the 18th century (e.g. Glyde, 1856), the term ‘‘environmen- tal criminology’’ was not coined until 1971, when Jeffery (1971) called for the development of a new school to focus on the environ- ment in which crime occurs (Andresen, 2009). As research has progressed, environmental criminology studies have focussed on the effects of the environment at progressively smaller scales, to the extent that ‘‘crime at places’’ (Eck and Weisburd, 1995) re- search now generally concentrates on individual streets or houses. This progression in quantitative research has been led by corre- sponding theoretical developments; the major theories associated with environmental criminology (Brantingham and Brantingham, 1981; Cohen and Felson, 1979; Clarke and Cornish, 1985) focus on the spatio-temporal behaviour of individual people and their immediate surrounding environment.

Models of crime have also followed the trend towards higher resolution geographies. The tradition of using spatially aggregated census data is being replaced by work that models at the level of the individual street (Johnson and Bowers, 2009) and house (Tselo- ni, 2006). In terms of methodologies employed, the ‘traditional’ regression approaches (multivariate, poisson, negative binomial and logistic, etc.) are being advanced through the use of techniques that are common in other disciplines – such as a discrete spatial choice model to study target choice (Bernasco, 2004) and multi-le- vel modelling to examine property crimes (Tseloni, 2006).

Regardless of the precise method employed, most crime models are generally linear. The crime system, on the other hand, is a com- plex system; it is made up of numerous interacting elements, exhib- its non-linear behaviour and involves feedback. Although linear models are ‘‘computationally convenient’’, they cannot capture the dynamics of such systems (Eck and Liu, 2008). Furthermore, statistical models aim to reduce the number of explanatory variables which can make it more difficult to account for environ- mental complexity and the human–human or human–environ- ment interactions that drive the system. Similarly, spatial realism is often compromised through the use of simple Euclidean distance measures that do not capture the richness of the physical environ- ment (such as the presences of roads, parks, and rivers). These factors mean that although linear crime models have proven to be an essential tool for crime analysis they are flawed in terms of capturing the underlying dynamics that drive the system.

Individual-level models, on the other hand, focus on manipulating the individual units that drive the system – in this case criminals, victims, managers, households, etc. – and are thus much better sui- ted to modelling the dynamics of complex systems.

2.2. Agent-based modelling (ABM)

In general, the central drawback to statistical crime models is that they are not able to capture the underlying processes that drive the crime system – one that is characterised by the behaviour of individual people who have their own unique psychology and interact in a rich social and physical environment. An alternative approach to modelling these types of systems, as opposed to con- trolling them from the ‘top down’ with an equation, is to simulate the behaviour of the individual actors that drive the system di- rectly. This is the approach taken with ABM. Unlike statistical mod- els, an agent-based model is comprised of individual entities called agents, who are able to behave autonomously. Agents are placed in a virtual environment and they are able to interact with each other and with the environment. A model is executed over a number of iterations and at each iteration the agents have the ability to assess their situation and make a decision about their future actions. Realistic human behaviour can be built into the model through the means that the agents make their decisions – behavioural com- plexity can range from simple rule-based systems, (e.g. Schelling, 1969), through to advanced cognitive frameworks (Schmidt, 2000). Overall, creating models in this manner from the ‘‘bottom- up’’ (Epstein and Axtell, 1996) is a much more natural means of describing complex systems (Bonabeau, 2002).

Although there is great potential to be offered by agent-based models, there are inevitably some drawbacks. The advantage of being able to model human behaviour could itself be construed as a drawback because it is actually extremely difficult to simulate human psychology in a computer model. This encourages models to have minimal behavioural complexity (O’Sullivan and Haklay, 2000) which is not necessarily justified. Also, computation time is often a problem for models: their probabilistic nature means that they must often be run numerous times and the time required to process each individual agent inevitably increases with behav- ioural complexity. Fortunately, there are efforts to make high-per- formance computer hardware more readily available which can alleviate some of the computational problems.

Models must also face the equifinality problem, which is where many models might match a single set of calibration data. There- fore the number of variables that can be used in a model are lim- ited where sufficient data are not available. Because agent-based models often work at the level of the individual person or house- hold, obtaining large amounts of high-quality calibration data can be problematic. Similarly, data is often required in order to characterise the agents that make up the system or to describe as- pects of the social or physical environment. For example, with the burglary model employed by this research (which is discussed in Section 3.2) it is necessary to create a virtual representation of all the potential victims of burglary (i.e. all households in the study area). Although the 2001 UK census provides sufficient socio- demographic information to describe neighbourhoods, a mecha- nism is required to identify or estimate the individual households themselves. Without these micro-level datasets it can be extremely difficult to initialise and to validate agent-based models. Fortu- nately, the technique of microsimulation can be used for this task.

2.3. Microsimulation

Microsimulation is a comparable technique to agent-based modelling because it also represents a population as a set of dis- tinct entities rather than by groups. Typically it is seen as a means

N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561 553

for applying well-defined rules to a wide variety of individual cir- cumstances in order to achieve insights with real predictive or ap- plied value. For example, microsimulation can be used to simulate processes such as birth, death, and migration at the level of the individual household to estimate household-level population change over time (Wu et al., 2008). Although there is no clear dis- tinction between microsimulation and ABM, generally agent-based approaches focus on richer behavioural models and on the interac- tions between individuals and their environment whereas micro- simulation is more suited to situations with clearly defined transition rules (Wu et al., 2010).

As well as running individual-level simulations, microsimula- tion can be used as a means of disaggregating data. For example, microsimulation has been used to disaggregate the British Crime Survey and simulate the effects of various policy decision on the lo- cal populations who they were targeted at (Kongmuang, 2006). A commonly used source for models in the UK, which is unparalleled in its robustness and scope, is the decennial census. Although cen- sus data are released at relatively small geographical areas (an ‘output area’ usually contains only 100 houses) this cannot be used to seed an agent-based model unless it can be further disaggregat- ed to the level of the individual household or person. Therefore, the coupling of an agent-based and microsimulation model offers con- siderable advantages. The following sections will outline the two models used here in more detail before identifying how they have been integrated and the benefits of doing so.

3. The population reconstruction and burglary models

It has been shown that in order to create accurate predictive/ explanatory models of crime there is a substantial benefit to using a fine-scale geography (e.g. the level of the individual house) and to simulating the individual actors that are responsible for generating higher-level crime patterns. Hence the coupling of agent-based and microsimulation models offers modelling advantages on two fronts: using microsimulation it is possible to create high-quality individual-level data to characterise the actors; and with agent- based modelling it is possible to create realistic behavioural rules and an accurate virtual environment in which to simulate their behaviour. This section will introduce the two models employed in this research to explore the characteristics of burglary victims. The Population Reconstruction Model (Birkin et al., 2006) that can be used to disaggregate census data and the BurgdSim model (Malleson et al., 2012) which is an advanced agent-based model of residential burglary.

3.1. The Population Reconstruction Model (PRM)

Although it is robust, comprehensive and accurate, the UK census fails to provide a spatially disaggregate representation of individual people and households. This high-resolution representation of indi- viduals is essential for the modelling of complex systems such as crime. Therefore a microsimulation program called the Population

Table 1 The individual and household attributes that are contained in the synthetic population ou

Attribute Description

House size The number of people who live in the household House type The type of the house building. Can be one of: deta Age The age of the individual in single year groups Gender The gender of the individual Ethnicity The person’s ethnicity. The census groups are aggre Marital Status Whether the person is married or unmarried Employment Status The person’s employment. As with ethnicity, the ce

manual or other. Employment status is also later us

Reconstruction Model (PRM) has been developed to use a combina- tion of census Small Area Statistics (socio-demographic data re- leased at the lowest level of spatial disaggregation) and the Sample of Anonymised Records (a set of anonymous, a-spatial indi- vidual census records) to provide synthetic lists of the entire popu- lation of any city or region in the country.

The PRM (Birkin et al., 2006) uses an iterative reweighting pro- cedure to allocate synthetic households to small areas, using attri- butes ranging from age, marital status, ethnicity and gender to occupation and health, housing tenure and household composition. Each characteristic is weighted to the neighbourhood of any small area which is to be reconstructed. For example, in a multi-cultural area, ethnic minority groups will attract an increased weighting; in areas of social housing then privately owned accommodation will attract a reduced weighting; and so on. A variety of microsimula- tion techniques are applicable to the problem of synthetic recon- struction (Williamson et al., 1998).

Determining whether or not the synthetic population is accu- rate is non-trivial because there are no data that can be used to val- idate it directly (if individual-level data were available in the first place the PRM procedure would be unnecessary). The most com- mon method of assessing validity is to compare the aggregate syn- thetic population to the original Small Area Statistics under the assumption that if the aggregate populations correspond then the synthetic population is a close representation of the real popula- tion. To this end, it has been shown that the PRM outputs have an extremely close match to the small area distributions from which they are derived and hence the PRM is accurate (Harland et al., 2012).

Table 1 illustrates the personal and household attributes that are currently available as output from the PRM as these will be used to characterise the individuals in the agent-based model.

3.2. The BurgdSim model

3.2.1. The burglar agents Environmental criminologists have emphasised the importance

of addressing the intricacies of the physical or social environment (Brantingham and Brantingham, 1993; Eck and Weisburd, 1995) and the effects of individual peoples’ behaviour (Cohen and Felson, 1979; Clarke and Cornish, 1985) in order to construct accurate models of crime. The BurgdSim crime model is an advanced agent-based model of residential burglary that aims to capture these elements. Offenders in the model (virtual burglars) are repre- sented as individual agents, who are able to navigate a realistic ur- ban environment performing normal day-to-day behaviours. In its current form, these behaviours include sleeping, socialising and using substances which, although obviously a vast simplification on real human behaviour, have been identified as being the most important drivers for many burglars (Wright and Decker, 1996; Cromwell et al., 1991; Wiles and Costello, 2000).

To control the agents, the PECS (Physical Conditions, Emotional State, Cognitive Capabilities and Social Status) artificial intelligence

tput by the PRM.

ched, semi-detached, terraced or flats

gated, for simplicity, into the following four categories: white, asian, black, other

nsus employment categories are grouped into one of: managerial, intermediate, ed as a proxy for socioeconomic status

554 N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561

framework (Schmidt, 2000) was used to equip the agents with realistic, dynamic behaviour (Malleson et al., 2010). With PECS, an agent’s behaviour is determined by the strength of different needs (socialising, using substances or sleeping in this case). It is the strongest of these needs that determines their current behav- iour at any point in time. Committing burglary is a response to hav- ing to meet a need that requires money – socialising or committing burglary – because in the model the agents cannot gain wealth through legitimate employment. In this manner it is possible to build up city-wide burglary patterns by simulating the behaviour of the individuals who are ultimately responsible for the individual crimes. It is possible to create heterogeneous agent behaviour by varying key parameters that determine where agents will start looking for burglary targets and which houses they find the most attractive. For example, it would be possible to create a ‘‘profes- sional’’ who was comfortable travelling larger distances than other agents in search of more lucrative targets. Varying these behav- iours will have a substantial impact on the model outcomes, but a full exploration of heterogeneous offender behaviour and its ef- fect on city-wide burglary patterns must be left for future work. For more information about behaviour validation, the interested reader can refer to Malleson et al. (2012).

Each agent is assigned a house as their home and they start the simulation there with low needs (i.e. they are satisfied and are not motivated to perform an action). Over time their needs increase and they become motivated to attempt to satisfy them. The simu- lation is configured so that on a typical day, an agent must sleep for 8 h, socialise for two hours and purchase drugs once. The income gained from a single burglary is set to a constant amount which is sufficient to allow the agent to purchase drugs once and socialise for 2 h (hence, on average, an agent will need to burgle once per day). There is no law enforcement in the model, so agents base their burglary decision purely on their own internal needs and the attributes of the surrounding environment. However, on some days the agent might not find a suitable target which will lead to them become more desperate and commit multiple burglaries on a later day. In this sense the agents are truly autonomous; the amount of time they spend performing different activities depends entirely on their own behaviour, there is no central control. See Malleson et al. (2012) for more details about the burglar agents.

Another important agent characteristic are their cognitive maps. Agents do not have global knowledge of their environment and instead they build up their awareness of houses as they pass them on routine travels. For example, an agent might become aware of a potential burglary target whilst on the way to socialise and later re- turn to burgle the house. This is a powerful component of the model because it brings it much more closely in line with criminology

Fig. 1. An example of the Ordnance Survey MasterMap d

theory (Brantingham and Brantingham, 1981; Clarke and Cornish, 1985) and means that the urban form of an area will have an influ- ence on burglary patterns (houses that are situated in areas that the burglar agents are unlikely to have passed through will have lower risk).

3.2.2. The virtual environment As with the representation of the agents, the virtual environ-

ment in the model has been designed to be as realistic as is neces- sary for a burglary simulation. To this end there are three distinct layers that make up the environment:

� The buildings layer contains physical buildings. These are the houses that the burglar agents can attempt to burgle and have been generated from MasterMap geographic data as depicted in Fig. 1. Each house is a unique object with dif- ferent physical attributes that reflect current theoretical understanding of the crime system, e.g. ease of access to the house, its visibility to neighbours, etc.

� The transport layer is another physical layer and it makes up the transport network for the simulation area. Again using MasterMap data, there are distinct geographical objects to represent roads, rail networks and bus routes. Roads also have attributes that determine whether or not they are car or pedestrian accessible. Realistic routing behaviour is obtained by varying the speed that agents can drive along roads so that agents with cars are encouraged to drive on major roads rather than using minor ones.

� The remaining layer is the community layer which, unlike the buildings and transport layers, is used to account for the effects that other people will have on a potential crime occurrence. For example, high levels of community cohesion have been linked to low levels of violent crime because local people are more likely to intervene to prevent a crime from occurring (Sampson et al., 1997). Similarly, areas with large numbers of residents who are at home during the day can offer informal protection and reduce the burglary risk.

3.2.3. The need for a synthetic population In terms of physical attributes, the model virtual environment is

highly detailed and high resolution; it is able to represent individ- ual roads and houses which modern environmental criminology research suggests are important. However, there are some draw- backs with the community layer that exist due to the absence of household-level population data. Currently, the layer includes (among others) the following two attributes:

ata that are used to create the virtual environment.

N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561 555

� Occupancy. An estimate for whether or not a household is occupied at a particular time based on the employment sta- tuses of the people who live in the area from the UK census. For example, student houses are more likely to be occupied during the day.

� Attractiveness. A measure of the affluence of houses in the area, also based on census data.

Although some factors (such as community cohesion) correspond to communities rather than individuals and should therefore be included at an aggregate level, the occupancy and attractiveness variables will not necessarily be homogeneous across an entire community. It would clearly be preferable to model them at the household level. Furthermore, environmental criminology research has shown that victim behaviour is an important determi- nant of household burglary risk so it is a major drawback that a model with such an accurate representation of the physical envi- ronment must aggregate certain key variables due to a lack of household-level demographic data.

Therefore this research will take advantage of the PRM micro- simulation model to create estimates of occupancy and attractive- ness for every individual household in the simulation area, rather than assuming all houses in an area are identical in this respect. Furthermore, by attaching additional information about the syn- thetic individuals to households (such as the residents’ gender, and ethnicity) the research is able to perform illuminating post- simulation analysis of the victims of crime.

It should be noted that victims are still not represented as agents; each household has heterogeneous levels of vulnerability but at this stage households do not ‘think’ in the way that burglar agents do (they will not react to a burglary). This is an obvious ave- nue for future research as it has been shown to have advantages in other work (Malleson et al., 2010). Nevertheless, the integration of the BurgdSim agent-based model with the PRM microsimulation model offers considerable advantages for the burglary simulation in terms of bringing the existing model in-line with current crim- inological thinking.

3.3. Validating the simulation results

Before validating the simulation results (by comparing model results to known data) it is necessary to verify that the model is logically consistent – a process often termed ‘‘verification’’ (Castle & Crooks, 2006) or ‘‘inner validity’’ (Axelrod, 1997). To verify that the model had been implemented correctly, it was executed in three different types of environment: a ‘null’ environment in which each agent’s journeys took a set amount of time; a ‘grid’ environ- ment in which roads and houses were situated on a regular grid; and finally a realistic ‘GIS’ environment that closely represented the real area under study. By varying the environmental complex- ity in this manner it was possible to ensure that changes in control-

Fig. 2. Buildings as generated from Ordnance Survey MasterMap Top

ling factors had the expected influences on model outcomes in the absence of a complex, confounding geography. For full details and results of verification experiments, see Malleson, Heppenstall, Evans, and See (2010).

After verification, the model can be compared to observed data to determine how closely it reflects known system conditions. As with the validation of microsimulation models, validating agent-based models is a divisive subject as there is no established method that can be used across different research projects. The BurgdSim model was calibrated and validated by comparing the model’s output bur- glaries to known burglary data provided by the police. The model is stochastic so, during the process of calibration, it was run a suffi- cient number of times (usually 50–100) to ensure that the aggregate results were consistent (Malleson, 2010).

The process of comparing the simulated data to the expected data is non-trivial because the two datasets are made up of points in space. Therefore there are a multitude of ways to answer the question ‘‘how similar are these two point patterns’’. A common ap- proach is to spatially aggregate the point data to some administra- tive boundary and then apply traditional goodness-of-fit statistic such as R2 or the Standardised Root Mean Square Error (SRMSE). However, aggregation to administratively-defined areas makes the approach highly susceptible to the modifiable areal unit prob- lem (Openshaw, 1984).

To avoid these drawbacks here, a new method was developed, following Costanza (1989), to assess the difference between two point datasets. The method, which was first published in Malleson (2010), takes advantage of traditional goodness-of-fit statistics, but instead of aggregating to an administrative boundary it places a number of cellular grids of varying resolutions over the study area and counts the number of points in each grid cell. By using various grids the method is able to minimise the effects of the modifiable areal unit problem. Also, it is possible to give local estimates of dif- ference using the relative percentage difference. For two cells, yi and y0i, this is defined as the difference between the proportions that the cells contribute to the total observation count:

100 � yiP y

� � � 100 � y

0 iP y0

� � ð1Þ

The advantage with using this method to calculate cell differ- ence is that it is not influenced by the total number of points in the datasets. The results presented in Section 5 use this method to explore spatial differences.

4. Integrating the models

4.1. Data preparation

The first stage in the process of integration is preparing the data for input into the BurgdSim model. The model represents the envi- ronment with Ordnance Survey MasterMap Topographic Area data

ographic Area data and their associated output area boundaries.

Table 2 Household occupancy behaviour as implemented in the model.

Group Description P (house occupied)

Family The house has young children and someone will be at home during the day to look after them

Higher probability of the house being occupied during the day and in the evenings

Students The household is made up of university students Higher probability during the day but less in the evenings as the students socialise Unemployed No one in the household is employed Higher probability of the house being occupied at all times

556 N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561

which is a vector GIS dataset containing the individual boundaries of buildings. The PRM model, however, uses census data that are published at the output area (OA) geography and hence the syn- thetic population created are spatially referenced to their associ- ated OA, not to an individual house within an OA. Therefore the main challenge in terms of data preparation is disaggregating the synthetic population to the household level. Fortunately, the PRM is able to estimate the type of house that a synthetic household lives in (detached, semi-detached, terraced or flats) and this infor- mation is used to assign synthetic households to buildings. At pres- ent, each household is randomly assigned to a building of the correct type within the target OA. For illustrative purposes, Fig. 2 contrasts the OA and building geographies.

4.2. Adapting the burglary model

The ‘key’ household burglary risk variables that could not be established without the integration of a microsimulation model are a measure of household affluence and the likelihood of the house being occupied. The affluence attribute is estimated from the employment of the head of household which can be one of four types: managerial, intermediate, manual and other (including unem- ployed, retired and students). Household affluence is estimated di- rectly with managerial types being the most affluent and other the least.

In terms of estimating occupancy, to coincide with the BurgdSim model it is necessary to place each household into one of the

Fig. 3. A comparison of original (pre-integration) model errors (left) with the errors pr counts (right). It is clear that although there are some differences between the pre- an original model errors.

groups illustrated in Table 2 (note that if a household does not fall into one of the specified groups then they are assumed to have typ- ical daytime jobs). This can be accomplished by examining the employment type of the head of household as well as the other people who live in the household. For example, if the household contains young children it is assumed to be of the ‘family’ type. Estimating students and unemployed people is less straightfor- ward, as Section 4.3 will discuss. It should be noted that occupancy is a probability rather than a binary value. For example it is more probable that a house containing unemployed synthetic people will be occupied during the day than one where all residents work. When burglar agents make their decision about whether or not to burgle, this probability is considered along with other variables such as the apparent security of the house, the volume of pedes- trian/vehicle traffic on the adjacent road, and the visibility of the house to neighbours.

Although occupancy and attractiveness are the only two house- hold-level variables that the agent-based model requires, the microsimulation also provides a range of person- and household- level factors (these were outlined in Table 1). We will show that although these factors do not influence the outcome of the simula- tion, their post-simulation analysis is illuminating.

4.3. Drawbacks with the integration approach

Sections 3.1 and 3.2 explained that the PRM and BurgdSim models have been thoroughly tested and calibrated and hence will

o- duced by deducting post-integration burglary counts from the original burglary d post-integration models, the differences are very small in comparison with the

1 2 3 4 5 6 7 8 9 10

Entire population Burglary victims

House Size

Number of residents

Pr op

or tio

n 0.

0 0.

1 0.

2 0.

3 0.

4

House Type

House Type

Pr op

or tio

n 0.

00 0.

10 0.

20 0.

30

detached semi-detached terraced flats

Fig. 4. Proportions of household attributes for all houses in the simulation area and the subset of burglary victims.

Table 3 Results of a linear regression of house size against other variables that might increase household burglary risk. Data are for the population of victims, not the population of the simulation area. R2 = 0.3922.

N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561 557

produce minimal error. However, there are clearly a number of ways in which error can arise in the data preparation stages and immediate future work will explore how this process can be im- proved. The first drawback relates to disaggregating the synthetic population and the most obvious means of improving this would be to base the allocation of households to houses on more than simply house type. For example, the affluence or income of the synthetic household could be used to assign richer families to physically larger houses or those that are more expensive (assum- ing house price data are available).

The other drawback comes with estimating the type of the household in terms of occupancy. Although it is relatively simple to estimate families, there are no attributes in the population cur- rently output by the PRM that can be used to determine whether the household is made up predominantly of students or unemployed and no means of estimating part time workers (which is another attribute that the burglar agents can assess). Presently, it is as- sumed that if the head of the household is part of the other employ- ment group then they are unemployed unless their ages are between 18 and 24 in which case they are a student. Therefore an- other obvious means of improving the integration process would be to generate a synthetic population with a richer set of attributes to represent employment type and income.

Although there are drawbacks with the process of integration, these are ameliorated because (as Section 5 will show) the use of individual victim data does not substantially influence the aggre- gate burglary patterns. Therefore they do not impact on the impor- tant insights from the model that can be gained from an assessment of the characteristics of individual burglary victims. The reason for the similarity in the aggregate patterns is largely be- cause each output area is relatively small (the mean square area of OAs within the simulation boundary is 0.02 km2). Therefore it is extremely likely that an offender agent will be aware of a vulner- able target even if it should, in reality, be located in a different building somewhere else in the output area. Hence the benefits of integrating the two models here are in terms of assessing which people have become victims, rather than accurately estimating in which houses the victimised people actually reside.

Estimate Std. error t value Pr (>jtj)

(Intercept) 1.7028 0.0322 52.91 0.0000 Attractiveness �0.7696 0.0175 �44.02 0.0000 Students 1.1052 0.0131 84.57 0.0000 Family 2.0085 0.0216 93.15 0.0000 Unemployed 1.4078 0.0329 42.81 0.0000 Accessibility 0.1177 0.0225 5.23 0.0000 Visibility 0.1089 0.0517 2.10 0.0353 Traffic volume �0.1426 0.0393 �3.62 0.0003

5. Experiments and results

5.1. The modelling scenario

The chosen scenario area is part of the city of Leeds, UK. In par- ticular, an area of approximately 1700 hectares located to the east of the city centre was chosen because it has been identified as the

site of a major urban regeneration project. Therefore the area rep- resents a prime candidate for predictive modelling in order to esti- mate what the effects of the regeneration scheme will be on crime. Prior research has already simulated the effects of the urban regen- eration and was able to show that the BurgdSim model has utility in predicting burglary patterns at the local (household) level (Malle- son, 2010). However, the previous model had no information about the burglary victims because, as Section 3.2 discussed, individual- level data were not available. Hence the following experiments have two major advantages: the model is now able to take individ- ual household occupancy and attractiveness measures into account (previously these measures were homogeneous for all houses in an area) which brings it closer in line with criminology theory; and it is now possible to analyse the victims of burglary to identify which households have the highest simulated burglary risk and why.

5.2. Comparing pre- and post-integration burglary patterns

The first stage in assessing how the integration of a microsimu- lation model has influenced the agent-based model is to explore the change in aggregate burglary patterns. Using the expanding cell algorithm, Fig. 3 plots two error distributions. The first is the original pre-integration model errors which were calculated by comparing the simulated burglary rates to those found in real data. The second shows the difference in burglary patterns between the two (pre- and post-integration) models. Original model errors range from �2.1% to +0.8% per 0.19 km2 cell whereas the difference between the pre- and post-integration models has a considerably smaller range between �0.1% and +0.3%. Therefore, although there are differences in the burglary patterns produced by the two mod- els (as would be expected) these are insubstantial at an aggregate level. There are differences, however, which suggests that agents

1 5 16 22 28 34 40 46 52 58 64 70 76 82 88 94 Age of Burglary Victims

Pr op

or tio

n

0. 00

0 0.

01 0

0. 02

0 0.

03 0

1 5 13 21 27 33 39 45 51 57 63 69 75 81 87 93

Age of All Heads of Household

Pr op

or tio

n

0. 00

0 0.

01 0

0. 02

0 0.

03 0

0 20 40 60 80

-0 .0

10 0.

00 0

0. 01

0 Difference in Ages

Age

D iff

er en

ce in

P ro

po rti

on

Fig. 5. Age differences between the population of victims and all households in the simulation area (heads of households only).

558 N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561

are choosing alternative houses or neighbourhoods, although they are not travelling to entirely different parts of the city.

5.3. Analysis of the synthetic victims

As it has been shown that incorporating individual-level victims has a relatively small affect on overall burglary patterns, a lot of information can be gained by examining the burglary victims in more detail. This was not possible before the integration of a syn- thetic population generated by a microsimulation model. This sec- tion will compare the attributes of the individuals who became victims of burglary in the model to the entire population of syn- thetic individuals. It will begin by examining the properties of the houses themselves before moving on to examine the attributes of the synthetic people.

White Asian Black Other

Ethnicity

Pr op

or tio

n 0.

0 0.

2 0.

4 0.

6 0.

8

Burglary Victims Entire Population

Managerial Intermediate Manual Other

Social Group

Pr op

or tio

n

0. 0

0. 1

0. 2

0. 3

Fig. 6. Individual characteristics of the head of households for the set

5.3.1. Household characteristics Fig. 4 compares the size (number of residents) and type (de-

tached (1), semi-detached (2), terrace (3) or flat (4)) of all house- holds in the simulation area to the subset of those that were victims of burglary (note that houses that were repeatedly victi- mised are represented multiple times). From the figure it becomes apparent that, in terms of house type, victims were chosen uni- formly; there is no house type that has been burgled a substan- tially higher/lower number of times than would be expected from their proportions in the whole population. However, in terms of house size, it appears that single-occupancy houses have a high- er proportion of burglaries than would be expected. This is, in it- self, an interesting finding because burglar agents in the model do not take account of the number of people present in a house when making their burglary decision. It has been shown recently

Victims (M) Pop(M) Victims (F) Pop(F)

Gender

Pr op

or tio

n 0.

0 0.

1 0.

2 0.

3 0.

4 0.

5

Unmarried Married

Marital Status

Pr op

or tio

n

0. 0

0. 1

0. 2

0. 3

0. 4

0. 5

0. 6

of victims compared to all synthetic individuals in the study area.

N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561 559

that households with a single adult and children have higher bur- glary risks (Flatley et al., 2010), but in the synthetic data these types of family unit will be represented as a two or more person household. It is worth noting that the number of people in a house- hold will influence the probability of the house being occupied, but this relationship is not linear – if an unemployed person or student is the sole occupant of a dwelling then their occupancy probability increases compared to that of a typical daytime worker.

Therefore it is likely that there is a different variable that is cor- related to single-occupancy dwellings which causes households to be more vulnerable. Table 3 provides the results of a linear regres- sion model that compares house size to the other household vari- ables that influence burglary risk: attractiveness (a proxy for the social class of the residents), occupancy (whether the house is occu- pied by students, families or unemployed people), accessibility (how easy it is to gain access to the house), visibility (how visible

Fig. 7. Comparing the densities of differen

the house is to the outside) and traffic volume (the estimated vol- ume of traffic on the adjacent road). The model demonstrates that there is no clear relationship between house type and the other household variables that might influence burglary risk.

As there is no clear explanation, in terms of model rules, for the higher proportion of single-occupancy dwellings that have been burgled, it is likely that this finding is a result of the spatial config- uration of the simulated area. A risk that has not been measured thus far relates to how close a house is to a potential burglar agent and whether or not the agent is aware of the house in the first place. It is likely, therefore, that houses with single occupants hap- pen to be in areas that are in the awareness spaces of many offend- ers. An advantage with the use of simulation is that the researcher has full knowledge of the system that they are experimenting with and, therefore, it is hypothetically possible to record how many times a particular building falls within the awareness space of

t household socio-economic statuses.

560 N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561

the population of agents. This avenue of analysis is recommended for future work. In the meantime, the following section will ex- plore some of the attributes of the individual people who have been victimised, rather than the households themselves.

5.3.2. Characteristics of individual people To begin with, Fig. 5 illustrates the distribution in people’s ages.

It appears that there is relatively little difference in the age profile of the subset of burglary victims compared to that of all people in the study area. This is confirmed by the graph of the age difference; although there are differences in the proportions ages they appear to be random (there is no noticeable trend). The simulation area contains a large number of young households and also spikes in some of the more mature age categories so on the whole there is a fairly broad mix of age groups. This might be quite different to the distribution of victims found in areas that are predominantly occupied by students or the more elderly affluent suburbs.

As well as age, synthetic individuals have characteristics to rep- resent their ethnicity, gender, social group and marital status. Fig. 6 compares these attributes for the victims and the entire popula- tion. In terms of ethnicity and gender there is little difference be- tween the distributions of victims and non-victims. However, there are interesting variations in social group and marital status.

In terms of social group, it appears that ‘manual’ workers re- ceive the highest proportion of burglaries – higher than would be expected if victims were distributed uniformly. This is surprising because burglar agents consider ‘managerial’ and ‘intermediate’ households to be more attractive than ‘manual’ ones. Therefore, as with the proportion of single occupancy dwellings (discussed above), it appears that the location of ‘manual’ households adds more to their burglary risk than their low attractiveness takes away. An explanation for this finding, therefore, can be sought by mapping the locations of the different social groups as in Fig. 7. Although the distributions are not completely dissimilar at an aggregate level, there are some areas where the density of ‘manual’ households is distinct from other groups. However, these areas do not clearly correspond to a high or low burglary rate so it is diffi- cult to draw any firm conclusions by observing the spatial distribu- tion in this manner.

The situation with respect to marital status is similar to that of the social groups; it appears unmarried people are targeted more often in the simulation than would be expected. This is interesting because, unlike social group, marital status plays no part in the model rules, it is purely an artefact of where (un) married people live and the types of houses/areas that they live in. It is possible that part of this relationship is related to occupancy – it is more likely that a married couple will be part of a family and, hence, have greater occupancy – but, further spatial analysis measures will need to be taken to explore this more fully.

6. Conclusions

This research has utilised two advanced computational tech- niques – agent-based modelling and microsimulation – in order to make progress towards an integrated micro-level crime simula- tion. At this stage in the research the results are promising. Envi- ronmental criminology and ‘crime at places’ research are highlighting the importance of analysing crime at extremely fine spatial scales (up to individual streets or households) and focussing more heavily on the behaviour of the victims of crime rather than purely the offenders. By integrating an agent-based model, that in- cluded an advanced offender behavioural framework, with detailed information about the potential victims of crime, this research has been able to produce a model that is more closely aligned with modern criminology thinking. In particular, the research was able

to highlight some of the sociodemographic characteristics of the simulated victims of burglary at the household level which, before integration of the two modelling techniques, was not possible at such fine geographical scales.

A priority for ongoing work is to include greater detail in order to enhance the accuracy of the integration process. Another signif- icant opportunity for research in the immediate future is an ex- tended analysis of the results to look for patterns of repeat victimisation and the characteristics of the victims who are being repeatedly victimised. Research suggests that prior victimisation is the strongest determinant of future burglary victimisation – more so than any known social or demographic factors – and hence a comparison of simulated repeat victimisation to known victimi- sation rates will be illuminating. In particular, an analysis of repeat victimisation might shed light on the reasons for some of the phe- nomena that the research has not been able to fully explain, such as the tendency for houses with ‘manual’ occupation to exhibit a greater proportion of burglary even though model rules mean that their risk is reduced. The potential for this type of analysis is an- other advantage of the linking process that was not possible previously.

References

Andresen, M. A. (2009). The place of environmental criminology within criminological thought. In M. A. Andresen, P. J. Brantingham, & J. B. Kinney (Eds.), Classics in environmental criminology (pp. 5–28). Taylor & Francis.

Axelrod, R. (1997). Advancing the art of simulation in the social sciences. In R. Conte, R. Hegselmann, & P. Terna (Eds.), Simulating social phenomena (pp. 21–40). Berlin: Springer-Verlag.

Bernasco, W. (2004). How do residential burglars select target areas?: A new approach to the analysis of criminal location choice. British Journal of Criminology, 45(3), 296–315.

Birkin, M., Turner, A., & Wu, B. (2006). A synthetic demographic model of the UK population: Progress and problems. In 2nd International conference on e-social science. Manchester.

Bonabeau, E. (2002). Agent-based modeling: Methods and techniques for simulating human systems. Proceedings of the National Academy of Sciences, 99003(90), 7280–7287.

Brantingham, P. J., & Brantingham, P. (1981). Notes on the geometry of crime. In Brantingham, P. J., & Brantingham, P., (Eds.), Environmental criminology, (pp. 27– 54). Prospect Heights.

Brantingham, P. L., & Brantingham, P. J. (1993). Nodes, paths and edges: Considerations on the complexity of crime and the physical environment. Journal of Environmental Psychology, 13(1), 3–28.

Castle, C. J. E., & Crooks, A. T. (2006). Principles and concepts of agent-based modelling for developing geospatial simulations. <http://eprints.ucl.ac.uk/archive/ 00003342/01/3342.pdf> (UCL Working Papers Series, Paper 110, Centre For Advanced Spatial Analysis, University College London).

Clarke, R. V., & Cornish, D. B. (1985). Modeling offenders’ decisions: A framework for research and policy. Crime and Justice, 6, 147–185.

Cohen, L., & Felson, M. (1979). Social change and crime rate trends: A routine activity approach. American Sociological Review, 44, 588–608.

Costanza, R. (1989). Model goodness of fit: A multiple resolution procedure. Ecological Modelling, 47, 199–215.

Cromwell, P. F., Olson, J. N., & Avary, D. W. (1991). Breaking and entering: An ethnographic analysis of burglary. Studies in Crime, Law and Justice (Vol. 8). Newbury Park, London: Sage Publications.

Eck, J. E., & Liu, L. (2008). Contrasting simulated and empirical experiments in crime prevention. Journal of Experimental Criminology, 4(3), 195–213.

Eck, J., & Weisburd, D. (1995). Crime places in crime theory. Crime and place (Vol. 4). Newbury Park, London: Sage Publications.

Epstein, J., & Axtell, R. (1996). Growing artificial societies: Social science from the bottom up. Newbury Park, London: Criminal Justice Press.

Flatley, J., Kershaw, C., Smith, K., Chaplin, R., & Moon, D. (2010). Crime in England and Wales 2009/10. Findings from the British crime survey and police recorded crime (3rd ed., ). London: Home Office.

Glyde, J. (1856). Localities of crime in suffolk. Journal of the Statistical Society of London, 19, 102–106.

Harland, K., Heppenstall, A., Smith, D., & Birkin, M. (2012). Creating realistic synthetic populations at varying spatial scales: A comparative critique of microsimulation techniques. Journal of Artificial Societies and Social Simulation, 15(1).

Heppenstall, A. J., Evans, A., & Birkin, M. H. (2007). Genetic algorithm optimisation of an agent-based model for simulating a retail market. Environment and Planning B: Planning and Design, 34, 1051–1070.

Jeffery, C. R. (1971). Crime prevention through environmental design. Sage Publications.

N. Malleson, M. Birkin / Computers, Environment and Urban Systems 36 (2012) 551–561 561

Johnson, S. D., & Bowers, K. J. (2009). Permeability and burglary risk: Are Cul-de- Sacs Safer? Journal of Quantitative Criminology, 26(1), 89–111.

Kongmuang, C. (2006). Modelling crime: A spatial microsimulation approach. PhD thesis, School of Geography, University of Leeds, LS2 9JT, UK.

Malleson, N. (2010). Agent-based modelling of burglary. PhD thesis, School of Geography, University of Leeds, UK.

Malleson, N., Heppenstall, A., Evans, A., See, L. (2010). Evaluating an agent-based model of burglary. Working paper 10/1, School of Geography, University of Leeds, UK. <http://www.geog.leeds.ac.uk/fileadmin/downloads/school/ research/wpapers/10_1.pdf>, January 2010.

Malleson, N., Heppenstall, A., & See, L. (2010). Crime reduction through simulation: An agent-based model of burglary. Computers, Environment and Urban Systems, 34(3), 236–250.

Malleson, N., See, L., Evans, A., & Heppenstall, A. (2012). Implementing comprehensive offender behaviour in a realistic agent-based model of burglary. Simulation: Transactions of the Society for Modeling and Simulation International, 88(1), 50–71.

Openshaw, S. (1984). The modifiable areal unit problem. Concepts and techniques in modern geography. Norwich: Geo Books.

O’Sullivan, D., & Haklay, M. (2000). Agent-based models and individualism: is the world agent-based? Environment and Planning A, 32(8), 1409–1425.

Sampson, R. J., Raudenbush, S. W., & Earls, F. (1997). Neighborhoods and violent crime: A multilevel study of collective efficacy. Science, 277, 918–924.

Schelling, T. (1969). Models of segregation. The American Economic Review, 59(2), 488–493.

Schmidt, B. (2000). The modelling of human behaviour. Ghent, Belgium: Society for Computer Simulation International.

Tseloni, A. (2006). Multilevel modelling of the number of property crimes: Household and area effects. Journal of the Royal Statistical Society A, 169, 205–233.

Wiles, P., & Costello, A. (2000). The ‘road to nowhere’: The evidence for travelling criminals. Home Office Research Study 207. London: Home Office.

Williamson, P., Birkin, M., & Rees, P. H. (1998). The estimation of population microdata by using data from small area statistics and samples of anonymised records. Environment and Planning A, 30, 785–816.

Wright, R., & Decker, S. (1996). Burglars on the job: Streetlife and residential break-ins. Boston: Northeastern University Press.

Wu, B., Birkin, M., & Rees, P. (2008). A spatial microsimulation model with student agents. Computers, Environment and Urban Systems, 32(6), 440–453.

Wu, B. M., Birkin, M. H., & Rees, P. H. (2010). A dynamic MSM with agent elements for spatial demographic forecasting. Social Science Computer Review, 29(1), 145–160.

  • Analysis of crime patterns through the integration of an agent-based model and a population microsimulation
    • 1 Introduction
    • 2 Background – modelling crime
      • 2.1 Environmental criminology and ‘traditional’ crime modelling
      • 2.2 Agent-based modelling (ABM)
      • 2.3 Microsimulation
    • 3 The population reconstruction and burglary models
      • 3.1 The Population Reconstruction Model (PRM)
      • 3.2 The BurgdSim model
        • 3.2.1 The burglar agents
        • 3.2.2 The virtual environment
        • 3.2.3 The need for a synthetic population
      • 3.3 Validating the simulation results
    • 4 Integrating the models
      • 4.1 Data preparation
      • 4.2 Adapting the burglary model
      • 4.3 Drawbacks with the integration approach
    • 5 Experiments and results
      • 5.1 The modelling scenario
      • 5.2 Comparing pre- and post-integration burglary patterns
      • 5.3 Analysis of the synthetic victims
        • 5.3.1 Household characteristics
        • 5.3.2 Characteristics of individual people
    • 6 Conclusions
    • References

Attachment 3

1

Criminal Geographical Profiling: Using FCA for Visualization and Analysis of Crime Data

Quist-Aphetsi Kester, MIEEE

Lecturer, Faculty of Informatics Ghana Technology University College Accra, Ghana Email: [email protected] / [email protected]

Abstract—fighting criminal activities in our modern societies required the engagement of intelligent information systems that can analyze crime data geographically and enable new concepts to be deduced from it. These information systems should be able to create visualization of such data as well as have the capability of giving new incite of information, if data is updated whilst maintaining the previously predicted patterns. This paper proposed the use of Formal Concept Analysis, or Galois Lattices, a data analysis technique grounded on Lattice Theory and Propositional Calculus, for the visualization and analysis of crime data. This method considered the set of common and distinct attributes of crimes in such a way that categorization are done based on related crime types, geographical locations and the persons involved. Keywords: Criminal Geographical Profiling, FCA, Visualization, Analysis, Galois Lattices

I. INTRODUCTION

To be able to fight crime effectively or understand criminal activities in the future, our information systems need to have the capability of analyzing and extracting significant knowledge from data based on predefined rules with supervised or unsupervised learning techniques. Hence there is a need for us to use effective computational and mathematical models in the domain of data mining and machine learning in building our artificial intelligence systems. These information systems should have the capability of using economic data, geographical data, demographic data, social networking data etc in analyzing and predicting behavior of modern society especially in the domain of peace and security which is one of the crucial platforms for effective development. Modern society has been challenged with rise in criminal activities and these crime rates vary enormously from one country to another and from one region to another [1].With the documentation of criminal activities and the use of computerized systems to track crimes, computer data analysts have started helping the law enforcement officers etc to understand crime patterns [2]. These systems should be capable of gathering and interpreting intelligence so as to help control of criminal activities as well as influence effective decision making as in figure 1 below [3]. Since criminal activities have become very complex, its monitory with intelligent systems has become necessary by using with geographical components.

Figure 1: Pattern analysis theory The most efficient and effective way of fighting crime today cannot be resourceful without geographical profiling. Criminal activities have become very complex in such a way that rapid monitory can only be achieved by using intelligent systems with geographical components.” Geographic profiling is a mathematical technique to derive information about a serial crime spree given the locations and times of Previous crimes in a given crime series [4]. Geographic profilers have access to a collection of strategies for predicting various attributes of crime such as a serial offender’s home location, possible groups, relationship between events and crimes etc. These strategies range in complexity, some involve more calculations to implement than others and the assumption often made is that more complex strategies will outperform simpler strategies [5]. Over the years there have been developments and approaches in the analysis of crime data such as the introduction of a graph based dataset representation that allows one to mine a set of datasets for correlation [6], data mining techniques using clustering algorithm to help detect the crimes patterns and speed up the process of solving crime [7], procedure for detecting changes over time in the spatial pattern of point

Criminal environment

Decision Maker

Intelligence /Analysis

Interpret Impact

Influence

2

events, combining the nearest neighbor statistic and cumulative sum methods [8] etc. Crime activities are geospatial phenomena and as such are geospatially, thematically and temporally correlated and. discovering these correlations allows a deeper insight into the complex nature of criminal behavior. This paper used Formal concept analysis, or Galois Lattices, a data analysis technique grounded on Lattice theory and propositional Calculus to discover the patterns and concepts within criminal data. This method considered the set of common and distinct attributes of crime data. The organization of this paper is as follows, Section II proposes how the FCA was used to classify and analyze crime data. Section III provided application and results: provides the use of FCA crime analysis. The last section of this paper, section IV, Concludes the paper.

II. METHODOLOGY

In this paper, Formal Concept Analysis, or Galois Lattices, a data analysis technique grounded on Lattice Theory and Propositional Calculus, was used for the visualization and analysis of crime data. This method considered the set of common and distinct attributes of crimes in such a way that categorization are done based on related crime types, geographical locations and the persons involved. Formal concept analysis (FCA) is a method of data analysis with growing popularity across various domains. FCA analyzes data and describes relationship between a particular set of objects and a particular set of attributes. Such data commonly appear in many areas of human activities. FCA produces two kinds of output from the input data. The first is a concept lattice. A concept lattice is a collection of formal concepts in the data which are hierarchically ordered by a subconcept- super concept relation [9][10]. In FCA, a formal context consists of a set of objects, G, a set of attributes, M, and a relation between G and M, I ⊆ G × M. A formal concept is a pair (A,B) where A ⊆ G and B ⊆ M. Every object in A has every attribute in B. For every object in G that is not in A, there is an attribute in B that that object does not have. For every attribute in M that is not in B there is an object in A that does not have that attribute. A is called the extent of the concept and B is called the intent of the concept. If g ∈ A and m ∈ B then (g,m) ∈ I ,or gIm. A formal context is a tripel (G,M,I), where G is a set of objects, M is a set of attributes and I is a relation between G and M. (g,m) ∈ I is read as “object g has attribute m”. For A ⊆ G, we define A´:= {m ∈ M | ⩝g ∈ A:(g,m) ∈ I }. For B ⊆ M, we define dually B´:= {g ∈ G | ⩝m ∈ B:(g,m) ∈I }. For A, A1, A2 ⊆ G holds: A1 ⊆ A2 ⇒ A`2 ⊆A`1

A 1 ⊆A`` A`= A``` For B, B1, B2 ⊆ M holds: B1 ⊆ B2 ⇒ B‘2 ⊆ B‘1 B ⊆ B`` B`= B``` A formal concept is a pair (A, B) where A is a set of objects (the extent of the concept), B is a set of attributes (the intent of the concept), A`= B and B`= A. The concept lattice of a formal context (G, M, I) is the set of all formal concepts of (G, M, I), together with the partial order (A1, B1) ≤ (A2, B2): ⟺ A1 ⊆ A2 (⟺ B1 ⊇ B2). The concept lattice is denoted by �(G,M,I) . Theorem: The concept lattice is a lattice, i.e. for two concepts (A1, B1) and (A2, B2), there is always

• greatest common subconcept: (A1⋂A2, (B1⋃ B2) ´´) • and a least common superconcept: ((A1 ⋃ A2) ´´,

B1⋂B2)

More general, it is even a complete lattice, i.e. the greatest common subconcept and the least common superconcept exist for all (finite and infinite) sets of concepts. Corollary: The set of all concept intents of a formal context is a closure system. The corresponding closure operator is h(X):= X``. An implication X→Y holds in a context, if every object having all attributes in X also has all attributes in Y. Def.: Let X ⊆M. The attributes in X are independent, if there are no trivial dependencies between them.

III. THE RESULTS AND ANALYSIS

Considering table 1 below, we have the rows to consist of persons and the columns which is the attributes, entails the age, sex, crime type committed by this persons, and the geographical location of the crime. Table 2 consists of the geographical locations and the economic factors, which includes income index, Index of education, and population index, existing at these locations. Figure two below is the geographical maps of the locations. From Table1, Let a =ages <18, b=ages <40 and cages>40 For the sex, let m= male and f=female Let c1=drugs, c2=rape, c3=burglary, c4=robbery Let the geographical locations be denoted by g1, g2….g5. From Table 2, Let the geographical locations be denoted by g1, g2….g5. For the index of income, let a be index =< 0.25, b be index = < 0.5, c be index =< 7.5, and d be index =< 1 For the index of education, let e be index =< 0.2, f be index = < 0.4, g be index =< 0.6, h be index =< 0.8, and i be index =< 1. Let that of population be {j, k, l, m, n}={0.2,0.4,0.6,0.8,1}.

3

g 1 g 2

g 3

g 4

g 5

TABLE 1: PERSONS X CRIME DATA WITH GEO. LOCATION

TABLE 2: GEOGRAPHICAL LOCATIONS X ECONOMIC FACTORS

Figure 2: geographical map indicating the locations

Figure 3: Galois lattices of intents and extents from table 1

Age Sex Crime Type Geographical Location

a b c m f c 1

c 2

c 3

c 4

g 1

g 2

g 3

g 4

g 5

P 1 x x x x x P 2 x x x x x P 3 x x x x x P 4 x x x x x P 5 x x x x x P 6 x x x x x P 7 x x x x P 8 x x

x x

P 9 x x x x x

Income Index Index of Education Population Index

a b c d e f g h i j k l m n

g1 x x x

g2 x x x

g3 x x x

g4 x x x

g5 x x x

Figure 4: Galois lattices of intents and extents from table 2 TABLE 3: GEOGRAPHICAL LOCATIONS X CRIME TYPES

c1 c2 c3 c4

g1 2 2 2

g2 0 0 0

g3 2 1 0

g4 1 0 0

g5 1 0 1

Total 6 3 3

CRIME TYPES

Total

1 7

1 1

1 4

1 2

0 2

4 16

Figure 5: A graph of crime count against geographical locations The concept lattice of a formal context figure 3, is the Galois lattices of intents and extents from table 1. This is the set of all formal concepts of with the partial order (A1, B1) ≤ (A2, B2): B1 ⊇ B2). From the concepts, it can be observed that 57, 61, 69, and 71 have most of the attribute g1 and most of the attributes of the crime types. This indicated the high occurrence of crime types in g1. The concept lattice of a formal context figure 4, is the Galois lattices of intents and extents from table 2. This is the set of all formal concepts of with the partial order (A1, B1) ≤ (A2, B2): B1 ⊇ B2). From the concepts, it can be observed that concept 1, and 5 have most of the attribute with the least index indicated the high occurrence of crime types in g1. Table 3 consists of the matrix of geographical locations crime types and figure 5 is the graph of crime count versus geographical locations.

IV. CONCLUSION

A formal concept analysis was used to analyze and crime data. And relationships between the and different geographical areas were visualized. considered the set of common and distinct attributes of crimes data in such a way that categorization was done based on related crime types and their geo-locations building a more defined and conceptual systems for analysis of geographical crime data that can easily be visualized and intelligently analyzed by computer systems.

REFERENCES [1] Kester, Quist-Aphetsi. "Computer Aided Investigation: Visualization an

Analysis of data from Mobile communication devices using Formal Concept Analysis." arXiv preprint arXiv:1307.7788 (2013).

[2] Phillips, P.; Lee, I.; , "Mining top-k and bottom patterns through graph representa-tions," Intelligence and Se Informatics, 2009. ISI '09. IEEE International Conference on , vol., no., pp.25-30, 8-11 June 2009.

4

A graph of crime count against geographical locations.

The concept lattice of a formal context (G, M, I) as shown in Galois lattices of intents and extents from table

is the set of all formal concepts of (G, M, I), together (A2, B2): ⟺ A1 ⊆ A2 (⟺

From the concepts, it can be observed that concept 57, 61, 69, and 71 have most of the attribute g1 and most of

This indicated the high

The concept lattice of a formal context (G, M, I) as shown in , is the Galois lattices of intents and extents from table

. This is the set of all formal concepts of (G, M, I), together (A2, B2): ⟺ A1 ⊆ A2 (⟺

From the concepts, it can be observed that concept with the least index. This

indicated the high occurrence of crime types in g1. of the matrix of geographical locations cross

crime types and figure 5 is the graph of crime count versus

CONCLUSION

A formal concept analysis was used to analyze and visualize relationships between the various crime types

were visualized. This method considered the set of common and distinct attributes of crimes

in such a way that categorization was done based on locations. This will help in

building a more defined and conceptual systems for analysis of geographical crime data that can easily be visualized and intelligently analyzed by computer systems.

REFERENCES Aphetsi. "Computer Aided Investigation: Visualization and

Analysis of data from Mobile communication devices using Formal Concept Analysis." arXiv preprint arXiv:1307.7788 (2013).

k and bottom-k cor-relative crime tions," Intelligence and Security

Informatics, 2009. ISI '09. IEEE International Conference on , vol., no.,

5

[3] Ratcliffe, Jerry H. Intelligence-led policing. Routledge, 2012. [4] Alex Leone, Chris Raastad, and Igor Tolkov. Yet Another Approach to

Geographic Profiling. Mathematical Contest in Modeling. Retrieved from: http://www.math.washington.edu/~morrow/mcm/.

[5] Snook, B., Zito, M., Bennell, C., & Taylor, P. J. (2005). On the complexity and accuracy of geographic profiling strategies. Journal of Quantitative Criminology, 21(1), 1-26.

[6] Phillips, P.; Lee, I.; , "Mining top-k and bottom-k correlative crime patterns through graph representa-tions," Intelligence and Security Informatics, 2009. ISI '09. IEEE International Conference on , vol., no., pp.25-30, 8-11 June 2009

[7] Shyam Varan Nath; , "Crime Pattern Detection Using Data Mining," Web Intelligence and Intelligent Agent Technology Workshops, 2006. WI-IAT 2006 Work-shops. 2006 IEEE/WIC/ACM International Confe-rence on , vol., no., pp.41-44, Dec. 2006

[8] P. Rogerson, Y. Sun Spatial monitoring of geographic patterns: an application to crime analysis Computers, Environment and Urban Systems, Volume 25, Issue 6, November 2001, Pages 539–556.

[9] Radimbˇ Elohl ´ Avek introduction to formal concept analysis Olomouc 2008.

[10] Kester, Quist-Aphetsi. "Visualization and Analysis of Geographical Crime Patterns Using Formal Concept Analysis." arXiv preprint arXiv: 1307.8112

Attachment 4

Newton and Felson Crime Science (2015) 4:11 DOI 10.1186/s40163-015-0025-6

EDITORIAL Open Access

Editorial: crime patterns in time and space: the dynamics of crime opportunities in urban areas Andrew Newton1* and Marcus Felson2

Abstract

The routine activity approach and associated crime pattern theory emphasise how crime emerges from spatio-temporal routines. In order to understand this crime should be studied in both space and time. However, the bulk of research into crime patterns and related activities has investigated the spatial distributions of crime, neglecting the temporal dimension. Specifically, disaggregation of crime by place and by time, for example hour of day, day of week, month of year, season, or school day versus none school day, is extremely relevant to theory. Modern data make such spatio-temporal disaggregation increasingly feasible, as exemplified in this special issue. First, much larger data files allow disaggregation of crime data into temporal and spatial slices. Second, new forms of data are generated by modern technologies, allowing innovative and new forms of analyses. Crime pattern analyses and routine activity inquiries are now able to explore avenues not previously available. The unique collection of nine papers in this thematic issue specifically examine spatio-temporal patterns of crime to; demonstrate the value of this approach for advancing knowledge in the field; consider how this informs our theoretical understanding of the manifestations of crime in time and space; to consider the prevention implications of this; and to raise awareness of the need for further spatio-temporal research into crime events.

Keywords: Crime patterns; Spatio-temporal analysis; Crime opportunities; Routine activities; Dynamic hot spots

Introduction The distribution of crime is not random in time and space. Explanations for this are grounded in routine activities theory (Cohen and Felson 1979) and crime pattern theory (Brantingham and Brantingham 1981). In ‘simple’ terms; the occurrence of a crime requires the juxtaposition of motivated offenders and suitable targets, a situation con- strained in time and space. These constraints are defined by an offenders and victims use of time and space, as their activities are bounded by the need to eat, sleep, work, or for recreational activity. Moreover, these activities can only occur at a finite number of locations and times; and, that the movement of offenders and victims is not compulsive, but structured, regulated by the daily routines of offenders and victims, and the social and physical environments within which they interact (Brantingham and Brantingham 2013). Indeed, “a limited number of sites, times, and situations constitute the space-time loci for the vast majority of offenses” (p540).

* Correspondence: [email protected] 1The Applied Criminology Centre, HHR2/10, The University of Huddersfield, Queensgate, Huddersfield, UK Full list of author information is available at the end of the article

© 2015 Newton and Felson. This is an Open A License (http://creativecommons.org/licenses/b medium, provided the original work is properly

The distribution of crime events The past two decades has seen a major expansion into the analysis of the spatial distribution of crime, with small scale or micro level analysis emerging at the fore- front of place based research (Sherman et al. 1989; Sher- man 1995; Weisburd 2015). This trend has been driven by both the increased availability of spatially referenced crime data, and the technological advances of software products which promote the analysis of the spatial clus- tering of crime, or hot-spot analysis. However, this growth in spatial analysis is perhaps not reflected by similar advances in the temporal analysis of crime. Whilst a number of studies have examined the temporal patterns of crime (Ashby and Bowers 2013), these are not as prominent in the field as the spatial literature. As highlighted over 10 years ago, whilst the spatial analysis of crime has thrived, analysis of the temporal distribu- tion of crime has failed to keep pace (Ratcliffe 2002). This is still true today; “the majority of studies linking poten- tially criminogenic places to elevated levels of crime across geographical units have been atemporal” (Haberman and Ratcliffe in press).

ccess article distributed under the terms of the Creative Commons Attribution y/4.0), which permits unrestricted use, distribution, and reproduction in any credited.

Newton and Felson Crime Science (2015) 4:11 Page 2 of 5

As a consequence of this, and perhaps compounded by the challenges of employing complex spatio-temporal analysis methods (Ratcliffe 2010), the inextricable link between space and time is often omitted from place- based or temporal-based crime research. With the ex- ceptions perhaps of the near repeat victimisation litera- ture (Johnson et al. 2007), animated visualisations of sequences of hotspots over the course of the day (Bruns- don et al. 2007; Townsley 2008), and some isolated stud- ies now discussed, there is a paucity of research into the patterns and manifestations of crime events in both space and time.

Spatio-temporal crime analysis Several researchers acting independently, using data on dif- ferent crimes, and from different nations, have found that crime hotspots shift quickly in response to the structure of daily life. For example, major shifts have been found in rob- bery locations from afternoon to early morning, and week- day to weekend within the vicinity of schools, parks and late night business (Adams et al. 2015). Others have found high crime risk in some entertainment districts in the early evening, while other entertainment districts experience more crime problems after midnight. Crime near bars and pubs is significant on weekends, but such clustering may be barely noticeable on weekdays (Newton and Hirschfield 2009; Grubesic and Pridemore 2011). Crime on transit sys- tems have been shown to be highly dynamic and related to surrounding environs with distinct patterns in both space and in time (Ceccato and Uittenbogaard 2014; Newton et al. 2014). Shiode et al. (2015) found that within high crime areas in Chicago, different micro-scale spatio- temporal crime patterns were evident for different types of crime; drugs, robbery, burglary, and vehicle crime all had their own unique spatio-temporal crime patterns. Haber- man and Ratcliffe (in press) suggest that the criminogenic nature of places is influenced by a number of factors that in- clude; the length of time facilities are open; the consistency of use during the day, for example facilities with a steady flow of people versus those with concentrations of people at peaks and sparse use at off peak times; and unofficial use of places when they are in effect closed or recently closed. Many of the old ideas from Chicago in the 1930s and

1940s no longer hold. Areas identified as high crime parts of towns and cities experience low and moderate levels of crime during certain time periods across several of their streets/blocks. Some areas are prone to certain crime of- fences at particular times of the day, but rarely does ana- lysis consider whether these areas suffer from other crime types, either simultaneously or at another time or day of the week. Moreover, little attention is afforded to the explaining the dynamics of crime hot spots. Crime can shift rapidly over the course of a 168 hour period (1 week). Furthermore, and especially when there is mixed land-use,

characteristics of these populations are likely to differ sub- stantially from the residential population, making it diffi- cult to calculate realistic crime rates. Are these changes driven solely by population dynamics, how is this influ- enced by the physical and social makeup of the environ- ments within which these crimes occur, and, what drives this change? The purpose of this thematic issue it to bring together a range of papers on crime patterns in time and space, to examine the dynamics of crime opportunities in urban areas.

Aims of the thematic series This thematic series aims to assemble a unique collection of papers that specifically examine the spatio-temporal pat- terns of crime events. Some key goals are; to raise aware- ness of the need for more of this type of research; to promote the value of this in advancing knowledge in the field; to inform our theoretical understanding of the mani- festation of crime in time and space; to investigate how op- portunities for crime are constrained by the routines and movements of offenders and victims and the social and physical environments they interact with; and to consider the prevention implications of this spatio-temporal approach.

Article overview The papers in this thematic issue are drawn from a number of different nations, and the focus is on empirical studies that better identify the space-time patterns evident, seek ex- planations for such observations, and consider the response implications of the findings and the challenges for preven- tion. This thematic issue contains nine papers from nine dif- ferent cities, three each from Canada, the United States, and the United Kingdom. The table below summarises the pa- pers and the key contribution each of these makes (Table 1).

The funnelling hypothesis? Felson and Boivín set the scene for this thematic issue, in- vestigating transportation data to determine how daily spatio-temporal shifts in population impact on the crime of a city. The premise here is that daily movements in a city will follow a funnelling hypothesis; and that visitors will have a greater impact than residents on crime. Whilst the data does not enable a micro-level break down of crime in time and place, the results reveal that daily visi- tors have a significant impact on violence and property crime distributions compared to residents. This suggests that daily spatio-temporal shifts have a greater influence than fixed residential factors in the distribution of crime opportunities over urban space.

Micro spatio-temporal crime settings Following on from this four papers (Hermann; Ivan-Erikson and La Vigne; Geoffrion, Sader, Ouellet and Boivin; and

Table 1 A summary of papers in the thematic issue

Authors Summary and contribution

Daily shifts in population. The funnelling hypothesis

Felson and Boivín Visitors funnelling crime risk into particular census tracts and away from others. Visitors found to impact property and violent crime more than residents. Canadian case study.

Space-time settings and the spatial-temporal landscape (school, bars and subway stations)

Herrmann Examines timing of street robbery hot spots by hour of the day. Compares school days and none school days. Finds two distinct patterns of robbery. School day 3:00 pm robbery hotspots adjacent to subway stations and schools; 1:00 am non-school day robbery hotspots close to bars and subway stations. US case study.

Ivan-Erikson and La Vigne Examines crimes at subway stations. Finds multiple types of crime in highly dynamic settings, considering peak versus off-peak and daytime versus night-time. Evidence of stations crime acting spatio-temporally as attractors and crime generators. US case study.

Geoffrion, Sader, Ouellet and Boivín

Data gathered over a year for a large nightclub, with aggression disaggregated by hour of evening and location inside the bar. Spatio-temporal patterns of aggression evident within micro settings inside bar-room environment. Canadian case study.

Newton Examine stability of crime hot spots around licensed premises. Consider how crime hot spots may change to different locations, to different times, or hot spots for different crime types may occur simultaneously, or at differing times of the day or days of the week at the same location. Examines disorder, criminal damage and violence. UK case study.

Understanding crime in time and space

Andreson and Malleson Examines intra-week patterns of crime in time and space. Finds unique crime patterns for differing crime types in both time and space by different days of the week. For example on Saturdays, theft from vehicle increased in the downtown parks and recreational park areas on, and assaults also increased in the bar districts. Canadian case study.

Tompson and Bowers Seasonal patterns of street robbery by hour of day and season, taking into consideration weather, thermal comfort, and likelihood that people will go out. Identifies discretionary routines more likely to be influenced by weather. UK case study.

Malleson and Andreson Examines risk of crime taking into account ambient population and hour of day. Identifies significant hotspots of risk based on dynamics of underlying population in both time and space. UK case study.

Preventing crime in time and space

Boba and Santos Hot spots examined by hours and degree of repetition, discussing how police can respond with that information. US case study.

Newton and Felson Crime Science (2015) 4:11 Page 3 of 5

Newton) examine crime at particular facilities, public trans- portation, schools, bars and nightclubs. Each considers how each of these offers specific micro-loci settings for crime, around which opportunities for crime are constrained in both time and space. Herrmann examines street robbery, using the New York

Bronx as a case study area. This analysis uses the NNh clustering technique used by a number of law enforce- ment agencies. Two distinct spatial patterns are found in robbery hot spots when comparing school days and non- school days. The first set of robbery hot spots peak around 3:00 pm during school days. The second are observed dur- ing non-school days, peaking around 1:00 am. Examin- ation of the location of these spatio-temporal hotspots reveals that the daytime robberies all cluster at places close to both schools and subway stations. The night-time robberies were found to concentrate adjacent to bars and mass transit stations. This suggests the spatio-temporal landscape is clearly influential in shaping the robbery hot spots observed. Irvin-Erickson and La Vigne’s paper analyses crime at

Washington DC metro stations, a highly dynamic setting. They classify time using three temporal groupings; peak times; off-peak daytime; and, off-peak night-time hours.

Their spatio-temporal analysis reveals that stations may act as crime generators and or crime attractors, and more- over, that this is time of day, location and crime type spe- cific. Stations that are highly connected, busy, and have high levels of crime in general, tend to have high crime rate ratios, indicating crime generator characteristics, those which create unplanned but favourable opportun- ities for offenders. During peak hours these stations are at greater risk of larcenies and disorderly conduct, and dur- ing non-peak day hours robbery is more prevalent. Sta- tions that were more remote and less connected to the network were more at risk of larceny during peak hours and disorderly conduct during non-peak night-time hours. These tended to act as crime attractors, places offenders travel to due to known and expected opportunities for crime. Thus crime at subway stations was a function of a station’s connectedness or remoteness, the density of people present which varied by time of day, the SES group of the area it is situated, and crime in its nearby surround- ings. Moreover, there were distinct spatio-temporal pat- terns to crime at stations. Geoffrion, Sader, Ouellet and Boivín investigate the

spatio-temporal routine activities at perhaps the most micro-level, inside a single building, a bar. Whilst at first

Newton and Felson Crime Science (2015) 4:11 Page 4 of 5

it may seem unusual to think about this environment as dynamic in time and space, the study found distinct spatio-temporal patterns of aggression present even within this localised and contained micro environment. Three distinct patterns for aggression were identified with hot- spots shifting by target, in time and location, between pa- trons, towards bouncers, and towards barmaids. Specific hotspots were identified for each victim group, unique in both time and space, driven by their activities at different points of the evening/night-time. For example aggressive incidents between patrons at the start of the evening oc- curred near the bar area, then from midnight to 2:00 am on the dance floors, from 2:00 am until closing were ob- served back at the bar area, and then at closing time in poorly lit areas near exits. Different spatio-temporal pat- ters were observed towards barmaids and bouncers. Thus, even within a bar environment during a single evening, crime opportunities are dynamic in time and space, driven by activities constrained both in time and in space. Newton investigates how patterns of crime in space

and time should not always be examined in the context of single crime types. Indeed, as the function and use of a place changes during the day and week, alternative types of crime may emerge. They examine crime around licensed premises, and analyse three crime types–crim- inal damage, violence and anti-social behaviour, as the research evidence has shown each of these to correlate with the locations of licensed premises. It is important for crime prevention to ascertain which places are hot spots for only one crime type at discrete times and places, and how hot spots for different crime types shift in location by time of day. When only one crime hot spot is present then hot spot analysis using one crime type is appropriate. However, when hot spots of different crime types are observed at different times of the day/ days of the week in the same place, and hot spots of different hot spot types are found conterminously at the same time and place, then it is necessary to consider using multiple crime type hot spot analysis. This is par- ticularly pertinent when targeting sparse police resources in time and space.

Understanding crime in time and space A second set of papers examine more aggregated time units to understand spatio-temporal crime patterns. Andresen and Malleson explore spatio-temporal patterns by time of day and day of week; Tompson and Bowers scrutinise how weather and seasonality influence the time and location of street robbery; and Malleson and Andreson address the issue of crime risk through the use of the ambient population. The underlying population at risk is itself dynamic, changing in time and place, and is not well represented through use of residential population as a crime denominator.

Andreson and Malleson explicitly explore how the day of week impacts on the spatial and temporal patterns of crime offences for the City of Vancouver, Canada. They investigate the intra-week patters of a range of crime types and found, as expected, increased levels of crime at weekends in certain localities. For example on Satur- days, theft from vehicle increased in the downtown parks and recreational park areas, and assaults also in- creased in the bar districts. However not all crime types revealed expected intra-week patterns. For example, in- creases in burglary in particular places were observed on Mondays. For robbery and sexual assault they did not find unique intra-week patterns. This may be due to dif- ferent groups of offenders operating on different days. However, for most crime types examined there were dis- tinctive temporal and spatial patterns observed for dif- ferent days of the week. Tompson and Bowers examine the impact on weather

on the spatio-temporal patterns of street robbery. They tested two hypotheses. The first of these is that people’s use of space will be influenced by extremes in weather, for example excess heat and extreme cold might limit the use of outdoor space–an essential component of street rob- bery–whereas unexpectedly mild or favourable weather might encourage people to venture outside. They found that wind speed and temperature did affect robbery, the adverse impact of winter corresponding with a reduction in robbery, whereas an increase in temperature led to more robberies. However, these variables interacted, as despite increased temperature an increased wind speed in summer months resulted in a decrease in robberies. Thus both variables contributed to what the author’s term a per- son’s sense of thermal comfort. The authors move beyond this with their second hypothesis, to examine how weather might impact on discretionary activities, those a person pursues through choice, as opposed to obligatory routine activities they have to do. The hypothesis here is weather will influence the spatio-temporal patterns of discretionary activities more than that of obligatory ones. Temperature, wind speed and humidity were significant predictors of robbery during the night-shift and at weekends, and rain was shown to have a negative relationship with robbery at the weekends. When travel behaviour is optional, people are less likely to venture outdoors when it is raining. Thus, weather exerts significant constraints on the space-time loci of crime opportunities, particularly outside of working hours during a person’s time and space delineated discre- tionary routine activities. Malleson and Andresen pose a different problem. A key

component in the analysis of crime is identifying levels of risk, and crime rates are often used here. For example, identification of burglary risk on a street should take ac- count the number of properties. Violence at night time should consider the number of persons present in the

Newton and Felson Crime Science (2015) 4:11 Page 5 of 5

night-time economy. Risk of assault on a train will depend on the number of passengers. Thus the denominators of crime (crime rates) are an essential component to aid our examination of crime risk. However, when considering crime in both place and time, it is problematic to identify crime rates. Calculating accurately the true population at risk in any given place and time is vital for identification of reliable crime rates. However data on the movement of persons through areas is not routinely collected during the course of the day, and not simple to capture. Residen- tial populations identified through census and other sur- veys do not accurately reflect populations in business centres during the day-time, or residential areas when most people are out at work. This paper attempts to ad- dress this, and produce an examination of the ambient population. This shift in denominators changes how we think about exposure to risk of crime, bringing back to life the classic work of Sarah Boggs (1965). This paper makes use of new ‘crowd sourced’ data, to create better estimates of populations at risk for crimes such as street robbery, which are referenced both in place and in time. From this, through the use of spatio-temporal cluster hunting tech- niques, crime hot spots are identified that are significant in time and space, after taking account of the size of the estimated ambient population in the area at the time of the crime.

Preventing crime in time and space The final paper in this thematic issue demonstrates how spatio-temporal elements of crime events are necessary for developing timely police interventions and responses. This paper examines the micro-time hot spot, a flare up of crime defined as the emergence of several closely- related crimes within a few minutes travel distance from one another, within a 1 to 2 week period. These are dis- tinguished from longer term hot spots which remain stable over time. The authors find evidence that when responses to micro-time hot spots are rapid and consist- ent, for example over a period of 14 days, the number of subsequent crimes were reduced. The findings support the recommendation that police should act immediately when a micro time hotspot is identified, and therefore spatio-temporal analysis should be routinely conducted as part of their operation analysis tools.

Competing interests The authors declare they have no competing interests.

Authors’ contributions AN led on the writing of the editorial introduction, and editing the nine papers submitted to the thematic issue. MF supported both the writing and the editing process. Both authors read and approved the final manuscript.

Author details 1The Applied Criminology Centre, HHR2/10, The University of Huddersfield, Queensgate, Huddersfield, UK. 2109 Hill House, Hines Academic Center, Texas State University, 601 University Dr, San Marcos, TX 78666, USA.

Received: 4 June 2015 Accepted: 4 June 2015

References Adams, W, Herrman, C, & Felson, M. (2015). Crime, transportation and malignant mixes.

In V Ceccato & A Newton (Eds.), Safety and security in transit environments: an interdisciplinary perspective (pp. 181–195). Basingstoke, Hampshire: Palgrave McMillan.

Ashby, A, & Bowers, K. (2013). A comparison of methods for temporal analysis of aoristic crime. Crime Science, 2, 1.

Boggs, S. (1965). Urban crime patterns. American Sociological Review, 30(6), 899–908. Brantingham, PL, & Brantingham, PJ. (1981). Notes on the geometry of crime. In PJ

Brantingham & PL Brantingham (Eds.), Environmental criminology (pp. 27–54). Prospect Heights IL: Waveland Press.

Brantingham, PL, & Brantingham, PJ. (2013). The theory of target search. In F Cullen & P Wilcox (Eds.), The Oxford handbook of criminological theory (pp. 535–553). New York, NY: Oxford University Press.

Brunsdon, C, Corcoran, J, & Higgs, G. (2007). Visualising space and time in crime patterns: a comparison of methods. Computers, Environment and Urban Systems, 31, 52–75.

Ceccato, V, & Uittenbogaard, AC. (2014). Space-time dynamics of crime in transport nodes. Annals of the Association of American Geographers, 104(1), 131–150.

Cohen, LE, & Felson, M. (1979). Social change and crime rate trends: a routine activity approach. American Sociological Review, 44(4), 588–608.

Grubesic, TH, & Pridemore, WA. (2011). Alcohol outlets and clusters of violence. International Journal of Health Geographics, 10(1), Article 30.

Haberman, C.P., & Ratcliffe, J.H. (in press). Testing for temporally nuanced relationships among potentially criminogenic places and census block street robbery counts. Criminology

Johnson, SD, Bernasco, W, Bowers, KJ, Elffers, H, Ratcliffe, J, Rengert, G, & Townsley, M. (2007). Space-time patterns of risk: a cross national assessment of residential burglary victimization. Journal of Quantitative Criminology, 23(3), 201–219.

Newton, A, & Hirschfield, A. (2009). Measuring violence in and around licensed premises: the need for a better evidence base. Crime Prevention and Community Safety: An International Journal, 11(3), 171–188.

Newton, A, Partridge, H, & Gill, A. (2014). Above and below: measuring crime risk in and around underground mass transit systems. Crime Science, 3(1), 1–14.

Ratcliffe, JH. (2002). Aoristic signatures and the spatio-temporal analysis of high volume crime patterns. Journal of Quantitative Criminology, 18(1), 23–43.

Ratcliffe, JH. (2010). Crime mapping: spatial and temporal challenges. In AR Piquero & D Weisburd (Eds.), Handbook of quantitative criminology (pp. 5–24). New York, NY: Springer.

Sherman, LW. (1995). Hot spots of crime and criminal careers of places. In JE Eck & D Weisburd (Eds.), Crime and place. Vol. 4 (pp. 35–52). Monsey, NY: Criminal Justice Press.

Sherman, LW, Gartin, PR, & Buerger, ME. (1989). Hot spots of predatory crime: routine activities and the criminology of place. Criminology, 27, 27.

Shiode, S, Shiode, N, Block, R, & Block, C. (2015). Space-time characteristics of micro-scale crime occurrences: an application of a network-based space-time search window technique for crime incidents in Chicago. International Journal of Geographical Information Science. http://www.tandfonline.com/doi/abs/ 10.1080/13658816.2014.968782?journalCode=tgis20.

Townsley, M. (2008). Visualising space time patterns in crime: the hotspot plot. Crime Patterns and Analysis, 1(1), 61–74.

Weisburd, D. (2015). The law of crime concentration and the criminology of place. Criminology, 53(2), 133–157.

Submit your manuscript to a journal and benefi t from:

7 Convenient online submission 7 Rigorous peer review 7 Immediate publication on acceptance 7 Open access: articles freely available online 7 High visibility within the fi eld 7 Retaining the copyright to your article

Submit your next manuscript at 7 springeropen.com

Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

  • c.40163_2015_Article_25_9462.pdf
    • Abstract
    • Introduction
    • The distribution of crime events
      • Spatio-temporal crime analysis
      • Aims of the thematic series
    • Article overview
      • The funnelling hypothesis?
      • Micro spatio-temporal crime settings
      • Understanding crime in time and space
      • Preventing crime in time and space
    • Competing interests
    • Authors’ contributions
    • Author details
    • References