You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 50 Next »

 

May 18, 2020 - This tutorial is still being actively edited. Please wait for this message to disappear before completing or printing, if you'd like to view it in its final form.


TABLE OF CONTENTS

This guide was created by the staff of the GIS/Data Center at Rice University and is to be used for individual educational purposes only. The steps outlined in this guide require access to ArcGIS Pro software and data that is available both online and at Fondren Library.The following text styles are used throughout the guide:Explanatory text appears in a regular font.
  1. Instruction text is numbered.
  2. Required actions are underlined.
  3. Objects of the actions are in bold.
Folder and file names are in italics.Names of Programs, Windows, Panes, Views, or Buttons are Capitalized.'Names of windows or entry fields are in single quotation marks.'"Text to be typed appears in double quotation marks."

The following step-by-step instructions and screenshots are based on the Windows 10 operating system with the Windows Classic desktop theme and ArcGIS Pro 2.1.3 software. If your personal system configuration varies, you may experience minor differences from the instructions and screenshots.

PART I. Install the R-ArcGIS bridge

1. Download R and RStudio

First of all, you need to download and set up R and RStudio (RStudio is a free integrated development environment for R).

R link (recommend to download R 3.2.2 or later): https://cran.mtu.edu/bin/windows/base/

R Studio link: https://rstudio.com/products/rstudio/download/

For both, accept all defaults in the installation wizard.

2. Prepare data on Houston crime statistics

Though you can Google search Houston crime GIS data and find the shapefile (with the link: https://cohgis-mycity.opendata.arcgis.com/datasets/hpd-nibrs-crime), it only contains around 4,000 cases, which is far from complete. 

As a remedy, we need to download the data from Houston Crime Statistics (with the link: https://www.houstontx.gov/police/cs/Monthly_Crime_Data_by_Street_and_Police_Beat.htm). For example, we can focus on the data of year 2020 till end of May (see the following screenshot).

 

Note that the data is in .xlsx format. We need to save it as .csv file and then geocode the addresses to make it a shapefile. You need to use credits for geocoding in ArcGIS Pro. For convenience, I have already downloaded the data (and cleaned a little in terms of addresses) and geocoded it into a shapefile. You can downloaded the .rar file attaced here houston-crime-sample.rar and proceed.

Notice that, for simplicity, I only use the crime cases of the first 10 days of year 2020, that is, dating from January 1, 2020 to January 10, 2020 (I choose 10 days because the step in Part II using Create Space Time Cube by Aggregating Points requires at least 10 time step intervals (in this case, it requires at least 10 days if the time step interval is set as 1 day).

3. Create a new project in ArcGIS Pro

As mentioned above, download the houston-crime-sample.rar file.

Locate the downloaded file on your computer and extract its contents to a folder named houston-crime-sample in a location of your choice.

Open the houston-crime-sample folder and you will see houston-crime.gdb, the geodatabase which has crime data that you will add to a map.

Start ArcGIS Pro. Under New, click Map. In the Create a New Project window, for Name, type Houston Crime Analysis. For Location, browse to and choose your houston-crime-sample folder. Uncheck Create a new folder for this project. Click OK, then the project is created.

In the Catalog pane, on the Project tab, expand Folders and expand the houston-crime-sample folder. Expand the houston-crime.gdb geodatabase, right-click the Houston_Crimes_Sample feature class, and choose Add To Current Map.


As described above, this map shows locations where crimes occurred from January 1, 2020 through January 10, 2020 in the greater Houston area. This is only a small sample for crime analysis, but the general procedure is the same. You can apply this tutorial for larger data sets of crime statistics.

4. Install the R-ArcGIS bridge: automatical method/manual method

R-ArcGIS bridge is a useful tool for you to reading and writing data to and from ArcGIS Pro and R. Once it is installed, you can also begin running script tools that reference an R script.Here are two methods to install R-ArcGIS bridge, automatical or manual.

Automatical method:
On the ribbon, click the Project tab.Click Options.

In the Options pane, under Application, click Geoprocessing. In the R-ArcGIS Support section, select your desired R home directory. Notice that all versions of R that are installed on your computer will appear in the list. Select R 3.2.2 or a later version. For example shown in the screenshot below, I use R 4.0.1.

If you do not have the ArcGIS R integration package installed, next to Please install the ArcGIS R integration package, click Install package and choose Install package from the Internet. When asked to confirm the installation, click Yes, and when the installation is complete, click Close.

If you already have the ArcGIS R integration package installed, next to Installed 'arcgisbinding' package version, click the Check for updates button and choose Check package for updates to ensure that you have the latest version of the package.

In the Options window, click OK. Click the Back button to return to the map.


Manual method:

Sometimes it turns out error messages when installing the ArcGIS R integration pacakge. You can use the manual method describled as following.

Go to this website: https://github.com/R-ArcGIS/r-bridge-install (actually you can also refer this tutorial for manual installation).

Download the repository r-bridge-install-master.zip by clicking Code and then Download ZIP.

Download the latest version of the arcgisbinding package in this website: https://github.com/R-ArcGIS/r-bridge/releases/tag/v1.0.1.239. As of writing, this is arcgisbinding_1.0.1.239.zip (this is the updated version by the end of June, 2020).

Copy both zip files onto the directory of your choise. Extract the r-bridge-install-master.zip file. Place the arcgisbinding_1.0.1.239.zip into the same directory as the R Integration Python toolbox.

Back to ArcGIS Pro, in the Catalog >Project pane, right-click Toolboxes > Add Toolbox and navigate to the location of the R Integration Python toolbox. Open the toolbox, which should look like the following screenshot:


Run the Install R bindings script. You can then test that the bridge is able to see your R installation by running the Print R Version and R Installation Details tools.

PART II. Basic statistical analysis

1. (Optional) Project the shapefile

It is optional because the shapefile I have prepared for you has already been projected. If you download the original data in .xlsx format and geocode addresses by yourself, you need to project the generated shapefile after geocoding addresses.

In the Geoprocessing pane, search Project in the Find Tools box and click Project to open the tool. Change the following parameters:

  • For Input Dataset or Feature Class, choose Houston_Crimes_Sample (or the name you created by your own for output feature class of geocoding addresses).
  •  For Output Dataset or Feature Class, name it as Houston_Crimes_Sample_Projected as an example.
  •  For Output Coordinate System, click the globe icon and choose Projected coordinate system > World > WGS 1984 World Mercator.

 Click OK. It should look like the following:

Click Run in the right-lower corner and it starts projecting and will generate the projected feature class.

2. Aggregate point data by counts within a defined location

This step helps us understand the data further. Before starting the analysis, you need to aggregate crime counts by space and time. Aggregation reveals the spatial and temporal relationships in your data that may not have been visible previously. Specifically, aggregating allows you to summarize your crime points in space-time bins that combine the crimes that have occurred into counts by space and time increments of your choice.

Open the Geoprocessing pane, search Create Space Time Cube in the Find Tools box and click Create Space Time Cube By Aggregating Points to open the tool. Change the following parameters:

  • For Input Features, choose Houston_Crimes_Sample.
  • For Output Space Time Cube, browse to your houston-crime-sample folder and name the output Houston_Crimes_Space_Time_Cube.nc.
  • For Time Field, choose Occurrence Date.
  • For Time Step Interval, type 1 and choose Days.
  • For Time Step Alignment, confirm that End time is chosen.
  • For Aggregation Shape Type, choose Hexagon grid.
  • For Distance Interval, type 1 and choose Miles.

It should look like the following screenshot:

These parameter values specify the size and shape of the space-time bins that you are creating. Because your data is for the first 10 days of year 2020, analyzing crimes by each day is a natural breaking point. Additionally, your department wants to analyze crimes at a local level, so you select a small distance interval value (here 1 mile). Hexagon bins are selected because they are preferable in analyses that include aspects of connectivity or movement paths.

Click Run. The Create Space Time Cube By Aggregating Points tool creates a netCDF file (.nc), which allows you to view spatial patterns and trends over time. The tool aggreagated the 6,190 points in the Houston_Crimes_Sample layer into 762 hexagons (the polygon bins). The Distance Interval and Time Step Interval parameters impact the number of resulting bins and the size of each bin. These values can be chosen based on prior knowledge of the analysis area, or the tool will calculate values for you based on the spatial distribution of your data. You can confirm that this tool successfully created the file by checking the houston-crime-sample folder.

3. Analyze crime hot spots: One step forward

Next, you'll analyze where statistically significant clusters of crime are emerging and receding throughout the city. Your analysis will help the department anticipate problems and evaluate the effectiveness of resource allocation for their crime prevention measures.

In the Geoprocessing pane, click the Back button. Search for and open the Emerging Hot Spot Analysis tool. Change the following parameters:

  • For Input Space Time Cube, browse to and choose the Houston_Crimes_Space_Time_Cube.nc file.

  • For Analysis Variable, choose COUNT.

  • For Output Features, browse to your houston-crime-sample folder and name the output Houston_Crimes_Hot_Spots.shp.

It should look like the following screenshot:


By using the default value for Neighborhood Distance, you are letting the tool calculate a distance band for you based on the spatial distribution of your data. The Neighborhood Time Step value is set to one time step interval (one day in this case) by default. These settings are ideal for an exploratory analysis; however, if you knew the optimal distance band and time step interval for your analysis, you could set them.

Click Run. The tool runs and its results are added to the map. (A warning message informs you of the value that the tool used for the Neighborhood Distance parameter.)

In the Contents pane on the left, turn off the Houston_Crime_Sample layer. You will see:


Trends in statistically significant hot and cold spots are shown on the map. Red areas indicate that over time there has been clustering of high numbers of crime, and blue areas indicate that over time there has been clustering of low numbers of crime. Each location is categorized based on the trends in clustering over time.

The dark red hexagon bins are persistent hot spots. These are locations that have been statistically significant hot spots for 90 percent of all of your time slices. However, these locations do not have a discernable increase or decrease in the intensity of clustering of crime counts over time.

In contrast, the light red with beige outlined hexagon bins are intensifying hot spots. These are locations that have been statistically significant hot spots for 90 percent of all of your time slices. In addition, these are locations where the intensity of clustering of crime counts is increasing over time, and that increase is statistically significant.

Conversely, the dark blue bins are persistent cold spots. These are areas where crime is statistically, and persistently, less prevalent. The light blue outlined bins are intensifying cold spots but means the opposite of its counterpart. Clusters of low crime counts in these cells are becoming more intense over time. In other words, the cold spots are getting colder.

The department needs to be especially concerned about the areas where crime is persistent or intensifying. They may move resources to these areas from the places where crime cold spots occur.

Save your project.

Till this step, you have installed the R-ArcGIS bridge, prepared your data for statistical analysis, and started using some of the available tools. Next, you'll add additional attributes to your dataset, allowing you to draw conclusions from your analysis about what factors likely influence the occurrence of crime.

PART III. Enhance the dataset with additional attributes

Previously, you installed the R-ArcGIS bridge and downloaded the data for your statistical analysis. Then, in ArcGIS, you aggregated your data based on areas and times of interest and began to explore temporal trends in your dataset. For the department to better understand what factors influence the prevalence of crime, you'll add additional information.

1. Add additional attributes to the original dataset

Now that you know where crime hot spots are emerging, you'll try to determine why they are emerging. In particular, you'll examine the relationship between an area's crime and its population. Statistical analysis can determine if the number of crimes occurring in a particular area is influenced by population. In addition, your department is interested in analyzing the presence of certain types of businesses, as well as the prevalence of parks, the amount of public land in a given area (hexagon bins), the median household income and home value, among other factors.

Currently, the hexagon bins in the space time cube layer contain no attribute information suitable for this kind of analysis. You'll run another geoprocessing tool to enrich the layer with relevant attribute information.

Open your Houston Crime Analysis project in ArcGIS Pro.

On the Analysis tab, in the Geoprocessing group, click Environments. In the Environments window, scroll down to the Fields section. Uncheck Maintain fully qualified field names.

Click OK.

Open the Geoprocessing pane, seach for and open the Enrich tool. Notice that this step requires ArcGIS service credits also (approximately no more than 50). If you do not have sufficient credits, I have prepared the enriched feature class already, which is named Houston_Crimes_Sample_Enrich. To see this, in the Catalog pane, on the Project tab, expand Folders and expand the houston-crime-sample folder. Expand the houston-crime.gdb geodatabase, copy the Houston_Crimes_Sample_Enrich feature class to the default project geodatabase Houston Crime Analysis.gdb.

If you want to enrich the feature class by yourself, change the following parameters:

  • For Input Features, choose Houston_Crimes_Hot_Spots.
  • For Output Feature Class, browse to the default project geodatabase Houston Crime Analysis.gdb and name the output feature class Houston_Crimes_Sample_Enrich.
  • For Variables, click the plus button. In the Add Variable window, search for and choose the following variables and click OK:
    • 2019 Total Population (Esri 2020)
    • 2020 Median Home Value
    • 2020 Median Household Income
    • 2020 Renter Occupied HUs
    • 2020 Food & Beverage Stores Bus (NAICS)
    • 2020 Food Service/Drinking Estab Bus (NAICS)

It should look like the following screenshot:

Notice that demographic data is updated periodically, so the available variables and values may differ from those specified in the lesson. If necessary, use the most recent data.

Moreover, the specific variables you add are important because specific variable names are used in the R script you'll run later in the lesson. If your variable names are different than those shown in the example image, you'll need to edit them when you paste the R script or the line won't run.

While not an exhaustive list of the variables that could potentially be linked to crime rates, this list will provide a good start for your analysis.

Click Run. The tool runs and the result layer is added to the map.

In the Contents pane, right-click the Houston_Crimes_Sample_Enrich layer and choose Attribute Table.

Scroll to the right in the attribute table until you can see the fields with which you chose to enrich the layer.

The newly added enrichment fields display in the table with alias names that are more descriptive than the original field names. In the list below, alias names are listed and followed by original field names in brackets.

 The result of the Enrich tool includes the following fields and values:

  • HasData - Indicates whether the Enrich tool found data for the given hexagon bin, with 0 meaning a hexagon had no available data for all of the attributes you selected and 1 meaning a hexagon bin had data for at least one of the attributes you selected. You can use this field to filter your data so that only features with relevant attribute information appear on the map.
  • 2019 Total Population (Esri 2020) (historicalpopulation_tspop10_cy) Contains the population count per hexagon bin. Some hexagons have a population of 0. A hexagon bin may have a population of 0 because it is located in an industrial area or in a park. The first priority of your department is to reduce crimes in populated areas, so you'll focus only on populated locations.
  • 2020 Median Home Value (wealth_medval_cy) - Contains the median home value per hexagon bin.
  • 2020 Median Household Income (wealth_medhinc_cy) - Contains the median household income value per hexagon bin.
  • 2020 Renter Occupied HUs (ownerrenter_renter_cy) - Contains the number of renter occupied households per hexagon bin.
  • 2020 Food & Beverage Stores Bus (NAICS) (businesses_n13_bus) - Contains the count of food and beverage stores located within each hexagon bin.
  • 2020 Food Service/Drinking Estab Bus (NAICS) (businesses_n37_bus) - Contains the count of businesses that serve food, beverages, or both located within each hexagon bin.

You've created a feature class that contains the information needed to perform your analysis, but you now have some data that is not pertinent to your analysis goals. Hexagon bins that do not have information for your attributes of interest do not add any value or new information to help you answer your questions. Additionally, areas that are not populated are not of high priority for your department at this time. As a result, you'll need to trim down your enriched dataset to contain only the information most useful to you.

Close the Attribute Table of the Houston_Crimes_Sample_Enrich layer.

2. Further prepare the dataset

Next, you'll select the data that is relevant to your analysis and make a subset with only that information. This way, you still have access to all your enriched data should you need it for further analyses, but you can continue your current analysis with only the necessary data.

In the Geoprocessing pane, click the Back button. Search for and open the Select Layer By Attribute tool.

For Input Rows, choose Houston_Crimes_Sample_Enrich. For Selection type, confirm that New selection is chosen.

Click New Expression. Create the expression Where HasData is equal to 0.

Click Add Clause and add the expression Or 2019 Total Population (Esri 2020) is Equal to 0.


Click Run. The tool runs and selects features that have no enriched data, or that have zero population. These may be industrial sites or parks.

Open the Attribute Table for the Houston_Crimes_Sample_Enrich layer. The table indicates that 18 of 762 rows are selected, meaning they have 0 values for the HasData or 2019 Total Population (Esri 2020) fields. You'll create a new dataset without these selected features so you can focus on features that have data relevant to your analysis.

In the attribute table, click the Switch button.

The button swaps the selection from the 18 rows that had no data or no population, to all of the other rows. You should have 744 of 762 rows selected, and you can now copy the enriched and populated data to its own layer.

In the Geoprocessing pane, click the Back button. Search for and open the Copy Features tool.

For Input Features, choose Houston_Crimes_Sample_Enrich. Notice that when you have specific rows selected, the Copy Features tool only copies those rows into your new feature class result.

For Output Feature Class, browse to Houston Crime Analysis.gdb and name the output feature class Houston_Crimes_Sample_Enrich_Subset.

Click Run. Now you have two layers, Houston_Crimes_Sample_Enrich and Houston_Crimes_Sample_Enrich_Subset. The former contains the full dataset, and the latter contains only the data for areas with enriched attributes or areas with people living in them.

Close the table. On the ribbon, on the Map tab, in the Selection group, click Clear.

Save your project.

Next, you'll learn how to analyze these attributes in R, and how they may influence the likelihood an area experiences crime.

PART IV. Conduct statistical analysis using R and ArcGIS Pro

Previously, you enriched your data with additional attributes, including attributes about population. Next, you'll calculate the crime rate for each location on your map. A crime rate determines how many crimes occur relative to the population. This will allow you to better compare crime counts between areas with vastly different amounts of people, as well as determine how crime rate may be influenced by the other attributes you added to your data.

While you could use the attribute table's Field Calculator in ArcGIS to determine the number of crimes per 100,000 population, you want to ensure that the crime rates you calculate are statistically robust. You'll use functions in R to smooth your crime rate.

For this analysis, you'll use the Empirical Bayes smoothing method. Empirical Bayes smoothing is a rate smoothing technique that uses the population in each of your bins as a measure of confidence in the data, with higher populations lending a high confidence. It then adjusts the areas with a lower confidence towards the mean. This technique will give the crime rates stability.

1. Bridge your data into R

Next, you'll work in RStudio to perform Empirical Bayes smoothing on your crime rates. Because you have the R-ArcGIS bridge, the data in your ArcGIS Pro project is connected to and accessible from RStudio.

Open your Houston Crime Analysis project in ArcGIS Pro. Open Rstudio.

In RStudio, in the R console, type the following code and press Enter:

 

install.packages("arcgisbinding", repos="http://r-arcgis.github.io/r-bridge", type="win.binary")

 

 

 

2. Calculate smoothed crime rates



3. Continue analysis in ArcGIS Pro



4. Identify areas with unusually high crime rates



PART V. Identify attributes that influence crime

1. Create a correlation matrix in R to evaluate attribute relationships



  • No labels