Hands-On Exercise 5A

Hands-On Exercise R rgdal sp spNetwork tmap

Today’s Adventure: Network Constrained Spatial Point Patterns Analysis (NetSPPA)! With the spNetwork package, let’s learn how to prepare our data for NetSPPA, perform and visualise NetKDE, and even perform a CSR test ✨

Megan Sim https://www.linkedin.com/in/megan-sim-tze-yen/
09-12-2021

1.0 Overview

In Hands-On Exercise 04, we learned how to how to conduct Spatial Point Pattern Analysis (SPPA) on our data. This exercise, we’ll be learning about Network-constrained Spatial Point Pattern Analysis (NetSPPA): analysis of spatial point events that occur on or alongside a network. For example, in today’s exercise, we’ll be analysing the spatial distribution of childcare centres with relation to the road network.

While researching for articles on NetSPPA, I chanced upon this in-depth NetSPPA study that analysed the relationship between the road networks in Nanjing with the distribution of healthcare facilities, with the aim of examining the reasonability of existing healthcare facility distribution and optimising the location of new ones. Using weighted Kernel Density Estimation (KDE), network K-function analysis and correlation analysis, they achieved insights into the spread of hospitals around street centralities, and even saw a difference in distribution between public hospitals and private ones. You can also download it here.

2.0 Setup

2.1 Packages Used

The R packages we’ll be introducing today are:

In addition, we’ll be using the packages from our last lesson:

packages = c('sp', 'rgdal', 'spNetwork', 'tmap')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Data Used

The datasets used for this exercise are:

2.3 Importing Data

We’re used to using st_read() for importing geospatial data, but this time we’ll be using something new: readOGR(), which outputs a suitable Spatial vector object. In this case, we’ll be importing Punggol_St and Punggol_CC as SpatialLinesDataFrame and SpatialPointsDataFrame respectively.

# we use verbose=FALSE so that the progress won't be reported
network <- readOGR(dsn="data/geospatial", 
                   layer="Punggol_St",
                   verbose = FALSE)
childcare <- readOGR(dsn="data/geospatial",
                     layer="Punggol_CC",
                     verbose = FALSE)

2.4 Data Preparation

We can take a look at our data with str():

#str(network) commented out due to length of output
str(childcare)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 61 obs. of  1 variable:
  .. ..$ Name: chr [1:61] "kml_10" "kml_99" "kml_100" "kml_101" ...
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:61, 1:3] 36174 36480 36619 36285 35415 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:3] "coords.x1" "coords.x2" "coords.x3"
  ..@ bbox       : num [1:3, 1:2] 34424 41504 0 37619 44686 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:3] "coords.x1" "coords.x2" "coords.x3"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
  .. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n    BASEGEOGCRS[\"SVY21\",\n        DATUM[\"SVY21\",\n            ELLIPSOID["| __truncated__

Before we move on into visulisation, we need to take care of one more step. spNetwork expects the geospatial data to contain complete CRS information, so - just like how we used to set the CRS of a dataframe with st_transform(), we’ll similarly set the for our Spatial vector objects with spTransform()! Like our previous geospatial data, the ESPG code for SVY21 is 3414.

childcare <-spTransform(childcare, CRS("+init=epsg:3414"))
network <- spTransform(network,CRS("+init=epsg:3414"))

2.5 Data Visualisation

Let’s visualise our geospatial data in both static and interactive forms to better help with our analysis:

plot(network)
plot(childcare,add=T,col='red',pch = 19)

tmap_mode('view')
tm_shape(childcare)+
  tm_dots() +
tm_shape(network)+
  tm_lines()
#return tmap mode to plot for future visualisations
tmap_mode('plot')

3.0 Network Constrained KDE (NetKDE) Analysis

3.1 Preparing the lixels objects

Before computing NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance - which we’ll do with lixelize_lines():

lixels <- lixelize_lines(network,700,mindist = 350)

With this, we’ve set certain dimensions for our lixels objects:

3.2 Generating line centre points

Next, let us generate a SpatialPointsDataFrame with line centre points (points that are located at the center of the line, based on the line length). This will be performed with lines_center(), as seen below:

samples <- lines_center(lixels)

3.3 Performing NetKDE

Now, let’s get into the meat of the matter: NetKDE!

densities <- nkde(network, 
                  events = childcare,
                  w = rep(1,nrow(childcare)),
                  samples = samples,
                  kernel_name = "quartic",
                  bw = 300, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  #for faster calculation, aggregate events within a 5m radius
                  agg = 5, 
                  sparse = TRUE,
                  verbose = FALSE)

Whew, that’s a long code chunk! Let’s break it down a little:

More details on the other parameters can be found here; moreover, I would recommend reading this comprehensive guide introducing NKDE, the 3 methods to calculate it (simple, discontinuous and continuous) and most importantly spNetworks’s application of NKDE.

3.4 Visualising NetKDE

Before we can visualise the NetKDE values, we’ll need to insert our computed density values, densities, into the density field of our sample and lixel objects:

samples$density <- densities
lixels$density <- densities

You might’ve noticed from just now that the computed density values are very small - this is because the svy21 projection system is in meters. However, that isn’t an appropriate scale of measurement: saying that there are 0.0000005 chidcare centres in a meter just doesn’t make sense 😟 Let’s rescale that into events per kilometer instead:

samples$density <- samples$density*1000
lixels$density <- lixels$density*1000

Now, let’s try visualising it as a interactive map again:

tmap_mode('view')
tm_shape(lixels)+
  tm_lines(col="density")+
tm_shape(childcare)+
  tm_dots()
#return tmap mode to plot for future visualisations
tmap_mode('plot')

Try playing around with the map: you’re realise that road segments with a relatively higher density of childcare centres will appear darker in colour than those with a relatively lower density of childcare centres.

4.0 Network Constrained G- and K-Function Analysis

Now, let’s perform a complete spatial randomness (CSR) test: this is to test if the events within a given study area occur in completely random fashion. The main reason for doing so is for efficiency and targeted analysis: if our data exhibits signs of complete spatial randomness, then there is no underlying structure and therefore little to be gained from further analysis.

The null hypothesis is defined as:

H0: The observed spatial point events (i.e distribution of childcare centres) are uniformly distributed over a street network in Punggol Planning Area.

The CSR test is based on the assumption of the binomial point process, which implies the hypothesis that the childcare centres are randomly and independently distributed over the street network.

If this hypothesis is rejected, we may infer that the distribution of childcare centres are spatially interacting and dependent on each other; as a result, they may form nonrandom patterns.

kfun_childcare <- kfunctions(network, 
                             childcare,
                             start = 0, 
                             end = 1000, 
                             step = 50, 
                             width = 50, 
                             nsim = 50, 
                             resolution = 50,
                             verbose = FALSE, 
                             conf_int = 0.05)

Another long code chunk - let’s break it down! Here are the arguments used:

The output of kfunctions() is a list with the following values:

Since it’s a ggplot2 object, we can easily visaulise these! Let’s try visualising the k-function:

kfun_childcare$plotk

How do we interpret this? The blue line is the empirical network K-function of the childcare centres in Punggol planning area, while the gray envelope represents the results of the 50 simulations in the interval 2.5% - 97.5%. You might notice that the blue line is within the gray envelope for most part, but is below it between 250m-400m: from this, our inference is that childcare centres in the Punggol planning area resemble a regular pattern at the distance of 250m-400m.

5.0 End Notes

With that, we’ve learned the gist of performing NetSPPA! From how to prepare our data for NetSPPA, to performing NetKDE and visualising our results, to performing a CSR test and plotting our k- and g-functions, we’ve done it all! We’re slowly building up our geospatial analytics repertoire - and there’s still so much more to discover 😄 See you next exercise!