In-Class Exercise 5

In-Class Exercise R sf tidyverse tmap maptools spatstat raster

Today’s in-class exercise will bring us through a step-by-step of data preparation + wrangling - and then we’ll be testing out our newly-obtained NetSPPA skills 😄

Megan Sim https://www.linkedin.com/in/megan-sim-tze-yen/
09-13-2021

1.0 Overview

In Hands-On Exercise 5, we learned how to perform Network-constrained Spatial Point Patterns Analysis with spNetwork. As such, in this In-Class Exercise, we’ll be putting what we learned into use!

The first few sections (2.0 Setup to 3.0 Data Wrangling) are essentially the same as what we did in In-Class Exercise 4

2.0 Setup

2.1 Packages Used

The R packages we’ll be using today are all from our last exercise:

packages = c('maptools', 'sf', 'raster','spatstat', 'tmap', 'tidyverse', 'plotly', 'ggthemes')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Data Used

The datasets used for this exercise are:

2.3 Importing Data

2.3.1 Importing Geospatial Data

For our geospatial data, out output goal is tibbles sf object, so we’ll use st_read():

Compare against Hands-On Exercise 5 where we used readOGR() for a Spatial vector object output

mpsz_sf <- st_read(dsn = "data/shapefile", 
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-13-in-class-exercise-5\data\shapefile' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

As we can see, mpsz’s projection is in svy21.

2.3.2 Importing Aspatial Data

Here, we’re using read_rds() of the readr package rather than the readRDS() from base R - this is because we want our output to be a tibble object.

childcare <- read_rds("data/rds/childcare.rds")
CHAS <- read_rds("data/rds/CHAS.rds")

Note that there are some issues in the childcare dataframe: the Lat and Lng coordinate fields are in decimal degrees, and thus should be of numeric data type instead of the chr data type. The WGS84 referencing system is assumed.

3.0 Data Preparation + Wrangling

3.1 Converting from Aspatial to Geospatial

CHAS_sf <- st_as_sf(CHAS,
                    coords = c("X_COORDINATE",
                               "Y_COORDINATE"),
                    crs=3414)

Note: st_as_sf accepts coordinates in character data type. Personally, I believe it to be better practice to convert to numeric data type first, but as long as you check the data beforehand and you get no error messages, then there shouldn’t be a significant difference

# if you'd like, we can convert lat and lng to numeric data type first 
childcare$Lat <- as.numeric(childcare$Lat)
childcare$Lng <- as.numeric(childcare$Lng)

# transform in CRS WGS84 (code 4326) first, due to lat+lng decimal degrees
childcare_sf <- st_as_sf(childcare,
                    coords = c("Lng",
                               "Lat"),
                    crs=4326) %>%
  # afterwards, transform to CRS SVY21 (code 3414)
  st_transform(crs=3414)

At this stage, it’s good practice to do a plot to show the location of the data points. Remember to use the _sf versions - the originals are aspatial! Let’s have an interactive map :

tmap_mode("view")
tm_shape(childcare_sf) +
  tm_dots(alpha=0.4, #affects transparency of points
          col="blue",
          size=0.05) +
tm_shape(CHAS_sf) +
  tm_dots(alpha=0.4,
          col="red",
          size=0.05)

3.2 Converting from sf to Spatial* data frame

childcare <- as_Spatial(childcare_sf)
CHAS <- as_Spatial(CHAS_sf)
mpsz <- as_Spatial(mpsz_sf)

3.3 Converting from Spatial* data frame to Spatial* objects

childcare_sp <- as(childcare, "SpatialPoints")
CHAS_sp <- as(CHAS,"SpatialPoints")
mpsz_sp <- as(mpsz,"SpatialPolygons")

Note the difference between childcare and childcare_sp - one is a Spatial* data frame, while the other is a Spatial* object:

3.4 Converting from Spatial* objects to spatstat ppp format

childcare_ppp <- as(childcare_sp, "ppp")
CHAS_ppp <- as(CHAS_sp, "ppp")

Note: this uses the as.ppp() of maptools package - make sure it’s installed!

3.5 Removing duplicate points using jitter

childcare_ppp_jit <- rjitter(childcare_ppp,
                             retry=TRUE,
                             nsim=1, #number of simulations
                             drop=TRUE)
CHAS_ppp_jit <- rjitter(CHAS_ppp,
                             retry=TRUE,
                             nsim=1, #number of simulations
                             drop=TRUE)

#check for duplicates
any(duplicated(childcare_ppp_jit))
[1] FALSE
any(duplicated(CHAS_ppp_jit))
[1] FALSE

Note: This one requires the spatstat package. Also, _ppp_jit is neither simple features nor sp - it’s ppp format, so tmap won’t understand it. You’ll need to convert it back to simple features if you want to plot it.

4.0 Punggol Planning Area

4.1 Extracing Punggol Planning Area

pg <- mpsz[mpsz@data$PLN_AREA_N=="PUNGGOL",]

Note: for this particular function - remember that there’s a comma behind it! Without it, it won’t run :^(

4.2 Converting SpatialPolygonsDataFrame into SpatialPolygons object

pg_sp <- as(pg, "SpatialPolygons")

4.3 Converting SpatialPolygons objects into owin objects

An owin object is requied by spatstat if you want to define irregular windows.

pg_owin <- as(pg_sp, "owin")

4.4 Extracting spatial points window owin

childcare_pg <- childcare_ppp_jit[pg_owin]
CHAS_pg <- CHAS_ppp_jit[pg_owin]

Let’s take a quick look and visualise this:

plot(childcare_pg)

Note that it’s showing data points within our study area AKA Punggol. If you plot the jitter-only version, you’ll get all the points without any boundary:

plot(childcare_ppp_jit)

5.0 Analysis

5.1 L-function

L_childcare <- envelope(childcare_pg,
                        Lest,
                        nsim=99, 
                        rank=1, # go through 1 round of rankings
                        global=TRUE) # make global
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.

Done.
L_CHAS <- envelope(CHAS_pg,
                        Lest,
                        nsim=99, 
                        rank=1, # go through 1 round of rankings
                        global=TRUE) # make global
Generating 99 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98,  99.

Done.

5.1 Interactive L-function

title <- "Pairwise Distance: L function"

Lcsr_df <- as.data.frame(L_childcare)

colour=c("#0D657D","#ee770d","#D3D3D3")
csr_plot <- ggplot(Lcsr_df, aes(r, obs-r))+
  # plot observed value
  geom_line(colour=c("#4d4d4d"))+
  geom_line(aes(r,theo-r), colour="red", linetype = "dashed")+
  # plot simulation envelopes
  geom_ribbon(aes(ymin=lo-r,ymax=hi-r),alpha=0.1, colour=c("#91bfdb")) +
  xlab("Distance r (m)") +
  ylab("L(r)-r") +
  geom_rug(data=Lcsr_df[Lcsr_df$obs > Lcsr_df$hi,], sides="b", colour=colour[1])  +
  geom_rug(data=Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,], sides="b", colour=colour[2]) +
  geom_rug(data=Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,], sides="b", color=colour[3]) +
  theme_tufte()+
  ggtitle(title)

text1<-"Significant clustering"
text2<-"Significant segregation"
text3<-"Not significant clustering/segregation"

# the below conditional statement is required to ensure that the labels (text1/2/3) are assigned to the correct traces
if (nrow(Lcsr_df[Lcsr_df$obs > Lcsr_df$hi,])==0){ 
  if (nrow(Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,])==0){ 
    ggplotly(csr_plot, dynamicTicks=T) %>%
      style(text = text3, traces = 4) %>%
      rangeslider() 
  }else if (nrow(Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,])==0){ 
    ggplotly(csr_plot, dynamicTicks=T) %>%
      style(text = text2, traces = 4) %>%
      rangeslider() 
  }else {
    ggplotly(csr_plot, dynamicTicks=T) %>%
      style(text = text2, traces = 4) %>%
      style(text = text3, traces = 5) %>%
      rangeslider() 
  }
} else if (nrow(Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,])==0){
  if (nrow(Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,])==0){
    ggplotly(csr_plot, dynamicTicks=T) %>%
      style(text = text1, traces = 4) %>%
      rangeslider() 
  } else{
    ggplotly(csr_plot, dynamicTicks=T) %>%
      style(text = text1, traces = 4) %>%
      style(text = text3, traces = 5) %>%
      rangeslider()
  }
} else{
  ggplotly(csr_plot, dynamicTicks=T) %>%
    style(text = text1, traces = 4) %>%
    style(text = text2, traces = 5) %>%
    style(text = text3, traces = 6) %>%
    rangeslider()
  }