Today’s in-class exercise will bring us through a step-by-step of data preparation + wrangling - and then we’ll be testing out our newly-obtained NetSPPA skills 😄
In Hands-On Exercise 5, we learned how to perform Network-constrained Spatial Point Patterns Analysis with spNetwork. As such, in this In-Class Exercise, we’ll be putting what we learned into use!
The first few sections (2.0 Setup to 3.0 Data Wrangling) are essentially the same as what we did in In-Class Exercise 4
The R packages we’ll be using today are all from our last exercise:
packages = c('maptools', 'sf', 'raster','spatstat', 'tmap', 'tidyverse', 'plotly', 'ggthemes')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The datasets used for this exercise are:
CHILDCARE
, a point feature data in rds format from data.gov.sg, providing both location and attribute information of childcare centresCHAS
, a point feature data in rds format that provides location and attribute information of CHAS clinicsMP14_SUBZONE_WEB_PL
, a polygon feature data in ESRI shapefile format from data.gov.sg, providing information of URA 2014 Master Plan Planning Subzone boundary dataPunggol_St
, a line features geospatial data which stores the road network within Punggol Planning Area (ESRI shapefile)For our geospatial data, out output goal is tibbles sf object, so we’ll use st_read():
Compare against Hands-On Exercise 5 where we used readOGR() for a Spatial vector object output
mpsz_sf <- st_read(dsn = "data/shapefile",
layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\IS415\IS415_msty\_posts\2021-09-13-in-class-exercise-5\data\shapefile'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
As we can see, mpsz
’s projection is in svy21.
Here, we’re using read_rds() of the readr package rather than the readRDS() from base R - this is because we want our output to be a tibble object.
childcare <- read_rds("data/rds/childcare.rds")
CHAS <- read_rds("data/rds/CHAS.rds")
Note that there are some issues in the childcare dataframe: the Lat
and Lng
coordinate fields are in decimal degrees, and thus should be of numeric data type instead of the chr data type. The WGS84
referencing system is assumed.
CHAS_sf <- st_as_sf(CHAS,
coords = c("X_COORDINATE",
"Y_COORDINATE"),
crs=3414)
Note: st_as_sf accepts coordinates in character data type. Personally, I believe it to be better practice to convert to numeric data type first, but as long as you check the data beforehand and you get no error messages, then there shouldn’t be a significant difference
# if you'd like, we can convert lat and lng to numeric data type first
childcare$Lat <- as.numeric(childcare$Lat)
childcare$Lng <- as.numeric(childcare$Lng)
# transform in CRS WGS84 (code 4326) first, due to lat+lng decimal degrees
childcare_sf <- st_as_sf(childcare,
coords = c("Lng",
"Lat"),
crs=4326) %>%
# afterwards, transform to CRS SVY21 (code 3414)
st_transform(crs=3414)
At this stage, it’s good practice to do a plot to show the location of the data points. Remember to use the _sf versions - the originals are aspatial! Let’s have an interactive map :
tmap_mode("view")
tm_shape(childcare_sf) +
tm_dots(alpha=0.4, #affects transparency of points
col="blue",
size=0.05) +
tm_shape(CHAS_sf) +
tm_dots(alpha=0.4,
col="red",
size=0.05)
childcare <- as_Spatial(childcare_sf)
CHAS <- as_Spatial(CHAS_sf)
mpsz <- as_Spatial(mpsz_sf)
childcare_sp <- as(childcare, "SpatialPoints")
CHAS_sp <- as(CHAS,"SpatialPoints")
mpsz_sp <- as(mpsz,"SpatialPolygons")
Note the difference between childcare
and childcare_sp
- one is a Spatial* data frame, while the other is a Spatial* object:
childcare_ppp <- as(childcare_sp, "ppp")
CHAS_ppp <- as(CHAS_sp, "ppp")
Note: this uses the as.ppp() of maptools package - make sure it’s installed!
childcare_ppp_jit <- rjitter(childcare_ppp,
retry=TRUE,
nsim=1, #number of simulations
drop=TRUE)
CHAS_ppp_jit <- rjitter(CHAS_ppp,
retry=TRUE,
nsim=1, #number of simulations
drop=TRUE)
#check for duplicates
any(duplicated(childcare_ppp_jit))
[1] FALSE
any(duplicated(CHAS_ppp_jit))
[1] FALSE
Note: This one requires the spatstat package. Also, _ppp_jit is neither simple features nor sp - it’s ppp format, so tmap won’t understand it. You’ll need to convert it back to simple features if you want to plot it.
pg <- mpsz[mpsz@data$PLN_AREA_N=="PUNGGOL",]
Note: for this particular function - remember that there’s a comma behind it! Without it, it won’t run :^(
pg_sp <- as(pg, "SpatialPolygons")
An owin object is requied by spatstat if you want to define irregular windows.
pg_owin <- as(pg_sp, "owin")
childcare_pg <- childcare_ppp_jit[pg_owin]
CHAS_pg <- CHAS_ppp_jit[pg_owin]
Let’s take a quick look and visualise this:
plot(childcare_pg)
Note that it’s showing data points within our study area AKA Punggol. If you plot the jitter-only version, you’ll get all the points without any boundary:
plot(childcare_ppp_jit)
L_childcare <- envelope(childcare_pg,
Lest,
nsim=99,
rank=1, # go through 1 round of rankings
global=TRUE) # make global
Generating 99 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
Done.
L_CHAS <- envelope(CHAS_pg,
Lest,
nsim=99,
rank=1, # go through 1 round of rankings
global=TRUE) # make global
Generating 99 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99.
Done.
title <- "Pairwise Distance: L function"
Lcsr_df <- as.data.frame(L_childcare)
colour=c("#0D657D","#ee770d","#D3D3D3")
csr_plot <- ggplot(Lcsr_df, aes(r, obs-r))+
# plot observed value
geom_line(colour=c("#4d4d4d"))+
geom_line(aes(r,theo-r), colour="red", linetype = "dashed")+
# plot simulation envelopes
geom_ribbon(aes(ymin=lo-r,ymax=hi-r),alpha=0.1, colour=c("#91bfdb")) +
xlab("Distance r (m)") +
ylab("L(r)-r") +
geom_rug(data=Lcsr_df[Lcsr_df$obs > Lcsr_df$hi,], sides="b", colour=colour[1]) +
geom_rug(data=Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,], sides="b", colour=colour[2]) +
geom_rug(data=Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,], sides="b", color=colour[3]) +
theme_tufte()+
ggtitle(title)
text1<-"Significant clustering"
text2<-"Significant segregation"
text3<-"Not significant clustering/segregation"
# the below conditional statement is required to ensure that the labels (text1/2/3) are assigned to the correct traces
if (nrow(Lcsr_df[Lcsr_df$obs > Lcsr_df$hi,])==0){
if (nrow(Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,])==0){
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text3, traces = 4) %>%
rangeslider()
}else if (nrow(Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,])==0){
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text2, traces = 4) %>%
rangeslider()
}else {
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text2, traces = 4) %>%
style(text = text3, traces = 5) %>%
rangeslider()
}
} else if (nrow(Lcsr_df[Lcsr_df$obs < Lcsr_df$lo,])==0){
if (nrow(Lcsr_df[Lcsr_df$obs >= Lcsr_df$lo & Lcsr_df$obs <= Lcsr_df$hi,])==0){
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text1, traces = 4) %>%
rangeslider()
} else{
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text1, traces = 4) %>%
style(text = text3, traces = 5) %>%
rangeslider()
}
} else{
ggplotly(csr_plot, dynamicTicks=T) %>%
style(text = text1, traces = 4) %>%
style(text = text2, traces = 5) %>%
style(text = text3, traces = 6) %>%
rangeslider()
}