Take-Home Exercise 2: Spatial Point Patterns Analysis of Airbnb Listings in Singapore

Take-Home Exercise R

This analysis aims to investigate the distribution of Airbnb listings and how location factors affect it.

Author

Affiliation

Megan Sim

Published

Sept. 18, 2021

DOI

1.0 Overview

1.1 Background

The email that started Airbnb. Source

From a couple of air mattresses and a complimentary breakfast at $80/night to a billion-dollar international business, Airbnb stands as one of the greatest success stories of the sharing economy. Their peer-to-peer model of short-term home-sharing have found a foothold worldwide: from Washington DC to New Delhi, from Shanghai to Havana - convenient locations at low cost have garnered great support from hosts and travellers alike.

And how has Singapore accommodated this sector of the sharing revolution?

Turns out: it hasn’t. Short-term rentals of less than three months remain illegal in Singapore - 6 months if you’re renting out a HDB. In addition, there’s an occupancy limit for those looking to rent out their HDBs: 6 persons for a 3-room flat, and 9 persons for a 4-room flat or larger. An extensive public consultation spanning 4 years came back with the conclusion: you might be alright with renting out your home, but your neighbours aren’t. Security concerns, a loss of privacy, causing disturbances within the neighbourhood or even damaging common facilities are among the top concerns of private homeowners - and this sentiment comes across in the strict regulatory framework for renting out short-term accommodations.

Airbnb’s setbacks aren’t solely with government restrictions: with 2020 came the COVID-19 global pandemic, which caused significant upheaval in the lives of many - arguably, the lives of every. As this paper points out, Airbnb hosts were ‘spoiled by high returns and the prospect of the ever-increasing growth of the Airbnb market’ pre-COVID, only to be hit with cancelled bookings and overwhelming financial pressure that lead to (temporary) exodus from the short-term rental market, or transition into long-term rentals.

In the context of Singapore, how do these two play into Airbnb accommodations?

1.2 Problem Statement

There are two parts to our analysis that we want to investigate, namely:

[A] how the distribution of Airbnb listings are affected by location factors (such as MRTs, tourist locations etc.)
[B] the impact of COVID-19 on Airbnb listings in Singapore (a comparison of the pre- and post-COVID-19 listings)

2.0 Setup

2.1 Packages Used

The R packages we’ll use for this analysis are:

sf: used for importing, managing, and processing geospatial data
tidyverse: a collection of packages for data science tasks
tmap: used for creating thematic maps, such as choropleth and bubble maps
spatstat: used for point pattern analysis
raster: reads, writes, manipulates, analyses and models gridded spatial data (i.e. raster-based geographical data)
maptools: a set of tools for manipulating geographic data
rgdal: provides bindings to the Geospatial Data Analysis Library (GDAL) and used for projectoin/transforamtion operations
kableExtra: an extension of kable, used for table customisation
plotly: used for creating interactive web graphics, and can be used in conjunction with ggplot2 with the ggplotly() function
ggthemes: an extension of ggplot2, with more advanced themes for plotting
onemapsgapi: used to query Singapore-specific spatial data, alongside additional functionalities. Recommended readings: Vignette and Documentation
devtools: used for installing any R packages which is not available in RCRAN. In this context, it’ll be used to download the xaringanExtra package for panelsets

In addition, the following tidyverse packages will be used:

readr for importing delimited files (.csv)
tidyr for manipulating and tidying data
dplyr for wrangling and transforming data
ggplot2 for visualising data

# initialise a list of required packages
packages = c('sf', 'tidyverse', 'tmap', 'spatstat', 'raster', 'maptools', 'rgdal',
             'kableExtra', 'plotly', 'ggthemes', 'onemapsgapi', 'devtools')

# for each package, check if installed and if not, install it
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

devtools::install_github("gadenbuie/xaringanExtra")
library(xaringanExtra)

2.2 Datasets Used

# initialise a dataframe of our geospatial and aspatial dataset details
datasets <- data.frame(
  Type=c("Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         "Geospatial",
         
         "Aspatial",
         "Aspatial",
         "Aspatial",
         "Aspatial"),
  
  Name=c("Singapore National Boundary `CostalOutline`",
         "Master Plan 2014 Subzone Boundary (Web) `MP14_SUBZONE_WEB_PL`",
         "MRT & LRT Locations Aug 2021 `MRTLRTStnPtt`",
         "Bus Stop Locations Aug 2021 `BusStop`",
         "Taxi Stand Locations Aug 2021 `TaxiStop`",
         "Historic Sites",
         "Moneychanger Locations",
         "Monuments",
         "Museums",
         
         "Singapore Airbnb Listings June 2019",
         "Singapore Airbnb Listings June 2021",
         "Tourism",
         "Hotels"),
  
  Format=c(".shp", 
           ".shp", 
           ".shp", 
           ".shp", 
           ".shp", 
           ".kml",
           ".kml",
           ".kml",
           ".kml",
           
           ".csv",
           ".csv",
           ".csv",
           ".csv"),
  
  Source=c("[data.gov.sg](https://data.gov.sg/dataset/national-map-polygon)",
           "[data.gov.sg](https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-web)",
           "[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=mrt)",
           "[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=bus%20stop)",
           "[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=taxi)",
           "[data.gov.sg](https://data.gov.sg/dataset/historic-sites)",
           "[data.gov.sg](https://data.gov.sg/dataset/locations-of-money-changer)",
           "[data.gov.sg](https://data.gov.sg/dataset/monuments)",
           "[data.gov.sg](https://data.gov.sg/dataset/museums)",
           
           "[Inside Airbnb](http://insideairbnb.com/get-the-data.html)",
           "[Inside Airbnb](http://insideairbnb.com/get-the-data.html)",
           "[OneMap API](https://www.onemap.gov.sg/docs/)",
           "[OneMap API](https://www.onemap.gov.sg/docs/)")
  )

# with reference to this guide on kableExtra:
# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
# kable_material is the name of the kable theme
# 'hover' for to highlight row when hovering, 'scale_down' to adjust table to fit page width
library(knitr)
library(kableExtra)
kable(datasets, caption="Datasets Used") %>%
  kable_material("hover", latex_options="scale_down")

Table 1: Datasets Used
Type	Name	Format	Source
Geospatial	Singapore National Boundary `CostalOutline`	.shp	data.gov.sg
Geospatial	Master Plan 2014 Subzone Boundary (Web) `MP14_SUBZONE_WEB_PL`	.shp	data.gov.sg
Geospatial	MRT & LRT Locations Aug 2021 `MRTLRTStnPtt`	.shp	LTA Data Mall
Geospatial	Bus Stop Locations Aug 2021 `BusStop`	.shp	LTA Data Mall
Geospatial	Taxi Stand Locations Aug 2021 `TaxiStop`	.shp	LTA Data Mall
Geospatial	Historic Sites	.kml	data.gov.sg
Geospatial	Moneychanger Locations	.kml	data.gov.sg
Geospatial	Monuments	.kml	data.gov.sg
Geospatial	Museums	.kml	data.gov.sg
Aspatial	Singapore Airbnb Listings June 2019	.csv	Inside Airbnb
Aspatial	Singapore Airbnb Listings June 2021	.csv	Inside Airbnb
Aspatial	Tourism	.csv	OneMap API
Aspatial	Hotels	.csv	OneMap API

Each source links to the respective dataset source, not the generic website - you can download and follow along 👍

Due to authorisation issues with the OneMap API, I chose to download the datasets from either Data.gov.sg or LTA Data Mall instead; However, they are all available as themes on OneMap. The list of available themes from the OneMap API was referenced from this document

Justification for Supplementary Datasets

In order to supplement the rail network of MRTs and LRTs, I’ve included bus stop and taxi stand locations - both of which contribute to our public transport, which, as this paper points out, adds to the total tourist experience (Duval, 2007) and may influence tourist satisfaction with the destination (Thompson & Schofield, 2007).

To encourage tourists to use public transport, a joint effort between the Land Transport Authority (LTA) and the Singapore Tourism Board (STB) lead to the official launch of the ‘Singapore Tourist Pass’ and a Public Transport Guide for Tourists (The “Travel with Ease” Guide). The former allows for unlimited bus and train rides for S$8/day, making public transport both cost-effective and convenient for tourists - the latter presents tourists with useful information on the various modes of public transport and gives travel directions to various places of interest. This is one of the contributors to Singapore taking top spot in McKinsey’s 2018 “Cities with Best Public Tranportation” Survey.

As such, I have reason to believe accessibility to the public transport system is a contributing factor to the attractiveness of Airbnb listings - and as we know, public transport extends past just the MRTs and LRTs! Thus my inclusion of bus stops (bus services are included as part of the Singapore Tourist Pass) and taxi stands (taxis are part of public transport, just not mass transport - and taxis are the fastest and most direct method to get to specific sightseeing areas).

Retrieved from Science Direct, Using Historical Heritage as a Factor in Tourism Development

In addition, historical and cultural objects, such as museums or monuments, are one of the main factors of tourism. As this paper, Using Historical Heritage as a Factor in Tourism Development puts it: the cultural capacity of the region, expressed through its historical heritage, plays a huge role in the development of tourism - something I believe could be showcased through this analysis. Thus my inclusion of museums, monuments and historical sites - some of which might overlap with the existing ‘tourism’ aspatial dataset, but we might be able to better determine the relationships

The inclusion of moneychangers is self-explanatory: at any point of the tourist’s journey, unexpected incidents that require extra funds might crop up, necessitating either using their credit card (which is known to incur significant fees, and might not be accepted in certain parts of a country or even the whole country) or a trip to the moneychanger.

3.0 Data Wrangling: Geospatial Data

Here’s a lil refresher on the import methods:

3.1 Importing Geospatial Data

Note: we could use either st_read() or readOGR() to read our geospatial data, but I prefer to read it in as a simple features dataframe first and then use as_Spatial() to convert it to a Spatial* object when necessary. This way, I can perform visualisations on both the simple features dataframe and Spatial* objects!

In addition, since we have .kml files - recall what we learned in our very first exercise, Hands-On Exercise 02:

# reads in geospatial data and stores into respective dataframes
sg_sf <- st_read(dsn = "data/geospatial", layer="CostalOutline")

Reading layer `CostalOutline' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 60 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
Projected CRS: SVY21

mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")

Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

rail_network_sf <- st_read(dsn="data/geospatial", layer="MRTLRTStnPtt")

Reading layer `MRTLRTStnPtt' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 171 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21

# self-sourced data for further visualisations
bus_sf <- st_read(dsn="data/geospatial/selfsourced", layer="BusStop")

Reading layer `BusStop' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced' 
  using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21

taxi_sf <- st_read(dsn="data/geospatial/selfsourced", layer="TaxiStop")

Reading layer `TaxiStop' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced' 
  using driver `ESRI Shapefile'
Simple feature collection with 354 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6151.732 ymin: 27461.92 xmax: 45468.51 ymax: 48948.45
Projected CRS: SVY21

museums_sf <- st_read("data/geospatial/selfsourced/museums-kml.kml")

Reading layer `MUSEUM' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\museums-kml.kml' 
  using driver `KML'
Simple feature collection with 54 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6768 ymin: 1.248349 xmax: 104.0165 ymax: 1.383068
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

monuments_sf <- st_read("data/geospatial/selfsourced/monuments-kml.kml")

Reading layer `MONUMENTS' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\monuments-kml.kml' 
  using driver `KML'
Simple feature collection with 72 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6831 ymin: 1.264685 xmax: 103.9696 ymax: 1.447378
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

moneychangers_sf <- st_read("data/geospatial/selfsourced/locations-of-money-changer-kml.kml")

Reading layer `MONEYCHANGER' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\locations-of-money-changer-kml.kml' 
  using driver `KML'
Simple feature collection with 313 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6781 ymin: 1.24506 xmax: 103.9897 ymax: 1.448227
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

historic_sites_sf <- st_read("data/geospatial/selfsourced/historic-sites-kml.kml")

Reading layer `HISTORICSITES' from data source 
  `C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\historic-sites-kml.kml' 
  using driver `KML'
Simple feature collection with 99 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6993 ymin: 1.252663 xmax: 103.9906 ymax: 1.448855
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

Overall, most of the projected CRS are the ‘Singapore Projected Coordinate system’, SVY21 (ESPG Code 3414), whichi s appropriate for our analysis (which is Singapore-centric). However, the museums, monuments, moneychangers and historic_sites are using the ‘World Geodetic System 1984’, WGS84. We’ll address this and check on their CRS with st_crs() later on in Section 3.3.

Also, notice that museums, monuments, moneychangers and historic_sites have their dimensions listed as ‘XYZ’: it has a z-dimension, though as we can see from the z_range, both zmin and zmax are at 0. As it is irrelevant to our analysis, we’ll drop this with st_zm() in our pre-processing.

3.2 Data Pre-Processing

Before we even start visualising our data, we have to first check for two things: invalid geometries and missing values, which could impact future calculations and representations if not addressed. In addition, we have to drop the z-dimension from some of our dataframes.

Reference was taken from the senior sample submissions for the code for this section, with credit to Xiao Rong Wong’s Geographic Analysis of the Supply & Demand of Childcare Services in Singapore

3.2.1 Dropping Z-Dimension

As we noticed in the section above, certain dataframes have a Z-dimension. We’ll take care of with st_zm(), a function that drops Z (or M) dimensions from feature geometries and appropriate reset the classes.

# drops the Z-dimension from our dataframes
# due to the length of the output, I've opted to hide the results 
# reference for manipulating output messages: https://yihui.org/knitr/demo/output/
museums_sf <- st_zm(museums_sf)
monuments_sf <- st_zm(monuments_sf)
moneychangers_sf <- st_zm(moneychangers_sf)
historic_sites_sf <- st_zm(historic_sites_sf)

# once again, due to the length of output, I've opted to leave this as a non-evaluated line of code
# however, I've included a screenshot of the first portion of the output!
museums

3.2.2 Invalid Geometries

Let us first check for invalid geometries:

# function breakdown:
# the st_is_valid function checks whether a geometry is valid
# which returns the indices of certain values based on logical conditions
# length returns the length of data objects

# checks for the number of geometries that are NOT valid
length(which(st_is_valid(sg_sf) == FALSE))

[1] 1

length(which(st_is_valid(mpsz_sf) == FALSE))

[1] 9

length(which(st_is_valid(rail_network_sf) == FALSE))

[1] 0

length(which(st_is_valid(bus_sf) == FALSE))

[1] 0

length(which(st_is_valid(taxi_sf) == FALSE))

[1] 0

length(which(st_is_valid(museums_sf) == FALSE))

[1] 0

length(which(st_is_valid(monuments_sf) == FALSE))

[1] 0

length(which(st_is_valid(moneychangers_sf) == FALSE))

[1] 0

length(which(st_is_valid(historic_sites_sf) == FALSE))

[1] 0

# Alternative Method
# test <- st_is_valid(sg_sf,reason=TRUE)
# length(which(test!= "Valid Geometry"))
# credit to Rajiv Abraham Xavier https://rpubs.com/rax/Take_Home_Ex01

Note: even if you didn’t drop the Z co-ordinate for this section, our st_is_valid() function will automatically do it for you, like so:

As we can see from the output, sg has 1 invalid geometry while mpsz has 9 invalid geometries. With reference to this article on checking and creating validity, let’s address them and check again:

# st_make_valid takes in an invalid geometry and outputs a valid one with the lwgeom_makevalid method
sg_sf <- st_make_valid(sg_sf)
length(which(st_is_valid(sg_sf) == FALSE))

[1] 0

mpsz_sf <- st_make_valid(mpsz_sf)
length(which(st_is_valid(mpsz_sf) == FALSE))

[1] 0

Success! 🎉

3.2.3 Missing Values

Now, let us check for missing values:

# the rowSums(is.na(sg_sf))!=0 checks every row if there are NA values, returning TRUE or FALSE
# the sg_sf [] 'wrapper' prints said rows that contain NA values
sg_sf[rowSums(is.na(sg_sf))!=0,]

Simple feature collection with 0 features and 4 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] GDO_GID    MSLINK     MAPID      COSTAL_NAM geometry  
<0 rows> (or 0-length row.names)

mpsz_sf[rowSums(is.na(mpsz_sf))!=0,]

Simple feature collection with 0 features and 15 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
 [1] OBJECTID   SUBZONE_NO SUBZONE_N  SUBZONE_C  CA_IND     PLN_AREA_N
 [7] PLN_AREA_C REGION_N   REGION_C   INC_CRC    FMEL_UPD_D X_ADDR    
[13] Y_ADDR     SHAPE_Leng SHAPE_Area geometry  
<0 rows> (or 0-length row.names)

rail_network_sf[rowSums(is.na(rail_network_sf))!=0,]

Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO   geometry
<0 rows> (or 0-length row.names)

bus_sf[rowSums(is.na(bus_sf))!=0,]

Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
    BUS_STOP_N BUS_ROOF_N LOC_DESC                  geometry
264      47201        UNK     <NA> POINT (22616.75 47793.68)

taxi_sf[rowSums(is.na(taxi_sf))!=0,]

Simple feature collection with 354 features and 2 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6151.732 ymin: 27461.92 xmax: 45468.51 ymax: 48948.45
Projected CRS: SVY21
First 10 features:
   TYPE_CD              TYPE_CD_DE                  geometry
1     <NA>               TAXI STOP POINT (30571.45 31254.88)
2     <NA>               TAXI STOP POINT (29420.41 30957.16)
3     <NA> TAXI PICK UP / DROP OFF POINT (30600.66 30530.26)
4     <NA>              TAXI STAND POINT (30952.44 30436.43)
5     <NA>              TAXI STAND POINT (30819.86 30858.43)
6     <NA>              TAXI STAND POINT (30914.61 30817.98)
7     <NA>              TAXI STAND POINT (30732.65 30446.24)
8     <NA>              TAXI STAND POINT (31035.63 30580.66)
9     <NA>              TAXI STAND POINT (30556.68 30247.45)
10    <NA>              TAXI STAND  POINT (30629.45 30577.2)

museums_sf[rowSums(is.na(museums_sf))!=0,]

Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)

monuments_sf[rowSums(is.na(monuments_sf))!=0,]

Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)

moneychangers_sf[rowSums(is.na(moneychangers_sf))!=0,]

Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)

historic_sites_sf[rowSums(is.na(historic_sites_sf))!=0,]

Simple feature collection with 0 features and 2 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS:  WGS 84
[1] Name        Description geometry   
<0 rows> (or 0-length row.names)

There doesn’t seem to be any missing values for any of our provided geospatial datasets ✨ But our self-sourced ones have some missing values, so let’s address them. After a bit of EDA and testing, I’ve realised that all the observations for TYPE_CD in the taxi dataframe are NA: likely because there is a separate column, TYPE_CD_DE, that describes the taxi stand type (i.e. overlap of features). As such, we should drop that column first before re-testing for NA values.

# using select(), removes the TYPE_CD column from taxi_sf dataframe
# and assigns this new dataframe back to taxi
# reference: https://www.statology.org/remove-columns-in-r/
taxi_sf <- taxi_sf %>% select(-TYPE_CD)

# checking for NA values again
taxi_sf[rowSums(is.na(taxi_sf))!=0,]

Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6151.732 ymin: 35873.48 xmax: 6199.635 ymax: 35938.87
Projected CRS: SVY21
    TYPE_CD_DE                  geometry
325       <NA> POINT (6199.635 35938.87)
342       <NA> POINT (6151.732 35873.48)

Now, let’s move on to removing the NA observations:

# removes rows that have an NA value in the respective NA columns
# which is LOC_DESC and TYPE_CD_DE for bus_sf and taxi_sf respectively
bus_sf <- na.omit(bus_sf,c("LOC_DESC"))
taxi_sf <- na.omit(taxi_sf,c("TYPE_CD_DE"))

And let’s check for missing values one last time, just to be sure:

# the rowSums(is.na(bus))!=0 checks every row if there are NA values, returning TRUE or FALSE
# the bus 'wrapper' prints said rows that contain NA values
bus_sf[rowSums(is.na(bus_sf))!=0,]

Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)

taxi_sf[rowSums(is.na(taxi_sf))!=0,]

Simple feature collection with 0 features and 1 field
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] TYPE_CD_DE geometry  
<0 rows> (or 0-length row.names)

Alright, our geospatial data pre-processing is done! 🥳

3.3 Verifying + Transforming Coordinate System

When we imported the data, we made a mental note to verify the projected CRS - and we’ll do that now!

# using st_crs() function to check on the CRS and ESPG Codes
st_crs(sg_sf)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(mpsz_sf)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(rail_network_sf)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(bus_sf)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(taxi_sf)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

st_crs(museums_sf)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

st_crs(monuments_sf)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

st_crs(moneychangers_sf)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

st_crs(historic_sites_sf)

Coordinate Reference System:
  User input: WGS 84 
  wkt:
GEOGCRS["WGS 84",
    DATUM["World Geodetic System 1984",
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4326]]

Hmm. That’s not right - our projected CRS should be SVY21 (ESPG Code 3414), but for our given data, the current ESPG Codes are 9001. In addition, some of our self-sourced datasets are in WG84 (ESPG Code 4326) as well. Let’s assign the correct ESPG Codes:

# with st_set_crs(), we can assign the appropriate ESPG Code
sg_sf <- st_set_crs(sg_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
rail_network_sf <- st_set_crs(rail_network_sf, 3414)

bus_sf <- st_set_crs(bus_sf, 3414)
taxi_sf <- st_set_crs(taxi_sf, 3414)

# with st_transform(), we can change from one CRS to another
museums_sf <- st_transform(museums_sf, crs=3414)
monuments_sf <- st_transform(monuments_sf, crs=3414)
moneychangers_sf <- st_transform(moneychangers_sf, crs=3414)
historic_sites_sf <- st_transform(historic_sites_sf, crs=3414)

And now, let’s check if the CRS has been properly assigned:

# using st_crs() function to check on the CRS and ESPG Codes
st_crs(sg_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(mpsz_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(rail_network_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(bus_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(taxi_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(museums_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(monuments_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(moneychangers_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

st_crs(historic_sites_sf)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

Whew, that’s a long output - but as we can see, it’s all good! 💡

3.4 Initial Visualisation

Now that we’ve finished our standard pre-processing, let’s try visualising our data:

# plots the geometry only - these are the 'base' maps
# alternatively, we can use plot(sg$geometry)
plot(st_geometry(sg_sf))

plot(st_geometry(mpsz_sf))

The main difference between sg and mpsz is that the former is a nationwide map, while the latter shows the subzones. Whichever we use depends on the scale of analysis: we might use sg for a general overview, while we’ll tap on the subzone divisions in mpsz if we want to look into specific subzones.

# switching to interactive map for better visualisation and to explore specific areas if needed
tmap_mode("view")
tm_shape(rail_network_sf) +
  tm_dots(col="purple", size=0.05)

# return tmap mode to plot for future visualisations
tmap_mode("plot")

These are the MRT/LRT train stations. There are 3 distinct clusters - the two in the upper area, (West and Northeast regions specifically) are due to the LRT lines, while the cluster in the Central region is due interconnections of different lines in the CBD and city areas for increased connectivity

This also matches up to the existing railway network map:

Retrieved from MRT.SG. Original work by Wikipedia user Seloloving.

Let’s also visualise our self-sourced Bus Stops & Taxi Stands:

tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(bus_sf) +
  tm_dots(col="red", size=0.05) +
  tm_layout(main.title = "Bus Stops",
          main.title.position = "center",
          main.title.size = 1.2,
          frame = TRUE)

tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(taxi_sf) +
  tm_dots(col="blue", size=0.05) +
  tm_layout(main.title = "Taxi Stands",
          main.title.position = "center",
          main.title.size = 1.2,
          frame = TRUE)

These will be good supplements to our railway_network - we can look at how the transport system and general connectivity of a place could play a role in the distribution of airbnb listings, especially since many tourists depend on the public transport system to get around a country due to its accessibility and cost, characteristics that are amplified in Singapore’s limited 728.6 km².

Let’s also take a look at the historical/cultural points of interest, namely museums, monuments, moneychangers and historical sites. As mentioned in 2.2’s Justification of Datasets used, tourists visit a country for a litany of reasons, but the top motivations are to understand the history and culture of the country, while moneychangers are crucial to their funds.

For this interactive graph, ‘warm’ colours (red, orange, yellow) represents places of historical/cultural interest (museums, monuments and historical sites) while ‘cool’ colours (blue) represents the moneychangers.

tmap_mode("view")
tm_shape(museums_sf) +
  tm_dots(alpha=0.4, #affects transparency of points
          col="red",
          size=0.05) +
tm_shape(monuments_sf) +
  tm_dots(alpha=0.4,
          col="orange",
          size=0.05) +
tm_shape(moneychangers_sf) +
  tm_dots(alpha=0.4,
          col="blue",
          size=0.05) +
tm_shape(historic_sites_sf) +
  tm_dots(alpha=0.4,
          col="yellow",
          size=0.05)

# return tmap mode to plot for future visualisations
tmap_mode("plot")

4.0 Data Wrangling: Aspatial Data

4.1 Importing Aspatial Data + EDA

Now that we have our geospatial data sorted out, let’s move on to our aspatial data! We’ll need to import the data first. You might remember that in our last take-home exercise, we discovered the existence of duplicate columns when performing EDA. It’s important to understand what we’re working with and to check for any discrepancies! As such, after importing, let’s take a look at our dataframes with glimpse():

# reads the the respective aspatial data and stores it as dataframes
# I've included show_col_types=FALSE so that there won't be a output of column types
# since in the next portion, we'll be using glimpse() to look at columns specifically
listings_2019 <- read_csv("data/aspatial/listings_30062019.csv", show_col_types=FALSE)
listings_2021 <- read_csv("data/aspatial/listings_29062021.csv", show_col_types=FALSE)
hotels <- read_csv("data/aspatial/hotels.csv", show_col_types=FALSE)
tourism <- read_csv("data/aspatial/tourism.csv", show_col_types=FALSE)

# gives an associated attribute information of the dataframe 
glimpse(listings_2019)

Rows: 8,293
Columns: 16
$ id                             <dbl> 49091, 50646, 56334, 71609, 7~
$ name                           <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id                        <dbl> 266763, 227796, 266763, 36704~
$ host_name                      <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group            <chr> "North Region", "Central Regi~
$ neighbourhood                  <chr> "Woodlands", "Bukit Timah", "~
$ latitude                       <dbl> 1.44255, 1.33235, 1.44246, 1.~
$ longitude                      <dbl> 103.7958, 103.7852, 103.7967,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 81, 80, 68, 200, 92, 102, 203~
$ minimum_nights                 <dbl> 180, 90, 6, 1, 1, 1, 1, 7, 30~
$ number_of_reviews              <dbl> 1, 18, 20, 12, 20, 35, 23, 15~
$ last_review                    <date> 2013-10-21, 2014-12-26, 2015~
$ reviews_per_month              <dbl> 0.01, 0.28, 0.21, 0.13, 0.21,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 9, 9, 9, 9, 1, 4, 4,~
$ availability_365               <dbl> 365, 365, 365, 353, 353, 348,~

glimpse(listings_2021)

Rows: 4,238
Columns: 16
$ id                             <dbl> 49091, 50646, 56334, 71609, 7~
$ name                           <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id                        <dbl> 266763, 227796, 266763, 36704~
$ host_name                      <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group            <chr> "North Region", "Central Regi~
$ neighbourhood                  <chr> "Woodlands", "Bukit Timah", "~
$ latitude                       <dbl> 1.44129, 1.33432, 1.44041, 1.~
$ longitude                      <dbl> 103.7957, 103.7852, 103.7967,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 81, 80, 67, 177, 81, 81, 52, ~
$ minimum_nights                 <dbl> 180, 90, 6, 90, 90, 90, 14, 1~
$ number_of_reviews              <dbl> 1, 18, 20, 20, 24, 48, 20, 13~
$ last_review                    <date> 2013-10-21, 2014-07-08, 2015~
$ reviews_per_month              <dbl> 0.01, 0.22, 0.16, 0.29, 0.34,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 4, 4, 4, 50, 50, 7, ~
$ availability_365               <dbl> 365, 365, 365, 365, 365, 365,~

glimpse(hotels)

Rows: 422
Columns: 9
$ NAME              <chr> "Jayleen Clarke Quay Hotel", "JEN Singapor~
$ ADDRESSPOSTALCODE <dbl> 59390, 238858, 249716, 399041, 238485, 399~
$ ADDRESSSTREETNAME <chr> "25 New Bridge Road", "277 Orchard Road , ~
$ HYPERLINK         <chr> "jayleenclarkequay@gmail.com", "singaporeo~
$ TOTALROOMS        <dbl> 20, 499, 565, 42, 81, 33, 17, 634, 56, 451~
$ KEEPERNAME        <chr> "James Federick Chong Kah Yean", "Kuok Oon~
$ Lat               <dbl> 1.288715, 1.300510, 1.304271, 1.311586, 1.~
$ Lng               <dbl> 103.8475, 103.8392, 103.8239, 103.8775, 10~
$ ICON_NAME         <chr> "hotel.gif", "hotel.gif", "hotel.gif", "ho~

glimpse(tourism)

Rows: 107
Columns: 17
$ NAME              <chr> "Chinatown Heritage Centre, Singapore", "T~
$ DESCRIPTION       <chr> "Experience how Singapore’s early Chinese ~
$ ADDRESSSTREETNAME <chr> "48 Pagoda Street", "158 Telok Ayer Street~
$ HYPERLINK         <chr> "http://www.singaporechinatown.com.sg/", "~
$ PHOTOURL          <chr> "www.yoursingapore.com/content/dam/desktop~
$ URL_PATH          <chr> "www.yoursingapore.com/en/see-do-singapore~
$ IMAGE_ALT_TEXT    <chr> "Learn more about local Chinese culture at~
$ PHOTOCREDITS      <chr> "Joel Chua DY", "Joel Chua DY", "Joel Chua~
$ LASTMODIFIED      <dttm> 2015-11-02 02:16:52, 2015-11-02 02:20:57,~
$ LATITUDE          <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ LONGTITUDE        <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ META_DESCRIPTION  <chr> "At the Chinatown Heritage Centre, experie~
$ OPENING_HOURS     <chr> "Daily, 9am – 8pm,Last entry at 7pm.*China~
$ Lat               <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ Lng               <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ ICON_NAME         <chr> "tourist_spot.gif", "tourist_spot.gif", "t~
$ ADDRESSPOSTALCODE <dbl> NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,~

With this, we can observe that all our data have Longitude and Latitude features - though some of them use Latitude and Longtitude while others use Lat and Lng. Also, notice that for tourism, there are both Lat&Lng as well as Latitude&Longtitude - from the first few entries, they seem identical, but we’ll see which one we go with when checking for missing values.

This also indicates that their projected CRS is the World Geodetic System 1984 (WGS84), which is the most commonly used and favoured CRS. We’ll need to re-assign that to fit with our geospatial data, which is using SVY21.

4.2 Data Pre-Processing

4.2.1 Missing Values

Like with our geospatial data, we should check if there are missing values. We’re only concerned about the longitude and latitude specifically:

sum(is.na(listings_2019$latitude))

[1] 0

sum(is.na(listings_2019$longitude))

[1] 0

sum(is.na(listings_2021$latitude))

[1] 0

sum(is.na(listings_2021$longitude))

[1] 0

sum(is.na(hotels$Lat))

[1] 0

sum(is.na(hotels$Lng))

[1] 0

sum(is.na(tourism$Lat))

[1] 0

sum(is.na(tourism$Lng))

[1] 0

sum(is.na(tourism$LATITUDE))

[1] 1

sum(is.na(tourism$LONGTITUDE))

[1] 1

Note: PSST! If you’d like a way to show the number of NA values for columns with NA values only, I’d recommend checking this out.

As we can see, tourism’s Lat vs LATITUDE issue can now be resolved. There seems to be one missing value in the LATITUDE/LONGITUDE features as opposed to the Lat/Lng one (which has no missing values), so we’ll use Lat/Lng for tourism. Let’s also check the row with missing values:

tourism[(is.na(tourism$LONGTITUDE)),]

# A tibble: 1 x 17
  NAME   DESCRIPTION   ADDRESSSTREETNAME HYPERLINK PHOTOURL  URL_PATH 
  <chr>  <chr>         <chr>             <chr>     <chr>     <chr>    
1 Cruis~ Watch the Si~ <NA>              <NA>      www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
#   PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
#   LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
#   Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>

tourism[(is.na(tourism$LATITUDE)),]

# A tibble: 1 x 17
  NAME   DESCRIPTION   ADDRESSSTREETNAME HYPERLINK PHOTOURL  URL_PATH 
  <chr>  <chr>         <chr>             <chr>     <chr>     <chr>    
1 Cruis~ Watch the Si~ <NA>              <NA>      www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
#   PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
#   LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
#   Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>

As we can see, the observation with missing latitude/longitude is ‘Cruises from Singapore’, which makes sense considering that a cruise ship would be inside Singapore waters, or outside of Singapore’s geographical boundaries - both of which would be hard (if not nigh impossible) to represent on a map representing Singapore’s landmass. As such, let’s remove that observation:

tourism <- tourism[!(is.na(tourism$LONGTITUDE)), ]
tourism <- tourism[!(is.na(tourism$LATITUDE)), ]

And we’ll check again:

sum(is.na(tourism$LATITUDE))

[1] 0

sum(is.na(tourism$LONGTITUDE))

[1] 0

That’s done and dusted! 😄

4.2.2 Convert into sf objects + Transforming Coordinate System

Now, we have to transform our dataframes into sf objects, and then verify and transform our assigned CRS for our aspatial datasets. In fact, we already know that (just like some of our geospatial datasets above) these datasets are using WGS84 (ESPG Code 4326) on account of using Latitude and Longitude - so we can do these two things in one go:

# you can chose to reuse the variable name as-is, especially if you have no plans to revisit older variables
# however, i believe it to be better practice to rename variables so that we know what type of data we're working with at any point of time!\
# st_as_sf outputs a simple features data frame
listings_2019_sf <- st_as_sf(listings_2019, 
                          # our coordinates are the longitude and latitude
                          coords = c("longitude", 
                                     "latitude"), 
                          # the geographical features are in longitude & latitude, in decimals
                          # as such, WGS84 is the most appropriate coordinates system
                          crs = 4326) %>%
  #afterwards, we transform it to SVY21, our desired CRS
  st_transform(crs=3414)

listings_2021_sf <- st_as_sf(listings_2021, 
                          coords = c("longitude", 
                                     "latitude"), 
                          crs = 4326) %>%
  st_transform(crs=3414)

hotels_sf <- st_as_sf(hotels, 
                      coords = c("Lng", 
                                 "Lat"), 
                      crs = 4326) %>%
  st_transform(crs=3414)

tourism_sf <- st_as_sf(tourism, 
                       coords = c("Lng", 
                                  "Lat"), 
                       crs = 4326) %>%
  st_transform(crs=3414)

5.0 Combined Data Wrangling: Geospatial & Aspatial Data

As we’re doing spatial points analysis, we’ll need to convert both our geospatial and aspaital data into the appropriate formats.

5.1 Converting sf data frames to sp’s Spatial* class

Firstly, we’ll need to convert the simple features data frames into Spatial* classes if we can to do SPPA on it:

mpsz <- as_Spatial(mpsz_sf)
sg <- as_Spatial(sg_sf)
rail_network <-as_Spatial(rail_network_sf)

bus <- as_Spatial(bus_sf)
taxi <- as_Spatial(taxi_sf)
museums <- as_Spatial(museums_sf)
monuments <- as_Spatial(monuments_sf)
moneychangers <- as_Spatial(moneychangers_sf)
historic_sites <- as_Spatial(historic_sites_sf)

listings_2019 <- as_Spatial(listings_2019_sf)
listings_2021 <- as_Spatial(listings_2021_sf)
hotels <- as_Spatial(hotels_sf)
tourism <- as_Spatial(tourism_sf)

Let’s check the newly-converted Spatial* classes!

mpsz

class       : SpatialPolygonsDataFrame 
features    : 323 
extent      : 2667.538, 56396.44, 15748.72, 50256.33  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       : OBJECTID, SUBZONE_NO, SUBZONE_N, SUBZONE_C, CA_IND, PLN_AREA_N, PLN_AREA_C,       REGION_N, REGION_C,          INC_CRC, FMEL_UPD_D,     X_ADDR,     Y_ADDR,    SHAPE_Leng,    SHAPE_Area 
min values  :        1,          1, ADMIRALTY,    AMSZ01,      N, ANG MO KIO,         AM, CENTRAL REGION,       CR, 00F5E30B5C9B7AD8,      16409,  5092.8949,  19579.069, 871.554887798, 39437.9352703 
max values  :      323,         17,    YUNNAN,    YSSZ09,      Y,     YISHUN,         YS,    WEST REGION,       WR, FFCCF172717C2EAF,      16409, 50424.7923, 49552.7904, 68083.9364708,  69748298.792

sg

class       : SpatialPolygonsDataFrame 
features    : 60 
extent      : 2663.926, 56047.79, 16357.98, 50244.03  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 4
names       : GDO_GID, MSLINK, MAPID,              COSTAL_NAM 
min values  :       1,      1,     0,             ISLAND LINK 
max values  :      60,     67,     0, SINGAPORE - MAIN ISLAND

listings_2019

class       : SpatialPointsDataFrame 
features    : 8293 
extent      : 7215.566, 44098.31, 25166.35, 49226.35  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 14
names       :       id,                                              name,   host_id,                host_name, neighbourhood_group, neighbourhood,       room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365 
min values  :    49091,                                                 -,     23666, (Email hidden by Airbnb),      Central Region,    Ang Mo Kio, Entire home/apt,     0,              1,                 0,       15656,              0.01,                              1,                0 
max values  : 36053005, ZR2- NEW! Sunny & Modern Apt 4 mins to Orchard Rd, 271165196,                   Zuzana,         West Region,        Yishun,     Shared room, 13999,           1000,               308,       18072,             12.09,                            277,              365

listings_2021

class       : SpatialPointsDataFrame 
features    : 4238 
extent      : 7406.989, 43337.89, 25330, 48391.55  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 14
names       :       id,                                              name,   host_id,        host_name, neighbourhood_group, neighbourhood,       room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365 
min values  :    49091, ! BEST ! LOCATION private room at Central Orchard,     23666, <U+4F73><U+4F73>,      Central Region,    Ang Mo Kio, Entire home/apt,     0,              1,                 0,       15456,              0.01,                              1,                0 
max values  : 50718590,                Zimmer nah der MRT (Downtown Line), 404826677,             Zuzu,         West Region,        Yishun,     Shared room, 10286,           1000,               370,       18806,             19.74,                            165,              365

hotels

class       : SpatialPointsDataFrame 
features    : 422 
extent      : 5939.241, 45334.18, 25379.44, 44562.4  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 7
names       :                              NAME, ADDRESSPOSTALCODE,                                 ADDRESSSTREETNAME,         HYPERLINK, TOTALROOMS,     KEEPERNAME, ICON_NAME 
min values  :                      30 BENCOOLEN,             18956,                                 1 Bayfront Avenue, 96ytlim@gmail.com,          4,  Adel Aramouni, hotel.gif 
max values  : YotelAir Singapore Changi Airport,            819666, 99 IRRAWADDY ROAD, # 22-00 ROYAL SQUARE AT NOVENA, zubair@dam.com.sg,       2561, Zhang YuanQing, hotel.gif

tourism

class       : SpatialPointsDataFrame 
features    : 106 
extent      : 11380.23, 43659.54, 22869.34, 47596.73  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 15
names       :                                NAME,                                                                                                                                                  DESCRIPTION,                  ADDRESSSTREETNAME,                            HYPERLINK,                                                                                                                             PHOTOURL,                                                                                          URL_PATH,                                                                                                          IMAGE_ALT_TEXT,                 PHOTOCREDITS,   LASTMODIFIED,  LATITUDE, LONGTITUDE,                                                                                             META_DESCRIPTION,                                                                                                                                                                           OPENING_HOURS,        ICON_NAME, ADDRESSPOSTALCODE 
min values  : Adventure Cove Waterpark™ Singapore, A feat of engineering, an architectural statement and a sheer aesthetic triumph, Marina Bay Sands<sup>®</sup> has upped the ante for buildings in Singapore.,                       1 Beach Road,                   http://acm.org.sg/,                                  /content/dam/desktop/global/see-do-singapore/architecture/hajjah-fatimah-mosque-carousel01-rect.jpg, www.yoursingapore.com/en/see-do-singapore/architecture/historical/capitol-building-singapore.html,               Adults and kids of all ages who are not even science buffs will have fun at the Singapore Science Centre., ©Darren Soh/National Gallery, 1427691447.648, 1.2230965,  103.68398, A tranquil patch of imperial China in the west of Singapore is pleasant respite from the bustle of the city.,                                                                                                                                                 50th storey Skybridge, Daily, 9am –10pm, tourist_spot.gif,                 0 
max values  :          Victoria Theatre Singapore,                                                    With so many attractions packed into this 15-km stretch of beaches, you’ll never run out of things to do., Seng Poh Road and Tiong Bahru Road, https://www.pub.gov.sg/marinabarrage, www.yoursingapore.com/content/dam/desktop/global/see-do-singapore/recreation-leisure/universal-studios-singapore-carousel01-rect.jpg,      www.yoursingapore.com/en/see-do-singapore/recreation-leisure/viewpoints/singapore-flyer.html, Whether you prefer water sports, rollerblading or cycling, find a myriad of things to do at East Coast Park, Singapore.,  Wildlife Reserves Singapore, 1446544541.364,   1.44672,  103.97403,                                     With the Henderson Waves bridge, form meets function to stunning effect., Visits are by appointment only.Visitors must sign up in advance for heritage tours which fall on:Monday, 2pm – 3pm,Tuesday, 6.30pm – 7.30pm,Thursday, 10am – 11am,Saturday, 11am – 12pm, tourist_spot.gif,                 0

rail_network

class       : SpatialPointsDataFrame 
features    : 171 
extent      : 6138.311, 45254.86, 27555.06, 47854.2  (xmin, xmax, ymin, ymax)
crs         : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs 
variables   : 3
names       : OBJECTID,              STN_NAME, STN_NO 
min values  :        1, ADMIRALTY MRT STATION,    BP1 
max values  :      199,    YISHUN MRT STATION,    TE8

Due to the sheer length of output of our self-sourced data, I’ve opted to to make code evluation FALSE for this code chunk. Screenshots of the first section of the output are included instead!

# geospatial data (self-sourced)
bus
taxi
museums
monuments
moneychangers
historic_sites

5.2 Converting from Spatial* classes to sp format

spatstat requires the analytical data to be in ppp object form, but since there is no way to directly convert a Spatial* classes into ppp object, we’ll need to convert the Spatial* classes into a generic Spatial object first, then convert the generic sp object into ppp object form.

# convert into respective sp (in our case, either polygons or points)
mpsz_sp <- as(mpsz, "SpatialPolygons")
sg_sp <- as(sg, "SpatialPolygons")
rail_network_sp <-as(rail_network, "SpatialPoints")

listings_2019_sp <- as(listings_2019, "SpatialPoints")
listings_2021_sp <- as(listings_2021, "SpatialPoints")
hotels_sp <- as(hotels, "SpatialPoints")
tourism_sp <- as(tourism, "SpatialPoints")

bus_sp <- as(bus, "SpatialPoints")
taxi_sp <- as(taxi, "SpatialPoints")
museums_sp <- as(museums, "SpatialPoints")
monuments_sp <- as(monuments, "SpatialPoints")
moneychangers_sp <- as(moneychangers, "SpatialPoints")
historic_sites_sp <- as(historic_sites, "SpatialPoints")

5.3 Converting from sp format to spatstat ppp format

Note that there is no way of coercing SpatialPolygons to ppp format - nor is there any need to. As such, we won’t be including our ‘base maps’, mpsz and sg.

# from sp object, convert into ppp format
listings_2019_ppp <- as(listings_2019_sp, "ppp")
listings_2021_ppp <- as(listings_2021_sp, "ppp")
rail_network_ppp <-as(rail_network_sp, "ppp")
hotels_ppp <- as(hotels_sp, "ppp")
tourism_ppp <- as(tourism_sp, "ppp")

bus_ppp <- as(bus_sp, "ppp")
taxi_ppp <- as(taxi_sp, "ppp")
museums_ppp <- as(museums_sp, "ppp")
monuments_ppp <- as(monuments_sp, "ppp")
moneychangers_ppp <- as(moneychangers_sp, "ppp")
historic_sites_ppp <- as(historic_sites_sp, "ppp")

Now, let’s check the newly-converted ppp objects with summary():

# from sp object, convert into ppp format
summary(rail_network_ppp)

Planar point pattern:  171 points
Average intensity 2.153565e-07 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [6138.31, 45254.86] x [27555.06, 47854.2] units
                    (39120 x 20300 units)
Window area = 794032000 square units

summary(listings_2019_ppp)

Planar point pattern:  8293 points
Average intensity 9.345289e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [7215.57, 44098.31] x [25166.35, 49226.35] units
                    (36880 x 24060 units)
Window area = 887399000 square units

summary(listings_2021_ppp)

Planar point pattern:  4238 points
Average intensity 5.114513e-06 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [7406.99, 43337.89] x [25330, 48391.55] units
                    (35930 x 23060 units)
Window area = 828622000 square units

summary(hotels_ppp)

Planar point pattern:  422 points
Average intensity 5.58414e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [5939.24, 45334.18] x [25379.44, 44562.4] units
                    (39390 x 19180 units)
Window area = 755712000 square units

summary(tourism_ppp)

Planar point pattern:  106 points
Average intensity 1.328016e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [11380.23, 43659.54] x [22869.34, 47596.73] units
                    (32280 x 24730 units)
Window area = 798183000 square units

# self-sourced data for further visualisations
summary(bus_ppp)

Planar point pattern:  5155 points
Average intensity 4.435473e-06 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [4427.94, 48282.5] x [26482.1, 52983.82] units
                    (43850 x 26500 units)
Window area = 1162220000 square units

summary(taxi_ppp)

Planar point pattern:  352 points
Average intensity 4.198058e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [6444.85, 45468.51] x [27461.92, 48948.45] units
                    (39020 x 21490 units)
Window area = 838483000 square units

summary(museums_ppp)

Planar point pattern:  54 points
Average intensity 9.588485e-08 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [10582.71, 48388.12] x [25661.57, 40558.26] units
                    (37810 x 14900 units)
Window area = 563176000 square units

summary(monuments_ppp)

Planar point pattern:  72 points
Average intensity 1.11782e-07 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [11285, 43169.72] x [27467.96, 47669.2] units
                    (31880 x 20200 units)
Window area = 644111000 square units

summary(moneychangers_ppp)

Planar point pattern:  313 points
Average intensity 4.017631e-07 points per square unit

*Pattern contains duplicated points*

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [10725.31, 45404.24] x [25297.94, 47763.05] units
                    (34680 x 22470 units)
Window area = 779066000 square units

summary(historic_sites_ppp)

Planar point pattern:  99 points
Average intensity 1.407695e-07 points per square unit

Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units

Window: rectangle = [13080.65, 45498.71] x [26138.63, 47832.64] units
                    (32420 x 21690 units)
Window area = 703278000 square units

From our output, we can summarise:

ppp objects with duplicated points: listings_2019_ppp, listings_2021_ppp, hotels_ppp, tourism_ppp, taxi_ppp, moneychangers_ppp
ppp objects with no duplicated points: railway_network_ppp, bus_ppp, museums_ppp, monuments_ppp, historic_sites_ppp

5.4 Handling Duplicated Points + Jittering

Though we’ve found which ppp objects have duplicates, it’s always best to double-check and verify. We can use duplicated() for this:

# looks if there is any instance of duplicated points in the ppp object
# returns TRUE if there is even a sinngle set of duplicated points

# should be TRUE i.e. have duplicated
any(duplicated(listings_2019_ppp))

[1] TRUE

any(duplicated(listings_2021_ppp))

[1] TRUE

any(duplicated(hotels_ppp))

[1] TRUE

any(duplicated(tourism_ppp))

[1] TRUE

any(duplicated(taxi_ppp))

[1] TRUE

any(duplicated(moneychangers_ppp))

[1] TRUE

# should be FALSE i.e. no duplicates
any(duplicated(rail_network_ppp))

[1] FALSE

any(duplicated(bus_ppp))

[1] FALSE

any(duplicated(museums_ppp))

[1] FALSE

any(duplicated(monuments_ppp))

[1] FALSE

any(duplicated(historic_sites_ppp))

[1] FALSE

# we can also use multiplicity() to count the number of co-incidence points
# to get the number of locations with duplicated events, use this:
# sum(multiplicity(rail_network_ppp)>1)

Yup, this is congruent with what we discovered earlier. Our first instinct might be to delete said duplicated points - but deleting them, we might be ignoring repeat occurrences, which could be valuable data + insights. However, never fear - there’s a better method to handle these duplicated points called jittering, where we add a small perturbation to the duplicate points so that they do not occupy the exact same space.

listings_2019_ppp_jit <- rjitter(listings_2019_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)
listings_2021_ppp_jit <- rjitter(listings_2021_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)
hotels_ppp_jit <- rjitter(hotels_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)
tourism_ppp_jit <- rjitter(tourism_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)
taxi_ppp_jit <- rjitter(taxi_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)
moneychangers_ppp_jit <- rjitter(moneychangers_ppp, 
                             retry=TRUE, 
                             nsim=1, 
                             drop=TRUE)

Let’s check for duplicates again:

any(duplicated(listings_2019_ppp_jit))

[1] FALSE

any(duplicated(listings_2021_ppp_jit))

[1] FALSE

any(duplicated(hotels_ppp_jit))

[1] FALSE

any(duplicated(tourism_ppp_jit))

[1] FALSE

any(duplicated(taxi_ppp_jit))

[1] FALSE

any(duplicated(moneychangers_ppp_jit))

[1] FALSE

Awesome! ✨

5.5 Introducing the owin object

Usually, when analysing spatial point patterns, we’ll confine our analysis within a certain geographical area - such as the Singapore boundary. In spatstat, an object called owin is specially designed to represent this polygonal region. This is also why I mentioned in 5.3 that there was no need (nor existing functions) to convert our ‘base maps’ into ppp - because they’re meant to be converted into owin instead.

5.5.1 Creating the owin object

# using sg_sp as it is the CostalOutline of Singapore, i.e. the whole island's boundary
sg_owin <- as(sg_sp, "owin")
plot(sg_owin)

5.5.2 Combining point events object and owin object

Now, we’ll extract the relevant events that are located within the Singapore:

rail_network_ppp_sg = rail_network_ppp[sg_owin]
listings_2019_ppp_sg = listings_2019_ppp_jit[sg_owin]
listings_2021_ppp_sg = listings_2021_ppp_jit[sg_owin]
hotels_ppp_sg = hotels_ppp_jit[sg_owin]
tourism_ppp_sg = tourism_ppp_jit[sg_owin]

bus_ppp_sg = bus_ppp[sg_owin]
taxi_ppp_sg = taxi_ppp_jit[sg_owin]
museums_ppp_sg = museums_ppp[sg_owin]
monuments_ppp_sg = monuments_ppp[sg_owin]
moneychangers_ppp_sg = moneychangers_ppp_jit[sg_owin]
historic_sites_ppp_sg = historic_sites_ppp[sg_owin]

5.5.3 Visualisation of ppp objects within Singapore boundaries

Now, let’s take a look at the visualisation:

# in the format of par(mfrow(c(rows,columns))) to display multiple plots at once
# reference: https://www.statmethods.net/advgraphs/layout.html
par(mfrow=c(1,2))
plot(listings_2019_ppp_sg)
plot(listings_2021_ppp_sg)

# One figure in row 1 and two figures in row 2
# reference: https://stackoverflow.com/questions/31319942/change-the-size-of-a-plot-when-plotting-multiple-plots-in-r
layout(matrix(c(1,1,2,3), nrow = 2, ncol = 2, byrow = TRUE))
plot(rail_network_ppp_sg)
plot(bus_ppp_sg)
plot(taxi_ppp_sg)

plot(tourism_ppp_sg)

par(mfrow=c(2,2))
plot(museums_ppp_sg)
plot(monuments_ppp_sg)
plot(moneychangers_ppp_sg)
plot(historic_sites_ppp_sg)

The graphs are a little cluttered at the moment, but we’ll slowly sort out the relationships in our analysis! Another thing to note - it seems that for most part, places of interest (tourism, hotels, moneychangers etc.) are centered around the Central Region. That’s worth looking into when we’re narrowing down areas for analysis!

6.0 [A] Airbnb Distribution in 2019: Exploratory Spatial Data Analysis

6.1 Rescaling to kilometers

As we learned from our Hands-On Ex 5B, we’ll need to use rescale() of the spatstat package, so as to convert meters (which SVY21 uses) into kilometers (our desired unit of measurement).

# essentially, we're telling the function 'new unit length is 1000'
# function will divide the old value by 1000 to obtain values expressed in kilometers
# while this section isn't using listings_2021 yet, we'll prep it with the rest of our data :^)
rail_network_ppp_sg_km <- rescale(rail_network_ppp_sg, 1000, 'km')
listings_2019_ppp_sg_km <- rescale(listings_2019_ppp_sg, 1000, 'km')
listings_2021_ppp_sg_km <- rescale(listings_2021_ppp_sg, 1000, 'km')
hotels_ppp_sg_km <- rescale(hotels_ppp_sg, 1000, 'km')
tourism_ppp_sg_km <- rescale(tourism_ppp_sg, 1000, 'km')

bus_ppp_sg_km <- rescale(bus_ppp_sg, 1000, 'km')
taxi_ppp_sg_km <- rescale(taxi_ppp_sg, 1000, 'km')
museums_ppp_sg_km <- rescale(museums_ppp_sg, 1000, 'km')
monuments_ppp_sg_km <- rescale(monuments_ppp_sg, 1000, 'km')
moneychangers_ppp_sg_km <- rescale(moneychangers_ppp_sg, 1000, 'km')
historic_sites_ppp_sg_km <- rescale(historic_sites_ppp_sg, 1000, 'km')

6.2 Computing kernel density estimation using automatic bandwidth selection method

As we went through in Hands-On Ex 04, we have the choice of using either bw.ppl or bw.diggle - the former is recommended for patterns comprised primarily of tight clusters, while the latter is good for detecting a single tight cluster in the midst of random noise. You can find out more (with illustrated examples!) in this mapping guide.

I’ve decided to go with bw.diggle for my analysis: my justification being that from what I see of our prior visualisations, there is a significant cluster in the Central region, while other regions have a relatively more widespread distribution of location factors - as such, I do not believe that applying an algorithm more suited for patterns “comprised primarily of tight clusters” would be appropriate for our data.

# automatic bandwidth method: bw.diggle
# smoothing kernel: gaussian (default)
kde_rail_network_sg_bw <- density(rail_network_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_listings_2019_sg_bw <- density(listings_2019_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_listings_2021_sg_bw <- density(listings_2021_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_hotels_sg_bw <- density(hotels_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_tourism_sg_bw <- density(tourism_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")

kde_bus_sg_bw <- density(bus_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_taxi_sg_bw <- density(taxi_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_museums_sg_bw <- density(museums_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_monuments_sg_bw <- density(monuments_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_moneychangers_sg_bw <- density(moneychangers_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_historic_sites_sg_bw <- density(historic_sites_ppp_sg_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")

6.3 Plotting Kernel Density Estimation

plot(kde_listings_2019_sg_bw)

# One figure in row 1 and two figures in row 2
# reference: https://stackoverflow.com/questions/31319942/change-the-size-of-a-plot-when-plotting-multiple-plots-in-r
layout(matrix(c(1,1,2,3), nrow = 2, ncol = 2, byrow = TRUE))
plot(kde_rail_network_sg_bw)
plot(kde_bus_sg_bw)
plot(kde_taxi_sg_bw)

plot(kde_tourism_sg_bw)

par(mfrow=c(2,2))
plot(kde_museums_sg_bw)
plot(kde_monuments_sg_bw)
plot(kde_moneychangers_sg_bw)
plot(kde_historic_sites_sg_bw)

6.4 Converting KDE output into grid object into RasterLayer object

Now, we have to convert our KDE outputs into RasterLayer objects. Since we can’t do that directly, we’ll need to convert them into a SpatialGridDataFrame first, then convert the SpatialGridDataFrame into RasterLayer objects:

# convert into a SpatialGridDataFrame
# then converts gridded output into raster
kde_listings_2019_sg_bw_raster <- kde_listings_2019_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_listings_2021_sg_bw_raster <- kde_listings_2021_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_rail_network_sg_bw_raster <- kde_rail_network_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_hotels_sg_bw_raster <- kde_hotels_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_tourism_sg_bw_raster <- kde_tourism_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

# self-sourced 
kde_bus_sg_bw_raster <- kde_bus_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_taxi_sg_bw_raster <- kde_taxi_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_museums_sg_bw_raster <- kde_museums_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_monuments_sg_bw_raster <- kde_monuments_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_moneychangers_sg_bw_raster <- kde_moneychangers_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_historic_sites_sg_bw_raster <- kde_historic_sites_sg_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

Now, let’s check the properties of one of our RasterLayers:

kde_listings_2019_sg_bw_raster

class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.4170614, 0.2647348  (x, y)
extent     : 2.663926, 56.04779, 16.35798, 50.24403  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -2.144823e-13, 1766.134  (min, max)

Note that the crs property is NA - let’s do something about that. We’ll assign projection systems again:

# assigns CRS as ESPG Code 3414 aka the SVY21 projection system
# while also specifying the units as km, since we rescaled it in the earlier section
projection(kde_listings_2019_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_listings_2021_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_rail_network_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_hotels_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_tourism_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")

projection(kde_bus_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_taxi_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_museums_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_monuments_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_moneychangers_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_historic_sites_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")

6.5 Kernel Density Maps on OpenStreetMap

Making a density map function

Now, as the assignment requirements has specified, we should plot our kernel density maps on OpenStreetMap, aiming to describe the spatial patterns revealed as well as highlight the advantage of kernel density map over point map. Since we’ll be plotting a lot of kernel density maps, let’s create a function:

density_map <- function(raster_object, map_title) {
  tm_basemap("OpenStreetMap") +
tm_shape(raster_object) +
  tm_raster("v", alpha=0.65) + 
  tm_layout(legend.position = c("right", "bottom"), 
            legend.height = 0.5, 
            legend.width = 0.4,
            main.title = map_title,
            main.title.position = 'center',
            main.title.size = 1,
            frame = FALSE)
  }

Plotting Density map

Now, let’s plot!

# input: a raster object and the map title
listings_2019_density_map <- density_map(kde_listings_2019_sg_bw_raster, map_title = "June 2019 SG Airbnb Listings Density Map")
listings_2021_density_map <- density_map(kde_listings_2021_sg_bw_raster, map_title = "June 2021 SG Airbnb Listings Density Map")
rail_network_density_map <- density_map(kde_rail_network_sg_bw_raster, map_title = "SG Rail Network Density Map")
hotels_density_map <- density_map(kde_hotels_sg_bw_raster, map_title = "SG Hotels Density Map")
tourism_density_map <- density_map(kde_tourism_sg_bw_raster, map_title = "SG Tourist Attractions Density Map")

#self-sourced
bus_density_map <- density_map(kde_bus_sg_bw_raster, map_title = "SG Bus Stops Density Map")
taxi_density_map <- density_map(kde_taxi_sg_bw_raster, map_title = "SG Taxi Stands Density Map")
museums_density_map <- density_map(kde_museums_sg_bw_raster, map_title = "SG Museums Density Map")
monuments_density_map <- density_map(kde_monuments_sg_bw_raster, map_title = "SG Monuments Density Map")
moneychangers_density_map <- density_map(kde_moneychangers_sg_bw_raster, map_title = "SG Moneychangers Density Map")
historic_sites_density_map <- density_map(kde_historic_sites_sg_bw_raster, map_title = "SG Historic Sites Density Map")

listings_2019_density_map

hotels_density_map

tourism_density_map

rail_network_density_map

tmap_arrange(rail_network_density_map,
             bus_density_map,
             taxi_density_map)

tmap_arrange(museums_density_map,
             monuments_density_map,
             moneychangers_density_map,
             historic_sites_density_map)

6.6 Kernel Density Maps Analysis

As we can see from our maps above, the airbnb listings in 2019 seem to be heavily concentrated in the central region - a trend that is mirrored in the ‘hotels’ and ‘tourism’ maps. This pattern also seems to be present in our self-sourced maps - which reflects the real world, where Singapore’s Central/Downtown Area is a hotspot for various points of interest for tourists, and thus related facilities (hotels + moneychangers) would be present as well.

In addition, you might be wondering, “why use a kernel density map over the point maps?”

There are a few advantages that Kernel Density Map has over Point Map. Let’s compare our the point map for the rail network (MRT & LRT stations) against our just-plotted density map:

tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(rail_network_sf) +
  tm_dots(col="red", size=0.05) +
  tm_layout(main.title = "Rail Network Point Map",
          main.title.position = "center",
          main.title.size = 1.2,
          frame = TRUE)

rail_network_density_map

With the kernel density map, denser areas with a heavier distribution of MRTs are easily spotted - such as the LRT-heavy areas as well as the Central region. This is because the kernel density z-estimate helps to smooth out the points in a given area - a much larger ‘smoothing effect’ than point density, which in turn is helped by the visualisation that kernel density maps have: its range for distributions.concentrations is clearly indicated by the gradient of colour - in my case, ranging from yellow to green for the different bands. Compare this against the individual points in a point map, which at most shows concentration of certain areas, but makes it hard for the viewer to comprehensively compare the distributions of different regions.

7.0 [A] Central Region Specific KDE Maps

7.1 Creating owin object

From our KDE maps above, we’ve learned that the Central Region has a high distribution of listings and various location factors. However, we can’t glean information about the region itself from our maps, so let’s narrow down on the area and get to making new ones!

Now, we should use our planning subzone dataset, specifying that the REGION_N should be CENTRAL REGION. We can also choose to narrow down on specific planning areas with PLN_AREA_N:

central = mpsz_sf[mpsz_sf$REGION_N=='CENTRAL REGION',]
central_owin <- central %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')
plot(central_owin)

7.2 Combining point events object and owin object

Now, we’ll extract the relevant events that are located within the central region:

rail_network_ppp_central = rail_network_ppp[central_owin]
listings_2019_ppp_central = listings_2019_ppp_jit[central_owin]
listings_2021_ppp_central = listings_2021_ppp_jit[central_owin]
hotels_ppp_central = hotels_ppp_jit[central_owin]
tourism_ppp_central = tourism_ppp_jit[central_owin]

bus_ppp_central = bus_ppp[central_owin]
taxi_ppp_central = taxi_ppp_jit[central_owin]
museums_ppp_central = museums_ppp[central_owin]
monuments_ppp_central = monuments_ppp[central_owin]
moneychangers_ppp_central = moneychangers_ppp_jit[central_owin]
historic_sites_ppp_central = historic_sites_ppp[central_owin]

7.3 Rescaling to kilometers

Once again, we’ll need to convert meters (which SVY21 uses) into kilometers (our desired unit of measurement).

# essentially, we're telling the function 'new unit length is 1000'
# function will divide the old value by 1000 to obtain values expressed in kilometers
# while this section isn't using listings_2021 yet, we'll prep it with the rest of our data :^)
rail_network_ppp_central_km <- rescale(rail_network_ppp_central, 1000, 'km')
listings_2019_ppp_central_km <- rescale(listings_2019_ppp_central, 1000, 'km')
listings_2021_ppp_central_km <- rescale(listings_2021_ppp_central, 1000, 'km')
hotels_ppp_central_km <- rescale(hotels_ppp_central, 1000, 'km')
tourism_ppp_central_km <- rescale(tourism_ppp_central, 1000, 'km')

bus_ppp_central_km <- rescale(bus_ppp_central, 1000, 'km')
taxi_ppp_central_km <- rescale(taxi_ppp_central, 1000, 'km')
museums_ppp_central_km <- rescale(museums_ppp_central, 1000, 'km')
monuments_ppp_central_km <- rescale(monuments_ppp_central, 1000, 'km')
moneychangers_ppp_central_km <- rescale(moneychangers_ppp_central, 1000, 'km')
historic_sites_ppp_central_km <- rescale(historic_sites_ppp_central, 1000, 'km')

7.4 Computing kernel density estimation using automatic bandwidth selection method

# automatic bandwidth method: bw.diggle
# smoothing kernel: gaussian (default)
kde_rail_network_central_bw <- density(rail_network_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_listings_2019_central_bw <- density(listings_2019_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_listings_2021_central_bw <- density(listings_2021_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_hotels_central_bw <- density(hotels_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_tourism_central_bw <- density(tourism_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")

kde_bus_central_bw <- density(bus_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_taxi_central_bw <- density(taxi_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_museums_central_bw <- density(museums_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_monuments_central_bw <- density(monuments_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_moneychangers_central_bw <- density(moneychangers_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")
kde_historic_sites_central_bw <- density(historic_sites_ppp_central_km,
                                   sigma=bw.diggle,
                                   edge=TRUE,
                                   kernel="gaussian")

7.5 Plotting Kernel Density Estimation

plot(kde_listings_2019_central_bw)

layout(matrix(c(1,1,2,3), nrow = 2, ncol = 2, byrow = TRUE))
plot(kde_rail_network_central_bw)
plot(kde_bus_central_bw)
plot(kde_taxi_central_bw)

plot(kde_tourism_central_bw)

par(mfrow=c(2,2))
plot(kde_museums_central_bw)
plot(kde_monuments_central_bw)
plot(kde_moneychangers_central_bw)
plot(kde_historic_sites_central_bw)

7.6 Converting KDE output into grid object into RasterLayer object

# convert into a SpatialGridDataFrame
# then converts gridded output into raster
kde_listings_2019_central_bw_raster <- kde_listings_2019_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_listings_2021_central_bw_raster <- kde_listings_2021_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_rail_network_central_bw_raster <- kde_rail_network_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_hotels_central_bw_raster <- kde_hotels_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

kde_tourism_central_bw_raster <- kde_tourism_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

# self-sourced 
kde_bus_central_bw_raster <- kde_bus_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_taxi_central_bw_raster <- kde_taxi_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_museums_central_bw_raster <- kde_museums_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_monuments_central_bw_raster <- kde_monuments_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_moneychangers_central_bw_raster <- kde_moneychangers_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()
kde_historic_sites_central_bw_raster <- kde_historic_sites_central_bw %>% 
  as.SpatialGridDataFrame.im() %>% 
  raster()

Now, let’s check the properties of one of our RasterLayers:

kde_listings_2019_central_bw_raster

class      : RasterLayer 
dimensions : 128, 128, 16384  (nrow, ncol, ncell)
resolution : 0.1471859, 0.1344767  (x, y)
extent     : 18.7524, 37.5922, 21.67835, 38.89137  (xmin, xmax, ymin, ymax)
crs        : NA 
source     : memory
names      : v 
values     : -5.6727e-13, 3536.411  (min, max)

Note that the crs property is NA - let’s do something about that. We’ll assign projection systems again:

# assigns CRS as ESPG Code 3414 aka the SVY21 projection system
# while also specifying the units as km, since we rescaled it in the earlier section
projection(kde_listings_2019_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_listings_2021_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_rail_network_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_hotels_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_tourism_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")

projection(kde_bus_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_taxi_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_museums_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_monuments_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_moneychangers_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_historic_sites_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")

7.7 Kernel Density Maps on OpenStreetMap

# input: a raster object and the map title
listings_2019_density_map_central <- density_map(kde_listings_2019_central_bw_raster, map_title = "June 2019 SG Airbnb Listings Density Map")
listings_2021_density_map_central <- density_map(kde_listings_2021_central_bw_raster, map_title = "June 2021 SG Airbnb Listings Density Map")
rail_network_density_map_central <- density_map(kde_rail_network_central_bw_raster, map_title = "SG Rail Network Density Map")
hotels_density_map_central <- density_map(kde_hotels_central_bw_raster, map_title = "SG Hotels Density Map")
tourism_density_map_central <- density_map(kde_tourism_central_bw_raster, map_title = "SG Tourist Attractions Density Map")

#self-sourced
bus_density_map_central <- density_map(kde_bus_central_bw_raster, map_title = "SG Bus Stops Density Map")
taxi_density_map_central <- density_map(kde_taxi_central_bw_raster, map_title = "SG Taxi Stands Density Map")
museums_density_map_central <- density_map(kde_museums_central_bw_raster, map_title = "SG Museums Density Map")
monuments_density_map_central <- density_map(kde_monuments_central_bw_raster, map_title = "SG Monuments Density Map")
moneychangers_density_map_central <- density_map(kde_moneychangers_central_bw_raster, map_title = "SG Moneychangers Density Map")
historic_sites_density_map_central <- density_map(kde_historic_sites_central_bw_raster, map_title = "SG Historic Sites Density Map")

listings_2019_density_map_central

hotels_density_map_central

tourism_density_map_central

rail_network_density_map_central

tmap_arrange(rail_network_density_map_central,
             bus_density_map_central,
             taxi_density_map_central)

tmap_arrange(museums_density_map_central,
             monuments_density_map_central,
             moneychangers_density_map_central,
             historic_sites_density_map_central)

7.8 Kernel Density Maps Analysis

With the area focused on the central region, the distribution of the various location factors is a lot clearer! We can clearly see that the transport networks are centered right in the heart of the central region area - a quick comparison to an interactive Singapore planning subzone map or the SingStat Planning Areas/Subzones Map tells us that the planning areas with the highest density of location factors are in the “Central Area”, such as:

Downtown Core (City Hall, Raffles Place, Marina Centre)
Kallang (Kampong Bugis, Lavender)
and to a lesser extent the Rochor and Newton planning areas

In comparison, other Central Region planning areas like Kallang and Novena (that are actually adjacent to the Central Area) do not have as dense for these location factorss.

What about the tourist-specific location factors like museum and monuments? From a glance, we can see that they’re mostly centered in the Central Area - with a surprising mini-cluster of museums in Queenstown (far left).

However, compare all of this to the listings - what we can glean is that it actually seems, at least visually, “separate” from the transit-specific and tourist-specific location factors - with clusters appearing in Geylang and Kallang (alongside the usual Central Area cluster). This actually mirrors the hotels distribution, where similar clusters in the same area are visible. This unequal distribution of listings within Singapore and even within the central region itself could point towards there being less of a relationship between the listings and the location factors than initially thought. However, keep in mind that there is still an extremely high distribution of location factors in the central area where listings appeared, and in the real-world context, listings are also constrained by the existing residential areas around said location factors.

8.0 [A] Airbnb Distribution in 2019: Second-order Spatial Point Patterns Analysis

Now that we’ve analysed our spatial point patterns, we have to confirm our observations statistically - which is where hypothesis testing comes in. I’d like to specifically narrow down on my observations for the central region - especially since it’s where we’ve found a higher concentration of our location factors and listings.

H0: The distribution of the Airbnb Listings are randomly distributed
H1: The distribution of the Airbnb Listings are not randomly distributed
Confidence level : 99%
Significance level : 0.01 In light that 0.05 is the most common level of significance, I’ve decided to make it slightly stricter and use 0.01 - I believe a 1% risk of an incorrect hypothesis is good enough, especially considering the trade-offs between sensitivity and false positives for this hypothesis testing

The chosen planning subzones (PLN_AREA_N) for comparison are: - Downtown Core (higher concentration of both listings and location factors) - Kallang (higher concentration for listings, less so for location factors) - Bukit Timah (randomly chosen from the visually ‘lower-concentration’ listings planning areas)

8.1 Defining Study Areas

Like what we’ve done just now, we’ve got to make owin objects:

downtown_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "DOWNTOWN CORE",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

kallang_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "KALLANG",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

bukit_timah_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "BUKIT TIMAH",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

Then, we’ll need to combine our point events object and owin object:

downtown_2019_ppp <- listings_2019_ppp[downtown_owin]
kallang_2019_ppp <- listings_2019_ppp[kallang_owin]
bukit_timah_2019_ppp <- listings_2019_ppp[bukit_timah_owin]

Let’s check their visualisation:

par(mfrow=c(1,3))

plot(downtown_2019_ppp, main = "Downtown Core")
plot(kallang_2019_ppp, main = "Kallang")
plot(bukit_timah_2019_ppp , main = "Bukit Timah")

mtext("Airbnb Listings for selected planning areas in 2019", side = 3, line = -5, outer = TRUE)

8.2 G-function

The G function measures the distribution of the distances from an arbitrary event to its nearest event - we’ll use Gest() function for this, as well as perform the Monte Carlo Simulation test with envelope().

8.2.1 Downtown Core

H0: The distribution of the Airbnb Listings in Downtown Core are randomly distributed
H1: The distribution of the Airbnb Listings in Downtown Core are not randomly distributed
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

Firstly, we have to compute the g-function estimation:

G_Downtown = Gest(downtown_2019_ppp, correction = "border")
plot(G_Downtown)

Now, we can perform the spatial randomness test:

G_Downtown.csr <- envelope(downtown_2019_ppp, Gest, nsim=100)

Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.

plot(G_Downtown.csr)

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Downtown Core area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Downtown Core area are randomly distributed at 99% confident interval.

8.2.2 Kallang

H0: The distribution of the Airbnb Listings in Kallang are randomly distributed
H1: The distribution of the Airbnb Listings in Kallanga are not randomly distributed
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_Kallang = Gest(kallang_2019_ppp, correction = "border")
plot(G_Kallang)

Now, we can perform the spatial randomness test:

G_Kallang.csr <- envelope(kallang_2019_ppp, Gest, nsim=100)

Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.

plot(G_Kallang.csr)

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Kallang area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Kallang area are randomly distributed at 99% confident interval.

8.2.3 Bukit Timah

H0: The distribution of the Airbnb Listings in Bukit Timah are randomly distributed
H1: The distribution of the Airbnb Listings in Bukit Timah are not randomly distributed
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_Bukit_Timah = Gest(bukit_timah_2019_ppp, correction = "border")
plot(G_Bukit_Timah)

Now, we can perform the spatial randomness test:

G_Bukit_Timah.csr <- envelope(bukit_timah_2019_ppp, Gest, nsim=100)

Generating 100 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99,  100.

Done.

plot(G_Bukit_Timah.csr)

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Bukit Timah area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Bukit Timah area are randomly distributed at 99% confident interval.

8.2.4 Listings Conclusions

As we can see, for our selected planning areas in the Central Region, we can conclude with 99% confidence (and 1% risk of our conclusion being unsubstantiated/invalid) that Airbnb listings in said areas are not randomly distributed.

8.3 Singapore-wide Analysis

Before we move on to Section B, we should also check for the distribution of listings across the entire region.

H0: The distribution of the Airbnb Listings in Singapore are randomly distributed
H1: The distribution of the Airbnb Listings in Singapore are not randomly distributed
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01.

G_sg = Gest(listings_2019_ppp_sg, correction = "border")
plot(G_sg)

Now, we can perform the spatial randomness test:

# due to the sheer size of this, it's computationally expensive
# thus number of simulations have been set to 50 instead of the usual 100
G_sg.csr <- envelope(listings_2019_ppp_sg, Gest, nsim=50)

Generating 50 simulations of CSR  ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,  50.

Done.

plot(G_sg.csr)

Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings across the island are clustered. Hence, we reject the null hypothesis that Airbnb listings in Singapore are randomly distributed at 99% confident interval.

9.0 [B] Impact of COVID-19: Exploratory Spatial Data Analysis

Now that we have an analysis of pre-COVID-19, 2019 Airbnb listings in Singapore, we should compare at with post-COVID-19 2021 Airbnb listings so as to see the impact COVID-19 has had on the business. Our first task is to derive kernel density maps of all Airbnb listings and Airbnb by room type as at June 2019 and June 2021!

Something to note: do you still remember our Hands-On Ex 5B? Notice something we can apply from that analysis to this one?

Yup, you guessed it: our “room type” is the associated categorical measurement for our listing events! In other words, they’re marks - and we can perform marked point pattern analysis on them.

9.1 Interactive Map of 2019 and 2021 Listings

Let’s first take a look at the Airbnb listings by room type for both years to get a feel for our data:

# set to interactive mode
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(listings_2019_sf) +
  tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
          palette = c("#e76f51", "#e9c46a","#2a9d8f"))

# set to plot mode
tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(listings_2019_sf) +
  tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
          palette = c("#e76f51", "#e9c46a","#2a9d8f")) +
  tm_facets(by="room_type") +
  tm_layout(main.title = "Airbnb Listings by Room Type (2019)",
          main.title.position = "left",
          main.title.size = 1.2,
          frame = TRUE)

Like what we discovered in our previous analysis from Section A, the Central Region has the highest concentration of Airbnb listings - and of those listings, the most frequently-occurring room type is the “Entire home/apt”. Yet outside of the Central Region, you might notice that majority of the listings are in the “Private room” category instead. “Shared room” is the least frequently listed. Let’s see if this trend holds for 2021:

# set to interactive mode
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(listings_2021_sf) +
  tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
          # as columns are in alphabetical order, i've added the hotel's blue colour #264653 as the second input. you can switch this out with your own!
          palette = c("#e76f51", "#264653", "#e9c46a","#2a9d8f"))

# set to plot mode
tmap_mode("plot")
tm_shape(mpsz_sf) +
  tm_borders(alpha = 0.5) +
  tmap_options(check.and.fix = TRUE) +
tm_shape(listings_2021_sf) +
  tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
          palette = c("#e76f51", "#264653", "#e9c46a","#2a9d8f")) +
  tm_facets(by="room_type") +
  tm_layout(main.title = "Airbnb Listings by Room Type (2021)",
          main.title.position = "left",
          main.title.size = 1.2,
          frame = TRUE)

It’s to no one’s surprise that there’s a significant drop in the volume of airbnb listings for 2021, in the context of COVID-19 and the concerns it raises with regards to home-sharing. Yet the loss of rooms comes with the rise of a new room type: the “Hotel room” category. A bit of research tells us that Airbnb pushed this initiative in Dec 2018, giving better visibility to hotel owners as part of its venture into the hotel sector. Additionally, in accordance with 2019 trends, it seems that rooms in Central Region are still the most frequently-listed.

9.2 Individual Room Type Visualisatoin

From our initial visualisation in 9.1, we note that there are 4 room types:

Private room
Shared room
Entire home/apt
Hotel room (2021 only)

We’d like to look at the distribution for each of the room types, so let’s go ahead and create dataframes for them:

private_2019 <- listings_2019_sf[listings_2019_sf$room_type == 'Private room',]
shared_2021 <- listings_2019_sf[listings_2019_sf$room_type == 'Shared room',]
apt_2019 <- listings_2019_sf[listings_2019_sf$room_type == 'Entire home/apt',]

private_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Private room',]
shared_2019 <- listings_2021_sf[listings_2021_sf$room_type == 'Shared room',]
apt_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Entire home/apt',]
hotel_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Hotel room',]

Now, let’s visualise them in a side-by-side comparison:

par(mfrow=c(1,2))

plot(mpsz)
plot(private_2019,add=T,col='red',pch = 3)

plot(mpsz)
plot(private_2021,add=T,col='red',pch = 3)

Private rooms, visually, almost seem on par with the “entire home/apt” room type - we can see a significant drop in the distribution of private rooms across the region, but there’s still a rather strong presence in the Central Region area.

par(mfrow=c(1,2))

plot(mpsz)
plot(shared_2019,add=T,col='green',pch = 3)

plot(mpsz)
plot(shared_2021,add=T,col='green',pch = 3)

For shared rooms, there’s a drop, but it’s less discernible than that of private rooms.

par(mfrow=c(1,2))

plot(mpsz)
plot(apt_2019,add=T,col='orange',pch = 3)

plot(mpsz)
plot(apt_2021,add=T,col='orange',pch = 3)

For entire homes/apartments, the densest of the four, there’s a significant decrease from 2019 to 2020 - and it’s noteworthy that the North/Northeast regions seem relatively clear of listings of this category.

plot(mpsz)
plot(hotel_2021,add=T,col='blue',pch = 3)

Lastly, while the hotel room has no 2019 comparison, we can glean that it’s mostly centered upon the Central Region again - most likely the Downtown Core.

9.3 Data Pre-Processing

Before we head on to our marked point pattern analysis, we have to check our marked data first:

# returns the various features and the data type the feature is in e.g. num, chr...
str(listings_2019)

Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 8293 obs. of  14 variables:
  .. ..$ id                            : num [1:8293] 49091 50646 56334 71609 71896 ...
  .. ..$ name                          : chr [1:8293] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
  .. ..$ host_id                       : num [1:8293] 266763 227796 266763 367042 367042 ...
  .. ..$ host_name                     : chr [1:8293] "Francesca" "Sujatha" "Francesca" "Belinda" ...
  .. ..$ neighbourhood_group           : chr [1:8293] "North Region" "Central Region" "North Region" "East Region" ...
  .. ..$ neighbourhood                 : chr [1:8293] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
  .. ..$ room_type                     : chr [1:8293] "Private room" "Private room" "Private room" "Private room" ...
  .. ..$ price                         : num [1:8293] 81 80 68 200 92 102 203 120 51 58 ...
  .. ..$ minimum_nights                : num [1:8293] 180 90 6 1 1 1 1 7 30 30 ...
  .. ..$ number_of_reviews             : num [1:8293] 1 18 20 12 20 35 23 15 174 198 ...
  .. ..$ last_review                   : Date[1:8293], format: "2013-10-21" ...
  .. ..$ reviews_per_month             : num [1:8293] 0.01 0.28 0.21 0.13 0.21 0.35 0.23 0.16 1.92 2.13 ...
  .. ..$ calculated_host_listings_count: num [1:8293] 2 1 2 9 9 9 9 1 4 4 ...
  .. ..$ availability_365              : num [1:8293] 365 365 365 353 353 348 155 365 229 206 ...
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:8293, 1:2] 23825 22646 23922 41778 42057 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 7216 25166 44098 49226
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
  .. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n    BASEGEOGCRS[\"SVY21\",\n        DATUM[\"SVY21\",\n            ELLIPSOID["| __truncated__

str(listings_2021)

Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 4238 obs. of  14 variables:
  .. ..$ id                            : num [1:4238] 49091 50646 56334 71609 71896 ...
  .. ..$ name                          : chr [1:4238] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
  .. ..$ host_id                       : num [1:4238] 266763 227796 266763 367042 367042 ...
  .. ..$ host_name                     : chr [1:4238] "Francesca" "Sujatha" "Francesca" "Belinda" ...
  .. ..$ neighbourhood_group           : chr [1:4238] "North Region" "Central Region" "North Region" "East Region" ...
  .. ..$ neighbourhood                 : chr [1:4238] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
  .. ..$ room_type                     : chr [1:4238] "Private room" "Private room" "Private room" "Private room" ...
  .. ..$ price                         : num [1:4238] 81 80 67 177 81 81 52 40 72 41 ...
  .. ..$ minimum_nights                : num [1:4238] 180 90 6 90 90 90 14 14 90 8 ...
  .. ..$ number_of_reviews             : num [1:4238] 1 18 20 20 24 48 20 13 133 105 ...
  .. ..$ last_review                   : Date[1:4238], format: "2013-10-21" ...
  .. ..$ reviews_per_month             : num [1:4238] 0.01 0.22 0.16 0.29 0.34 0.67 0.2 0.16 1.3 1.78 ...
  .. ..$ calculated_host_listings_count: num [1:4238] 2 1 2 4 4 4 50 50 7 1 ...
  .. ..$ availability_365              : num [1:4238] 365 365 365 365 365 365 364 364 365 89 ...
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:4238, 1:2] 23815 22646 23924 41973 42052 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 7407 25330 43338 48392
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
  .. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n    BASEGEOGCRS[\"SVY21\",\n        DATUM[\"SVY21\",\n            ELLIPSOID["| __truncated__

As we are working with marked data, and we know that the values are categorical (different room types), we need to ensure that the marked field is of factor data type. However, as seen from the output, our room_type field is of chr data type, not factor! Let’s rectify that with the as.factor() function:

listings_2019@data$room_type <-as.factor(listings_2019@data$room_type)
listings_2021@data$room_type <-as.factor(listings_2021@data$room_type)
str(listings_2019)

Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 8293 obs. of  14 variables:
  .. ..$ id                            : num [1:8293] 49091 50646 56334 71609 71896 ...
  .. ..$ name                          : chr [1:8293] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
  .. ..$ host_id                       : num [1:8293] 266763 227796 266763 367042 367042 ...
  .. ..$ host_name                     : chr [1:8293] "Francesca" "Sujatha" "Francesca" "Belinda" ...
  .. ..$ neighbourhood_group           : chr [1:8293] "North Region" "Central Region" "North Region" "East Region" ...
  .. ..$ neighbourhood                 : chr [1:8293] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
  .. ..$ room_type                     : Factor w/ 3 levels "Entire home/apt",..: 2 2 2 2 2 2 2 2 2 2 ...
  .. ..$ price                         : num [1:8293] 81 80 68 200 92 102 203 120 51 58 ...
  .. ..$ minimum_nights                : num [1:8293] 180 90 6 1 1 1 1 7 30 30 ...
  .. ..$ number_of_reviews             : num [1:8293] 1 18 20 12 20 35 23 15 174 198 ...
  .. ..$ last_review                   : Date[1:8293], format: "2013-10-21" ...
  .. ..$ reviews_per_month             : num [1:8293] 0.01 0.28 0.21 0.13 0.21 0.35 0.23 0.16 1.92 2.13 ...
  .. ..$ calculated_host_listings_count: num [1:8293] 2 1 2 9 9 9 9 1 4 4 ...
  .. ..$ availability_365              : num [1:8293] 365 365 365 353 353 348 155 365 229 206 ...
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:8293, 1:2] 23825 22646 23922 41778 42057 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 7216 25166 44098 49226
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
  .. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n    BASEGEOGCRS[\"SVY21\",\n        DATUM[\"SVY21\",\n            ELLIPSOID["| __truncated__

str(listings_2021)

Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
  ..@ data       :'data.frame': 4238 obs. of  14 variables:
  .. ..$ id                            : num [1:4238] 49091 50646 56334 71609 71896 ...
  .. ..$ name                          : chr [1:4238] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
  .. ..$ host_id                       : num [1:4238] 266763 227796 266763 367042 367042 ...
  .. ..$ host_name                     : chr [1:4238] "Francesca" "Sujatha" "Francesca" "Belinda" ...
  .. ..$ neighbourhood_group           : chr [1:4238] "North Region" "Central Region" "North Region" "East Region" ...
  .. ..$ neighbourhood                 : chr [1:4238] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
  .. ..$ room_type                     : Factor w/ 4 levels "Entire home/apt",..: 3 3 3 3 3 3 3 3 3 3 ...
  .. ..$ price                         : num [1:4238] 81 80 67 177 81 81 52 40 72 41 ...
  .. ..$ minimum_nights                : num [1:4238] 180 90 6 90 90 90 14 14 90 8 ...
  .. ..$ number_of_reviews             : num [1:4238] 1 18 20 20 24 48 20 13 133 105 ...
  .. ..$ last_review                   : Date[1:4238], format: "2013-10-21" ...
  .. ..$ reviews_per_month             : num [1:4238] 0.01 0.22 0.16 0.29 0.34 0.67 0.2 0.16 1.3 1.78 ...
  .. ..$ calculated_host_listings_count: num [1:4238] 2 1 2 4 4 4 50 50 7 1 ...
  .. ..$ availability_365              : num [1:4238] 365 365 365 365 365 365 364 364 365 89 ...
  ..@ coords.nrs : num(0) 
  ..@ coords     : num [1:4238, 1:2] 23815 22646 23924 41973 42052 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  ..@ bbox       : num [1:2, 1:2] 7407 25330 43338 48392
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
  .. .. ..$ : chr [1:2] "min" "max"
  ..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
  .. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
  .. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n    BASEGEOGCRS[\"SVY21\",\n        DATUM[\"SVY21\",\n            ELLIPSOID["| __truncated__

Success!

9.4 Data Wrangling

9.3.1 Converting the SpatialPointsDataFrame into ppp format

Let’s convert our SpatialPointsDataFrame into ppp format with as.(x, “ppp”) or as.ppp(x). In doing so, the additional field in x data frame will become the marks of the point pattern z.

listings_2019_marked_ppp <- as(listings_2019, "ppp")
listings_2021_marked_ppp <- as(listings_2021, "ppp")

9.3.2 Avoiding duplicated spatial point event by using jittering method

To avoid duplicated spatial point event issues, we’ll use jittering:

listings_2019_marked_ppp_jit <- rjitter(listings_2019_marked_ppp, retry=TRUE, nsim=1, drop=TRUE)
listings_2021_marked_ppp_jit <- rjitter(listings_2021_marked_ppp, retry=TRUE, nsim=1, drop=TRUE)

Are there any duplicated spatial point events? Let’s check:

any(duplicated(listings_2019_marked_ppp_jit))

[1] FALSE

any(duplicated(listings_2021_marked_ppp_jit))

[1] FALSE

Yay! 😄 Our duplicated points issue has been resolved!

9.5 Choosing Planning Subzone

This time, we’ve learned from our initial visualisations that there are clusters for different room types in different regions. As such, unlike in Section A where the high density of location factors and listings in the Central Region encouraged us to focus mainly on planning subzones in the Central Region, in this Section B, our focus is wider. Here are the four main planning areas we will look at:

Downtown Core (high density of all types of room, especially entire room/apartments)
Jurong West (high density of private rooms)
Sembawang (small/middling density of shared room and private rooms )
Pasir Ris (small density of private and shared rooms, close to Changi Airport so might have insights)

9.6 Defining Study Area

Like what we’ve done just now, we’ve got to make owin objects:

# this one is a repeat, but I'm putting it here for reference!
downtown_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "DOWNTOWN CORE",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

jurong_west_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "JURONG WEST",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

sembawang_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "SEMBAWANG",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

pasir_ris_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "PASIR RIS",] %>%
    as('Spatial') %>%
    as('SpatialPolygons') %>%
    as('owin')

Then combine our point events object and owin object:

downtown_2019_marked_ppp <- listings_2019_marked_ppp_jit[downtown_owin]
jurong_west_2019_marked_ppp <- listings_2019_marked_ppp_jit[jurong_west_owin]
sembawang_2019_marked_ppp <- listings_2019_marked_ppp_jit[sembawang_owin]
pasir_ris_2019_marked_ppp <- listings_2019_marked_ppp_jit[pasir_ris_owin]

downtown_2021_marked_ppp <- listings_2021_marked_ppp_jit[downtown_owin]
jurong_west_2021_marked_ppp <- listings_2021_marked_ppp_jit[jurong_west_owin]
sembawang_2021_marked_ppp <- listings_2021_marked_ppp_jit[sembawang_owin]
pasir_ris_2021_marked_ppp <- listings_2021_marked_ppp_jit[pasir_ris_owin]

Let’s check their visualisation:

# reference for title placement:
# https://stackoverflow.com/questions/14660372/common-main-title-of-a-figure-panel-compiled-with-parmfrow
par(mfrow=c(2,2))

plot(downtown_2019_marked_ppp, main = "Downtown Core", which.marks = "room_type")
plot(jurong_west_2019_marked_ppp, main = "Jurong West", which.marks = "room_type")
plot(sembawang_2019_marked_ppp, main = "Sembawang", which.marks = "room_type")
plot(pasir_ris_2019_marked_ppp, main = "Pasir Ris", which.marks = "room_type")

mtext("Airbnb Listings by Room Type for selected Planning Areas in 2019", side = 3, line = -11, outer = TRUE)

par(mfrow=c(2,2))

plot(downtown_2021_marked_ppp, main = "Downtown Core", which.marks = "room_type")
plot(jurong_west_2021_marked_ppp, main = "Jurong West", which.marks = "room_type")
plot(sembawang_2021_marked_ppp, main = "Sembawang", which.marks = "room_type")
plot(pasir_ris_2021_marked_ppp, main = "Pasir Ris", which.marks = "room_type")

mtext("Airbnb Listings by Room Type for selected Planning Areas in 2021", side = 3, line = -11, outer = TRUE)

9.6 First-order Spatial Point Patterns Analysis (EDA)

Just like in Section A, let’s get the kernel density estimation. We need to use density() of the spatstat package to compute the kernel density objects, and then plot() it out. In addition, we’ll need to use rescale() to convert metres to kilometres (our desired unit of measurement). We can keep them as two separate lines of code, or we can combine them, like so:

2019 KDE Map + Analysis

# in this context, for rescale(), we are telling the function that the new unit length is 1000 meters (or 1km), 
#so the function will divide the old coordinate values by 1000 to obtain coordinates expressed in kilometers

plot((density(split(rescale(downtown_2019_marked_ppp, 1000)))), main = "Downtown")

Downtown: At first glance, it might seem like ‘private room’ has both a higher distribution and density than the rest due to its gradient, but we have to keep in mind the scale of the legend - indeed, entire homes/apartments reign supreme with their range of up to 600, which is easily 6 times that of private rooms and 40 times that of shared rooms. There does seems to be a rather ‘even’ distribution amongst the three types of listings: entire home/apartments: entire home and apartment takes the south end with private rooms, private rooms has a middling stake on the north end, and the rest of the north end has shared rooms.

plot((density(split(rescale(jurong_west_2019_marked_ppp, 1000)))), main = "Jurong West")

Jurong West: All room types are nestled to the East, though private rooms have a consistent presence throughout most of the area, and shared rooms has a significant (though smaller) cluster in the central area. Also - notice the legend, where private rooms reign supreme, with a gradient going up to 25 as compared to private room’s 7.

plot((density(split(rescale(sembawang_2019_marked_ppp, 1000)))), main = "Sembawang")

Sembawang: All room types are nestled southwards, with only entire homes/apartments having any presence in the central area. Once again, private rooms are the densest in this area, while entire apartments lose to even the shared rooms.

plot((density(split(rescale(pasir_ris_2019_marked_ppp, 1000)))), main = "Pasir Ris")

Pasir Ris: All room types complement each other, though there’s a stronger complementary relatinoship between apartments/shared rooms OR private rooms/shared rooms (since they cover each other’s unused areas).

2021 KDE Map + Analysis

plot((density(split(rescale(downtown_2021_marked_ppp, 1000)))), main = "Downtown")

Downtown: This is the only area with ‘Hotel’ presence, which is clustered to the North, competing with private rooms. There is evidently a niche area for each of the room types, and very few overlaps can be seen. In comparison to 2019, entire home/apartments are still reigning, but they stand at a scale of 250 instead of 600, showcasing a significant drop.

plot((density(split(rescale(jurong_west_2021_marked_ppp, 1000)))), main = "Jurong West")

Jurong West: An almost complete withdrawal of shared rooms in comparison to 2019, but otherwise the distribution of rooms stay the same. No hotel presence (thus the full-pink map).

plot((density(split(rescale(sembawang_2021_marked_ppp, 1000)))), main = "Sembawang")

Sembawang: While in 2019 entire homes/apartments still had some presence in the central area, in 2021 they have withdrawn and are now competing with the other two room types for the North area market. No hotel presence (thus the full-pink map).

plot((density(split(rescale(pasir_ris_2021_marked_ppp, 1000)))), main = "Pasir Ris")

Pasir Ris: Interestingly, private rooms are now smaller in scale compared to entire apartments, which is a reversal of their positions in 2019. At the same time, they seem to have achieved better distribution (though this could be attributed to the loss of listings = a seemingly more ‘even’ distribution for whatever listings are left). Both clusters of the share room type have dwindled to a faint presence. No hotel presence (thus the full-pink map).

9.7 Analysis & Conclusions

Across the board, for both 2019 and 2021, there are visible trends in which particular areas - complementary room types, or congregations toward the same areas (possibly due to the existence of tourist attractions or related landmarks/sites, as we discussed in Section A).

Something of note is the relationship between ‘entire home/apartments’ and ‘private rooms’ - at times they complement each other, covering areas the others are not in; other times they congregate in the same area and compete for market share. This could be useful to look into.

Another relationship to take note of is the apartments/shared rooms and private rooms/shared rooms - shared rooms generally seems to be on a small enough scale to appear in areas that neither of its bigger ‘competitors’ cover.

Evidently, as seen from the significant drop in scale across all areas for all room types, COVID-19 had significant impact on the listings: specifically, influencing island-wide withdrawal. Given the context of social distancing, safety measures, lockdown (which restricted travel and tourism that Airbnb listings are typically used for) and other governmental restrictions, this is to be expected.

10.0 [B] Impact of COVID-19: Second-order Spatial Point Patterns Analysis

10.1 Data Pre-processing: Assigning Marks

Before we proceed with our second order SPPA, we need to assign marks to the ppp objects + check the levels of the marks, like so:

# in this context, for rescale(), we are telling the function that the new unit length is 1000 meters (or 1km), 
#so the function will divide the old coordinate values by 1000 to obtain coordinates expressed in kilometers

plot((density(split(rescale(downtown_2019_marked_ppp, 1000)))), main = "Downtown")

marks(downtown_2019_marked_ppp) <- downtown_2019_marked_ppp$marks$room_type
marks(jurong_west_2019_marked_ppp) <- jurong_west_2019_marked_ppp$marks$room_type
marks(sembawang_2019_marked_ppp) <- sembawang_2019_marked_ppp$marks$room_type
marks(pasir_ris_2019_marked_ppp) <- pasir_ris_2019_marked_ppp$marks$room_type
levels(marks(downtown_2019_marked_ppp))

[1] "Entire home/apt" "Private room"    "Shared room"

levels(marks(jurong_west_2019_marked_ppp))

[1] "Entire home/apt" "Private room"    "Shared room"

levels(marks(sembawang_2019_marked_ppp))

[1] "Entire home/apt" "Private room"    "Shared room"

levels(marks(pasir_ris_2019_marked_ppp))

[1] "Entire home/apt" "Private room"    "Shared room"

marks(downtown_2021_marked_ppp) <- downtown_2021_marked_ppp$marks$room_type
marks(jurong_west_2021_marked_ppp) <- jurong_west_2021_marked_ppp$marks$room_type
marks(sembawang_2021_marked_ppp) <- sembawang_2021_marked_ppp$marks$room_type
marks(pasir_ris_2021_marked_ppp) <- pasir_ris_2021_marked_ppp$marks$room_type
levels(marks(downtown_2021_marked_ppp))

[1] "Entire home/apt" "Hotel room"      "Private room"   
[4] "Shared room"

levels(marks(jurong_west_2021_marked_ppp))

[1] "Entire home/apt" "Hotel room"      "Private room"   
[4] "Shared room"

levels(marks(sembawang_2021_marked_ppp))

[1] "Entire home/apt" "Hotel room"      "Private room"   
[4] "Shared room"

levels(marks(pasir_ris_2021_marked_ppp))

[1] "Entire home/apt" "Hotel room"      "Private room"   
[4] "Shared room"

10.2 Second-order Multi-type Point Patterns Analysis: Cross L-Function

Now that we’ve analysed our spatial point patterns (once again), we have to confirm our observations statistically with hypothesis testing. From our analysis above, there seems to be a strong relationship between entire homes/apartments and private rooms - in the sense that entire homes/apartments at the predominant player in the listings, but wherever they are scarcer, private rooms are more plentiful. Yet there are also times where they coexist in the same area of a subzone, implying a level of competition between them. In addition, it seems that this relationship is stronger in the post-COVID-19 (2021) dataset.

As such, we will use the Cross K-Function to look into this relationship.

H0: The distribution of the entire apartments and private rooms are spatially independent
H1: The distribution of the entire apartments and private rooms are spatially independent
Confidence level : 99%
Significance level : 0.01
The null hypothesis will be rejected if p-value is smaller than alpha value of 0.01 In light that 0.05 is the most common level of significance, I’ve decided to make it slightly stricter and use 0.01 - I believe a 1% risk of an incorrect hypothesis is good enough, especially considering the trade-offs between sensitivity and false positives for this hypothesis testing

The chosen planning subzones (PLN_AREA_N) for comparison are: - Downtown Core (high density of all types of room, especially entire homes/apartments) - Jurong West (high density of private rooms) - Pasir Ris (small density of private and shared rooms, close to Changi Airport so might have insights) Sembawang will be excluded on account of it having few ‘entire home/apt’ marks as compared to private rooms, which leads to errors when visualising.

10.3.1 Downtown Cross-K Function

downtown_2019_Lcross.csr <- envelope(downtown_2019_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(downtown_2019_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

downtown_2021_Lcross.csr <- envelope(downtown_2021_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(downtown_2021_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

Here, entire homes/apartments and private rooms are statistically independent at a distance of 250-255 and 290 (estimated) meters (2019), and 0-25 (estimated) and 175 onwards (estimated) meters (2021). Outside of those instances, they are not statistically independent as the empirical k-cross line is outside of the envelope of the 99% confidence level, and for that scale, we reject the null hypothesis. It is worth noting that up till around 200, most of the observed values (the black line) are above our theoretical poisson process (the red dotted line) - but henceafter, it starts dropping, meeting the threshold for statistical independence - which is where we do not reject the null hypothesis. This could imply that the greater the distance, the greater the probability of statistical independence.

10.3.2 Jurong West Cross-K Function

jurong_west_2019_Lcross.csr <- envelope(jurong_west_2019_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(jurong_west_2019_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

jurong_west_2021_Lcross.csr <- envelope(jurong_west_2021_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(jurong_west_2021_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

Here, entire homes/apartments and private rooms are statistically independent at a distance of 0-15 (estimated) and 180-210 (estimated) meters (2019), and mostly inconclusive results for 2021, but outside of that, they are not statistically independent as the empirical k-cross line is outside of the envelope of the 99% confidence level. Once again, it seems that the larger the value, the more likely it is to fall within the envelope.

10.3.3 Pasir Ris Cross-K Function

pasir_ris_2019_Lcross.csr <- envelope(pasir_ris_2019_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(pasir_ris_2019_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

pasir_ris_2021_Lcross.csr <- envelope(pasir_ris_2021_marked_ppp, 
                                 Lcross, 
                                 i="Entire home/apt", 
                                 j="Private room", 
                                 correction="border", 
                                 nsim=999)

Generating 999 simulations of CSR  ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.

Done.

plot(pasir_ris_2021_Lcross.csr, xlab="distance(m)", xlim=c(0,500))

Here, entire homes/apartments and private rooms are statistically independent at a distance of 0-40 (estimated) meters (2019), and for 2021 it is statisically independent (within the envelope) up till the envelope’s project. For most part (especially for the 2021 results) it seems statistically independent as the empirical k-cross line is within the envelope of 99% confidence level.

10.4 Conclusions

As we can see, entire homes/apartments and private rooms are found to be largely not spatially independent of each other especially at smaller distances, which lends weight to our earlier theory of a competitive or complementary relationship. However, we also must acknowledge that from the trends we’ve seen in our graph, larger distances seem to increase the probability of them being spatially independent.

As analysed in our Section A earlier, the tourist attractions, train services, hotels as well as majority of our self-sourced location factors (e.g. moneychangers) tend to be located in the dense areas of listings, regardless of the year. A possible reason for clustering, outside of the relationship between these room types, could be because said residences are located close to these factors (rather than because they’re located close to each other).

11.0 Acknowledgements

Once again, thank you Prof. Kam for our IS415 Geospatial Analytics and Applications course materials & resources 😄