This analysis aims to investigate the distribution of Airbnb listings and how location factors affect it.
The email that started Airbnb. Source
From a couple of air mattresses and a complimentary breakfast at $80/night to a billion-dollar international business, Airbnb stands as one of the greatest success stories of the sharing economy. Their peer-to-peer model of short-term home-sharing have found a foothold worldwide: from Washington DC to New Delhi, from Shanghai to Havana - convenient locations at low cost have garnered great support from hosts and travellers alike.
And how has Singapore accommodated this sector of the sharing revolution?
Turns out: it hasn’t. Short-term rentals of less than three months remain illegal in Singapore - 6 months if you’re renting out a HDB. In addition, there’s an occupancy limit for those looking to rent out their HDBs: 6 persons for a 3-room flat, and 9 persons for a 4-room flat or larger. An extensive public consultation spanning 4 years came back with the conclusion: you might be alright with renting out your home, but your neighbours aren’t. Security concerns, a loss of privacy, causing disturbances within the neighbourhood or even damaging common facilities are among the top concerns of private homeowners - and this sentiment comes across in the strict regulatory framework for renting out short-term accommodations.
Airbnb’s setbacks aren’t solely with government restrictions: with 2020 came the COVID-19 global pandemic, which caused significant upheaval in the lives of many - arguably, the lives of every. As this paper points out, Airbnb hosts were ‘spoiled by high returns and the prospect of the ever-increasing growth of the Airbnb market’ pre-COVID, only to be hit with cancelled bookings and overwhelming financial pressure that lead to (temporary) exodus from the short-term rental market, or transition into long-term rentals.
In the context of Singapore, how do these two play into Airbnb accommodations?
There are two parts to our analysis that we want to investigate, namely:
The R packages we’ll use for this analysis are:
ggplotly()
functionIn addition, the following tidyverse packages will be used:
# initialise a list of required packages
packages = c('sf', 'tidyverse', 'tmap', 'spatstat', 'raster', 'maptools', 'rgdal',
'kableExtra', 'plotly', 'ggthemes', 'onemapsgapi', 'devtools')
# for each package, check if installed and if not, install it
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
devtools::install_github("gadenbuie/xaringanExtra")
library(xaringanExtra)
# initialise a dataframe of our geospatial and aspatial dataset details
datasets <- data.frame(
Type=c("Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Geospatial",
"Aspatial",
"Aspatial",
"Aspatial",
"Aspatial"),
Name=c("Singapore National Boundary `CostalOutline`",
"Master Plan 2014 Subzone Boundary (Web) `MP14_SUBZONE_WEB_PL`",
"MRT & LRT Locations Aug 2021 `MRTLRTStnPtt`",
"Bus Stop Locations Aug 2021 `BusStop`",
"Taxi Stand Locations Aug 2021 `TaxiStop`",
"Historic Sites",
"Moneychanger Locations",
"Monuments",
"Museums",
"Singapore Airbnb Listings June 2019",
"Singapore Airbnb Listings June 2021",
"Tourism",
"Hotels"),
Format=c(".shp",
".shp",
".shp",
".shp",
".shp",
".kml",
".kml",
".kml",
".kml",
".csv",
".csv",
".csv",
".csv"),
Source=c("[data.gov.sg](https://data.gov.sg/dataset/national-map-polygon)",
"[data.gov.sg](https://data.gov.sg/dataset/master-plan-2014-subzone-boundary-web)",
"[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=mrt)",
"[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=bus%20stop)",
"[LTA Data Mall](https://datamall.lta.gov.sg/content/datamall/en/search_datasets.html?searchText=taxi)",
"[data.gov.sg](https://data.gov.sg/dataset/historic-sites)",
"[data.gov.sg](https://data.gov.sg/dataset/locations-of-money-changer)",
"[data.gov.sg](https://data.gov.sg/dataset/monuments)",
"[data.gov.sg](https://data.gov.sg/dataset/museums)",
"[Inside Airbnb](http://insideairbnb.com/get-the-data.html)",
"[Inside Airbnb](http://insideairbnb.com/get-the-data.html)",
"[OneMap API](https://www.onemap.gov.sg/docs/)",
"[OneMap API](https://www.onemap.gov.sg/docs/)")
)
# with reference to this guide on kableExtra:
# https://cran.r-project.org/web/packages/kableExtra/vignettes/awesome_table_in_html.html
# kable_material is the name of the kable theme
# 'hover' for to highlight row when hovering, 'scale_down' to adjust table to fit page width
library(knitr)
library(kableExtra)
kable(datasets, caption="Datasets Used") %>%
kable_material("hover", latex_options="scale_down")
Type | Name | Format | Source |
---|---|---|---|
Geospatial |
Singapore National Boundary CostalOutline
|
.shp | data.gov.sg |
Geospatial |
Master Plan 2014 Subzone Boundary (Web) MP14_SUBZONE_WEB_PL
|
.shp | data.gov.sg |
Geospatial |
MRT & LRT Locations Aug 2021 MRTLRTStnPtt
|
.shp | LTA Data Mall |
Geospatial |
Bus Stop Locations Aug 2021 BusStop
|
.shp | LTA Data Mall |
Geospatial |
Taxi Stand Locations Aug 2021 TaxiStop
|
.shp | LTA Data Mall |
Geospatial | Historic Sites | .kml | data.gov.sg |
Geospatial | Moneychanger Locations | .kml | data.gov.sg |
Geospatial | Monuments | .kml | data.gov.sg |
Geospatial | Museums | .kml | data.gov.sg |
Aspatial | Singapore Airbnb Listings June 2019 | .csv | Inside Airbnb |
Aspatial | Singapore Airbnb Listings June 2021 | .csv | Inside Airbnb |
Aspatial | Tourism | .csv | OneMap API |
Aspatial | Hotels | .csv | OneMap API |
Each source links to the respective dataset source, not the generic website - you can download and follow along 👍
Due to authorisation issues with the OneMap API, I chose to download the datasets from either Data.gov.sg or LTA Data Mall instead; However, they are all available as themes on OneMap. The list of available themes from the OneMap API was referenced from this document
In order to supplement the rail network of MRTs and LRTs, I’ve included bus stop and taxi stand locations - both of which contribute to our public transport, which, as this paper points out, adds to the total tourist experience (Duval, 2007) and may influence tourist satisfaction with the destination (Thompson & Schofield, 2007).
To encourage tourists to use public transport, a joint effort between the Land Transport Authority (LTA) and the Singapore Tourism Board (STB) lead to the official launch of the ‘Singapore Tourist Pass’ and a Public Transport Guide for Tourists (The “Travel with Ease” Guide). The former allows for unlimited bus and train rides for S$8/day, making public transport both cost-effective and convenient for tourists - the latter presents tourists with useful information on the various modes of public transport and gives travel directions to various places of interest. This is one of the contributors to Singapore taking top spot in McKinsey’s 2018 “Cities with Best Public Tranportation” Survey.
As such, I have reason to believe accessibility to the public transport system is a contributing factor to the attractiveness of Airbnb listings - and as we know, public transport extends past just the MRTs and LRTs! Thus my inclusion of bus stops (bus services are included as part of the Singapore Tourist Pass) and taxi stands (taxis are part of public transport, just not mass transport - and taxis are the fastest and most direct method to get to specific sightseeing areas).
Retrieved from Science Direct, Using Historical Heritage as a Factor in Tourism Development
In addition, historical and cultural objects, such as museums or monuments, are one of the main factors of tourism. As this paper, Using Historical Heritage as a Factor in Tourism Development puts it: the cultural capacity of the region, expressed through its historical heritage, plays a huge role in the development of tourism - something I believe could be showcased through this analysis. Thus my inclusion of museums, monuments and historical sites - some of which might overlap with the existing ‘tourism’ aspatial dataset, but we might be able to better determine the relationships
The inclusion of moneychangers is self-explanatory: at any point of the tourist’s journey, unexpected incidents that require extra funds might crop up, necessitating either using their credit card (which is known to incur significant fees, and might not be accepted in certain parts of a country or even the whole country) or a trip to the moneychanger.
Here’s a lil refresher on the import methods:
Note: we could use either st_read() or readOGR() to read our geospatial data, but I prefer to read it in as a simple features dataframe first and then use as_Spatial() to convert it to a Spatial* object when necessary. This way, I can perform visualisations on both the simple features dataframe and Spatial* objects!
In addition, since we have .kml files - recall what we learned in our very first exercise, Hands-On Exercise 02:
# reads in geospatial data and stores into respective dataframes
sg_sf <- st_read(dsn = "data/geospatial", layer="CostalOutline")
Reading layer `CostalOutline' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 60 features and 4 fields
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 2663.926 ymin: 16357.98 xmax: 56047.79 ymax: 50244.03
Projected CRS: SVY21
mpsz_sf <- st_read(dsn = "data/geospatial", layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
rail_network_sf <- st_read(dsn="data/geospatial", layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 171 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
# self-sourced data for further visualisations
bus_sf <- st_read(dsn="data/geospatial/selfsourced", layer="BusStop")
Reading layer `BusStop' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced'
using driver `ESRI Shapefile'
Simple feature collection with 5156 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 4427.938 ymin: 26482.1 xmax: 48282.5 ymax: 52983.82
Projected CRS: SVY21
taxi_sf <- st_read(dsn="data/geospatial/selfsourced", layer="TaxiStop")
Reading layer `TaxiStop' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced'
using driver `ESRI Shapefile'
Simple feature collection with 354 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6151.732 ymin: 27461.92 xmax: 45468.51 ymax: 48948.45
Projected CRS: SVY21
museums_sf <- st_read("data/geospatial/selfsourced/museums-kml.kml")
Reading layer `MUSEUM' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\museums-kml.kml'
using driver `KML'
Simple feature collection with 54 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6768 ymin: 1.248349 xmax: 104.0165 ymax: 1.383068
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
monuments_sf <- st_read("data/geospatial/selfsourced/monuments-kml.kml")
Reading layer `MONUMENTS' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\monuments-kml.kml'
using driver `KML'
Simple feature collection with 72 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6831 ymin: 1.264685 xmax: 103.9696 ymax: 1.447378
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
moneychangers_sf <- st_read("data/geospatial/selfsourced/locations-of-money-changer-kml.kml")
Reading layer `MONEYCHANGER' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\locations-of-money-changer-kml.kml'
using driver `KML'
Simple feature collection with 313 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6781 ymin: 1.24506 xmax: 103.9897 ymax: 1.448227
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
historic_sites_sf <- st_read("data/geospatial/selfsourced/historic-sites-kml.kml")
Reading layer `HISTORICSITES' from data source
`C:\IS415\IS415_msty\_posts\2021-09-19-take-home-exercise-2\data\geospatial\selfsourced\historic-sites-kml.kml'
using driver `KML'
Simple feature collection with 99 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6993 ymin: 1.252663 xmax: 103.9906 ymax: 1.448855
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
Overall, most of the projected CRS are the ‘Singapore Projected Coordinate system’, SVY21 (ESPG Code 3414), whichi s appropriate for our analysis (which is Singapore-centric). However, the museums
, monuments
, moneychangers
and historic_sites
are using the ‘World Geodetic System 1984’, WGS84. We’ll address this and check on their CRS with st_crs() later on in Section 3.3.
Also, notice that museums
, monuments
, moneychangers
and historic_sites
have their dimensions listed as ‘XYZ’: it has a z-dimension, though as we can see from the z_range, both zmin and zmax are at 0. As it is irrelevant to our analysis, we’ll drop this with st_zm() in our pre-processing.
Before we even start visualising our data, we have to first check for two things: invalid geometries and missing values, which could impact future calculations and representations if not addressed. In addition, we have to drop the z-dimension from some of our dataframes.
Reference was taken from the senior sample submissions for the code for this section, with credit to Xiao Rong Wong’s Geographic Analysis of the Supply & Demand of Childcare Services in Singapore
As we noticed in the section above, certain dataframes have a Z-dimension. We’ll take care of with st_zm(), a function that drops Z (or M) dimensions from feature geometries and appropriate reset the classes.
# drops the Z-dimension from our dataframes
# due to the length of the output, I've opted to hide the results
# reference for manipulating output messages: https://yihui.org/knitr/demo/output/
museums_sf <- st_zm(museums_sf)
monuments_sf <- st_zm(monuments_sf)
moneychangers_sf <- st_zm(moneychangers_sf)
historic_sites_sf <- st_zm(historic_sites_sf)
# once again, due to the length of output, I've opted to leave this as a non-evaluated line of code
# however, I've included a screenshot of the first portion of the output!
museums
Let us first check for invalid geometries:
# function breakdown:
# the st_is_valid function checks whether a geometry is valid
# which returns the indices of certain values based on logical conditions
# length returns the length of data objects
# checks for the number of geometries that are NOT valid
length(which(st_is_valid(sg_sf) == FALSE))
[1] 1
[1] 9
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
# Alternative Method
# test <- st_is_valid(sg_sf,reason=TRUE)
# length(which(test!= "Valid Geometry"))
# credit to Rajiv Abraham Xavier https://rpubs.com/rax/Take_Home_Ex01
Note: even if you didn’t drop the Z co-ordinate for this section, our st_is_valid() function will automatically do it for you, like so:
As we can see from the output, sg
has 1 invalid geometry while mpsz
has 9 invalid geometries. With reference to this article on checking and creating validity, let’s address them and check again:
# st_make_valid takes in an invalid geometry and outputs a valid one with the lwgeom_makevalid method
sg_sf <- st_make_valid(sg_sf)
length(which(st_is_valid(sg_sf) == FALSE))
[1] 0
[1] 0
Success! 🎉
Now, let us check for missing values:
# the rowSums(is.na(sg_sf))!=0 checks every row if there are NA values, returning TRUE or FALSE
# the sg_sf [] 'wrapper' prints said rows that contain NA values
sg_sf[rowSums(is.na(sg_sf))!=0,]
Simple feature collection with 0 features and 4 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] GDO_GID MSLINK MAPID COSTAL_NAM geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 22616.75 ymin: 47793.68 xmax: 22616.75 ymax: 47793.68
Projected CRS: SVY21
BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
264 47201 UNK <NA> POINT (22616.75 47793.68)
Simple feature collection with 354 features and 2 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6151.732 ymin: 27461.92 xmax: 45468.51 ymax: 48948.45
Projected CRS: SVY21
First 10 features:
TYPE_CD TYPE_CD_DE geometry
1 <NA> TAXI STOP POINT (30571.45 31254.88)
2 <NA> TAXI STOP POINT (29420.41 30957.16)
3 <NA> TAXI PICK UP / DROP OFF POINT (30600.66 30530.26)
4 <NA> TAXI STAND POINT (30952.44 30436.43)
5 <NA> TAXI STAND POINT (30819.86 30858.43)
6 <NA> TAXI STAND POINT (30914.61 30817.98)
7 <NA> TAXI STAND POINT (30732.65 30446.24)
8 <NA> TAXI STAND POINT (31035.63 30580.66)
9 <NA> TAXI STAND POINT (30556.68 30247.45)
10 <NA> TAXI STAND POINT (30629.45 30577.2)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
There doesn’t seem to be any missing values for any of our provided geospatial datasets ✨ But our self-sourced ones have some missing values, so let’s address them. After a bit of EDA and testing, I’ve realised that all the observations for TYPE_CD
in the taxi dataframe are NA: likely because there is a separate column, TYPE_CD_DE
, that describes the taxi stand type (i.e. overlap of features). As such, we should drop that column first before re-testing for NA values.
# using select(), removes the TYPE_CD column from taxi_sf dataframe
# and assigns this new dataframe back to taxi
# reference: https://www.statology.org/remove-columns-in-r/
taxi_sf <- taxi_sf %>% select(-TYPE_CD)
# checking for NA values again
taxi_sf[rowSums(is.na(taxi_sf))!=0,]
Simple feature collection with 2 features and 1 field
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6151.732 ymin: 35873.48 xmax: 6199.635 ymax: 35938.87
Projected CRS: SVY21
TYPE_CD_DE geometry
325 <NA> POINT (6199.635 35938.87)
342 <NA> POINT (6151.732 35873.48)
Now, let’s move on to removing the NA observations:
And let’s check for missing values one last time, just to be sure:
# the rowSums(is.na(bus))!=0 checks every row if there are NA values, returning TRUE or FALSE
# the bus 'wrapper' prints said rows that contain NA values
bus_sf[rowSums(is.na(bus_sf))!=0,]
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 1 field
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] TYPE_CD_DE geometry
<0 rows> (or 0-length row.names)
Alright, our geospatial data pre-processing is done! 🥳
When we imported the data, we made a mental note to verify the projected CRS - and we’ll do that now!
# using st_crs() function to check on the CRS and ESPG Codes
st_crs(sg_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(rail_network_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(bus_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(taxi_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(museums_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(monuments_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(moneychangers_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(historic_sites_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
Hmm. That’s not right - our projected CRS should be SVY21 (ESPG Code 3414), but for our given data, the current ESPG Codes are 9001. In addition, some of our self-sourced datasets are in WG84 (ESPG Code 4326) as well. Let’s assign the correct ESPG Codes:
# with st_set_crs(), we can assign the appropriate ESPG Code
sg_sf <- st_set_crs(sg_sf, 3414)
mpsz_sf <- st_set_crs(mpsz_sf, 3414)
rail_network_sf <- st_set_crs(rail_network_sf, 3414)
bus_sf <- st_set_crs(bus_sf, 3414)
taxi_sf <- st_set_crs(taxi_sf, 3414)
# with st_transform(), we can change from one CRS to another
museums_sf <- st_transform(museums_sf, crs=3414)
monuments_sf <- st_transform(monuments_sf, crs=3414)
moneychangers_sf <- st_transform(moneychangers_sf, crs=3414)
historic_sites_sf <- st_transform(historic_sites_sf, crs=3414)
And now, let’s check if the CRS has been properly assigned:
# using st_crs() function to check on the CRS and ESPG Codes
st_crs(sg_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(rail_network_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(bus_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(taxi_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(museums_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(monuments_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(moneychangers_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(historic_sites_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
Whew, that’s a long output - but as we can see, it’s all good! 💡
Now that we’ve finished our standard pre-processing, let’s try visualising our data:
# plots the geometry only - these are the 'base' maps
# alternatively, we can use plot(sg$geometry)
plot(st_geometry(sg_sf))
plot(st_geometry(mpsz_sf))
The main difference between sg
and mpsz
is that the former is a nationwide map, while the latter shows the subzones. Whichever we use depends on the scale of analysis: we might use sg
for a general overview, while we’ll tap on the subzone divisions in mpsz
if we want to look into specific subzones.
# switching to interactive map for better visualisation and to explore specific areas if needed
tmap_mode("view")
tm_shape(rail_network_sf) +
tm_dots(col="purple", size=0.05)
# return tmap mode to plot for future visualisations
tmap_mode("plot")
These are the MRT/LRT train stations. There are 3 distinct clusters - the two in the upper area, (West and Northeast regions specifically) are due to the LRT lines, while the cluster in the Central region is due interconnections of different lines in the CBD and city areas for increased connectivity
This also matches up to the existing railway network map:
Retrieved from MRT.SG. Original work by Wikipedia user Seloloving.
Let’s also visualise our self-sourced Bus Stops & Taxi Stands:
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(bus_sf) +
tm_dots(col="red", size=0.05) +
tm_layout(main.title = "Bus Stops",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE)
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(taxi_sf) +
tm_dots(col="blue", size=0.05) +
tm_layout(main.title = "Taxi Stands",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE)
These will be good supplements to our railway_network
- we can look at how the transport system and general connectivity of a place could play a role in the distribution of airbnb listings, especially since many tourists depend on the public transport system to get around a country due to its accessibility and cost, characteristics that are amplified in Singapore’s limited 728.6 km².
Let’s also take a look at the historical/cultural points of interest, namely museums, monuments, moneychangers and historical sites. As mentioned in 2.2’s Justification of Datasets used, tourists visit a country for a litany of reasons, but the top motivations are to understand the history and culture of the country, while moneychangers are crucial to their funds.
For this interactive graph, ‘warm’ colours (red, orange, yellow) represents places of historical/cultural interest (museums, monuments and historical sites) while ‘cool’ colours (blue) represents the moneychangers.
tmap_mode("view")
tm_shape(museums_sf) +
tm_dots(alpha=0.4, #affects transparency of points
col="red",
size=0.05) +
tm_shape(monuments_sf) +
tm_dots(alpha=0.4,
col="orange",
size=0.05) +
tm_shape(moneychangers_sf) +
tm_dots(alpha=0.4,
col="blue",
size=0.05) +
tm_shape(historic_sites_sf) +
tm_dots(alpha=0.4,
col="yellow",
size=0.05)
# return tmap mode to plot for future visualisations
tmap_mode("plot")
Now that we have our geospatial data sorted out, let’s move on to our aspatial data! We’ll need to import the data first. You might remember that in our last take-home exercise, we discovered the existence of duplicate columns when performing EDA. It’s important to understand what we’re working with and to check for any discrepancies! As such, after importing, let’s take a look at our dataframes with glimpse():
# reads the the respective aspatial data and stores it as dataframes
# I've included show_col_types=FALSE so that there won't be a output of column types
# since in the next portion, we'll be using glimpse() to look at columns specifically
listings_2019 <- read_csv("data/aspatial/listings_30062019.csv", show_col_types=FALSE)
listings_2021 <- read_csv("data/aspatial/listings_29062021.csv", show_col_types=FALSE)
hotels <- read_csv("data/aspatial/hotels.csv", show_col_types=FALSE)
tourism <- read_csv("data/aspatial/tourism.csv", show_col_types=FALSE)
# gives an associated attribute information of the dataframe
glimpse(listings_2019)
Rows: 8,293
Columns: 16
$ id <dbl> 49091, 50646, 56334, 71609, 7~
$ name <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id <dbl> 266763, 227796, 266763, 36704~
$ host_name <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group <chr> "North Region", "Central Regi~
$ neighbourhood <chr> "Woodlands", "Bukit Timah", "~
$ latitude <dbl> 1.44255, 1.33235, 1.44246, 1.~
$ longitude <dbl> 103.7958, 103.7852, 103.7967,~
$ room_type <chr> "Private room", "Private room~
$ price <dbl> 81, 80, 68, 200, 92, 102, 203~
$ minimum_nights <dbl> 180, 90, 6, 1, 1, 1, 1, 7, 30~
$ number_of_reviews <dbl> 1, 18, 20, 12, 20, 35, 23, 15~
$ last_review <date> 2013-10-21, 2014-12-26, 2015~
$ reviews_per_month <dbl> 0.01, 0.28, 0.21, 0.13, 0.21,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 9, 9, 9, 9, 1, 4, 4,~
$ availability_365 <dbl> 365, 365, 365, 353, 353, 348,~
glimpse(listings_2021)
Rows: 4,238
Columns: 16
$ id <dbl> 49091, 50646, 56334, 71609, 7~
$ name <chr> "COZICOMFORT LONG TERM STAY R~
$ host_id <dbl> 266763, 227796, 266763, 36704~
$ host_name <chr> "Francesca", "Sujatha", "Fran~
$ neighbourhood_group <chr> "North Region", "Central Regi~
$ neighbourhood <chr> "Woodlands", "Bukit Timah", "~
$ latitude <dbl> 1.44129, 1.33432, 1.44041, 1.~
$ longitude <dbl> 103.7957, 103.7852, 103.7967,~
$ room_type <chr> "Private room", "Private room~
$ price <dbl> 81, 80, 67, 177, 81, 81, 52, ~
$ minimum_nights <dbl> 180, 90, 6, 90, 90, 90, 14, 1~
$ number_of_reviews <dbl> 1, 18, 20, 20, 24, 48, 20, 13~
$ last_review <date> 2013-10-21, 2014-07-08, 2015~
$ reviews_per_month <dbl> 0.01, 0.22, 0.16, 0.29, 0.34,~
$ calculated_host_listings_count <dbl> 2, 1, 2, 4, 4, 4, 50, 50, 7, ~
$ availability_365 <dbl> 365, 365, 365, 365, 365, 365,~
glimpse(hotels)
Rows: 422
Columns: 9
$ NAME <chr> "Jayleen Clarke Quay Hotel", "JEN Singapor~
$ ADDRESSPOSTALCODE <dbl> 59390, 238858, 249716, 399041, 238485, 399~
$ ADDRESSSTREETNAME <chr> "25 New Bridge Road", "277 Orchard Road , ~
$ HYPERLINK <chr> "jayleenclarkequay@gmail.com", "singaporeo~
$ TOTALROOMS <dbl> 20, 499, 565, 42, 81, 33, 17, 634, 56, 451~
$ KEEPERNAME <chr> "James Federick Chong Kah Yean", "Kuok Oon~
$ Lat <dbl> 1.288715, 1.300510, 1.304271, 1.311586, 1.~
$ Lng <dbl> 103.8475, 103.8392, 103.8239, 103.8775, 10~
$ ICON_NAME <chr> "hotel.gif", "hotel.gif", "hotel.gif", "ho~
glimpse(tourism)
Rows: 107
Columns: 17
$ NAME <chr> "Chinatown Heritage Centre, Singapore", "T~
$ DESCRIPTION <chr> "Experience how Singapore’s early Chinese ~
$ ADDRESSSTREETNAME <chr> "48 Pagoda Street", "158 Telok Ayer Street~
$ HYPERLINK <chr> "http://www.singaporechinatown.com.sg/", "~
$ PHOTOURL <chr> "www.yoursingapore.com/content/dam/desktop~
$ URL_PATH <chr> "www.yoursingapore.com/en/see-do-singapore~
$ IMAGE_ALT_TEXT <chr> "Learn more about local Chinese culture at~
$ PHOTOCREDITS <chr> "Joel Chua DY", "Joel Chua DY", "Joel Chua~
$ LASTMODIFIED <dttm> 2015-11-02 02:16:52, 2015-11-02 02:20:57,~
$ LATITUDE <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ LONGTITUDE <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ META_DESCRIPTION <chr> "At the Chinatown Heritage Centre, experie~
$ OPENING_HOURS <chr> "Daily, 9am – 8pm,Last entry at 7pm.*China~
$ Lat <dbl> 1.283510, 1.280940, 1.310070, 1.277219, 1.~
$ Lng <dbl> 103.8444, 103.8476, 103.8994, 103.8373, 10~
$ ICON_NAME <chr> "tourist_spot.gif", "tourist_spot.gif", "t~
$ ADDRESSPOSTALCODE <dbl> NA, NA, NA, 0, NA, NA, NA, NA, NA, NA, NA,~
With this, we can observe that all our data have Longitude and Latitude features - though some of them use Latitude
and Longtitude
while others use Lat
and Lng
. Also, notice that for tourism
, there are both Lat
&Lng
as well as Latitude
&Longtitude
- from the first few entries, they seem identical, but we’ll see which one we go with when checking for missing values.
This also indicates that their projected CRS is the World Geodetic System 1984 (WGS84), which is the most commonly used and favoured CRS. We’ll need to re-assign that to fit with our geospatial data, which is using SVY21.
Like with our geospatial data, we should check if there are missing values. We’re only concerned about the longitude and latitude specifically:
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 0
[1] 1
[1] 1
Note: PSST! If you’d like a way to show the number of NA values for columns with NA values only, I’d recommend checking this out.
As we can see, tourism
’s Lat vs LATITUDE issue can now be resolved. There seems to be one missing value in the LATITUDE/LONGITUDE features as opposed to the Lat/Lng one (which has no missing values), so we’ll use Lat/Lng for tourism
. Let’s also check the row with missing values:
tourism[(is.na(tourism$LONGTITUDE)),]
# A tibble: 1 x 17
NAME DESCRIPTION ADDRESSSTREETNAME HYPERLINK PHOTOURL URL_PATH
<chr> <chr> <chr> <chr> <chr> <chr>
1 Cruis~ Watch the Si~ <NA> <NA> www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
# PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
# LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
# Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
tourism[(is.na(tourism$LATITUDE)),]
# A tibble: 1 x 17
NAME DESCRIPTION ADDRESSSTREETNAME HYPERLINK PHOTOURL URL_PATH
<chr> <chr> <chr> <chr> <chr> <chr>
1 Cruis~ Watch the Si~ <NA> <NA> www.your~ www.your~
# ... with 11 more variables: IMAGE_ALT_TEXT <chr>,
# PHOTOCREDITS <chr>, LASTMODIFIED <dttm>, LATITUDE <dbl>,
# LONGTITUDE <dbl>, META_DESCRIPTION <chr>, OPENING_HOURS <chr>,
# Lat <dbl>, Lng <dbl>, ICON_NAME <chr>, ADDRESSPOSTALCODE <dbl>
As we can see, the observation with missing latitude/longitude is ‘Cruises from Singapore’, which makes sense considering that a cruise ship would be inside Singapore waters, or outside of Singapore’s geographical boundaries - both of which would be hard (if not nigh impossible) to represent on a map representing Singapore’s landmass. As such, let’s remove that observation:
And we’ll check again:
That’s done and dusted! 😄
Now, we have to transform our dataframes into sf objects, and then verify and transform our assigned CRS for our aspatial datasets. In fact, we already know that (just like some of our geospatial datasets above) these datasets are using WGS84 (ESPG Code 4326) on account of using Latitude and Longitude - so we can do these two things in one go:
# you can chose to reuse the variable name as-is, especially if you have no plans to revisit older variables
# however, i believe it to be better practice to rename variables so that we know what type of data we're working with at any point of time!\
# st_as_sf outputs a simple features data frame
listings_2019_sf <- st_as_sf(listings_2019,
# our coordinates are the longitude and latitude
coords = c("longitude",
"latitude"),
# the geographical features are in longitude & latitude, in decimals
# as such, WGS84 is the most appropriate coordinates system
crs = 4326) %>%
#afterwards, we transform it to SVY21, our desired CRS
st_transform(crs=3414)
listings_2021_sf <- st_as_sf(listings_2021,
coords = c("longitude",
"latitude"),
crs = 4326) %>%
st_transform(crs=3414)
hotels_sf <- st_as_sf(hotels,
coords = c("Lng",
"Lat"),
crs = 4326) %>%
st_transform(crs=3414)
tourism_sf <- st_as_sf(tourism,
coords = c("Lng",
"Lat"),
crs = 4326) %>%
st_transform(crs=3414)
As we’re doing spatial points analysis, we’ll need to convert both our geospatial and aspaital data into the appropriate formats.
Firstly, we’ll need to convert the simple features data frames into Spatial* classes if we can to do SPPA on it:
mpsz <- as_Spatial(mpsz_sf)
sg <- as_Spatial(sg_sf)
rail_network <-as_Spatial(rail_network_sf)
bus <- as_Spatial(bus_sf)
taxi <- as_Spatial(taxi_sf)
museums <- as_Spatial(museums_sf)
monuments <- as_Spatial(monuments_sf)
moneychangers <- as_Spatial(moneychangers_sf)
historic_sites <- as_Spatial(historic_sites_sf)
listings_2019 <- as_Spatial(listings_2019_sf)
listings_2021 <- as_Spatial(listings_2021_sf)
hotels <- as_Spatial(hotels_sf)
tourism <- as_Spatial(tourism_sf)
Let’s check the newly-converted Spatial* classes!
mpsz
class : SpatialPolygonsDataFrame
features : 323
extent : 2667.538, 56396.44, 15748.72, 50256.33 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 15
names : OBJECTID, SUBZONE_NO, SUBZONE_N, SUBZONE_C, CA_IND, PLN_AREA_N, PLN_AREA_C, REGION_N, REGION_C, INC_CRC, FMEL_UPD_D, X_ADDR, Y_ADDR, SHAPE_Leng, SHAPE_Area
min values : 1, 1, ADMIRALTY, AMSZ01, N, ANG MO KIO, AM, CENTRAL REGION, CR, 00F5E30B5C9B7AD8, 16409, 5092.8949, 19579.069, 871.554887798, 39437.9352703
max values : 323, 17, YUNNAN, YSSZ09, Y, YISHUN, YS, WEST REGION, WR, FFCCF172717C2EAF, 16409, 50424.7923, 49552.7904, 68083.9364708, 69748298.792
sg
class : SpatialPolygonsDataFrame
features : 60
extent : 2663.926, 56047.79, 16357.98, 50244.03 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 4
names : GDO_GID, MSLINK, MAPID, COSTAL_NAM
min values : 1, 1, 0, ISLAND LINK
max values : 60, 67, 0, SINGAPORE - MAIN ISLAND
listings_2019
class : SpatialPointsDataFrame
features : 8293
extent : 7215.566, 44098.31, 25166.35, 49226.35 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 14
names : id, name, host_id, host_name, neighbourhood_group, neighbourhood, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365
min values : 49091, -, 23666, (Email hidden by Airbnb), Central Region, Ang Mo Kio, Entire home/apt, 0, 1, 0, 15656, 0.01, 1, 0
max values : 36053005, ZR2- NEW! Sunny & Modern Apt 4 mins to Orchard Rd, 271165196, Zuzana, West Region, Yishun, Shared room, 13999, 1000, 308, 18072, 12.09, 277, 365
listings_2021
class : SpatialPointsDataFrame
features : 4238
extent : 7406.989, 43337.89, 25330, 48391.55 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 14
names : id, name, host_id, host_name, neighbourhood_group, neighbourhood, room_type, price, minimum_nights, number_of_reviews, last_review, reviews_per_month, calculated_host_listings_count, availability_365
min values : 49091, ! BEST ! LOCATION private room at Central Orchard, 23666, <U+4F73><U+4F73>, Central Region, Ang Mo Kio, Entire home/apt, 0, 1, 0, 15456, 0.01, 1, 0
max values : 50718590, Zimmer nah der MRT (Downtown Line), 404826677, Zuzu, West Region, Yishun, Shared room, 10286, 1000, 370, 18806, 19.74, 165, 365
hotels
class : SpatialPointsDataFrame
features : 422
extent : 5939.241, 45334.18, 25379.44, 44562.4 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 7
names : NAME, ADDRESSPOSTALCODE, ADDRESSSTREETNAME, HYPERLINK, TOTALROOMS, KEEPERNAME, ICON_NAME
min values : 30 BENCOOLEN, 18956, 1 Bayfront Avenue, 96ytlim@gmail.com, 4, Adel Aramouni, hotel.gif
max values : YotelAir Singapore Changi Airport, 819666, 99 IRRAWADDY ROAD, # 22-00 ROYAL SQUARE AT NOVENA, zubair@dam.com.sg, 2561, Zhang YuanQing, hotel.gif
tourism
class : SpatialPointsDataFrame
features : 106
extent : 11380.23, 43659.54, 22869.34, 47596.73 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 15
names : NAME, DESCRIPTION, ADDRESSSTREETNAME, HYPERLINK, PHOTOURL, URL_PATH, IMAGE_ALT_TEXT, PHOTOCREDITS, LASTMODIFIED, LATITUDE, LONGTITUDE, META_DESCRIPTION, OPENING_HOURS, ICON_NAME, ADDRESSPOSTALCODE
min values : Adventure Cove Waterpark™ Singapore, A feat of engineering, an architectural statement and a sheer aesthetic triumph, Marina Bay Sands<sup>®</sup> has upped the ante for buildings in Singapore., 1 Beach Road, http://acm.org.sg/, /content/dam/desktop/global/see-do-singapore/architecture/hajjah-fatimah-mosque-carousel01-rect.jpg, www.yoursingapore.com/en/see-do-singapore/architecture/historical/capitol-building-singapore.html, Adults and kids of all ages who are not even science buffs will have fun at the Singapore Science Centre., ©Darren Soh/National Gallery, 1427691447.648, 1.2230965, 103.68398, A tranquil patch of imperial China in the west of Singapore is pleasant respite from the bustle of the city., 50th storey Skybridge, Daily, 9am –10pm, tourist_spot.gif, 0
max values : Victoria Theatre Singapore, With so many attractions packed into this 15-km stretch of beaches, you’ll never run out of things to do., Seng Poh Road and Tiong Bahru Road, https://www.pub.gov.sg/marinabarrage, www.yoursingapore.com/content/dam/desktop/global/see-do-singapore/recreation-leisure/universal-studios-singapore-carousel01-rect.jpg, www.yoursingapore.com/en/see-do-singapore/recreation-leisure/viewpoints/singapore-flyer.html, Whether you prefer water sports, rollerblading or cycling, find a myriad of things to do at East Coast Park, Singapore., Wildlife Reserves Singapore, 1446544541.364, 1.44672, 103.97403, With the Henderson Waves bridge, form meets function to stunning effect., Visits are by appointment only.Visitors must sign up in advance for heritage tours which fall on:Monday, 2pm – 3pm,Tuesday, 6.30pm – 7.30pm,Thursday, 10am – 11am,Saturday, 11am – 12pm, tourist_spot.gif, 0
rail_network
class : SpatialPointsDataFrame
features : 171
extent : 6138.311, 45254.86, 27555.06, 47854.2 (xmin, xmax, ymin, ymax)
crs : +proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs
variables : 3
names : OBJECTID, STN_NAME, STN_NO
min values : 1, ADMIRALTY MRT STATION, BP1
max values : 199, YISHUN MRT STATION, TE8
Due to the sheer length of output of our self-sourced data, I’ve opted to to make code evluation FALSE for this code chunk. Screenshots of the first section of the output are included instead!
# geospatial data (self-sourced)
bus
taxi
museums
monuments
moneychangers
historic_sites
spatstat requires the analytical data to be in ppp object form, but since there is no way to directly convert a Spatial* classes into ppp object, we’ll need to convert the Spatial* classes into a generic Spatial object first, then convert the generic sp object into ppp object form.
# convert into respective sp (in our case, either polygons or points)
mpsz_sp <- as(mpsz, "SpatialPolygons")
sg_sp <- as(sg, "SpatialPolygons")
rail_network_sp <-as(rail_network, "SpatialPoints")
listings_2019_sp <- as(listings_2019, "SpatialPoints")
listings_2021_sp <- as(listings_2021, "SpatialPoints")
hotels_sp <- as(hotels, "SpatialPoints")
tourism_sp <- as(tourism, "SpatialPoints")
bus_sp <- as(bus, "SpatialPoints")
taxi_sp <- as(taxi, "SpatialPoints")
museums_sp <- as(museums, "SpatialPoints")
monuments_sp <- as(monuments, "SpatialPoints")
moneychangers_sp <- as(moneychangers, "SpatialPoints")
historic_sites_sp <- as(historic_sites, "SpatialPoints")
Note that there is no way of coercing SpatialPolygons to ppp format - nor is there any need to. As such, we won’t be including our ‘base maps’, mpsz and sg.
# from sp object, convert into ppp format
listings_2019_ppp <- as(listings_2019_sp, "ppp")
listings_2021_ppp <- as(listings_2021_sp, "ppp")
rail_network_ppp <-as(rail_network_sp, "ppp")
hotels_ppp <- as(hotels_sp, "ppp")
tourism_ppp <- as(tourism_sp, "ppp")
bus_ppp <- as(bus_sp, "ppp")
taxi_ppp <- as(taxi_sp, "ppp")
museums_ppp <- as(museums_sp, "ppp")
monuments_ppp <- as(monuments_sp, "ppp")
moneychangers_ppp <- as(moneychangers_sp, "ppp")
historic_sites_ppp <- as(historic_sites_sp, "ppp")
Now, let’s check the newly-converted ppp objects with summary():
# from sp object, convert into ppp format
summary(rail_network_ppp)
Planar point pattern: 171 points
Average intensity 2.153565e-07 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [6138.31, 45254.86] x [27555.06, 47854.2] units
(39120 x 20300 units)
Window area = 794032000 square units
summary(listings_2019_ppp)
Planar point pattern: 8293 points
Average intensity 9.345289e-06 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [7215.57, 44098.31] x [25166.35, 49226.35] units
(36880 x 24060 units)
Window area = 887399000 square units
summary(listings_2021_ppp)
Planar point pattern: 4238 points
Average intensity 5.114513e-06 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [7406.99, 43337.89] x [25330, 48391.55] units
(35930 x 23060 units)
Window area = 828622000 square units
summary(hotels_ppp)
Planar point pattern: 422 points
Average intensity 5.58414e-07 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [5939.24, 45334.18] x [25379.44, 44562.4] units
(39390 x 19180 units)
Window area = 755712000 square units
summary(tourism_ppp)
Planar point pattern: 106 points
Average intensity 1.328016e-07 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [11380.23, 43659.54] x [22869.34, 47596.73] units
(32280 x 24730 units)
Window area = 798183000 square units
# self-sourced data for further visualisations
summary(bus_ppp)
Planar point pattern: 5155 points
Average intensity 4.435473e-06 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [4427.94, 48282.5] x [26482.1, 52983.82] units
(43850 x 26500 units)
Window area = 1162220000 square units
summary(taxi_ppp)
Planar point pattern: 352 points
Average intensity 4.198058e-07 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [6444.85, 45468.51] x [27461.92, 48948.45] units
(39020 x 21490 units)
Window area = 838483000 square units
summary(museums_ppp)
Planar point pattern: 54 points
Average intensity 9.588485e-08 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [10582.71, 48388.12] x [25661.57, 40558.26] units
(37810 x 14900 units)
Window area = 563176000 square units
summary(monuments_ppp)
Planar point pattern: 72 points
Average intensity 1.11782e-07 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [11285, 43169.72] x [27467.96, 47669.2] units
(31880 x 20200 units)
Window area = 644111000 square units
summary(moneychangers_ppp)
Planar point pattern: 313 points
Average intensity 4.017631e-07 points per square unit
*Pattern contains duplicated points*
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [10725.31, 45404.24] x [25297.94, 47763.05] units
(34680 x 22470 units)
Window area = 779066000 square units
summary(historic_sites_ppp)
Planar point pattern: 99 points
Average intensity 1.407695e-07 points per square unit
Coordinates are given to 3 decimal places
i.e. rounded to the nearest multiple of 0.001 units
Window: rectangle = [13080.65, 45498.71] x [26138.63, 47832.64] units
(32420 x 21690 units)
Window area = 703278000 square units
From our output, we can summarise:
Though we’ve found which ppp objects have duplicates, it’s always best to double-check and verify. We can use duplicated() for this:
# looks if there is any instance of duplicated points in the ppp object
# returns TRUE if there is even a sinngle set of duplicated points
# should be TRUE i.e. have duplicated
any(duplicated(listings_2019_ppp))
[1] TRUE
any(duplicated(listings_2021_ppp))
[1] TRUE
any(duplicated(hotels_ppp))
[1] TRUE
any(duplicated(tourism_ppp))
[1] TRUE
any(duplicated(taxi_ppp))
[1] TRUE
any(duplicated(moneychangers_ppp))
[1] TRUE
# should be FALSE i.e. no duplicates
any(duplicated(rail_network_ppp))
[1] FALSE
any(duplicated(bus_ppp))
[1] FALSE
any(duplicated(museums_ppp))
[1] FALSE
any(duplicated(monuments_ppp))
[1] FALSE
any(duplicated(historic_sites_ppp))
[1] FALSE
# we can also use multiplicity() to count the number of co-incidence points
# to get the number of locations with duplicated events, use this:
# sum(multiplicity(rail_network_ppp)>1)
Yup, this is congruent with what we discovered earlier. Our first instinct might be to delete said duplicated points - but deleting them, we might be ignoring repeat occurrences, which could be valuable data + insights. However, never fear - there’s a better method to handle these duplicated points called jittering, where we add a small perturbation to the duplicate points so that they do not occupy the exact same space.
listings_2019_ppp_jit <- rjitter(listings_2019_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
listings_2021_ppp_jit <- rjitter(listings_2021_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
hotels_ppp_jit <- rjitter(hotels_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
tourism_ppp_jit <- rjitter(tourism_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
taxi_ppp_jit <- rjitter(taxi_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
moneychangers_ppp_jit <- rjitter(moneychangers_ppp,
retry=TRUE,
nsim=1,
drop=TRUE)
any(duplicated(listings_2019_ppp_jit))
[1] FALSE
any(duplicated(listings_2021_ppp_jit))
[1] FALSE
any(duplicated(hotels_ppp_jit))
[1] FALSE
any(duplicated(tourism_ppp_jit))
[1] FALSE
any(duplicated(taxi_ppp_jit))
[1] FALSE
any(duplicated(moneychangers_ppp_jit))
[1] FALSE
Awesome! ✨
Usually, when analysing spatial point patterns, we’ll confine our analysis within a certain geographical area - such as the Singapore boundary. In spatstat, an object called owin is specially designed to represent this polygonal region. This is also why I mentioned in 5.3 that there was no need (nor existing functions) to convert our ‘base maps’ into ppp - because they’re meant to be converted into owin instead.
# using sg_sp as it is the CostalOutline of Singapore, i.e. the whole island's boundary
sg_owin <- as(sg_sp, "owin")
plot(sg_owin)
Now, we’ll extract the relevant events that are located within the Singapore:
rail_network_ppp_sg = rail_network_ppp[sg_owin]
listings_2019_ppp_sg = listings_2019_ppp_jit[sg_owin]
listings_2021_ppp_sg = listings_2021_ppp_jit[sg_owin]
hotels_ppp_sg = hotels_ppp_jit[sg_owin]
tourism_ppp_sg = tourism_ppp_jit[sg_owin]
bus_ppp_sg = bus_ppp[sg_owin]
taxi_ppp_sg = taxi_ppp_jit[sg_owin]
museums_ppp_sg = museums_ppp[sg_owin]
monuments_ppp_sg = monuments_ppp[sg_owin]
moneychangers_ppp_sg = moneychangers_ppp_jit[sg_owin]
historic_sites_ppp_sg = historic_sites_ppp[sg_owin]
Now, let’s take a look at the visualisation:
The graphs are a little cluttered at the moment, but we’ll slowly sort out the relationships in our analysis! Another thing to note - it seems that for most part, places of interest (tourism, hotels, moneychangers etc.) are centered around the Central Region. That’s worth looking into when we’re narrowing down areas for analysis!
As we learned from our Hands-On Ex 5B, we’ll need to use rescale() of the spatstat package, so as to convert meters (which SVY21 uses) into kilometers (our desired unit of measurement).
# essentially, we're telling the function 'new unit length is 1000'
# function will divide the old value by 1000 to obtain values expressed in kilometers
# while this section isn't using listings_2021 yet, we'll prep it with the rest of our data :^)
rail_network_ppp_sg_km <- rescale(rail_network_ppp_sg, 1000, 'km')
listings_2019_ppp_sg_km <- rescale(listings_2019_ppp_sg, 1000, 'km')
listings_2021_ppp_sg_km <- rescale(listings_2021_ppp_sg, 1000, 'km')
hotels_ppp_sg_km <- rescale(hotels_ppp_sg, 1000, 'km')
tourism_ppp_sg_km <- rescale(tourism_ppp_sg, 1000, 'km')
bus_ppp_sg_km <- rescale(bus_ppp_sg, 1000, 'km')
taxi_ppp_sg_km <- rescale(taxi_ppp_sg, 1000, 'km')
museums_ppp_sg_km <- rescale(museums_ppp_sg, 1000, 'km')
monuments_ppp_sg_km <- rescale(monuments_ppp_sg, 1000, 'km')
moneychangers_ppp_sg_km <- rescale(moneychangers_ppp_sg, 1000, 'km')
historic_sites_ppp_sg_km <- rescale(historic_sites_ppp_sg, 1000, 'km')
As we went through in Hands-On Ex 04, we have the choice of using either bw.ppl or bw.diggle - the former is recommended for patterns comprised primarily of tight clusters, while the latter is good for detecting a single tight cluster in the midst of random noise. You can find out more (with illustrated examples!) in this mapping guide.
I’ve decided to go with bw.diggle for my analysis: my justification being that from what I see of our prior visualisations, there is a significant cluster in the Central region, while other regions have a relatively more widespread distribution of location factors - as such, I do not believe that applying an algorithm more suited for patterns “comprised primarily of tight clusters” would be appropriate for our data.
# automatic bandwidth method: bw.diggle
# smoothing kernel: gaussian (default)
kde_rail_network_sg_bw <- density(rail_network_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_listings_2019_sg_bw <- density(listings_2019_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_listings_2021_sg_bw <- density(listings_2021_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_hotels_sg_bw <- density(hotels_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_tourism_sg_bw <- density(tourism_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_bus_sg_bw <- density(bus_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_taxi_sg_bw <- density(taxi_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_museums_sg_bw <- density(museums_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_monuments_sg_bw <- density(monuments_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_moneychangers_sg_bw <- density(moneychangers_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_historic_sites_sg_bw <- density(historic_sites_ppp_sg_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_listings_2019_sg_bw)
plot(kde_tourism_sg_bw)
Now, we have to convert our KDE outputs into RasterLayer objects. Since we can’t do that directly, we’ll need to convert them into a SpatialGridDataFrame first, then convert the SpatialGridDataFrame into RasterLayer objects:
# convert into a SpatialGridDataFrame
# then converts gridded output into raster
kde_listings_2019_sg_bw_raster <- kde_listings_2019_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_listings_2021_sg_bw_raster <- kde_listings_2021_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_rail_network_sg_bw_raster <- kde_rail_network_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_hotels_sg_bw_raster <- kde_hotels_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_tourism_sg_bw_raster <- kde_tourism_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
# self-sourced
kde_bus_sg_bw_raster <- kde_bus_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_taxi_sg_bw_raster <- kde_taxi_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_museums_sg_bw_raster <- kde_museums_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_monuments_sg_bw_raster <- kde_monuments_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_moneychangers_sg_bw_raster <- kde_moneychangers_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_historic_sites_sg_bw_raster <- kde_historic_sites_sg_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
Now, let’s check the properties of one of our RasterLayers:
kde_listings_2019_sg_bw_raster
class : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.4170614, 0.2647348 (x, y)
extent : 2.663926, 56.04779, 16.35798, 50.24403 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -2.144823e-13, 1766.134 (min, max)
Note that the crs
property is NA - let’s do something about that. We’ll assign projection systems again:
# assigns CRS as ESPG Code 3414 aka the SVY21 projection system
# while also specifying the units as km, since we rescaled it in the earlier section
projection(kde_listings_2019_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_listings_2021_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_rail_network_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_hotels_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_tourism_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_bus_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_taxi_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_museums_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_monuments_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_moneychangers_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_historic_sites_sg_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
Now, as the assignment requirements has specified, we should plot our kernel density maps on OpenStreetMap, aiming to describe the spatial patterns revealed as well as highlight the advantage of kernel density map over point map. Since we’ll be plotting a lot of kernel density maps, let’s create a function:
density_map <- function(raster_object, map_title) {
tm_basemap("OpenStreetMap") +
tm_shape(raster_object) +
tm_raster("v", alpha=0.65) +
tm_layout(legend.position = c("right", "bottom"),
legend.height = 0.5,
legend.width = 0.4,
main.title = map_title,
main.title.position = 'center',
main.title.size = 1,
frame = FALSE)
}
Now, let’s plot!
# input: a raster object and the map title
listings_2019_density_map <- density_map(kde_listings_2019_sg_bw_raster, map_title = "June 2019 SG Airbnb Listings Density Map")
listings_2021_density_map <- density_map(kde_listings_2021_sg_bw_raster, map_title = "June 2021 SG Airbnb Listings Density Map")
rail_network_density_map <- density_map(kde_rail_network_sg_bw_raster, map_title = "SG Rail Network Density Map")
hotels_density_map <- density_map(kde_hotels_sg_bw_raster, map_title = "SG Hotels Density Map")
tourism_density_map <- density_map(kde_tourism_sg_bw_raster, map_title = "SG Tourist Attractions Density Map")
#self-sourced
bus_density_map <- density_map(kde_bus_sg_bw_raster, map_title = "SG Bus Stops Density Map")
taxi_density_map <- density_map(kde_taxi_sg_bw_raster, map_title = "SG Taxi Stands Density Map")
museums_density_map <- density_map(kde_museums_sg_bw_raster, map_title = "SG Museums Density Map")
monuments_density_map <- density_map(kde_monuments_sg_bw_raster, map_title = "SG Monuments Density Map")
moneychangers_density_map <- density_map(kde_moneychangers_sg_bw_raster, map_title = "SG Moneychangers Density Map")
historic_sites_density_map <- density_map(kde_historic_sites_sg_bw_raster, map_title = "SG Historic Sites Density Map")
listings_2019_density_map
hotels_density_map
tourism_density_map
rail_network_density_map
tmap_arrange(rail_network_density_map,
bus_density_map,
taxi_density_map)
tmap_arrange(museums_density_map,
monuments_density_map,
moneychangers_density_map,
historic_sites_density_map)
As we can see from our maps above, the airbnb listings in 2019 seem to be heavily concentrated in the central region - a trend that is mirrored in the ‘hotels’ and ‘tourism’ maps. This pattern also seems to be present in our self-sourced maps - which reflects the real world, where Singapore’s Central/Downtown Area is a hotspot for various points of interest for tourists, and thus related facilities (hotels + moneychangers) would be present as well.
In addition, you might be wondering, “why use a kernel density map over the point maps?”
There are a few advantages that Kernel Density Map has over Point Map. Let’s compare our the point map for the rail network (MRT & LRT stations) against our just-plotted density map:
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(rail_network_sf) +
tm_dots(col="red", size=0.05) +
tm_layout(main.title = "Rail Network Point Map",
main.title.position = "center",
main.title.size = 1.2,
frame = TRUE)
rail_network_density_map
With the kernel density map, denser areas with a heavier distribution of MRTs are easily spotted - such as the LRT-heavy areas as well as the Central region. This is because the kernel density z-estimate helps to smooth out the points in a given area - a much larger ‘smoothing effect’ than point density, which in turn is helped by the visualisation that kernel density maps have: its range for distributions.concentrations is clearly indicated by the gradient of colour - in my case, ranging from yellow to green for the different bands. Compare this against the individual points in a point map, which at most shows concentration of certain areas, but makes it hard for the viewer to comprehensively compare the distributions of different regions.
From our KDE maps above, we’ve learned that the Central Region has a high distribution of listings and various location factors. However, we can’t glean information about the region itself from our maps, so let’s narrow down on the area and get to making new ones!
Now, we should use our planning subzone dataset, specifying that the REGION_N
should be CENTRAL REGION
. We can also choose to narrow down on specific planning areas with PLN_AREA_N
:
central = mpsz_sf[mpsz_sf$REGION_N=='CENTRAL REGION',]
central_owin <- central %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
plot(central_owin)
Now, we’ll extract the relevant events that are located within the central region:
rail_network_ppp_central = rail_network_ppp[central_owin]
listings_2019_ppp_central = listings_2019_ppp_jit[central_owin]
listings_2021_ppp_central = listings_2021_ppp_jit[central_owin]
hotels_ppp_central = hotels_ppp_jit[central_owin]
tourism_ppp_central = tourism_ppp_jit[central_owin]
bus_ppp_central = bus_ppp[central_owin]
taxi_ppp_central = taxi_ppp_jit[central_owin]
museums_ppp_central = museums_ppp[central_owin]
monuments_ppp_central = monuments_ppp[central_owin]
moneychangers_ppp_central = moneychangers_ppp_jit[central_owin]
historic_sites_ppp_central = historic_sites_ppp[central_owin]
Once again, we’ll need to convert meters (which SVY21 uses) into kilometers (our desired unit of measurement).
# essentially, we're telling the function 'new unit length is 1000'
# function will divide the old value by 1000 to obtain values expressed in kilometers
# while this section isn't using listings_2021 yet, we'll prep it with the rest of our data :^)
rail_network_ppp_central_km <- rescale(rail_network_ppp_central, 1000, 'km')
listings_2019_ppp_central_km <- rescale(listings_2019_ppp_central, 1000, 'km')
listings_2021_ppp_central_km <- rescale(listings_2021_ppp_central, 1000, 'km')
hotels_ppp_central_km <- rescale(hotels_ppp_central, 1000, 'km')
tourism_ppp_central_km <- rescale(tourism_ppp_central, 1000, 'km')
bus_ppp_central_km <- rescale(bus_ppp_central, 1000, 'km')
taxi_ppp_central_km <- rescale(taxi_ppp_central, 1000, 'km')
museums_ppp_central_km <- rescale(museums_ppp_central, 1000, 'km')
monuments_ppp_central_km <- rescale(monuments_ppp_central, 1000, 'km')
moneychangers_ppp_central_km <- rescale(moneychangers_ppp_central, 1000, 'km')
historic_sites_ppp_central_km <- rescale(historic_sites_ppp_central, 1000, 'km')
# automatic bandwidth method: bw.diggle
# smoothing kernel: gaussian (default)
kde_rail_network_central_bw <- density(rail_network_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_listings_2019_central_bw <- density(listings_2019_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_listings_2021_central_bw <- density(listings_2021_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_hotels_central_bw <- density(hotels_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_tourism_central_bw <- density(tourism_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_bus_central_bw <- density(bus_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_taxi_central_bw <- density(taxi_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_museums_central_bw <- density(museums_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_monuments_central_bw <- density(monuments_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_moneychangers_central_bw <- density(moneychangers_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
kde_historic_sites_central_bw <- density(historic_sites_ppp_central_km,
sigma=bw.diggle,
edge=TRUE,
kernel="gaussian")
plot(kde_listings_2019_central_bw)
plot(kde_tourism_central_bw)
Now, we have to convert our KDE outputs into RasterLayer objects. Since we can’t do that directly, we’ll need to convert them into a SpatialGridDataFrame first, then convert the SpatialGridDataFrame into RasterLayer objects:
# convert into a SpatialGridDataFrame
# then converts gridded output into raster
kde_listings_2019_central_bw_raster <- kde_listings_2019_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_listings_2021_central_bw_raster <- kde_listings_2021_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_rail_network_central_bw_raster <- kde_rail_network_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_hotels_central_bw_raster <- kde_hotels_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_tourism_central_bw_raster <- kde_tourism_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
# self-sourced
kde_bus_central_bw_raster <- kde_bus_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_taxi_central_bw_raster <- kde_taxi_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_museums_central_bw_raster <- kde_museums_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_monuments_central_bw_raster <- kde_monuments_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_moneychangers_central_bw_raster <- kde_moneychangers_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
kde_historic_sites_central_bw_raster <- kde_historic_sites_central_bw %>%
as.SpatialGridDataFrame.im() %>%
raster()
Now, let’s check the properties of one of our RasterLayers:
kde_listings_2019_central_bw_raster
class : RasterLayer
dimensions : 128, 128, 16384 (nrow, ncol, ncell)
resolution : 0.1471859, 0.1344767 (x, y)
extent : 18.7524, 37.5922, 21.67835, 38.89137 (xmin, xmax, ymin, ymax)
crs : NA
source : memory
names : v
values : -5.6727e-13, 3536.411 (min, max)
Note that the crs
property is NA - let’s do something about that. We’ll assign projection systems again:
# assigns CRS as ESPG Code 3414 aka the SVY21 projection system
# while also specifying the units as km, since we rescaled it in the earlier section
projection(kde_listings_2019_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_listings_2021_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_rail_network_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_hotels_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_tourism_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_bus_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_taxi_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_museums_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_monuments_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_moneychangers_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
projection(kde_historic_sites_central_bw_raster) <- CRS("+init=EPSG:3414 +datum=WGS84 +units=km")
# input: a raster object and the map title
listings_2019_density_map_central <- density_map(kde_listings_2019_central_bw_raster, map_title = "June 2019 SG Airbnb Listings Density Map")
listings_2021_density_map_central <- density_map(kde_listings_2021_central_bw_raster, map_title = "June 2021 SG Airbnb Listings Density Map")
rail_network_density_map_central <- density_map(kde_rail_network_central_bw_raster, map_title = "SG Rail Network Density Map")
hotels_density_map_central <- density_map(kde_hotels_central_bw_raster, map_title = "SG Hotels Density Map")
tourism_density_map_central <- density_map(kde_tourism_central_bw_raster, map_title = "SG Tourist Attractions Density Map")
#self-sourced
bus_density_map_central <- density_map(kde_bus_central_bw_raster, map_title = "SG Bus Stops Density Map")
taxi_density_map_central <- density_map(kde_taxi_central_bw_raster, map_title = "SG Taxi Stands Density Map")
museums_density_map_central <- density_map(kde_museums_central_bw_raster, map_title = "SG Museums Density Map")
monuments_density_map_central <- density_map(kde_monuments_central_bw_raster, map_title = "SG Monuments Density Map")
moneychangers_density_map_central <- density_map(kde_moneychangers_central_bw_raster, map_title = "SG Moneychangers Density Map")
historic_sites_density_map_central <- density_map(kde_historic_sites_central_bw_raster, map_title = "SG Historic Sites Density Map")
listings_2019_density_map_central
hotels_density_map_central
tourism_density_map_central
rail_network_density_map_central
tmap_arrange(rail_network_density_map_central,
bus_density_map_central,
taxi_density_map_central)
tmap_arrange(museums_density_map_central,
monuments_density_map_central,
moneychangers_density_map_central,
historic_sites_density_map_central)
With the area focused on the central region, the distribution of the various location factors is a lot clearer! We can clearly see that the transport networks are centered right in the heart of the central region area - a quick comparison to an interactive Singapore planning subzone map or the SingStat Planning Areas/Subzones Map tells us that the planning areas with the highest density of location factors are in the “Central Area”, such as:
In comparison, other Central Region planning areas like Kallang and Novena (that are actually adjacent to the Central Area) do not have as dense for these location factorss.
What about the tourist-specific location factors like museum and monuments? From a glance, we can see that they’re mostly centered in the Central Area - with a surprising mini-cluster of museums in Queenstown (far left).
However, compare all of this to the listings - what we can glean is that it actually seems, at least visually, “separate” from the transit-specific and tourist-specific location factors - with clusters appearing in Geylang and Kallang (alongside the usual Central Area cluster). This actually mirrors the hotels distribution, where similar clusters in the same area are visible. This unequal distribution of listings within Singapore and even within the central region itself could point towards there being less of a relationship between the listings and the location factors than initially thought. However, keep in mind that there is still an extremely high distribution of location factors in the central area where listings appeared, and in the real-world context, listings are also constrained by the existing residential areas around said location factors.
Now that we’ve analysed our spatial point patterns, we have to confirm our observations statistically - which is where hypothesis testing comes in. I’d like to specifically narrow down on my observations for the central region - especially since it’s where we’ve found a higher concentration of our location factors and listings.
The chosen planning subzones (PLN_AREA_N
) for comparison are: - Downtown Core (higher concentration of both listings and location factors) - Kallang (higher concentration for listings, less so for location factors) - Bukit Timah (randomly chosen from the visually ‘lower-concentration’ listings planning areas)
Like what we’ve done just now, we’ve got to make owin objects:
downtown_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "DOWNTOWN CORE",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
kallang_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "KALLANG",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
bukit_timah_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "BUKIT TIMAH",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
Then, we’ll need to combine our point events object and owin object:
downtown_2019_ppp <- listings_2019_ppp[downtown_owin]
kallang_2019_ppp <- listings_2019_ppp[kallang_owin]
bukit_timah_2019_ppp <- listings_2019_ppp[bukit_timah_owin]
Let’s check their visualisation:
The G function measures the distribution of the distances from an arbitrary event to its nearest event - we’ll use Gest() function for this, as well as perform the Monte Carlo Simulation test with envelope().
Firstly, we have to compute the g-function estimation:
G_Downtown = Gest(downtown_2019_ppp, correction = "border")
plot(G_Downtown)
Now, we can perform the spatial randomness test:
G_Downtown.csr <- envelope(downtown_2019_ppp, Gest, nsim=100)
Generating 100 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
Done.
plot(G_Downtown.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Downtown Core area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Downtown Core area are randomly distributed at 99% confident interval.
G_Kallang = Gest(kallang_2019_ppp, correction = "border")
plot(G_Kallang)
Now, we can perform the spatial randomness test:
G_Kallang.csr <- envelope(kallang_2019_ppp, Gest, nsim=100)
Generating 100 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
Done.
plot(G_Kallang.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Kallang area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Kallang area are randomly distributed at 99% confident interval.
G_Bukit_Timah = Gest(bukit_timah_2019_ppp, correction = "border")
plot(G_Bukit_Timah)
Now, we can perform the spatial randomness test:
G_Bukit_Timah.csr <- envelope(bukit_timah_2019_ppp, Gest, nsim=100)
Generating 100 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70,
71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100.
Done.
plot(G_Bukit_Timah.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings in the Bukit Timah area are clustered. Hence, we reject the null hypothesis that Airbnb listings in the Bukit Timah area are randomly distributed at 99% confident interval.
As we can see, for our selected planning areas in the Central Region, we can conclude with 99% confidence (and 1% risk of our conclusion being unsubstantiated/invalid) that Airbnb listings in said areas are not randomly distributed.
Before we move on to Section B, we should also check for the distribution of listings across the entire region.
G_sg = Gest(listings_2019_ppp_sg, correction = "border")
plot(G_sg)
Now, we can perform the spatial randomness test:
# due to the sheer size of this, it's computationally expensive
# thus number of simulations have been set to 50 instead of the usual 100
G_sg.csr <- envelope(listings_2019_ppp_sg, Gest, nsim=50)
Generating 50 simulations of CSR ...
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50.
Done.
plot(G_sg.csr)
Conclusion: The observed G(r) is far above the G(theo) as well as the envelope - indicating that Airbnb listings across the island are clustered. Hence, we reject the null hypothesis that Airbnb listings in Singapore are randomly distributed at 99% confident interval.
Now that we have an analysis of pre-COVID-19, 2019 Airbnb listings in Singapore, we should compare at with post-COVID-19 2021 Airbnb listings so as to see the impact COVID-19 has had on the business. Our first task is to derive kernel density maps of all Airbnb listings and Airbnb by room type as at June 2019 and June 2021!
Something to note: do you still remember our Hands-On Ex 5B? Notice something we can apply from that analysis to this one?
Yup, you guessed it: our “room type” is the associated categorical measurement for our listing events! In other words, they’re marks - and we can perform marked point pattern analysis on them.
Let’s first take a look at the Airbnb listings by room type for both years to get a feel for our data:
# set to interactive mode
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(listings_2019_sf) +
tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
palette = c("#e76f51", "#e9c46a","#2a9d8f"))
# set to plot mode
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(listings_2019_sf) +
tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
palette = c("#e76f51", "#e9c46a","#2a9d8f")) +
tm_facets(by="room_type") +
tm_layout(main.title = "Airbnb Listings by Room Type (2019)",
main.title.position = "left",
main.title.size = 1.2,
frame = TRUE)
Like what we discovered in our previous analysis from Section A, the Central Region has the highest concentration of Airbnb listings - and of those listings, the most frequently-occurring room type is the “Entire home/apt”. Yet outside of the Central Region, you might notice that majority of the listings are in the “Private room” category instead. “Shared room” is the least frequently listed. Let’s see if this trend holds for 2021:
# set to interactive mode
tmap_mode("view")
tm_basemap("OpenStreetMap") +
tm_shape(listings_2021_sf) +
tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
# as columns are in alphabetical order, i've added the hotel's blue colour #264653 as the second input. you can switch this out with your own!
palette = c("#e76f51", "#264653", "#e9c46a","#2a9d8f"))
# set to plot mode
tmap_mode("plot")
tm_shape(mpsz_sf) +
tm_borders(alpha = 0.5) +
tmap_options(check.and.fix = TRUE) +
tm_shape(listings_2021_sf) +
tm_dots(col = 'room_type', size = 0.02, title="Room Type" ,alpha=0.6,
palette = c("#e76f51", "#264653", "#e9c46a","#2a9d8f")) +
tm_facets(by="room_type") +
tm_layout(main.title = "Airbnb Listings by Room Type (2021)",
main.title.position = "left",
main.title.size = 1.2,
frame = TRUE)
It’s to no one’s surprise that there’s a significant drop in the volume of airbnb listings for 2021, in the context of COVID-19 and the concerns it raises with regards to home-sharing. Yet the loss of rooms comes with the rise of a new room type: the “Hotel room” category. A bit of research tells us that Airbnb pushed this initiative in Dec 2018, giving better visibility to hotel owners as part of its venture into the hotel sector. Additionally, in accordance with 2019 trends, it seems that rooms in Central Region are still the most frequently-listed.
From our initial visualisation in 9.1, we note that there are 4 room types:
We’d like to look at the distribution for each of the room types, so let’s go ahead and create dataframes for them:
private_2019 <- listings_2019_sf[listings_2019_sf$room_type == 'Private room',]
shared_2021 <- listings_2019_sf[listings_2019_sf$room_type == 'Shared room',]
apt_2019 <- listings_2019_sf[listings_2019_sf$room_type == 'Entire home/apt',]
private_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Private room',]
shared_2019 <- listings_2021_sf[listings_2021_sf$room_type == 'Shared room',]
apt_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Entire home/apt',]
hotel_2021 <- listings_2021_sf[listings_2021_sf$room_type == 'Hotel room',]
Now, let’s visualise them in a side-by-side comparison:
Private rooms, visually, almost seem on par with the “entire home/apt” room type - we can see a significant drop in the distribution of private rooms across the region, but there’s still a rather strong presence in the Central Region area.
For entire homes/apartments, the densest of the four, there’s a significant decrease from 2019 to 2020 - and it’s noteworthy that the North/Northeast regions seem relatively clear of listings of this category.
Lastly, while the hotel room has no 2019 comparison, we can glean that it’s mostly centered upon the Central Region again - most likely the Downtown Core.
Before we head on to our marked point pattern analysis, we have to check our marked data first:
# returns the various features and the data type the feature is in e.g. num, chr...
str(listings_2019)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 8293 obs. of 14 variables:
.. ..$ id : num [1:8293] 49091 50646 56334 71609 71896 ...
.. ..$ name : chr [1:8293] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
.. ..$ host_id : num [1:8293] 266763 227796 266763 367042 367042 ...
.. ..$ host_name : chr [1:8293] "Francesca" "Sujatha" "Francesca" "Belinda" ...
.. ..$ neighbourhood_group : chr [1:8293] "North Region" "Central Region" "North Region" "East Region" ...
.. ..$ neighbourhood : chr [1:8293] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
.. ..$ room_type : chr [1:8293] "Private room" "Private room" "Private room" "Private room" ...
.. ..$ price : num [1:8293] 81 80 68 200 92 102 203 120 51 58 ...
.. ..$ minimum_nights : num [1:8293] 180 90 6 1 1 1 1 7 30 30 ...
.. ..$ number_of_reviews : num [1:8293] 1 18 20 12 20 35 23 15 174 198 ...
.. ..$ last_review : Date[1:8293], format: "2013-10-21" ...
.. ..$ reviews_per_month : num [1:8293] 0.01 0.28 0.21 0.13 0.21 0.35 0.23 0.16 1.92 2.13 ...
.. ..$ calculated_host_listings_count: num [1:8293] 2 1 2 9 9 9 9 1 4 4 ...
.. ..$ availability_365 : num [1:8293] 365 365 365 353 353 348 155 365 229 206 ...
..@ coords.nrs : num(0)
..@ coords : num [1:8293, 1:2] 23825 22646 23922 41778 42057 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..@ bbox : num [1:2, 1:2] 7216 25166 44098 49226
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
.. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n BASEGEOGCRS[\"SVY21\",\n DATUM[\"SVY21\",\n ELLIPSOID["| __truncated__
str(listings_2021)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 4238 obs. of 14 variables:
.. ..$ id : num [1:4238] 49091 50646 56334 71609 71896 ...
.. ..$ name : chr [1:4238] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
.. ..$ host_id : num [1:4238] 266763 227796 266763 367042 367042 ...
.. ..$ host_name : chr [1:4238] "Francesca" "Sujatha" "Francesca" "Belinda" ...
.. ..$ neighbourhood_group : chr [1:4238] "North Region" "Central Region" "North Region" "East Region" ...
.. ..$ neighbourhood : chr [1:4238] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
.. ..$ room_type : chr [1:4238] "Private room" "Private room" "Private room" "Private room" ...
.. ..$ price : num [1:4238] 81 80 67 177 81 81 52 40 72 41 ...
.. ..$ minimum_nights : num [1:4238] 180 90 6 90 90 90 14 14 90 8 ...
.. ..$ number_of_reviews : num [1:4238] 1 18 20 20 24 48 20 13 133 105 ...
.. ..$ last_review : Date[1:4238], format: "2013-10-21" ...
.. ..$ reviews_per_month : num [1:4238] 0.01 0.22 0.16 0.29 0.34 0.67 0.2 0.16 1.3 1.78 ...
.. ..$ calculated_host_listings_count: num [1:4238] 2 1 2 4 4 4 50 50 7 1 ...
.. ..$ availability_365 : num [1:4238] 365 365 365 365 365 365 364 364 365 89 ...
..@ coords.nrs : num(0)
..@ coords : num [1:4238, 1:2] 23815 22646 23924 41973 42052 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..@ bbox : num [1:2, 1:2] 7407 25330 43338 48392
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
.. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n BASEGEOGCRS[\"SVY21\",\n DATUM[\"SVY21\",\n ELLIPSOID["| __truncated__
As we are working with marked data, and we know that the values are categorical (different room types), we need to ensure that the marked field is of factor data type. However, as seen from the output, our room_type
field is of chr data type, not factor! Let’s rectify that with the as.factor() function:
listings_2019@data$room_type <-as.factor(listings_2019@data$room_type)
listings_2021@data$room_type <-as.factor(listings_2021@data$room_type)
str(listings_2019)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 8293 obs. of 14 variables:
.. ..$ id : num [1:8293] 49091 50646 56334 71609 71896 ...
.. ..$ name : chr [1:8293] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
.. ..$ host_id : num [1:8293] 266763 227796 266763 367042 367042 ...
.. ..$ host_name : chr [1:8293] "Francesca" "Sujatha" "Francesca" "Belinda" ...
.. ..$ neighbourhood_group : chr [1:8293] "North Region" "Central Region" "North Region" "East Region" ...
.. ..$ neighbourhood : chr [1:8293] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
.. ..$ room_type : Factor w/ 3 levels "Entire home/apt",..: 2 2 2 2 2 2 2 2 2 2 ...
.. ..$ price : num [1:8293] 81 80 68 200 92 102 203 120 51 58 ...
.. ..$ minimum_nights : num [1:8293] 180 90 6 1 1 1 1 7 30 30 ...
.. ..$ number_of_reviews : num [1:8293] 1 18 20 12 20 35 23 15 174 198 ...
.. ..$ last_review : Date[1:8293], format: "2013-10-21" ...
.. ..$ reviews_per_month : num [1:8293] 0.01 0.28 0.21 0.13 0.21 0.35 0.23 0.16 1.92 2.13 ...
.. ..$ calculated_host_listings_count: num [1:8293] 2 1 2 9 9 9 9 1 4 4 ...
.. ..$ availability_365 : num [1:8293] 365 365 365 353 353 348 155 365 229 206 ...
..@ coords.nrs : num(0)
..@ coords : num [1:8293, 1:2] 23825 22646 23922 41778 42057 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..@ bbox : num [1:2, 1:2] 7216 25166 44098 49226
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
.. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n BASEGEOGCRS[\"SVY21\",\n DATUM[\"SVY21\",\n ELLIPSOID["| __truncated__
str(listings_2021)
Formal class 'SpatialPointsDataFrame' [package "sp"] with 5 slots
..@ data :'data.frame': 4238 obs. of 14 variables:
.. ..$ id : num [1:4238] 49091 50646 56334 71609 71896 ...
.. ..$ name : chr [1:4238] "COZICOMFORT LONG TERM STAY ROOM 2" "Pleasant Room along Bukit Timah" "COZICOMFORT" "Ensuite Room (Room 1 & 2) near EXPO" ...
.. ..$ host_id : num [1:4238] 266763 227796 266763 367042 367042 ...
.. ..$ host_name : chr [1:4238] "Francesca" "Sujatha" "Francesca" "Belinda" ...
.. ..$ neighbourhood_group : chr [1:4238] "North Region" "Central Region" "North Region" "East Region" ...
.. ..$ neighbourhood : chr [1:4238] "Woodlands" "Bukit Timah" "Woodlands" "Tampines" ...
.. ..$ room_type : Factor w/ 4 levels "Entire home/apt",..: 3 3 3 3 3 3 3 3 3 3 ...
.. ..$ price : num [1:4238] 81 80 67 177 81 81 52 40 72 41 ...
.. ..$ minimum_nights : num [1:4238] 180 90 6 90 90 90 14 14 90 8 ...
.. ..$ number_of_reviews : num [1:4238] 1 18 20 20 24 48 20 13 133 105 ...
.. ..$ last_review : Date[1:4238], format: "2013-10-21" ...
.. ..$ reviews_per_month : num [1:4238] 0.01 0.22 0.16 0.29 0.34 0.67 0.2 0.16 1.3 1.78 ...
.. ..$ calculated_host_listings_count: num [1:4238] 2 1 2 4 4 4 50 50 7 1 ...
.. ..$ availability_365 : num [1:4238] 365 365 365 365 365 365 364 364 365 89 ...
..@ coords.nrs : num(0)
..@ coords : num [1:4238, 1:2] 23815 22646 23924 41973 42052 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : NULL
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
..@ bbox : num [1:2, 1:2] 7407 25330 43338 48392
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:2] "coords.x1" "coords.x2"
.. .. ..$ : chr [1:2] "min" "max"
..@ proj4string:Formal class 'CRS' [package "sp"] with 1 slot
.. .. ..@ projargs: chr "+proj=tmerc +lat_0=1.36666666666667 +lon_0=103.833333333333 +k=1 +x_0=28001.642 +y_0=38744.572 +ellps=WGS84 +to"| __truncated__
.. .. ..$ comment: chr "PROJCRS[\"SVY21 / Singapore TM\",\n BASEGEOGCRS[\"SVY21\",\n DATUM[\"SVY21\",\n ELLIPSOID["| __truncated__
Success!
Let’s convert our SpatialPointsDataFrame into ppp format with as.(x, “ppp”) or as.ppp(x). In doing so, the additional field in x data frame will become the marks of the point pattern z.
listings_2019_marked_ppp <- as(listings_2019, "ppp")
listings_2021_marked_ppp <- as(listings_2021, "ppp")
To avoid duplicated spatial point event issues, we’ll use jittering:
listings_2019_marked_ppp_jit <- rjitter(listings_2019_marked_ppp, retry=TRUE, nsim=1, drop=TRUE)
listings_2021_marked_ppp_jit <- rjitter(listings_2021_marked_ppp, retry=TRUE, nsim=1, drop=TRUE)
Are there any duplicated spatial point events? Let’s check:
any(duplicated(listings_2019_marked_ppp_jit))
[1] FALSE
any(duplicated(listings_2021_marked_ppp_jit))
[1] FALSE
Yay! 😄 Our duplicated points issue has been resolved!
This time, we’ve learned from our initial visualisations that there are clusters for different room types in different regions. As such, unlike in Section A where the high density of location factors and listings in the Central Region encouraged us to focus mainly on planning subzones in the Central Region, in this Section B, our focus is wider. Here are the four main planning areas we will look at:
Like what we’ve done just now, we’ve got to make owin objects:
# this one is a repeat, but I'm putting it here for reference!
downtown_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "DOWNTOWN CORE",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
jurong_west_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "JURONG WEST",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
sembawang_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "SEMBAWANG",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
pasir_ris_owin <- mpsz_sf[mpsz_sf$PLN_AREA_N == "PASIR RIS",] %>%
as('Spatial') %>%
as('SpatialPolygons') %>%
as('owin')
Then combine our point events object and owin object:
downtown_2019_marked_ppp <- listings_2019_marked_ppp_jit[downtown_owin]
jurong_west_2019_marked_ppp <- listings_2019_marked_ppp_jit[jurong_west_owin]
sembawang_2019_marked_ppp <- listings_2019_marked_ppp_jit[sembawang_owin]
pasir_ris_2019_marked_ppp <- listings_2019_marked_ppp_jit[pasir_ris_owin]
downtown_2021_marked_ppp <- listings_2021_marked_ppp_jit[downtown_owin]
jurong_west_2021_marked_ppp <- listings_2021_marked_ppp_jit[jurong_west_owin]
sembawang_2021_marked_ppp <- listings_2021_marked_ppp_jit[sembawang_owin]
pasir_ris_2021_marked_ppp <- listings_2021_marked_ppp_jit[pasir_ris_owin]
Let’s check their visualisation:
# reference for title placement:
# https://stackoverflow.com/questions/14660372/common-main-title-of-a-figure-panel-compiled-with-parmfrow
par(mfrow=c(2,2))
plot(downtown_2019_marked_ppp, main = "Downtown Core", which.marks = "room_type")
plot(jurong_west_2019_marked_ppp, main = "Jurong West", which.marks = "room_type")
plot(sembawang_2019_marked_ppp, main = "Sembawang", which.marks = "room_type")
plot(pasir_ris_2019_marked_ppp, main = "Pasir Ris", which.marks = "room_type")
mtext("Airbnb Listings by Room Type for selected Planning Areas in 2019", side = 3, line = -11, outer = TRUE)
par(mfrow=c(2,2))
plot(downtown_2021_marked_ppp, main = "Downtown Core", which.marks = "room_type")
plot(jurong_west_2021_marked_ppp, main = "Jurong West", which.marks = "room_type")
plot(sembawang_2021_marked_ppp, main = "Sembawang", which.marks = "room_type")
plot(pasir_ris_2021_marked_ppp, main = "Pasir Ris", which.marks = "room_type")
mtext("Airbnb Listings by Room Type for selected Planning Areas in 2021", side = 3, line = -11, outer = TRUE)
Just like in Section A, let’s get the kernel density estimation. We need to use density() of the spatstat package to compute the kernel density objects, and then plot() it out. In addition, we’ll need to use rescale() to convert metres to kilometres (our desired unit of measurement). We can keep them as two separate lines of code, or we can combine them, like so:
# in this context, for rescale(), we are telling the function that the new unit length is 1000 meters (or 1km),
#so the function will divide the old coordinate values by 1000 to obtain coordinates expressed in kilometers
plot((density(split(rescale(downtown_2019_marked_ppp, 1000)))), main = "Downtown")
Downtown: At first glance, it might seem like ‘private room’ has both a higher distribution and density than the rest due to its gradient, but we have to keep in mind the scale of the legend - indeed, entire homes/apartments reign supreme with their range of up to 600, which is easily 6 times that of private rooms and 40 times that of shared rooms. There does seems to be a rather ‘even’ distribution amongst the three types of listings: entire home/apartments: entire home and apartment takes the south end with private rooms, private rooms has a middling stake on the north end, and the rest of the north end has shared rooms.
Jurong West: All room types are nestled to the East, though private rooms have a consistent presence throughout most of the area, and shared rooms has a significant (though smaller) cluster in the central area. Also - notice the legend, where private rooms reign supreme, with a gradient going up to 25 as compared to private room’s 7.
Sembawang: All room types are nestled southwards, with only entire homes/apartments having any presence in the central area. Once again, private rooms are the densest in this area, while entire apartments lose to even the shared rooms.
Pasir Ris: All room types complement each other, though there’s a stronger complementary relatinoship between apartments/shared rooms OR private rooms/shared rooms (since they cover each other’s unused areas).
Downtown: This is the only area with ‘Hotel’ presence, which is clustered to the North, competing with private rooms. There is evidently a niche area for each of the room types, and very few overlaps can be seen. In comparison to 2019, entire home/apartments are still reigning, but they stand at a scale of 250 instead of 600, showcasing a significant drop.
Jurong West: An almost complete withdrawal of shared rooms in comparison to 2019, but otherwise the distribution of rooms stay the same. No hotel presence (thus the full-pink map).
Sembawang: While in 2019 entire homes/apartments still had some presence in the central area, in 2021 they have withdrawn and are now competing with the other two room types for the North area market. No hotel presence (thus the full-pink map).
Pasir Ris: Interestingly, private rooms are now smaller in scale compared to entire apartments, which is a reversal of their positions in 2019. At the same time, they seem to have achieved better distribution (though this could be attributed to the loss of listings = a seemingly more ‘even’ distribution for whatever listings are left). Both clusters of the share room type have dwindled to a faint presence. No hotel presence (thus the full-pink map).
Across the board, for both 2019 and 2021, there are visible trends in which particular areas - complementary room types, or congregations toward the same areas (possibly due to the existence of tourist attractions or related landmarks/sites, as we discussed in Section A).
Something of note is the relationship between ‘entire home/apartments’ and ‘private rooms’ - at times they complement each other, covering areas the others are not in; other times they congregate in the same area and compete for market share. This could be useful to look into.
Another relationship to take note of is the apartments/shared rooms and private rooms/shared rooms - shared rooms generally seems to be on a small enough scale to appear in areas that neither of its bigger ‘competitors’ cover.
Evidently, as seen from the significant drop in scale across all areas for all room types, COVID-19 had significant impact on the listings: specifically, influencing island-wide withdrawal. Given the context of social distancing, safety measures, lockdown (which restricted travel and tourism that Airbnb listings are typically used for) and other governmental restrictions, this is to be expected.
Before we proceed with our second order SPPA, we need to assign marks to the ppp objects + check the levels of the marks, like so:
# in this context, for rescale(), we are telling the function that the new unit length is 1000 meters (or 1km),
#so the function will divide the old coordinate values by 1000 to obtain coordinates expressed in kilometers
plot((density(split(rescale(downtown_2019_marked_ppp, 1000)))), main = "Downtown")
Downtown: At first glance, it might seem like ‘private room’ has both a higher distribution and density than the rest due to its gradient, but we have to keep in mind the scale of the legend - indeed, entire homes/apartments reign supreme with their range of up to 600, which is easily 6 times that of private rooms and 40 times that of shared rooms. There does seems to be a rather ‘even’ distribution amongst the three types of listings: entire home/apartments: entire home and apartment takes the south end with private rooms, private rooms has a middling stake on the north end, and the rest of the north end has shared rooms.
marks(downtown_2019_marked_ppp) <- downtown_2019_marked_ppp$marks$room_type
marks(jurong_west_2019_marked_ppp) <- jurong_west_2019_marked_ppp$marks$room_type
marks(sembawang_2019_marked_ppp) <- sembawang_2019_marked_ppp$marks$room_type
marks(pasir_ris_2019_marked_ppp) <- pasir_ris_2019_marked_ppp$marks$room_type
levels(marks(downtown_2019_marked_ppp))
[1] "Entire home/apt" "Private room" "Shared room"
levels(marks(jurong_west_2019_marked_ppp))
[1] "Entire home/apt" "Private room" "Shared room"
levels(marks(sembawang_2019_marked_ppp))
[1] "Entire home/apt" "Private room" "Shared room"
levels(marks(pasir_ris_2019_marked_ppp))
[1] "Entire home/apt" "Private room" "Shared room"
marks(downtown_2021_marked_ppp) <- downtown_2021_marked_ppp$marks$room_type
marks(jurong_west_2021_marked_ppp) <- jurong_west_2021_marked_ppp$marks$room_type
marks(sembawang_2021_marked_ppp) <- sembawang_2021_marked_ppp$marks$room_type
marks(pasir_ris_2021_marked_ppp) <- pasir_ris_2021_marked_ppp$marks$room_type
levels(marks(downtown_2021_marked_ppp))
[1] "Entire home/apt" "Hotel room" "Private room"
[4] "Shared room"
levels(marks(jurong_west_2021_marked_ppp))
[1] "Entire home/apt" "Hotel room" "Private room"
[4] "Shared room"
levels(marks(sembawang_2021_marked_ppp))
[1] "Entire home/apt" "Hotel room" "Private room"
[4] "Shared room"
levels(marks(pasir_ris_2021_marked_ppp))
[1] "Entire home/apt" "Hotel room" "Private room"
[4] "Shared room"
Now that we’ve analysed our spatial point patterns (once again), we have to confirm our observations statistically with hypothesis testing. From our analysis above, there seems to be a strong relationship between entire homes/apartments and private rooms - in the sense that entire homes/apartments at the predominant player in the listings, but wherever they are scarcer, private rooms are more plentiful. Yet there are also times where they coexist in the same area of a subzone, implying a level of competition between them. In addition, it seems that this relationship is stronger in the post-COVID-19 (2021) dataset.
As such, we will use the Cross K-Function to look into this relationship.
The chosen planning subzones (PLN_AREA_N
) for comparison are: - Downtown Core (high density of all types of room, especially entire homes/apartments) - Jurong West (high density of private rooms) - Pasir Ris (small density of private and shared rooms, close to Changi Airport so might have insights) Sembawang will be excluded on account of it having few ‘entire home/apt’ marks as compared to private rooms, which leads to errors when visualising.
downtown_2019_Lcross.csr <- envelope(downtown_2019_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
downtown_2021_Lcross.csr <- envelope(downtown_2021_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
Here, entire homes/apartments and private rooms are statistically independent at a distance of 250-255 and 290 (estimated) meters (2019), and 0-25 (estimated) and 175 onwards (estimated) meters (2021). Outside of those instances, they are not statistically independent as the empirical k-cross line is outside of the envelope of the 99% confidence level, and for that scale, we reject the null hypothesis. It is worth noting that up till around 200, most of the observed values (the black line) are above our theoretical poisson process (the red dotted line) - but henceafter, it starts dropping, meeting the threshold for statistical independence - which is where we do not reject the null hypothesis. This could imply that the greater the distance, the greater the probability of statistical independence.
jurong_west_2019_Lcross.csr <- envelope(jurong_west_2019_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
jurong_west_2021_Lcross.csr <- envelope(jurong_west_2021_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
Here, entire homes/apartments and private rooms are statistically independent at a distance of 0-15 (estimated) and 180-210 (estimated) meters (2019), and mostly inconclusive results for 2021, but outside of that, they are not statistically independent as the empirical k-cross line is outside of the envelope of the 99% confidence level. Once again, it seems that the larger the value, the more likely it is to fall within the envelope.
pasir_ris_2019_Lcross.csr <- envelope(pasir_ris_2019_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
pasir_ris_2021_Lcross.csr <- envelope(pasir_ris_2021_marked_ppp,
Lcross,
i="Entire home/apt",
j="Private room",
correction="border",
nsim=999)
Generating 999 simulations of CSR ...
1, 2, 3, ......10.........20.........30.........40.........50.........60
.........70.........80.........90.........100.........110.........120
.........130.........140.........150.........160.........170.........180
.........190.........200.........210.........220.........230.........240
.........250.........260.........270.........280.........290.........300
.........310.........320.........330.........340.........350.........360
.........370.........380.........390.........400.........410.........420
.........430.........440.........450.........460.........470.........480
.........490.........500.........510.........520.........530.........540
.........550.........560.........570.........580.........590.........600
.........610.........620.........630.........640.........650.........660
.........670.........680.........690.........700.........710.........720
.........730.........740.........750.........760.........770.........780
.........790.........800.........810.........820.........830.........840
.........850.........860.........870.........880.........890.........900
.........910.........920.........930.........940.........950.........960
.........970.........980.........990........ 999.
Done.
Here, entire homes/apartments and private rooms are statistically independent at a distance of 0-40 (estimated) meters (2019), and for 2021 it is statisically independent (within the envelope) up till the envelope’s project. For most part (especially for the 2021 results) it seems statistically independent as the empirical k-cross line is within the envelope of 99% confidence level.
As we can see, entire homes/apartments and private rooms are found to be largely not spatially independent of each other especially at smaller distances, which lends weight to our earlier theory of a competitive or complementary relationship. However, we also must acknowledge that from the trends we’ve seen in our graph, larger distances seem to increase the probability of them being spatially independent.
As analysed in our Section A earlier, the tourist attractions, train services, hotels as well as majority of our self-sourced location factors (e.g. moneychangers) tend to be located in the dense areas of listings, regardless of the year. A possible reason for clustering, outside of the relationship between these room types, could be because said residences are located close to these factors (rather than because they’re located close to each other).
Once again, thank you Prof. Kam for our IS415 Geospatial Analytics and Applications course materials & resources 😄