Hands-On Exercise 02

Hands-On Exercise R sf ggplot2

Today’s Adventure: Geospatial Data Wrangling with R! Using the sf package, let’s learn the ropes of handling geospatial data in R: from importing to projection, we’ve got you covered 💪 You might even learn a geoprocessing tip or two 😄

Author

Affiliation

Megan Sim

Published

Aug. 29, 2021

DOI

1.0 Overview

Before we even start, you might be wondering: why geospatial analytics? Why are we learning about it, and how is it relevant to a business or a governmental/academic institution?

A quick Google search brings up thousands upon thousands of answers, articles, and research papers: I recommend Deloitte’s 3-Minute Guide on why geospatial analytics matters and the value it brings to a business its data.

In today’s Hands-On Exercise, we’ll be doing an introduction to geospatial analytics in R:

importing the data and (if necessary) converting it into an accessible format
carrying out geoprocessing tasks with sf
carrying out data wrangling tasks with dplyr
performing Exploratory Data Analysis (EDA) with ggplot2

2.0 Setup

2.1 Packages Used

The R packages we’ll be introducing today are: - sf: used for importing, managing, and processing geospatial data - tidyverse: used for importing, wrangling and visualising data (and other data science tasks!)

Note: Tidyverse consists of a family of R packages, such as readr, tidyr, and dplyr - these are used for the different steps of the data wrangling + visualisation process!

Here’s a handy code chunk that we’ll likely be putting at the start of every file: it (a) creates a list of packages, (b) checks if they have been installed (and installs it for us if they haven’t), and lastly (c) launches them in the Rstudio environment.

packages = c('sf', 'tidyverse')
for (p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p,character.only = T)
}

2.2 Data Used

The datasets used for this exercise are:

Master Plan 2014 Subzone Boundary (Web) from data.gov.sg
Pre-Schools Location from data.gov.sg
Cycling Path from LTADataMall
Singapore Airbnb listing data, 19 July 2021 from Inside Airbnb

A good practice would be to put the data sets in a ‘data’ folder, sorted accordingly into ‘geospatial’ and ‘aspatial’ folders.

3.0 Importing Geospatial Data into R

Now that we’ve got our data, let’s take a closer look at the formats they’re in and how to import them into R. The geospatial data we have are:

MP14_SUBZONE_WEB_PL, a polygon feature layer in ESRI shapefile format,
CyclingPath, a line feature layer in ESRI shapefile format, and
PreSchool, a point feature layer in kml file format.

To import, we’ll be using a handy function called st_read() from the sf package. The arguments it takes in depends on the file format. For the shapefile format, two arguments are provided: dsn to define the data path, and layer to provide the shapefile name. Bonus: we don’t need to specify the file extension for shapefiles 😄 On the other hand, for the kml file format, our argument is the complete path with the kml file extension.

3.1 Importing polygon feature data in shapefile format

Dataset used: MP14_SUBZONE_WEB_PL File format: shapefile Data frame type: polygon feature

mpsz = st_read(dsn = "data/geospatial", 
                  layer = "MP14_SUBZONE_WEB_PL")

Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `C:\IS415\IS415_msty\_posts\2021-08-30-hands-on-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

From the output message, we can see that in our mpsz simple feature data frame, there are 323 multipolygon features, 15 fields and is in the svy21 projected coordinates system.

3.2 Importing polyline feature data in shapefile form

Dataset used: CyclingPath File format: shapefile Data frame type: line feature

cyclingpath = st_read(dsn = "data/geospatial", 
                         layer = "CyclingPath")

Reading layer `CyclingPath' from data source 
  `C:\IS415\IS415_msty\_posts\2021-08-30-hands-on-exercise-2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3336 features and 2 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 12831.45 ymin: 28347.98 xmax: 42799.89 ymax: 48948.15
Projected CRS: SVY21

From the output message, we can see that in our cyclingpath linestring feature data frame, there are 1625 linestring features, 2 fields and is in the svy21 projected coordinates system.

3.3 Importing GIS data in kml format

Dataset used: pre-schools-location-kml File format: kml Data frame type: point feature

preschool = st_read("data/geospatial/pre-schools-location-kml.kml")

Reading layer `PRESCHOOLS_LOCATION' from data source 
  `C:\IS415\IS415_msty\_posts\2021-08-30-hands-on-exercise-2\data\geospatial\pre-schools-location-kml.kml' 
  using driver `KML'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range:       zmin: 0 zmax: 0
Geodetic CRS:  WGS 84

From the output message, we can see that in our preschool point feature data frame, there are 1359 linestring features, 2 fields and is in the wgs84 projected coordinates system.

4.0 Importing + Converting Aspatial Data into R

For aspatial data, such as the listings Airbnb datset, there’s an extra step in the importing process. We’ll import it into a tibble data frame, then convert it into a simple feature data frame.

4.1 Importing aspatial data

Since our listings data set is in a csv file format, we’ll use the read_csv() function from the readr package, like so:

listings <- read_csv("data/aspatial/listings.csv")
glimpse(listings)

Rows: 4,252
Columns: 16
$ id                             <dbl> 50646, 71609, 71896, 71903, 2~
$ name                           <chr> "Pleasant Room along Bukit Ti~
$ host_id                        <dbl> 227796, 367042, 367042, 36704~
$ host_name                      <chr> "Sujatha", "Belinda", "Belind~
$ neighbourhood_group            <chr> "Central Region", "East Regio~
$ neighbourhood                  <chr> "Bukit Timah", "Tampines", "T~
$ latitude                       <dbl> 1.33432, 1.34537, 1.34754, 1.~
$ longitude                      <dbl> 103.7852, 103.9589, 103.9596,~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 80, 178, 81, 81, 52, 40, 72, ~
$ minimum_nights                 <dbl> 90, 90, 90, 90, 14, 14, 90, 8~
$ number_of_reviews              <dbl> 18, 20, 24, 48, 20, 13, 133, ~
$ last_review                    <date> 2014-07-08, 2019-12-28, 2014~
$ reviews_per_month              <dbl> 0.22, 0.28, 0.33, 0.67, 0.20,~
$ calculated_host_listings_count <dbl> 1, 4, 4, 4, 50, 50, 7, 1, 50,~
$ availability_365               <dbl> 365, 365, 365, 365, 353, 364,~

From the output message, we can see that in our listing tibble data frame, there are 4252 rows and 16 columns (not features and fields like in our simple data feature frame!) Take note of the latitude and longitude fields - we’ll be using them in the next phase.

Assumption: The data is in the wgs84 Geographic Coordinate System on account of its latitude/longtitude fields.

4.2 Converting aspatial data

Now, let’s convert our listing tibble data frame into a by using the st_as_sf() function from the sf package.

listings_sf <- st_as_sf(listings, 
                       coords = c("longitude", "latitude"),
                       crs=4326) %>%
  st_transform(crs = 3414)

Let’s explain the code chunk!

This gives us the new simple feature data frame, listings_sf:

glimpse(listings_sf)

Rows: 4,252
Columns: 15
$ id                             <dbl> 50646, 71609, 71896, 71903, 2~
$ name                           <chr> "Pleasant Room along Bukit Ti~
$ host_id                        <dbl> 227796, 367042, 367042, 36704~
$ host_name                      <chr> "Sujatha", "Belinda", "Belind~
$ neighbourhood_group            <chr> "Central Region", "East Regio~
$ neighbourhood                  <chr> "Bukit Timah", "Tampines", "T~
$ room_type                      <chr> "Private room", "Private room~
$ price                          <dbl> 80, 178, 81, 81, 52, 40, 72, ~
$ minimum_nights                 <dbl> 90, 90, 90, 90, 14, 14, 90, 8~
$ number_of_reviews              <dbl> 18, 20, 24, 48, 20, 13, 133, ~
$ last_review                    <date> 2014-07-08, 2019-12-28, 2014~
$ reviews_per_month              <dbl> 0.22, 0.28, 0.33, 0.67, 0.20,~
$ calculated_host_listings_count <dbl> 1, 4, 4, 4, 50, 50, 7, 1, 50,~
$ availability_365               <dbl> 365, 365, 365, 365, 353, 364,~
$ geometry                       <POINT [m]> POINT (22646.02 35167.9~

Note that a new column called geometry has been added! In addition, longtitude and latitude have both been dropped.

4.3 Combining it together: Importing asptial data as sf

5.0 Checking the Content of a Data Frame

Now that we’ve got our data frames, this begs the question: what exactly is inside them? Let’s learn how to retrieve the information of the dataframe with 3 simple methods!

5.1 st_geometry()

Let’s say that we want a preliminary look at our data - just seeing the basic feature information is sufficient. In this case, st_geometry() is most appropriate:

st_geometry(mpsz)

Geometry set for 323 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 5 geometries:

As we can see, only the basic information of the feature class (type of geometry, geographic extent of features and CRS) is displayed.

5.2 glimpse()

However, basic information won’t take us very far. Let’s dig a little deeper and learn about the associated attribute information in the data frame. This is where glimpse() comes in:

glimpse(mpsz)

Rows: 323
Columns: 16
$ OBJECTID   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15~
$ SUBZONE_NO <int> 1, 1, 3, 8, 3, 7, 9, 2, 13, 7, 12, 6, 1, 5, 1, 1,~
$ SUBZONE_N  <chr> "MARINA SOUTH", "PEARL'S HILL", "BOAT QUAY", "HEN~
$ SUBZONE_C  <chr> "MSSZ01", "OTSZ01", "SRSZ03", "BMSZ08", "BMSZ03",~
$ CA_IND     <chr> "Y", "Y", "Y", "N", "N", "N", "N", "Y", "N", "N",~
$ PLN_AREA_N <chr> "MARINA SOUTH", "OUTRAM", "SINGAPORE RIVER", "BUK~
$ PLN_AREA_C <chr> "MS", "OT", "SR", "BM", "BM", "BM", "BM", "SR", "~
$ REGION_N   <chr> "CENTRAL REGION", "CENTRAL REGION", "CENTRAL REGI~
$ REGION_C   <chr> "CR", "CR", "CR", "CR", "CR", "CR", "CR", "CR", "~
$ INC_CRC    <chr> "5ED7EB253F99252E", "8C7149B9EB32EEFC", "C35FEFF0~
$ FMEL_UPD_D <date> 2014-12-05, 2014-12-05, 2014-12-05, 2014-12-05, ~
$ X_ADDR     <dbl> 31595.84, 28679.06, 29654.96, 26782.83, 26201.96,~
$ Y_ADDR     <dbl> 29220.19, 29782.05, 29974.66, 29933.77, 30005.70,~
$ SHAPE_Leng <dbl> 5267.381, 3506.107, 1740.926, 3313.625, 2825.594,~
$ SHAPE_Area <dbl> 1630379.3, 559816.2, 160807.5, 595428.9, 387429.4~
$ geometry   <MULTIPOLYGON [m]> MULTIPOLYGON (((31495.56 30..., MULT~

With glimpse(), we now know the data type of each field - for example, FMEL-UPD_D field is of the date data type.

5.3 head()

Sometimes, we want to dig as deep as we can and reveal all the information of a feature object. In this case, head() is best:

head(mpsz)

Simple feature collection with 6 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 24468.89 ymin: 28369.47 xmax: 32362.39 ymax: 30542.74
Projected CRS: SVY21
  OBJECTID SUBZONE_NO      SUBZONE_N SUBZONE_C CA_IND      PLN_AREA_N
1        1          1   MARINA SOUTH    MSSZ01      Y    MARINA SOUTH
2        2          1   PEARL'S HILL    OTSZ01      Y          OUTRAM
3        3          3      BOAT QUAY    SRSZ03      Y SINGAPORE RIVER
4        4          8 HENDERSON HILL    BMSZ08      N     BUKIT MERAH
5        5          3        REDHILL    BMSZ03      N     BUKIT MERAH
6        6          7 ALEXANDRA HILL    BMSZ07      N     BUKIT MERAH
  PLN_AREA_C       REGION_N REGION_C          INC_CRC FMEL_UPD_D
1         MS CENTRAL REGION       CR 5ED7EB253F99252E 2014-12-05
2         OT CENTRAL REGION       CR 8C7149B9EB32EEFC 2014-12-05
3         SR CENTRAL REGION       CR C35FEFF02B13E0E5 2014-12-05
4         BM CENTRAL REGION       CR 3775D82C5DDBEFBD 2014-12-05
5         BM CENTRAL REGION       CR 85D9ABEF0A40678F 2014-12-05
6         BM CENTRAL REGION       CR 9D286521EF5E3B59 2014-12-05
    X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
1 31595.84 29220.19   5267.381  1630379.3
2 28679.06 29782.05   3506.107   559816.2
3 29654.96 29974.66   1740.926   160807.5
4 26782.83 29933.77   3313.625   595428.9
5 26201.96 30005.70   2825.594   387429.4
6 25358.82 29991.38   4428.913  1030378.8
                        geometry
1 MULTIPOLYGON (((31495.56 30...
2 MULTIPOLYGON (((29092.28 30...
3 MULTIPOLYGON (((29932.33 29...
4 MULTIPOLYGON (((27131.28 30...
5 MULTIPOLYGON (((26451.03 30...
6 MULTIPOLYGON (((25899.7 297...

Note that the ‘n’ argument allows us to select how many records to display!

6.0 Plotting & Projection

6.1 Plotting

Now that we’ve got our data, we get into the meat of the matter: visualisation! Having a dataframe of fields and data points is taxing for human eyes to process, but with a bit of plotting and mapping, our geospatial data becomes a gold mine of great insights. Let’s try to plot() it out:

plot(mpsz)

This is a handy multi-plot of all attributes (up to a reasonable amount). But maybe we just want to plot a specific attribute:

plot(mpsz["PLN_AREA_N"])

There are also times where we need just the geometry - in other words, the map outline:

plot(st_geometry(mpsz))

6.2 Projection

Projection transformation is when we project a simple feature data frame from one coordinate system to another. There are two common issues that require projection transformation: (a) missing or inaccurate coordinate system, or (b) inappropriate coordinate systems for the analysis chosen.

6.2.1 Missing/Inaccurate Coordinate System

To tackle the first issue, we’ll need to assign an ESPG code to the data frame. Firstly, let’s check the coordinate system of our mpsz data frame:

st_crs(mpsz)

Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["SVY21[WGS84]",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

Although mpsz data frame is projected in svy21, the output indicates that the EPSG is 9001. This is the wrong EPSG code! The correct EPSG code for svy21 should be 3414. Let’s change it:

mpsz3414 <- st_set_crs(mpsz, 3414)
st_crs(mpsz3414)

Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

6.2.2 Inappropriate Coordinate Systems

Still remember our geospatial data overview in 3.0? You might have noticed that the coordinate system differed among our data frames: mpsz and cyclingpath are svy21, while preschool is wgs84. preschool might run into issues when we’re performing geoprocessing, because a geographic coordinate system is not appropriate if our analysis needs distance or/and area measurements.

preschool3414 <- st_transform(preschool, 
                              crs = 3414)
st_geometry(preschool3414)

Geometry set for 1925 features 
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 11203.01 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
z_range:       zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:

Notice the change to the svy21 projected coordinate system!

You might notice that the values in the bounding box have changed, too! They are now greater than the original 0-360 range of decimal degree commonly used by majority of the geographic coordinate systems.

7.0 Geoprocessing

Other than handling data, the handy sf package also has functions for geoprocessing (otherwise known as GIS analysis). Let’s explore some commonly-used geoprocessingn functions!

7.1 Buffering

Let’s say that in a bid to promote healthier living, the authorities are planning to upgrade the existing cycling paths, making it safer and more appealing to cyclists all around! To do that, they’ll need to acquire 5 metres of reserved land on both sides of the cycling path. Our task is to find the total area of land that’s to be acquired.

Firstly, let’s compute the 5-meter buffers around the cycling paths:

buffer_cycling <- st_buffer(cyclingpath, 
                               dist=5, nQuadSegs = 30)

Then, let’s actually calculate the area of the buffers:

buffer_cycling$AREA <- st_area(buffer_cycling)

Lastly, let’s find the total area with sum():

sum(buffer_cycling$AREA)

1642750 [m^2]

7.2 Point-In-Polygon Count

Let’s say that a preschool service group is organising a nation-wide event, and wants to find out the number of preschools in each Planning Subzone.

We can kill two birds with one stone with this handy combination of st_intersects() and length(). st_intersects() helps us identify the preschools located in each Planningn Subzone, while length() calculates the number of preschools per Planning Subzone.

mpsz3414$`PreSch Count`<- lengths(st_intersects(mpsz3414, preschool3414))

Let’s check a summary of PreSch Count:

summary(mpsz3414$`PreSch Count`)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    3.00    5.96    9.00   58.00

Now, the preschool service group wants to hold a subzone-specific event! They want to target the Planning Subzone with the most number of preschools to maximise their outreach, so let’s help them out with top_n():

top_n(mpsz3414, 1, `PreSch Count`)

Simple feature collection with 1 feature and 16 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 39655.33 ymin: 35966 xmax: 42940.57 ymax: 38622.37
Projected CRS: SVY21 / Singapore TM
  OBJECTID SUBZONE_NO     SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
1      189          2 TAMPINES EAST    TMSZ02      N   TAMPINES
  PLN_AREA_C    REGION_N REGION_C          INC_CRC FMEL_UPD_D
1         TM EAST REGION       ER 21658EAAF84F4D8D 2014-12-05
    X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
1 41122.55 37392.39   10180.62    4339824
                        geometry PreSch Count
1 MULTIPOLYGON (((42196.76 38...           58

Lastly, the preschool service group wants to know the density of preschools by planning subzone for future events. We’ll need to derive the area of each planning subzone:

mpsz3414$Area <- mpsz3414 %>%
  st_area()

Now, introducing a new function, mutate()! It’s to help us add the density column into mpsz3414:

mpsz3414 <- mpsz3414 %>%
  mutate(`PreSch Density` = `PreSch Count`/Area * 1000000)

While they look similar and have a similar function (to add new variables), there’s a difference between mutate() and transmute()! mutate() adds new variables and preserves existing ones while transmute() adds new variables and drops existing ones. Use the appropriate one for the situation!

8.0 Exploratory Data Analysis (EDA)

Why do we do EDA? Here’s a great Medium article that introduces EDA: from its significance to its components, it’s got you covered!

For this EDA section, we’ll be introducing various ggplot2 functions that’ll help us create functional and yet truthful statistical graphs that’ll help us visualise and understand our data!

Firstly, let’s use hist() to plot a histogram and reveal the distribution of PreSch Density:

hist(mpsz3414$`PreSch Density`)

That looks good - but it’s not something you’ll publish in your reports! Let’s add a little customisation:

ggplot(data=mpsz3414, 
       aes(x= as.numeric(`PreSch Density`)))+
  geom_histogram(bins=20, 
                 color="black", 
                 fill="light blue") +
  labs(title = "Are pre-school even distributed in Singapore?",
       subtitle= "There are many planning sub-zones with a single pre-school, on the other hand, \nthere are two planning sub-zones with at least 20 pre-schools",
      x = "Pre-school density (per km sq)",
      y = "Frequency")

DIY section

DIY: Using ggplot2, plot a scatterplot showing the relationship between Pre-school Density and Pre-school Count.

Let’s approach this step-by-step. Firstly, we’ll need the variables Pre-school Density and Pre-school Count. Luckily, we’ve already created both in 2.5.2 Point-In-Polygon Count.

Next, we need to plot a scatterplot. For our histogram, we used geom_histogram - and for a scatterplot, we’ll use geom_point. We can further customise this by adjusting the colour, size and fill of the points, like so:

ggplot(data=mpsz3414, 
       aes(y = `PreSch Count`, 
           x= as.numeric(`PreSch Density`)))+
  geom_point(color="black", 
             fill="light blue") + 
  labs(title = "",
      x = "Pre-school density (per km sq)",
      y = "Pre-school count")

Seem finished? There’s actually one more step: ensuring that your data is presented as fairly as possible - we don’t want to mislead our readers 😱 In this case, notice that the aspect ratio of our graph is off - its height is greater than its width! Let’s correct that by setting limiters with the xlim() and ylim() functions inside ggplot2:

And there we have it! A representative scatterplot visualisation ✨

7.0 End Notes

This is the end of this week’s exercise! I’m really excited to learn more about geospatial analytics: we’ve just begun to scratch the surface, and there’s so much more to explore and experiment with!

On a related sidenote: I’ve learned a number of things myself when creating this blog post! For example, I wanted to have a table of contents, but googling ‘sidebar navigation pane’ proved to be far less effective than just reading the Distill Basics - which is a lesson learnt on reading the provided materials!

On a funny note, here’s a mini-comic on my experience with tabsets:

Tabsets are meant for html documents, but this output is a distill article… Of course it didn’t work 😭 Well, that’s just another learning point in our Geospatial Analytics Journey !

Walk with me as we dive deeper the coming weeks! 🤗 💖 ✨

P.S. I found my old tablet so I could illustrate the points better! Handwriting will be more consistent from this blog post onwards 😄