Todayâs Adventure: Thematic & Analytical Mapping! With the tmap package, letâs learn how to plot functional and truthful choropleth maps đȘ
In Hands-On Exercise 02, we learned how to handle geospatial data in R: from importing to content-checking to plotting and projection! We ever learned a couple of geoprocessing tricks đ. Now that we know how to go all of that, letâs get into the meat of geospatial visualisations: choropleth maps!
So what are choropleth maps? Hereâs a great primer on what they are (with examples!) as well as the advantages + disadvantages of using them - but in short, theyâre thematic maps: maps used to represent statistical data of a geographic region - usually through patterns or graduated colours. For example, seeing the distribution of elderly across Singaporeâs various subzones!
The R packages weâll be introducing today are:
In addition, weâll be using the packages from our last lesson:
packages = c('sf', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
The datasets used for this exercise are:
MP14_SUBZONE_WEB_PL
) from data.gov.sgrespopagesextod2011to2020.csv
) from Department of Statistics, SingaporeNote: Our aspatial data file does not contain any coordinates values, but itâs PA and SZ fields can be used as unique identifiers to geocode to
MP14_SUBZONE_WEB_PL
shapefile!
Weâve learned how to import data in Hands-On Exercise 02, so try doing it yourself in this section! If youâre stuck, click on âshow codeâ below.
mpsz <- st_read(dsn = "data/geospatial",
layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\IS415\IS415_msty\_posts\2021-09-02-hands-on-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
Letâs examine the content - note that only the first 10 records are displayed for brevity:
mpsz
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
First 10 features:
OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND
1 1 1 MARINA SOUTH MSSZ01 Y
2 2 1 PEARL'S HILL OTSZ01 Y
3 3 3 BOAT QUAY SRSZ03 Y
4 4 8 HENDERSON HILL BMSZ08 N
5 5 3 REDHILL BMSZ03 N
6 6 7 ALEXANDRA HILL BMSZ07 N
7 7 9 BUKIT HO SWEE BMSZ09 N
8 8 2 CLARKE QUAY SRSZ02 Y
9 9 13 PASIR PANJANG 1 QTSZ13 N
10 10 7 QUEENSWAY QTSZ07 N
PLN_AREA_N PLN_AREA_C REGION_N REGION_C
1 MARINA SOUTH MS CENTRAL REGION CR
2 OUTRAM OT CENTRAL REGION CR
3 SINGAPORE RIVER SR CENTRAL REGION CR
4 BUKIT MERAH BM CENTRAL REGION CR
5 BUKIT MERAH BM CENTRAL REGION CR
6 BUKIT MERAH BM CENTRAL REGION CR
7 BUKIT MERAH BM CENTRAL REGION CR
8 SINGAPORE RIVER SR CENTRAL REGION CR
9 QUEENSTOWN QT CENTRAL REGION CR
10 QUEENSTOWN QT CENTRAL REGION CR
INC_CRC FMEL_UPD_D X_ADDR Y_ADDR SHAPE_Leng
1 5ED7EB253F99252E 2014-12-05 31595.84 29220.19 5267.381
2 8C7149B9EB32EEFC 2014-12-05 28679.06 29782.05 3506.107
3 C35FEFF02B13E0E5 2014-12-05 29654.96 29974.66 1740.926
4 3775D82C5DDBEFBD 2014-12-05 26782.83 29933.77 3313.625
5 85D9ABEF0A40678F 2014-12-05 26201.96 30005.70 2825.594
6 9D286521EF5E3B59 2014-12-05 25358.82 29991.38 4428.913
7 7839A8577144EFE2 2014-12-05 27680.06 30230.86 3275.312
8 48661DC0FBA09F7A 2014-12-05 29253.21 30222.86 2208.619
9 1F721290C421BFAB 2014-12-05 22077.34 29893.78 6571.323
10 3580D2AFFBEE914C 2014-12-05 24168.31 30104.18 3454.239
SHAPE_Area geometry
1 1630379.3 MULTIPOLYGON (((31495.56 30...
2 559816.2 MULTIPOLYGON (((29092.28 30...
3 160807.5 MULTIPOLYGON (((29932.33 29...
4 595428.9 MULTIPOLYGON (((27131.28 30...
5 387429.4 MULTIPOLYGON (((26451.03 30...
6 1030378.8 MULTIPOLYGON (((25899.7 297...
7 551732.0 MULTIPOLYGON (((27746.95 30...
8 290184.7 MULTIPOLYGON (((29351.26 29...
9 1084792.3 MULTIPOLYGON (((20996.49 30...
10 631644.3 MULTIPOLYGON (((24472.11 29...
popdata <- read_csv("data/aspatial/respopagesextod2011to2020.csv")
Hopefully, after this lesson, youâve remembered the main methods for importing geospatial and attribute data respectively!
Before we can start mapping out our thematic map, weâll need to prepare a data table with our required values, which are as follows:
popdata2020 <- popdata %>%
filter(Time == 2020) %>%
group_by(PA, SZ, AG) %>%
summarise(`POP` = sum(`Pop`)) %>%
ungroup()%>%
pivot_wider(names_from=AG,
values_from=POP) %>%
mutate(YOUNG = rowSums(.[3:6])
+rowSums(.[12])) %>%
mutate(`ECONOMY ACTIVE` = rowSums(.[7:11])+
rowSums(.[13:15]))%>%
mutate(`AGED`=rowSums(.[16:21])) %>%
mutate(`TOTAL`=rowSums(.[3:21])) %>%
mutate(`DEPENDENCY` = (`YOUNG` + `AGED`)
/`ECONOMY ACTIVE`) %>%
select(`PA`, `SZ`, `YOUNG`,
`ECONOMY ACTIVE`, `AGED`,
`TOTAL`, `DEPENDENCY`)
Hmm, there are some new functions inside here that we havenât touched on before! Letâs explain them, one by one:
PA
, SZ
AND AG
, and we perform the summarise() operation on it before ungrouping.DEPENDENCY
column, we calculated the value of the young + aged groups over the economy-active groups and used mutate() to add it as a new column.Lastly, hereâs an illustrated example of pivot_wider(). Itâs mean to âwidenâ data, which means increasing the number of columns and decreasing the number of rows.
From the pivot_wider() example above, we can see the the name_from argument takes column that we want to expand the table by, while values_from refers to the column that we get the cell values from. >Note: notice that while the column names are âgoneâ in the pivot_wider() table, the viewer still understands what they represent!
Credits to to the Youtube Channel SR for their video example on pivot_wider() đȘ
Weâre almost ready to join the geospatial and attribute data - but wait! Thereâs one extra step: weâve got to convert the values in the PA
and SZ
fields to uppercase. This is because the values from those fields are in both uppercase and lowercase; on the other hand, SUBZONE_N
and PLN_AREA_N
are in uppercase only.
popdata2020 <- popdata2020 %>%
mutate_at(.vars = vars(PA, SZ),
.funs = funs(toupper)) %>%
filter(`ECONOMY ACTIVE` > 0)
Now, letâs join the two sets of data with left_join()! Here, weâll be using the planning subzone names (SUBZONE_N and SZ from the geospatial and attribute data frames respectively) as the common identifier.
mpsz_pop2020 <- left_join(mpsz, popdata2020,
by = c("SUBZONE_N" = "SZ"))
Now you might be thinking⊠if thereâs a left_join(), surely thereâs a right_join() as well? Whatâs the difference, and why arenât we using right_join()?
left_join(a,b) is used when dataframe âaâ is your âmainâ dataframe, and youâd like to merge some of the data from âbâ into âaâ. right_join(a,b) is vice versa: âbâ is your main, while âaâ is additional. In addition, the newly joined dataframe is modelled after your âmainâ dataframe - and since we want our output to be a simple features dataframe, weâre putting mpsz
as our âmainâ one. You can find out more here!
Now into the meat of the matter: choropleth mapping! There are two approaches when preparing these maps, namely:
If youâd like a fast method that gives you a concise visualisation, qtm() is the way to go - itâll give you a cartographic standard choropleth map as shown below:
tmap_mode("plot")
qtm(mpsz_pop2020,
fill = "DEPENDENCY")
As we can see, the âfillâ argument is used to map the attribute we have in mind - in this case, DEPENDENCY
. But what is the tmap_mode()? Letâs fiddle around with itâŠ
tmap_options(check.and.fix = TRUE)
tmap_mode("view")
qtm(mpsz_pop2020,
fill = "DEPENDENCY")
A wild interactive map appeared! đ€Ł If you change the tmap_mode() to âviewâ, youâll get an interactive map - try hovering over it to see the values of each subzone, and zoom in and out to target specific areas!
While qtm() is well and good, it offers very little in way of customisation. If we want high-quality cartographic choropleth maps, we should make use of tmapâs drawing elements. Hereâs an example of what a customised map looks like:
tmap_mode("plot")
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
title = "Dependency ratio") +
tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar() +
tm_grid(alpha =0.2) +
tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS",
position = c("left", "bottom"))
Wow! Thatâs a detailed map - and it may seem like a lot to take in at once, but letâs break down our map-making process step-by-step.
Our very first step is the base map: the basic building block upon which weâll add our statistical details to! Weâll define the input data mpsz_pop2020
into tm_shape(), and then add one or more layer elements such as tm_fill() and tm_polygons(). In this case, weâll use tm_polygons() to draw the planning subzone polygons:
tm_shape(mpsz_pop2020) +
tm_polygons()
Now, letâs achieve the same level of visualisation as qtm(): we want to show the geographic distribution of a selected variable by planning subzone. Easy-peasy: just assign the target variable (in this case DEPENDENCY
) to tm_polygons(), like so:
tm_shape(mpsz_pop2020)+
tm_polygons("DEPENDENCY")
Hereâs a secret: tm_polygons() is actually two kids in a trench coat!
To be exact, tm_polygons() is a wrapper of tm_fill() and tm_border(). tm_fill() shades the polygons(using the default colour scheme) while tm_borders() adds the borders of the shapefile onto the choropleth map.
If we use tm_fill() aloneâŠ
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY")
The planning subzones are shaded according to their respective dependency values, but there arenât any boundaries. Letâs remedy that:
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY") +
tm_borders(lwd = 0.1, alpha = 1)
This looks a bit different from using just tm_polygons() - notice how the borders are a lot thinner? This is due us tweaking the arguments for tm_borders: - alpha = transparency value, from 0 (totally transparent) to 1 (totally opaque). By default, the alpha value of the column is used - which is normally 1. - col = border colour, - lwd = border line width. The default is 1, and - lty = border line type. The default is âsolidâ.
If youâve got a keen eye, you might have noticed the legend in the lower-right corner of our maps, marking the dependency in increments of 5. You might also be thinking that the red subzone must be an outlier - itâs dark red, while everything else is a pale yellow, and only two of the listed classes are used.
If so - we come to the conclusion: the outlier is affecting the map visualisation, and we canât see the varying intensities of other subzones đČ What do we do?
No fear - data classification is here! We ought to tweak the data classification method thatâs being used so as to adjust the data ranges/classes. tmap provides a total ten data classification methods: fixed, sd, equal, pretty (default), quantile, kmeans, hclust, bclust, fisher, and jenks. Weâll define these in the style argument of tm_fill() or tm_polygons() will be used.
Letâs test out the jenks and equal classification methods!
jenksmap = tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 5,
style = "jenks") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Jenks Classification")
equalmap = tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 5,
style = "equal") +
tm_borders(alpha = 0.5) +
tm_layout(main.title="Equal Classification")
If youâre curious about what the other classification methods look like, you can also check out this handy website for visualisations!
**** DIY section Prepare several choropleth maps, using similar classification methods but with different numbers of classes (i.e. 2, 6, 10, 20). What observation can you draw from comparing the output maps?
As we can see: across the âequalâ classifications, the map mostly stays the same - and only at very high numbers of classes would you see any differentiation for a dataset like ours (which has an outlier subzone). On the other hand, âjenksâ seems to have a more âaccurateâ subzone that allows for varying intensities to be visaulised even in the presence of subsets, even at small class numbers.
However, itâs likely that your viewer wonât be able to differentiate the difference between each class the greater the class number, as we can see from our class number = 20 examples. On the other hand, having too few classes makes it hard for any differentiation between subzones to be seen! This balance is something we have to keep in mind as we choose our classification methods for choropleth modelling.
For the built-in styles weâve learned previously, the category breaks are computer internally - but we can override these and explicitly set the breakpoints by using the breaks
argument in tm_fill().
Note: for tmap, breaks include a minimum and maximum - so if you want n categories, youâll need to specify n+1 elements in the breaks argument!
Before we start, letâs get some descriptive statistics on the targeted variable:
summary(mpsz_pop2020$DEPENDENCY)
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
0.0000 0.6519 0.7025 0.7742 0.7645 19.0000 92
With reference to the results above, weâll set our break points at 0.60, 0.70, 0.80, and 0.90. In addition, as mentioned in our note above, weâll need to include a minimum and maximum, which weâll set at 0 and 100. Our breaks vector is thus c(0, 0.60, 0.70, 0.80, 0.90, 1.00).
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
breaks = c(0, 0.60, 0.70, 0.80, 0.90, 1.00)) +
tm_borders(alpha = 0.5)
Tired of seeing the olâ red-orange-yellow colour on all our maps? Never fear - RColorBrewer is here! Letâs try changing up our colour scheme by defining the palette in tm_fill():
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
n = 6,
style = "quantile",
palette = "Blues") +
tm_borders(alpha = 0.5)
Wow, that BLUE me away! Bad puns aside, did you know that you can reverse the colour palette? Just do add a â-â prefix:
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "-Greens") +
tm_borders(alpha = 0.5)
To further customise your maps, maybe youâd like to add titles, scale bars and aspect ratios. Maybe youâd like to change just one aspect, or youâd like a complete overhaul on the presentation. In this section, weâll introduce you to three methods to switch up your map layout!
Change the placement, format and appearance of the legend with the legend
option:
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "jenks",
palette = "Blues",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(main.title = "Distribution of Dependency Ratio by planning subzone \n(Jenks classification)",
main.title.position = "center",
main.title.size = 1,
legend.height = 0.45,
legend.width = 0.35,
legend.outside = FALSE,
legend.position = c("right", "bottom"),
frame = FALSE) +
tm_borders(alpha = 0.5)
Change a wide variety of layout settings with the tmap_style() option:
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "-Greens") +
tm_borders(alpha = 0.5) +
tmap_style("classic")
Add map furniture such as a compass, scale bar or grid lines with their associated methods:
tm_shape(mpsz_pop2020)+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
title = "No. of persons") +
tm_layout(main.title = "Distribution of Dependency Ratio \nby planning subzone",
main.title.position = "center",
main.title.size = 1.2,
legend.height = 0.45,
legend.width = 0.35,
frame = TRUE) +
tm_borders(alpha = 0.5) +
tm_compass(type="8star", size = 2) +
tm_scale_bar(width = 0.15) +
tm_grid(lwd = 0.1, alpha = 0.2) +
tm_credits("Source: Planning Sub-zone boundary from Urban Redevelopment Authorithy (URA)\n and Population data from Department of Statistics DOS",
position = c("left", "bottom"))
And lastly, reset to the default style with:
tmap_style("white")
There are times were we need to compare maps - and itâs easier to compare them when theyâre side by side instead of two paragraphs apart! They also enable visualisations of how spatial relationships change with respect to another variable, such as time.
In tmap, facet maps can be plotted in three ways:
Defining ncols in tm_fill():
Assigning multiple values to at least one of the aesthetic arguments:
tm_shape(mpsz_pop2020) +
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
thres.poly = 0) +
tm_facets(by="REGION_N",
free.coords=TRUE,
drop.shapes=TRUE) +
tm_layout(legend.show = FALSE,
title.position = c("center", "center"),
title.size = 20) +
tm_borders(alpha = 0.5)
youngmap <- tm_shape(mpsz_pop2020)+
tm_polygons("YOUNG",
style = "quantile",
palette = "Blues")
agedmap <- tm_shape(mpsz_pop2020)+
tm_polygons("AGED",
style = "quantile",
palette = "Blues")
tmap_arrange(youngmap, agedmap, asp=1, ncol=2)
Other than creating facet maps, we also have the option of having our map meet a particular selection criterion with the selection function:
tm_shape(mpsz_pop2020[mpsz_pop2020$REGION_N=="CENTRAL REGION", ])+
tm_fill("DEPENDENCY",
style = "quantile",
palette = "Blues",
legend.hist = TRUE,
legend.is.portrait = TRUE,
legend.hist.z = 0.1) +
tm_layout(legend.outside = TRUE,
legend.height = 0.45,
legend.width = 5.0,
legend.position = c("right", "bottom"),
frame = FALSE) +
tm_borders(alpha = 0.5)
With this, weâve started our foray into the visualisation portion of geospatial analytics! Thereâs still lots to learn, so stick by my side đ Weâll see you next exercise!