Hands-on Ex 3

Published

December 2, 2023

Modified

December 16, 2023

Spatial Interaction: Processing and Visualising Flow Data

1 What is Spatial Interaction?

Spatial interaction represents the flow of people, materials, or information between locations in geographical space. Each spatial interaction is composed of a discrete origin->destination pair. Each pair can be represented as a cell in a matrix where rows are related to the locations (centroids) of origin, while columns are related to locations (centroids) of destination. Such a matrix is commonly known as an origin/destination matrix, or a spatial interaction matrix.

2 Loading the Packages

For the purpose of this exercise, four r packages will be used. They are:

  • sf for importing, integrating, processing and transforming geospatial data
  • tidyverse for importing, integrating, wrangling and visualising data
  • tmap for creating thematic maps
  • DT for interactive dataframe styling
pacman::p_load(tmap, sf, DT, stplanr,
               performance,
               ggpubr, tidyverse)

3 Loading Aspatial Data

The dataset used is Passenger Volume by Origin Bus Stop from LTA Datamall. The data extracted is from September 2023.

odbus09 <- read_csv("data/aspatial/origin_destination_bus_202309.csv")

glimpse(odbus09)
Rows: 5,714,196
Columns: 7
$ YEAR_MONTH          <chr> "2023-09", "2023-09", "2023-09", "2023-09", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKENDS/HOLIDAY", "WEEKDAY",…
$ TIME_PER_HOUR       <dbl> 17, 10, 10, 7, 7, 11, 16, 16, 16, 20, 7, 11, 11, 1…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "24499", "65239", "65239", "23519", "23519", "5250…
$ DESTINATION_PT_CODE <chr> "22221", "65159", "65159", "23311", "23311", "4204…
$ TOTAL_TRIPS         <dbl> 1, 9, 2, 6, 1, 2, 18, 3, 2, 1, 2, 5, 3, 5, 5, 19, …

glimpse() reveals that the values in ORIGIN_PT_CODE and DESTINATON_PT_CODE are character data types. As these represent unique bus stops, we treat them as categorical data and cast them as factor type:

odbus09 <- odbus09 %>%
  mutate(ORIGIN_PT_CODE = as.factor(ORIGIN_PT_CODE),
         DESTINATION_PT_CODE = as.factor( DESTINATION_PT_CODE))

3.1 Extracting data for study

For the purpose of this exercise, we will extract commuting flows on weekday and between 6 and 9 o’clock.

odbus6_9 <- odbus09 %>%
  filter(
    DAY_TYPE == "WEEKDAY"
  ) %>%
  filter(
    TIME_PER_HOUR >= 6 &
    TIME_PER_HOUR <= 9
  ) %>%
  group_by(
    ORIGIN_PT_CODE,
    DESTINATION_PT_CODE
  ) %>%
  summarise(
    TRIPS = sum(TOTAL_TRIPS)
  ) %>%
  ungroup()

DT::datatable(head(odbus6_9, 10))

3.2 Saving the dataframe as RDS

We will save the output in rds format for future usage:

write_rds(odbus6_9, "data/rds/odbus6_9.rds")

Then, import the rds file into the R environment

odbus6_9 <- read_rds("data/rds/odbus6_9.rds")

4 Loading Geospatial Data

Two geospatial data will be used in this exercise, they are:

  • busstop: This data provides the location of bus stops
  • MPSZ-2019: This data provides the sub-zone boundary of URA Master Plan 2019

Both geospatial dataframes are transformed into the same EPSG code 3414 based on Co-ordinate Reference System (CRS)

busstop is a Simple Features Dataframe (point)

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `C:\haileycsy\ISSS624-AGA\Hands-on_Ex\hoe3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21

MPSZ 2019 is a Simple Features Dataframe (Polygon)

mpsz <- st_read(dsn = "data/geospatial",
                layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `C:\haileycsy\ISSS624-AGA\Hands-on_Ex\hoe3\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

The code chunk below writes the mpsz sf tibble data frame into an rds file for future use:

mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

5 Geospatial Data Wrangling

The following code chunk integrates the planning subzone codes (i.e. SUBZONE_C) of mpsz sf data frame into busstop sf data frame.

  • st_intersection() is used to perform point and polygon overly and the output will be in point sf object.
  • select() of dplyr package is then use to retain only BUS_STOP_N and SUBZONE_C in the busstop_mpsz sf data frame.

Note:5 bus stops are excluded in the resultant data frame because they are outside of Singapore boundary.

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C) %>%
  st_drop_geometry()
od_data <- left_join(
    odbus6_9 , busstop_mpsz,
    by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")
  ) %>%
  rename(
    ORIGIN_BS = ORIGIN_PT_CODE,
    ORIGIN_SZ = SUBZONE_C,
    DESTIN_BS = DESTINATION_PT_CODE)
duplicates <- od_data %>%
  group_by_all() %>%
  filter(n() > 1) %>%
  ungroup()

duplicates
# A tibble: 1,154 × 4
   ORIGIN_BS DESTIN_BS TRIPS ORIGIN_SZ
   <chr>     <fct>     <dbl> <chr>    
 1 11009     01411         9 QTSZ01   
 2 11009     01411         9 QTSZ01   
 3 11009     01421        19 QTSZ01   
 4 11009     01421        19 QTSZ01   
 5 11009     01511        10 QTSZ01   
 6 11009     01511        10 QTSZ01   
 7 11009     01521         5 QTSZ01   
 8 11009     01521         5 QTSZ01   
 9 11009     01611         3 QTSZ01   
10 11009     01611         3 QTSZ01   
# ℹ 1,144 more rows

There are 1,154 duplicated rows. These are removed by retaining only unique values with the following code:

od_data <- unique(od_data)

Check again to ensure that duplicates have been truly removed:

duplicates_2 <- od_data %>%
  group_by_all() %>%
  filter(n() > 1) %>%
  ungroup()

duplicates_2
# A tibble: 0 × 4
# ℹ 4 variables: ORIGIN_BS <chr>, DESTIN_BS <fct>, TRIPS <dbl>, ORIGIN_SZ <chr>
od_data <- left_join(od_data , busstop_mpsz,
            by = c("DESTIN_BS" = "BUS_STOP_N")) 

od_data <- od_data %>%
  rename(
    DESTIN_SZ = SUBZONE_C
  ) %>%
  drop_na() %>%
  group_by(
    ORIGIN_SZ, DESTIN_SZ
  ) %>%
  summarise(
    MORNING_PEAK = sum(TRIPS)
  ) %>%
  ungroup()
write_rds(od_data, "data/rds/od_data.rds")
od_data <- read_rds("data/rds/od_data.rds")

6 Visualising Spatial Interaction

6.1 Removing intra-zonal flows

As we are interested in the spatial flows between different subzones, we remove those orign/destination flows within the subzones:

od_data1 <- od_data[od_data$ORIGIN_SZ!=od_data$DESTIN_SZ,]

6.2 Creating desire lines

Desire lines are straight lines that represent ‘origin-destination’ data that records how many people travel (or could travel) between places (points or zones)

In this code chunk, od2line() of stplanr package is used to create the desire lines:

flowline <- od2line(flow = od_data1, 
                    zones = mpsz,
                    zone_code = "SUBZONE_C")

6.3 Visualising desire lines

tmap_options(check.and.fix = TRUE)

tmap_mode("plot")

tm_shape(mpsz) +
  tm_polygons() +
flowline %>%  
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3
  ) +
  tm_layout(
    frame = FALSE
  )

When the flow data is visually messy and highly skewed like the one shown above, it is wiser to focus on selected flows.

Thus, we will focus on flows greater >= 5000:

tm_shape(mpsz) +
  tm_polygons() +
flowline %>%  
  filter(MORNING_PEAK >= 5000) %>%
tm_shape() +
  tm_lines(lwd = "MORNING_PEAK",
           style = "quantile",
           scale = c(0.1, 1, 3, 5, 7, 10),
           n = 6,
           alpha = 0.3
  ) +
  tm_compass(
    type="8star", size = 2
  ) +
  tm_layout(
    frame = FALSE
  ) +
  tmap_style("classic")