Take-home Ex 2

Published

December 7, 2023

Modified

December 18, 2023

Singapore Train commuter flows

1 Project Brief

As city-wide urban infrastructure such as public buses, mass rapid transit, public utilities and roads become digital, the data sets obtained can be used for tracking movement patterns through space and time. This is particularly true with the recent trends in massive deployment of pervasive computing technologies such as GPS on vehicles and SMART cards used by public transport commuters.

Unfortunately, this explosive growth of geospatially-referenced data has far outpaced the planner’s ability to utilize and transform the data into insightful information. There has not been significant practice research carried out to show how these disparate data sources can be integrated, analysed, and modelled to support policy making decisions, and a general lack of practical research to show how geospatial data science and analysis (GDSA) can be used to support decision-making.

This study aims to demonstrate the potential value of GDSA to integrate publicly available data from multiple sources for building spatial interaction models to determine factors affecting urban mobility patterns in public bus transit.

2 Loading Required packages

The following packages are used in this exercise:

tmap for cartography
mapview for interactive map backgrouds & leaflet.providers for basemap customisation
sf and sp for geospatial data handling
stplanr for creating ‘desire lines’
tidyverse and reshape2 for aspatial data transformation
sfdep and spdep for computing spatial autocorrelation
Hmisc for summary statistics
kableExtra and DT for formatting of dataframes
ggplot2, patchwork and ggrain for visualising attributes
performance ang ggpubr for Spatial Interaction modeling
urbnthemes for consistent plot themes
scales for ggplot axis break formatting

code block

options(repos = c(CRAN = "https://cran.rstudio.com/"))

pacman::p_load(tmap, sf, sp, tidyverse, sfdep, stplanr,
               mapview, leaflet.providers,
               Hmisc, kableExtra, DT, reshape2,
               ggplot2, patchwork, ggrain, urbnthemes, knitr,
               performance, ggpubr, scales)

3 Importing the Data

3.1 Aspatial Data

This study uses 2 aspatial datasets pertaining to Public Train Trips:

train, a dataset from LTA Datamall, Passenger Volume by Origin Destination Train Stations for October 2023.
train_codes, also from LTA Datamall, lists train station codes and names. This is used to assign MRT train station codes to the geospatial data set with train station locations.

train
train_codes

The downloaded dataset is in .csv format. We use the function read_csv() to import the data into the R environment.

train_oct23 <- read_csv("data/aspatial/origin_destination_train_202310.csv")

# remove any duplicated rows
train_oct23 <- distinct(train_oct23)

str(train_oct23)

tibble [800,595 × 7] (S3: tbl_df/tbl/data.frame)
 $ YEAR_MONTH         : chr [1:800595] "2023-10" "2023-10" "2023-10" "2023-10" ...
 $ DAY_TYPE           : chr [1:800595] "WEEKENDS/HOLIDAY" "WEEKDAY" "WEEKENDS/HOLIDAY" "WEEKDAY" ...
 $ TIME_PER_HOUR      : num [1:800595] 9 6 12 12 12 12 19 19 9 9 ...
 $ PT_TYPE            : chr [1:800595] "TRAIN" "TRAIN" "TRAIN" "TRAIN" ...
 $ ORIGIN_PT_CODE     : chr [1:800595] "EW32" "BP4" "NE15" "SW5" ...
 $ DESTINATION_PT_CODE: chr [1:800595] "DT24" "EW31" "SW5" "NE15" ...
 $ TOTAL_TRIPS        : num [1:800595] 1 22 51 87 48 73 1 3 1 2 ...

train_oct23 is a tibble dataframe consisting of the following variables:

YEAR_MONTH: Month of data collection in YYYY-MM format
DAY_TYPE: Category of Day
TIME_PER_HOUR: Extracted hour of day
PT_TYPE: Public transport type
ORIGIN_PT_CODE: ID of Trip Origin Train Station
DESTINATION_PT_CODE: ID of Trip Destination Train Station
TOTAL_TRIPS: Sum of trips made per origin-Destination

Hmisc::describe(train_oct23)

train_oct23 

 7  Variables      800595  Observations
--------------------------------------------------------------------------------
YEAR_MONTH 
       n  missing distinct    value 
  800595        0        1  2023-10 
                  
Value      2023-10
Frequency   800595
Proportion       1
--------------------------------------------------------------------------------
DAY_TYPE 
       n  missing distinct 
  800595        0        2 
                                            
Value               WEEKDAY WEEKENDS/HOLIDAY
Frequency            424314           376281
Proportion             0.53             0.47
--------------------------------------------------------------------------------
TIME_PER_HOUR 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
  800595        0       20    0.997    14.01    6.042        6        7 
     .25      .50      .75      .90      .95 
      10       14       18       21       22 
                                                                            
Value          0     5     6     7     8     9    10    11    12    13    14
Frequency   4083 23741 37878 42772 43653 43289 43466 44412 45463 45298 44883
Proportion 0.005 0.030 0.047 0.053 0.055 0.054 0.054 0.055 0.057 0.057 0.056
                                                                
Value         15    16    17    18    19    20    21    22    23
Frequency  45358 46141 47505 46815 44605 42622 41491 38360 28760
Proportion 0.057 0.058 0.059 0.058 0.056 0.053 0.052 0.048 0.036

For the frequency table, variable is rounded to the nearest 0
--------------------------------------------------------------------------------
PT_TYPE 
       n  missing distinct    value 
  800595        0        1    TRAIN 
                 
Value       TRAIN
Frequency  800595
Proportion      1
--------------------------------------------------------------------------------
ORIGIN_PT_CODE 
       n  missing distinct 
  800595        0      171 

lowest : BP10 BP11 BP12 BP13 BP2 , highest: TE4  TE5  TE6  TE7  TE8 
--------------------------------------------------------------------------------
DESTINATION_PT_CODE 
       n  missing distinct 
  800595        0      171 

lowest : BP10 BP11 BP12 BP13 BP2 , highest: TE4  TE5  TE6  TE7  TE8 
--------------------------------------------------------------------------------
TOTAL_TRIPS 
       n  missing distinct     Info     Mean      Gmd      .05      .10 
  800595        0     4672    0.998      104    171.3        1        1 
     .25      .50      .75      .90      .95 
       4       15       60      214      455 

lowest :     1     2     3     4     5, highest: 16799 17670 18763 19585 21050
--------------------------------------------------------------------------------

From the summary statistics above, we can derive that:

There are 800,959 Origin-Destination (OD) trips made in October 2023
Data is collected for 24 hours, starting from 0 Hrs to 23 Hrs in TIME_PER_HOUR
The highest number of trips for an OD route is 21,050, while the 95th Percentile is only 455. This suggests a highly right-skewed distribution, with particularly busy routes.

As we are interested in studying the passenger flows for peak hour periods only, the number of trips are calculated for each period as defined below:

Peak Period	Tap in Time (hr)
Weekday Morning	6 - 9
Weekday Evening	17 - 20
Weekend/PH Morning	11 - 14
Weekend/PH Evening	16 - 19

The dataframe now shows the traffic volume by peak period for each Origin-Destination route:

code block

train_od <- train_oct23 %>%
   # Categorize trips under period based on day and timeframe
  mutate(period = ifelse(DAY_TYPE == "WEEKDAY" & 
                         TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9, 
                         "Weekday morning peak",
                    ifelse(DAY_TYPE == "WEEKDAY" & 
                           TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20,
                           "Weekday evening peak",
                      ifelse(DAY_TYPE == "WEEKENDS/HOLIDAY" &
                             TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14,
                              "Weekend/PH morning peak",
                        ifelse(DAY_TYPE == "WEEKENDS/HOLIDAY" & 
                              TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19,
                               "Weekend/PH evening peak",
                    "Others"))))
  ) %>%
  # Only retain needed periods for analysis
  filter(
    period != "Others"
  ) %>%
 # compute number of trips per origin busstop per month for each period
  group_by(
    ORIGIN_PT_CODE,
    DESTINATION_PT_CODE,
    period
  ) %>%
  summarise(
    num_trips = sum(TOTAL_TRIPS)
  ) %>%
  # change all column names to lowercase
  rename_with(
    tolower, everything()
  ) %>%
  ungroup()

There are several instances where ORIGIN_PT_CODE and DESTINATION_PT_CODE are not composite codes, representing MRT Stations that are interchanges with multiple station lines. For the purpose of this investigation, only a single station code is required.

code block

train_od <- train_od %>%
  separate_wider_delim(origin_pt_code,
                       delim = "/", 
                       names = c("origin_station_code", "unused_origin"),
                    # to capture first station for those without multiples
                       too_few = "align_start",
                    # to remove any other unused station columns for 3 or more
                       too_many = "drop"
  ) %>%
  separate_wider_delim(destination_pt_code,
                       delim = "/", 
                       names = c("dest_station_code", "unused_dest"),
                       too_few = "align_start",
                       too_many = "drop"
  ) %>%
  select(
    origin_station_code,
    dest_station_code,
    period,
    num_trips
  )

DT::datatable(head(train_od,20))

This dataset lists train station codes and names. It is used to join with the geospatial dataset for station identification.

train_codes <- read_csv("data/aspatial/train_codes.csv")

str(train_codes)

spc_tbl_ [203 × 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ stn_code           : chr [1:203] "NS1" "NS2" "NS3" "NS4" ...
 $ mrt_station_english: chr [1:203] "Jurong East" "Bukit Batok" "Bukit Gombak" "Choa Chu Kang" ...
 $ mrt_line_english   : chr [1:203] "North-South Line" "North-South Line" "North-South Line" "North-South Line" ...
 $ type               : chr [1:203] "MRT" "MRT" "MRT" "MRT" ...
 - attr(*, "spec")=
  .. cols(
  ..   stn_code = col_character(),
  ..   mrt_station_english = col_character(),
  ..   mrt_line_english = col_character(),
  ..   type = col_character()
  .. )
 - attr(*, "problems")=<externalptr>

4 What is the Distribution of Passenger Traffic across peak periods?

To determine which time period to analyze further, we visualize the distribution of trips by peak period:

code block

set_urbn_defaults(style = "print")

trip_density <- train_od %>%
  ggplot(
    aes(x = period,
        y = num_trips,
        fill = period,
        color = period)
  ) +
  geom_violin(
    # shift violin plot upwards
    position = position_nudge(x = .3, y = 0), alpha = .8
  ) +
  geom_point(
    aes(y = num_trips,
        color = period),
    position = position_jitter(width = .15),
    size = .5,
    alpha = 0.8
  ) +
  labs(
    title = "Oct 2023: Widest range of MRT trips during Weekday Evening Peak"
  ) + 
  theme(
    axis.title.y = element_blank(),
    axis.title.x = element_blank(),
    axis.ticks.y = element_blank(),
    axis.ticks.x = element_blank(),
    legend.position = "none"
  ) +
  coord_flip()

trip_density

The density plot reveals a highly right-skewed distribution for all trips, especially during weekday evening peak periods. This could point towards congested MRT train stations, and would be useful to analyse further to determine the possible factors leading to this high passenger volume.

The OD data for weekday evening peak periods will thus be the focus of the study, and data is extracted using the following code chunk:

weekday_pm_od <- train_od %>%
  filter(period == "Weekday evening peak")

write_rds(weekday_pm_od, "data/rds/weekday_pm_od.rds")

4.1 Geospatial Data

The following geospatial dataframes are used for this exercise:

train_station, the location of train stations (RapidTransitSystemStation) from LTA Datamall
Business, entertn, fnb, finserv, recreation and retail, geospatial data sets of the locations of business establishments, entertainments, food and beverage outlets, financial centres, leisure and recreation centres, retail and services stores/outlets compiled for urban mobility studies
mpsz, masterplan boundary 2019

4.2 Preparing Geospatial Files: train station

As the files are all based on Singapore Maps, they are in SVY21 coordinate reference system (CRS) and projected in ESPG code 3414 using st_transform()

train_station is a Simple feature polygon layer based on SVY21 coordinate reference system (CRS).

train_station <- st_read(dsn = "data/geospatial",
                         layer = "RapidTransitSystemStation") %>%
          st_transform(crs = 3414)

Reading layer `RapidTransitSystemStation' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 220 features and 4 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 6068.209 ymin: 27478.44 xmax: 45377.5 ymax: 47913.58
Projected CRS: SVY21

Dataset limitations

train_station only lists the Type of station (MRT or LRT), Station name and geolocation of each station. There is no station code to join with the aspatial dataset. Thus, the use of train_codes is needed to assign a unique train code identifier to each station
There are only 203 Station codes in train_codes, compared to 220 entries in train_station. There may be duplicated station names, which needs further investigation.
GDAL Message 1: Non closed ring detected – This warning message was received when loading in the geospatial layer. This means that there are polygons that are not closed (starting and ending points are not joined). There is a need for further geospatial data wrangling to rectify this.

code block

mpsz <- st_read(dsn = "data/geospatial",
                layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)

Reading layer `MPSZ-2019' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84

Checking for duplicated station entires in train_station:

train_duplicates <- train_station %>%
  # remove geometry layer
  st_set_geometry(NULL) %>%
  group_by(TYP_CD_DES,
           STN_NAM_DE
  ) %>%
  # count number of rows per train station
  summarise(
    count = n()
  ) %>%
  ungroup() %>%
  # retrieve stations with more than single row
  filter(count > 1)

DT::datatable(train_duplicates)

The code above reveals that there are several entries for the stations above in train_station. This may be due to the fact that some stations are interchanges and have multiple train lines. However, there is no way to differentiate this without station code, and this study will assume that the location of these stations are within the same boundary anyway. Only one geospatial entry per station is selected using slice():

# Remove duplicates on selected columns
train_station <- train_station %>%
  group_by(STN_NAM_DE) %>%
  slice(1) %>%
  st_as_sf()

As train_station does not have station codes that may be joined to the OD dataset for analysis, it is joined to train_codes by train station name. However, the station names in train_codes are in lowercase and without the suffix “MRT STATION” in the station names – there is thus an extra layer of data wrangling to be done before joining.

train_codes_new <- train_codes %>%
  # Assign suffix to train station names and change to upper case
  mutate(station_name = ifelse(type == "MRT", 
                               paste0(toupper(mrt_station_english), " MRT STATION"),
                               paste0(toupper(mrt_station_english), " LRT STATION"))
  )

The code chunk below assigns train station code to the geospatial layer to identify each station by code.

train_station_comb <- train_station %>%
  left_join(train_codes_new,
            by = join_by(STN_NAM_DE == station_name,
                         TYP_CD_DES == type)
  ) %>%
  select(
    stn_code,
    STN_NAM_DE,
    TYP_CD_DES
  )

There are several stations that have no stn_code:

code block

train_station_comb %>%
  filter(is.na(stn_code)) %>%
  datatable()

As these are mainly train depots that are not identified by code and irrelvant to analysis of OD flows, these are removed from the dataset and saved as a new file, train_station_list:

train_station_list <- train_station_comb %>%
  filter(!is.na(stn_code))

We use st_is_valid() to retrieve the invalid polygon datapoints in train_station_list:

# Retrieve invalid geometries
invalid_indices <- which(!st_is_valid(train_station_list))
print(invalid_indices)

[1] 195

The code chunk above reveals that index 195 is invalid. This is identified as UPPER THOMSON MRT STATION:

print(train_station_list[195,])

Simple feature collection with 1 feature and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 27808.12 ymin: 37278.27 xmax: 28080.89 ymax: 37543.25
Projected CRS: SVY21 / Singapore TM
# A tibble: 1 × 4
# Groups:   STN_NAM_DE [1]
  stn_code STN_NAM_DE                TYP_CD_DES                         geometry
  <chr>    <chr>                     <chr>                         <POLYGON [m]>
1 TE8      UPPER THOMSON MRT STATION MRT        ((27808.12 37518.2, 27811.57 37…

To rectify this invalid geometry, use st_make_valid():

train_station_valid <- st_make_valid(train_station_list)

Check validity of the geometry layer again:

invalid_indices2 <- which(!st_is_valid(train_station_valid))
print(invalid_indices2)

integer(0)

The code chunk shows that there are no more invalid geometries in the file.

# Save as file 
write_rds(train_station_valid, "data/rds/train_station_valid.rds")

code block

train_station_valid <- read_rds("data/rds/train_station_valid.rds")

tmap_mode("view")

tm_basemap("CartoDB.Positron") +
  tm_shape(train_station_valid) +
  tm_dots()

4.3 Creating hexagon grid

From the mapview above, the current simple features dataframe only shows the train stations as points on the map, which not suitable for understanding commuter flows between areas. In urban transportation planning, Traffic Analysis Zones (TAZ) are the basic units of spatial areas delineated to tabulate traffic-related models.

The following code chunks create a hexagonal grid frame spanning 750m between edges for each hexagon:

# create hexagon frame
train_hex <- st_make_grid(
    train_station_valid, 
    # for hexagonal cells, cellsize = the distance between opposite edges
    cellsize = 750, 
    square = FALSE
  ) %>%
  st_sf() %>%
  rowid_to_column("hex_id")

This creates a hexagonal grid over the entire area:

code block

tmap_mode("plot")
qtm(train_hex)

# join to train station names
train_stops <- st_join(
    train_station_valid,
    train_hex,
    join = st_intersects
  ) %>%
  st_set_geometry(NULL) %>%
  group_by(stn_code) %>%
  summarise(
  # Ensure that each station gets assigned to a single hex_id
    hex_id = first(hex_id)
  ) %>%
  ungroup() %>%
  group_by(hex_id) %>%
  summarise(
    station_count = n(),
    station_codes = str_c(stn_code, collapse = ","),
  ) %>%
  ungroup()

train_hex_final <- train_hex %>%
  left_join(train_stops,
          by = "hex_id"
  ) %>%
  replace(is.na(.), 0)

code block

# Get hexagons with at least one station 
train_hex_filtered <- train_hex_final %>%
  filter(station_count > 0)

tmap_mode("view")

tm_basemap("CartoDB.Positron") +
  tm_shape(train_hex_filtered) +
  tm_fill(
    col = "station_count",
    palette = "-RdYlGn",
    style = "cont",
    id = "hex_id",
    popup.vars = c("No. of Train Stations: " = "station_count",
                   "Train station codes: " = "station_codes"),
    title = " "
  ) +
  tm_layout(
    # Set legend.show to FALSE to hide the legend
    legend.show = FALSE
  )

5 Which trips are the most congested on Weekday Evening Peaks?

We seek to understand the inter-zone trip volume per TAZ, to understand which routes are the most popular during weekday evening peak periods.

5.1 Preparing Origin-Destination flow data

The following steps are done to prepare O-D flow dataframes at hexagon level:

station_by_hex <- train_hex_final %>%
  select(
    hex_id,
    station_codes
  ) %>%
  st_drop_geometry() %>%
  # create separate rows for each hex_id - station_code pair
  separate_rows(station_codes, 
                sep = ","
  ) %>%
  # drop any hexagon without stations
  filter(station_codes != 0)

od_data <- left_join(
    weekday_pm_od, 
    station_by_hex,
    by = c("origin_station_code" = "station_codes")
  ) %>%
  rename(
    origin_stn = origin_station_code,
    origin_hex = hex_id,
    dest_stn = dest_station_code
  ) %>%
  distinct()

od_data <- left_join(
    od_data,
    station_by_hex,
    by = c("dest_stn" = "station_codes")
  ) %>%
  rename(
    dest_hex = hex_id
  ) %>%
  group_by(
    origin_hex,
    dest_hex
  ) %>%
  summarise(
    weekday_pm_trips = sum(num_trips)
  ) %>%
  ungroup()

write_rds(od_data, "data/rds/od_data.rds")

5.2 Visualising Spatial Interation

The following steps are taken to visualise the traffic flows between TAZs.

As we are only interested in inter-zonal flows, we remove the intra-zonal trips from od_data dataframe:

od_data <- read_rds("data/rds/od_data.rds")

od_data_inter <- od_data[od_data$origin_hex!=od_data$dest_hex,]

Desire lines are straight lines that connect origin to destination. od2line() function is used to create these:

od_flow <- od2line(flow = od_data_inter, 
                    zones = train_hex_filtered,
                    zone_code = "hex_id")

quantile() reveals that there is a large range between 75th and 100th quantiles. This will affect the visualisation of OD flows.

quantile(od_flow$weekday_pm_trips)

      0%      25%      50%      75%     100% 
    1.00    36.00   161.00   648.75 63626.00

To visualise this further, we bin the number of trips into custom limits using cut():

od_flow <- od_flow %>%
  mutate(
    trips_quantile = cut(weekday_pm_trips, 
                         breaks = c(0, 50, 100, 250, 500, 1000, 5000, 10000, 20000, Inf),
                         labels = c("< 50", "< 100", "100 ~ 250", "250 ~ 500",
                                    "500 ~ 1000", "1000 ~ 5000", "5000 ~ 10000",
                                    "10000 ~ 20000", "> 20000"),
                         ordered_result = TRUE
  ))

code block

wd_pm_trips <- tm_shape(mpsz) +
  tm_fill(
    col = "#dfdfeb"
    ) +
tm_shape(train_hex_filtered) +
  tm_fill(
    col = "station_count",
    palette = "-RdYlGn",
    alpha = .7,
    style = "cont"
  ) +
  tm_borders(
    col = "#dfdfeb",
    lwd = .5
  ) +
tm_shape(od_flow) +
  tm_lines(
    lwd = "weekday_pm_trips",
    scale = 1.5,
    col = "#1F363D",
    alpha = .7
  ) +
  tm_layout(
    title = "Weekday Evening Peak Traffic Flow",
    scale = .9,
    frame = FALSE
  ) +
  tm_facets(
    along = "trips_quantile",
    free.coords = FALSE
  )

# Save animation as gif
tmap_animation(wd_pm_trips,
               "wd_pm_trips.gif",
               loop = TRUE,
               delay = 80,
               outer.margins = NA,
               restart.delay = 100)

The animated O-D map shows that the flows with trip count < 10,000 is too dense for effective visualisation. As the number of trips increases, we also observe a slight concentration of flow lines within the Central/CBD district. To investigate this further, we focus on looking at the most popular O-D passenger flows for Weekday PM trips > 20,000:

code block

tmap_options(check.and.fix = TRUE)
tmap_mode("plot")

tm_shape(mpsz) +
  tm_fill(
    col = "#dfdfeb"
    ) +
tm_shape(train_hex_filtered) +
  tm_fill(
    col = "station_count",
    palette = "-RdYlGn",
    alpha = .8,
    style = "cont",
    # set legend title
    title = "Station Count",
    popup.vars = c("No. of Train Stations: " = "station_count",
                   "TAZ: " = "hex_id",
                   "Train station codes: " = "station_codes")
  ) +
  tm_borders(
    col = "#dfdfeb",
    lwd = .5
  ) +
od_flow %>% 
  filter(weekday_pm_trips > 20000) %>%
tm_shape() +
  tm_lines(
    lwd = "weekday_pm_trips",
    scale = 1.5,
    style = "quantile",
    n = 6,
    col = "#451F55",
    alpha = .7
  ) +
  tm_layout(
    title = "Weekday Evening Peak Traffic Flow",
    scale = .9,
    legend.stack = "horizontal",
    legend.position = c("right", "bottom"),
    frame = FALSE
  ) +
  tmap_style("white")

Insights from OD flow map

A higher train station density in TAZs does not correlate to higher traffic. This is possibly due to the fact that a single station may encompass multiple lines (for instance, TAZ #1074 has the highest station density with 5 station codes, but this only corresponds to 3 stations: Marina Bay, Downtown & Shenton Way)
There are several TAZs with more flowlines, that are scattered across the country. This is suggestive of either being a key origin or destination zone for Weekday evenings. Namely:
- TAZ #1058 (Telok ayer and Raffles Place MRT Stations)
- TAZ #534 (Jurong East MRT Station)
- TAZ #988 (Yishun MRT Station)
There are also TAZs with fewer but thicker flowlines, indicating highly popular origin or destination train stations, that are likely to be congested during Weekday evening peak periods. These are:
- TAZ #1544 (Pasir Ris MRT Station)
- TAZ #1028 (Novena MRT Station)
- TAZ #774 (Woodlands MRT Station)
- TAZ #573 (Yew Tee MRT Station)

Top 15 O-D flows by number of trips during Weekday Evening Peak Period:

code block

od_data %>%
  left_join(train_hex_filtered,
            by = c("origin_hex" = "hex_id")
  ) %>%
  rename(
    origin_stations = station_codes
  ) %>%
  left_join(train_hex_filtered,
            by = c("dest_hex" = "hex_id")
  ) %>%
  rename(
    dest_stations = station_codes
  ) %>%
  st_drop_geometry() %>%
  select(origin_hex,
         origin_stations,
         dest_hex,
         dest_stations,
         weekday_pm_trips
  ) %>%
  arrange(desc(weekday_pm_trips)) %>%
  slice_head(n = 15) %>%
  datatable()

The table above reveals that Jurong East MRT Station is the busiest origin station, with top OD trips originating from there. Only 2 TAZs occurred multiple times as top destination stations: Boon Lay and Newton MRT Stations. The fact that these stations are not in the city centre is slightly surprising, as one would assume that the busiest origin station would be in the CBD area where people would be commuting from after work.

Further analysis is conducted by using Spatial Interaction Models (SIMs) to determine factors explaining flow density between these TAZs.

6 What makes these trip origins and destinations so popular?

To understand the factors affecting MRT passenger flows during weekday evening peak periods, we calibrate Spatial Interactive Models using a variety of factors as independent variables \(X\) and dependent variable total number of trips \(Y\). These factors can reveal propulsive or attractive qualities origin and destination zones respectively.

%%{
  init: {
    'theme': 'base',
    'themeVariables': {
      'primaryColor': '#ACAEDA',
      'primaryTextColor': '#371F49',
      'primaryBorderColor': '#3d7670',
      'lineColor': '#371F49',
      'secondaryColor': '#3d7670',
      'tertiaryColor': '#371F49'
    }
  }
}%%

flowchart LR
    L{fa:fa-moon \nWeekday \nEvening} -.- A{fa:fa-train-subway \nTrain Trips} --> B(Origin)
    A --> C(Destination) -.- E[fa:fa-magnet \nAttractive \nfactors]
    B -.- D[fa:fa-angles-right \nPropulsive \nfactors] --- O[fa:fa-ruler-horizontal Distance]
    D --- F[fa:fa-house-user Residential density]
    D --- G[fa:fa-briefcase Business]
    D --- H[fa:fa-school Schools]
    E --- I[fa:fa-dumbbell Recreation]
    E --- J[fa:fa-burger Food & Beverage]
    E --- K[fa:fa-cart-shopping Retail facilities]
    E --- M[fa:fa-film Entertainment]
    E --- N[fa:fa-ruler-horizontal Distance]

6.1 Attractive & Propulsive Factors

The factors listed above are computed at TAZ level for Spatial Interaction Modelling.

6.1.1 Distance

Computing the distance between Traffic analysis zones at a hexagonal level requires the computation of a distance matrix. This is a table that shows the Euclidean distance between each pair of locations, and is computed using the following steps:

train_hex_filtered

Simple feature collection with 152 features and 3 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 5693.209 ymin: 27079.48 xmax: 45443.21 ymax: 48730.11
Projected CRS: SVY21 / Singapore TM
First 10 features:
   hex_id station_count station_codes                       geometry
1      23             1          EW33 POLYGON ((6068.209 34873.71...
2      39             1          EW32 POLYGON ((6443.209 34224.19...
3      88             1          EW31 POLYGON ((7568.209 33574.67...
4     137             1          EW30 POLYGON ((8693.209 32925.15...
5     237             1          EW29 POLYGON ((10943.21 34224.19...
6     320             1          EW28 POLYGON ((12818.21 34873.71...
7     353             1          EW27 POLYGON ((13568.21 34873.71...
8     436             1          EW26 POLYGON ((15443.21 35523.23...
9     502             1          EW25 POLYGON ((16943.21 35523.23...
10    534             2      EW24,NS1 POLYGON ((17693.21 34224.19...

The hexagon TAZ layer, train_hex_filtered, is a Simple Features dataframe. We first convert it to a SpatialPolygonsDataFrame using as_spatial() function

# convert to SpatialPolygonsDataFrame
train_hex_sf <- as_Spatial(train_hex_filtered)

# compute distance between hexagons
dist <- spDists(train_hex_sf, 
                longlat = FALSE)

head(dist, n=c(10, 10))

           [,1]      [,2]      [,3]     [,4]     [,5]     [,6]     [,7]
 [1,]     0.000   750.000  1984.313 3269.174 4918.079 6750.000 7500.000
 [2,]   750.000     0.000  1299.038 2598.076 4500.000 6408.003 7154.544
 [3,]  1984.313  1299.038     0.000 1299.038 3436.932 5408.327 6139.015
 [4,]  3269.174  2598.076  1299.038    0.000 2598.076 4562.072 5250.000
 [5,]  4918.079  4500.000  3436.932 2598.076    0.000 1984.313 2704.163
 [6,]  6750.000  6408.003  5408.327 4562.072 1984.313    0.000  750.000
 [7,]  7500.000  7154.544  6139.015 5250.000 2704.163  750.000    0.000
 [8,]  9397.473  9093.267  8112.490 7232.738 4683.748 2704.163 1984.313
 [9,] 10894.379 10580.052  9575.359 8649.422 6139.015 4175.823 3436.932
[10,] 11643.131 11250.000 10145.812 9093.267 6750.000 4918.079 4175.823
          [,8]      [,9]     [,10]
 [1,] 9397.473 10894.379 11643.131
 [2,] 9093.267 10580.052 11250.000
 [3,] 8112.490  9575.359 10145.812
 [4,] 7232.738  8649.422  9093.267
 [5,] 4683.748  6139.015  6750.000
 [6,] 2704.163  4175.823  4918.079
 [7,] 1984.313  3436.932  4175.823
 [8,]    0.000  1500.000  2598.076
 [9,] 1500.000     0.000  1500.000
[10,] 2598.076  1500.000     0.000

The resultant matrix has numbers as row and column headers, representing the hexagon pairs. To make this information more usable for analysis, we want to replace this with TAZ hexagon ids instead.

# Create a list of hex ids
hex_names <- train_hex_filtered$hex_id

# Attach hex ids to row and column variables in dist
colnames(dist) <- paste0(hex_names)
rownames(dist) <- paste0(hex_names)

head(dist, n=c(10, 10))

           23        39        88      137      237      320      353      436
23      0.000   750.000  1984.313 3269.174 4918.079 6750.000 7500.000 9397.473
39    750.000     0.000  1299.038 2598.076 4500.000 6408.003 7154.544 9093.267
88   1984.313  1299.038     0.000 1299.038 3436.932 5408.327 6139.015 8112.490
137  3269.174  2598.076  1299.038    0.000 2598.076 4562.072 5250.000 7232.738
237  4918.079  4500.000  3436.932 2598.076    0.000 1984.313 2704.163 4683.748
320  6750.000  6408.003  5408.327 4562.072 1984.313    0.000  750.000 2704.163
353  7500.000  7154.544  6139.015 5250.000 2704.163  750.000    0.000 1984.313
436  9397.473  9093.267  8112.490 7232.738 4683.748 2704.163 1984.313    0.000
502 10894.379 10580.052  9575.359 8649.422 6139.015 4175.823 3436.932 1500.000
534 11643.131 11250.000 10145.812 9093.267 6750.000 4918.079 4175.823 2598.076
          502       534
23  10894.379 11643.131
39  10580.052 11250.000
88   9575.359 10145.812
137  8649.422  9093.267
237  6139.015  6750.000
320  4175.823  4918.079
353  3436.932  4175.823
436  1500.000  2598.076
502     0.000  1500.000
534  1500.000     0.000

dist now has row and column headers as hex_id, identifying which TAZs the distance belongs to. However, the matrix has repeated information, and needs to be in an O-D format for geospatial modelling. The melt() function from package reshape2 is used for this.

distPair <- melt(dist) %>%
  rename(distance = value,
         origin_hex = Var1,
         dest_hex = Var2)

DT::datatable(head(distPair, 10))

The inter-zonal distance is computed between hexagon centroids. The same hexagon will thus have an inter-zonal difference of 0 to itself, but this is misrepresenative of intra-zonal difference. To rectify this, we append a constant value that is less than the minimum inter-zonal difference to all ‘0’ values.

# Find mininum inter-zonal difference:
distPair %>%
  filter(distance > 0) %>%
  summary()

   origin_hex      dest_hex       distance    
 Min.   :  23   Min.   :  23   Min.   :  750  
 1st Qu.: 796   1st Qu.: 796   1st Qu.: 6666  
 Median :1048   Median :1048   Median :10633  
 Mean   :1016   Mean   :1016   Mean   :11210  
 3rd Qu.:1258   3rd Qu.:1258   3rd Qu.:15056  
 Max.   :1741   Max.   :1741   Max.   :39086

The minimum distance is 750m – this represents the distance between the centre of a hexgon to the centre of an adjacent hexagon. We thus set the intra-zonal distance to 200 using an ifelse9) statement:

# If distance = 0, set to 200, else remain as is
distPair$distance <- ifelse(distPair$distance == 0,
                            200, distPair$distance)

summary(distPair)

   origin_hex      dest_hex       distance    
 Min.   :  23   Min.   :  23   Min.   :  200  
 1st Qu.: 796   1st Qu.: 796   1st Qu.: 6538  
 Median :1048   Median :1048   Median :10633  
 Mean   :1016   Mean   :1016   Mean   :11138  
 3rd Qu.:1258   3rd Qu.:1258   3rd Qu.:15000  
 Max.   :1741   Max.   :1741   Max.   :39086

The minimum distance is now set at 200m .

origin_hex and dest_hex represent unique areas, and are set to factor data types instead of integers:

distPair <- distPair %>%
  mutate(
    origin_hex = as.factor(origin_hex),
    dest_hex = as.factor(dest_hex)
  )

6.1.2 How is the number of trips correlated to distance?

The following steps are taken to prepare the dataframe with both distance and flow data:

Identifying intra-zonal flows
Joining with inter-zonal distances

Set trip number to 0 for intra-zonal flows

od_flow$flowNoIntra <- ifelse(od_flow$origin_hex == od_flow$dest_hex,
                              0, od_flow$weekday_pm_trips)

Set offset for intra-zonal flows to small value (0.00001) and inter-zonal flows to 1

od_flow$offset <- ifelse(od_flow$origin_hex == od_flow$dest_hex,
                         0.00001, 1)

distPair <- read_rds("data/rds/distPair.rds")

od_data_dist <- od_flow %>%
  # Change hex_ids to factor fields
  mutate(
    origin_hex = as.factor(origin_hex),
    dest_hex = as.factor(dest_hex)
  ) %>%
  # Retrieve distance value
  left_join(
    distPair,
    by = c("origin_hex" = "origin_hex",
           "dest_hex" = "dest_hex")
  )

DT::datatable(head(od_data_dist,10))

code block

p1 <- od_data_dist %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = distance,
            y = weekday_pm_trips)
      ) +
      geom_point(
        size = 1,
        alpha = .6,
        color = "#4d5887"
      ) +
      geom_smooth(method = lm) +
      ggtitle("Trips ~ Distance")

logp1 <- od_data_dist %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(distance),
            y = log(weekday_pm_trips))
      ) +
      geom_point(
        size = 1,
        alpha = .6,
        color = "#4d5887"
      ) +
      geom_smooth(method = lm) +
      ggtitle("Log(Trips ~ Distance)")

p1 + logp1

From the plots above, there is no obvious linear trend when using absolute figures. Using log transformed values, on the other hand, revealed an inverse relationship between higher number of trips and further distance.

6.1.3 How is the number of trips correlated to other factors?

To determine factors that influence the movement of people between TAZ, we look at how flow data is related to propulsive and attractive attributes of the origin ans destination areas.

Propulsive attributes refer to factors that encourage or drive movement from one location to another and are associated with the origin of the journey, representing conditions that “push” or “propel” entities away from their current location
Attractive attributes are factors that pull or attract entities toward a specific location. These are associated with the destination and represent conditions that make a location appealing

As the flow data is based on weekday evening peak hours, propulsion is likely to be related to work, schools, or home (perhaps many people work from home). As such, number of businesses, number of schools and residential density per TAZ is computed as propulsive Attributes. On the other hand, leisure, food or shopping could be appealing at the end of a workday, and these are taken as attractive attributes.

These attributes are added to the flow data from the following sources:

business is a Simple feature point layer based on SVY21 coordinate reference system (CRS).

business <- st_read(dsn = "data/geospatial",
                    layer = "Business") %>%
          st_transform(crs = 3414)

Reading layer `Business' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 6550 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3669.148 ymin: 25408.41 xmax: 47034.83 ymax: 50148.54
Projected CRS: SVY21 / Singapore TM

code block

tmap_mode("plot")

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(business) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

The resultant dataframe is a simple features dataframe with point locations of business offices in Singapore. As this is not useful as point locations, we compute the number of offices per TAZ

train_hex_attr <- train_hex_final %>%
  mutate(num_offices = lengths(st_intersects(train_hex_final, business)))

summary(train_hex_attr$num_offices)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   3.416   1.000 135.000

schools is an aspatial dataframe with longitude and latitude data. This is imported and transformed into a Simple feature point layer based on SVY21 coordinate reference system (CRS).

schools <- read_csv("data/aspatial/schools.csv")

There are 40 columns in schools, but we only require a few variables. These are renamed and selected, and the data is transformed into a Simple Features Dataframe:

schools_sf <- schools %>%
  rename(
    latitude = "results.LATITUDE",
    longitude = "results.LONGITUDE"
  ) %>%
  select(
    postal_code, 
    school_name, 
    latitude, 
    longitude
  ) %>%
  st_as_sf(
    coords = c("longitude", "latitude"),
    crs=4326
  ) %>%
  st_transform(
    crs = 3414
  )

code block

tmap_mode("plot")

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(schools_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

train_hex_attr$num_schools <- lengths(st_intersects(train_hex_attr, schools_sf))

summary(train_hex_attr$num_schools)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.1977  0.0000  4.0000

entertn is a Simple feature point layer based on SVY21 coordinate reference system (CRS).

entertn_sf <- st_read(dsn = "data/geospatial",
                    layer = "entertn") %>%
          st_transform(crs = 3414)

Reading layer `entertn' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 114 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 10809.34 ymin: 26528.63 xmax: 41600.62 ymax: 46375.77
Projected CRS: SVY21 / Singapore TM

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(entertn_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

train_hex_attr$num_entertn <- lengths(st_intersects(train_hex_attr, entertn_sf))

summary(train_hex_attr$num_entertn)

    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.00000  0.00000  0.00000  0.06346  0.00000 12.00000

fnb is a Simple feature point layer based on SVY21 coordinate reference system (CRS).

fnb_sf <- st_read(dsn = "data/geospatial",
               layer = "F&B") %>%
          st_transform(crs = 3414)

Reading layer `F&B' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1919 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25343.27 xmax: 45462.43 ymax: 48796.21
Projected CRS: SVY21 / Singapore TM

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(fnb_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

train_hex_attr$num_fnb <- lengths(st_intersects(train_hex_attr, fnb_sf))

summary(train_hex_attr$num_fnb)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  0.000   0.000   0.000   1.065   0.000 115.000

recreation is a Simple feature point layer based on SVY21 coordinate reference system (CRS).

recreation_sf <- st_read(dsn = "data/geospatial",
                      layer = "Liesure&Recreation") %>%
          st_transform(crs = 3414)

Reading layer `Liesure&Recreation' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 1217 features and 30 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 6010.495 ymin: 25134.28 xmax: 48439.77 ymax: 50078.88
Projected CRS: SVY21 / Singapore TM

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(recreation_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

train_hex_attr$num_facilities <- lengths(st_intersects(train_hex_attr, recreation_sf))

summary(train_hex_attr$num_facilities)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.6742  0.0000 31.0000

retail is a Simple feature point layer based on SVY21 coordinate reference system (CRS).

retail_sf <- st_read(dsn = "data/geospatial",
                  layer = "Retails") %>%
          st_transform(crs = 3414)

Reading layer `Retails' from data source 
  `C:\haileycsy\ISSS624-AGA\Take-home_Ex\the2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 37635 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 4737.982 ymin: 25171.88 xmax: 48265.04 ymax: 50135.28
Projected CRS: SVY21 / Singapore TM

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(retail_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

train_hex_attr$num_retail <- lengths(st_intersects(train_hex_attr, retail_sf))

summary(train_hex_attr$num_retail)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.00    0.00   21.18    8.00 1474.00

This dataset is in .csv format, and features information pertaining to HDB properties in Singapore.

hdb <- read_csv("data/aspatial/hdb.csv")

# Remove any duplicated rows
hdb <- distinct(hdb)

str(hdb)

tibble [12,442 × 37] (S3: tbl_df/tbl/data.frame)
 $ ...1                 : num [1:12442] 0 1 2 3 4 5 6 7 8 9 ...
 $ blk_no               : chr [1:12442] "1" "1" "1" "1" ...
 $ street               : chr [1:12442] "BEACH RD" "BEDOK STH AVE 1" "CANTONMENT RD" "CHAI CHEE RD" ...
 $ max_floor_lvl        : num [1:12442] 16 14 2 15 4 25 12 14 12 2 ...
 $ year_completed       : num [1:12442] 1970 1975 2010 1982 1975 ...
 $ residential          : chr [1:12442] "Y" "Y" "N" "Y" ...
 $ commercial           : chr [1:12442] "Y" "N" "Y" "N" ...
 $ market_hawker        : chr [1:12442] "N" "N" "N" "N" ...
 $ miscellaneous        : chr [1:12442] "N" "Y" "N" "N" ...
 $ multistorey_carpark  : chr [1:12442] "N" "N" "N" "N" ...
 $ precinct_pavilion    : chr [1:12442] "N" "N" "N" "N" ...
 $ bldg_contract_town   : chr [1:12442] "KWN" "BD" "CT" "BD" ...
 $ total_dwelling_units : num [1:12442] 142 206 0 102 55 96 125 247 95 0 ...
 $ 1room_sold           : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ 2room_sold           : num [1:12442] 1 0 0 0 0 0 0 0 0 0 ...
 $ 3room_sold           : num [1:12442] 138 204 0 0 54 0 118 0 62 0 ...
 $ 4room_sold           : num [1:12442] 1 0 0 10 0 0 0 0 0 0 ...
 $ 5room_sold           : num [1:12442] 2 2 0 92 1 96 7 0 33 0 ...
 $ exec_sold            : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ multigen_sold        : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ studio_apartment_sold: num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ 1room_rental         : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ 2room_rental         : num [1:12442] 0 0 0 0 0 0 0 247 0 0 ...
 $ 3room_rental         : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ other_room_rental    : num [1:12442] 0 0 0 0 0 0 0 0 0 0 ...
 $ lat                  : num [1:12442] 1.3 1.32 1.28 1.33 1.39 ...
 $ lng                  : num [1:12442] 104 104 104 104 104 ...
 $ building             : chr [1:12442] "RAFFLES HOTEL" "NIL" "PINNACLE @ DUXTON" "PING YI GARDENS" ...
 $ addr                 : chr [1:12442] "1 BEACH ROAD RAFFLES HOTEL SINGAPORE 189673" "1 BEDOK SOUTH AVENUE 1 SINGAPORE 460001" "1 CANTONMENT ROAD PINNACLE @ DUXTON SINGAPORE 080001" "1 CHAI CHEE ROAD PING YI GARDENS SINGAPORE 461001" ...
 $ postal               : chr [1:12442] "189673" "460001" "080001" "461001" ...
 $ SUBZONE_NO           : num [1:12442] 2 6 3 3 1 9 10 5 3 5 ...
 $ SUBZONE_N            : chr [1:12442] "CITY HALL" "BEDOK SOUTH" "CHINATOWN" "KEMBANGAN" ...
 $ SUBZONE_C            : chr [1:12442] "DTSZ02" "BDSZ06" "OTSZ03" "BDSZ03" ...
 $ PLN_AREA_N           : chr [1:12442] "DOWNTOWN CORE" "BEDOK" "OUTRAM" "BEDOK" ...
 $ PLN_AREA_C           : chr [1:12442] "DT" "BD" "OT" "BD" ...
 $ REGION_N             : chr [1:12442] "CENTRAL REGION" "EAST REGION" "CENTRAL REGION" "EAST REGION" ...
 $ REGION_C             : chr [1:12442] "CR" "ER" "CR" "ER" ...

hdb is a geocoded list of HDB properties in Singapore, with information such as:

blk_no & street: Address of the HDB Property
max_floor_lvl & year_completed: Characteristics of the HDB building, indicative of height and age
residential, commercial, market_hawker, miscellaneous, multistorey_carpark & precinct_pavilion , a series of boolean columns indicating if the HDB Property has the facilities
bldg_contract_town: A code indicating HDB town
total_dwelling_units: Number of units in the block
xx_sold & xx_rental: multiple columns indicating number of units sold and rented per type
lat & lng: Geocoded lattitude and longitudinal data pertaining to HDB location
SUBZONE: Subzone information

hdb contains many variables, but only a few are useful for this exercise. The following code performs the following actions:

lng and lat are renamed for easier conversion
Filter only residential HDB properties
Select only relevant columns for analysis

hdb <- hdb %>%
  rename(
    longitude = lng,
    latitude = lat
  ) %>%
  filter(residential == "Y") %>%
  select(
    blk_no,
    street,
    postal,
    total_dwelling_units,
    longitude,
    latitude
  )

hdb is an aspatial dataframe, with longitude and latitude columns as variables. These are used to transform it into a simple feature layer to join at hexagon level using st_as_sf() function.

hdb_sf <- st_as_sf(hdb,
                   coords = c("longitude", "latitude"),
                   crs = 4326) %>%
  st_transform(crs = 3414)

# Show map of HDB blocks

tm_shape(mpsz) +
  tm_fill(col = "#dfdfeb") +
tm_shape(train_hex_sf) +
  tm_fill(col =  "#BAFFDF")+
  tm_borders(col = "#dfdfeb") +
tm_shape(hdb_sf) +
  tm_dots(col = "#2F4858") +
  tm_layout(frame = FALSE)

The resultant dataframe is a simple features dataframe with point locations of residential HDB units.

train_hex_attr <- st_join(
    hdb_sf,
    train_hex_attr,
    join = st_intersects
  ) %>%
  group_by(
    hex_id
  ) %>%
  summarise(
    hdb_blocks = n(),
    hdb_units = sum(total_dwelling_units),
    num_offices = sum(num_offices),
    num_schools = sum(num_schools),
    num_entertn = sum(num_entertn),
    num_fnb = sum(num_fnb),
    num_facilities = sum(num_facilities),
    num_retail = sum(num_retail)
  ) %>%
  ungroup()

summary(train_hex_attr$hdb_blocks)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1.00    9.50   20.00   25.26   39.00   77.00

6.2 Preparing Modeling data by flow origin and destination

The following code chunks select attributes relevant to origin (propulsive factors) and destination (attractive factors) for Spatial Interaction Modelling.

# Origin
train_hex_attr_all <- train_hex_attr %>%
  st_drop_geometry() %>%
  mutate(
    hex_id = as.factor(hex_id)
  ) %>%
  select( -hdb_units)

#join by origin TAZ
flow_attr <- od_data_dist %>%
  mutate(
    origin_hex = as.factor(origin_hex),
    dest_hex = as.factor(dest_hex)
  ) %>%
  left_join(
    train_hex_attr_all,
    by = c("origin_hex" = "hex_id")
  ) %>%
  rename(
    trips = weekday_pm_trips,
    origin_hdb = hdb_blocks,
    origin_schools = num_schools,
    origin_offices = num_offices
  ) %>%
  select(
    -c(num_entertn, num_retail, num_facilities, num_fnb)
  )

#join by destination TAZ
flow_attr <- flow_attr %>%
  left_join(
    train_hex_attr_all,
    by = c("dest_hex" = "hex_id")
  ) %>%
  rename(
    dest_hdb = hdb_blocks,
    dest_entertn = num_entertn,
    dest_fnb = num_fnb,
    dest_facilities = num_facilities,
    dest_retail = num_retail
  ) %>%
  select(
    -c(num_offices, num_schools)
  ) %>%
  replace(is.na(.), 0)

summary(flow_attr)

   origin_hex       dest_hex         trips             trips_quantile
 237    :  151   320    :  151   Min.   :    1.0   < 50       :6373  
 320    :  151   353    :  151   1st Qu.:   36.0   100 ~ 250  :3644  
 353    :  151   436    :  151   Median :  161.0   1000 ~ 5000:3121  
 436    :  151   534    :  151   Mean   :  945.7   250 ~ 500  :2762  
 534    :  151   637    :  151   3rd Qu.:  648.8   < 100      :2518  
 637    :  151   648    :  151   Max.   :63626.0   500 ~ 1000 :2213  
 (Other):20692   (Other):20692                     (Other)    : 967  
  flowNoIntra          offset     distance       origin_hdb   
 Min.   :    1.0   Min.   :1   Min.   :  750   Min.   : 0.00  
 1st Qu.:   36.0   1st Qu.:1   1st Qu.: 6495   1st Qu.: 0.00  
 Median :  161.0   Median :1   Median :10500   Median :13.00  
 Mean   :  945.7   Mean   :1   Mean   :11059   Mean   :19.38  
 3rd Qu.:  648.8   3rd Qu.:1   3rd Qu.:14981   3rd Qu.:34.00  
 Max.   :63626.0   Max.   :1   Max.   :39086   Max.   :77.00  
                                                              
 origin_offices    origin_schools      dest_hdb      dest_entertn    
 Min.   :   0.00   Min.   :  0.00   Min.   : 0.00   Min.   :  0.000  
 1st Qu.:   0.00   1st Qu.:  0.00   1st Qu.: 0.00   1st Qu.:  0.000  
 Median :   0.00   Median :  0.00   Median :14.00   Median :  0.000  
 Mean   :  68.94   Mean   : 22.75   Mean   :19.72   Mean   :  4.338  
 3rd Qu.:  53.00   3rd Qu.: 33.00   3rd Qu.:34.00   3rd Qu.:  0.000  
 Max.   :2295.00   Max.   :174.00   Max.   :77.00   Max.   :108.000  
                                                                     
    dest_fnb      dest_facilities   dest_retail             geometry    
 Min.   :   0.0   Min.   :  0.00   Min.   :    0   LINESTRING   :21598  
 1st Qu.:   0.0   1st Qu.:  0.00   1st Qu.:    0   epsg:3414    :    0  
 Median :   0.0   Median :  0.00   Median :  336   +proj=tmer...:    0  
 Mean   :  53.1   Mean   : 28.39   Mean   : 1503                        
 3rd Qu.:  28.0   3rd Qu.: 36.00   3rd Qu.: 1798                        
 Max.   :1275.0   Max.   :429.00   Max.   :16950

Since the SIM is based on Poisson Regression (this model uses log values), it is important for us to ensure that no 0 values in the explanatory variables as these will be parsed as undefined.

# save columns as a vector
update_cols <- c("origin_hdb", "origin_offices", "origin_schools", "dest_hdb", "dest_entertn", "dest_fnb", "dest_facilities", "dest_retail")

# update all 0 values across columns
flow_attr <- flow_attr %>%
  mutate(across(all_of(update_cols), ~ ifelse(. == 0, 0.9, .)))

summary(flow_attr)

   origin_hex       dest_hex         trips             trips_quantile
 237    :  151   320    :  151   Min.   :    1.0   < 50       :6373  
 320    :  151   353    :  151   1st Qu.:   36.0   100 ~ 250  :3644  
 353    :  151   436    :  151   Median :  161.0   1000 ~ 5000:3121  
 436    :  151   534    :  151   Mean   :  945.7   250 ~ 500  :2762  
 534    :  151   637    :  151   3rd Qu.:  648.8   < 100      :2518  
 637    :  151   648    :  151   Max.   :63626.0   500 ~ 1000 :2213  
 (Other):20692   (Other):20692                     (Other)    : 967  
  flowNoIntra          offset     distance       origin_hdb   
 Min.   :    1.0   Min.   :1   Min.   :  750   Min.   : 0.90  
 1st Qu.:   36.0   1st Qu.:1   1st Qu.: 6495   1st Qu.: 0.90  
 Median :  161.0   Median :1   Median :10500   Median :13.00  
 Mean   :  945.7   Mean   :1   Mean   :11059   Mean   :19.64  
 3rd Qu.:  648.8   3rd Qu.:1   3rd Qu.:14981   3rd Qu.:34.00  
 Max.   :63626.0   Max.   :1   Max.   :39086   Max.   :77.00  
                                                              
 origin_offices    origin_schools      dest_hdb      dest_entertn    
 Min.   :   0.90   Min.   :  0.90   Min.   : 0.90   Min.   :  0.900  
 1st Qu.:   0.90   1st Qu.:  0.90   1st Qu.: 0.90   1st Qu.:  0.900  
 Median :   0.90   Median :  0.90   Median :14.00   Median :  0.900  
 Mean   :  69.46   Mean   : 23.28   Mean   :19.98   Mean   :  5.133  
 3rd Qu.:  53.00   3rd Qu.: 33.00   3rd Qu.:34.00   3rd Qu.:  0.900  
 Max.   :2295.00   Max.   :174.00   Max.   :77.00   Max.   :108.000  
                                                                     
    dest_fnb      dest_facilities   dest_retail               geometry    
 Min.   :   0.9   Min.   :  0.90   Min.   :    0.9   LINESTRING   :21598  
 1st Qu.:   0.9   1st Qu.:  0.90   1st Qu.:    0.9   epsg:3414    :    0  
 Median :   0.9   Median :  0.90   Median :  336.0   +proj=tmer...:    0  
 Mean   :  53.7   Mean   : 28.93   Mean   : 1503.5                        
 3rd Qu.:  28.0   3rd Qu.: 36.00   3rd Qu.: 1798.0                        
 Max.   :1275.0   Max.   :429.00   Max.   :16950.0

code block

write_rds(flow_attr, "data/rds/sim_data.rds")

code block

# residential density
p_res <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(origin_hdb),
            y = log(trips))
      ) +
      geom_point(
        color = "#8D9EC6",
        size = 1,
        alpha = .7
      ) +
      geom_smooth(method = lm) +
      theme(axis.text.x = element_blank()) +
      ggtitle("Trips ~ Residential Density")

# offices
p_office <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(origin_offices),
            y = log(trips))
      ) +
      geom_point(
        color = "#4E4B5C",
        size = 1,
        alpha = .7
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ Office Count")

# schools
p_sch <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(origin_schools),
            y = log(trips))
      ) +
      geom_point(
        color = "#f5bc5f",
        size = 1,
        alpha = .7
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ School Count")

origin_patch <- p_res + p_office + p_sch
origin_patch + plot_annotation(
  title = "Correlation between trips and propulsive factors",
  subtitle = "+ve observed relationship between trips and office count"
)

code block

# Entertainment
p_ent <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(dest_entertn),
            y = log(trips))
      ) +
      geom_point(
        size = 1,
        alpha = .7
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ Entertainment")

# f&b
p_food <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(dest_fnb),
            y = log(trips))
      ) +
      geom_point(
        size = 1,
        alpha = .7,
        color = "#9590A8"
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ F&B")

# recreation
p_rec <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(dest_facilities),
            y = log(trips))
      ) +
      geom_point(
        size = 1,
        alpha = .7,
        color = "#f5bc5f"
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ Recreation")

# retail
p_retail <- flow_attr %>%
      st_drop_geometry() %>%
      ggplot(
        aes(x = log(dest_retail),
            y = log(trips))
      ) +
      geom_point(
        size = 1,
        alpha = .7,
        color = "#6D435A"
      ) +
      geom_smooth(method = lm) +
      theme(
        axis.title.y = element_blank(),
        axis.text.y = element_blank(),
        axis.text.x = element_blank()
      ) +
      ggtitle("Trips ~ Retail")

dest_patch <- (p_ent + p_food) / (p_rec + p_retail)
dest_patch + plot_annotation(
  title = "Correlation between trips and attractive factors",
  subtitle = "+ve observed relationship between trip count and all factors"
)

The Scatterplots above reveal that there seems to be a positive correlation between number of trips and number of offices, while there is a negative correlation between trips and residential density and school count. This could suggest that higher number of trips could be related to higher workplace concentration in the origin TAZ. Conversely, there is a similar positive correlation between number of trips and all attractive factors, indicating some relationship between higher trips towards destination areas with more leisure activities or food options. However, to understand the explanatory strength of these attributes as push or pull factors in these passenger trips, we run a series of spatial interactive models (SIMs) to estimate the likelihood or intensity of interactions between locations.

An origin-constrained model features explanatory variables pertaining to the attractiveness of the destination.

originSIM <- glm(
   # constrain by origin TAZ
    formula = trips ~ origin_hex 
            + log(dest_hdb)
            + log(dest_entertn) 
            + log(dest_fnb)
            + log(dest_facilities)
            + log(dest_retail)
            + log(distance) - 1,
    family = poisson(link = "log"),
    data = flow_attr,
           na.action = na.exclude)

summary(originSIM)


Call:
glm(formula = trips ~ origin_hex + log(dest_hdb) + log(dest_entertn) + 
    log(dest_fnb) + log(dest_facilities) + log(dest_retail) + 
    log(distance) - 1, family = poisson(link = "log"), data = flow_attr, 
    na.action = na.exclude)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-215.68   -23.32   -11.18     1.11   392.90  

Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
origin_hex23         11.8490197  0.0057181  2072.2   <2e-16 ***
origin_hex39         12.4201248  0.0045988  2700.8   <2e-16 ***
origin_hex88         12.4582313  0.0044969  2770.4   <2e-16 ***
origin_hex137        12.3193135  0.0046725  2636.5   <2e-16 ***
origin_hex237        13.4674271  0.0033140  4063.8   <2e-16 ***
origin_hex320        13.0451086  0.0035122  3714.2   <2e-16 ***
origin_hex353        13.6123304  0.0030974  4394.8   <2e-16 ***
origin_hex436        12.8689210  0.0035378  3637.5   <2e-16 ***
origin_hex502        11.4359220  0.0055304  2067.8   <2e-16 ***
origin_hex534        14.2758242  0.0027608  5170.9   <2e-16 ***
origin_hex555        13.0054253  0.0032869  3956.8   <2e-16 ***
origin_hex571        10.4754637  0.0084385  1241.4   <2e-16 ***
origin_hex573        12.5955035  0.0037267  3379.8   <2e-16 ***
origin_hex585        12.6753147  0.0035582  3562.3   <2e-16 ***
origin_hex586        11.9168226  0.0044214  2695.3   <2e-16 ***
origin_hex604        10.7006199  0.0075594  1415.5   <2e-16 ***
origin_hex637        12.7393858  0.0032477  3922.5   <2e-16 ***
origin_hex641        12.5731109  0.0038038  3305.4   <2e-16 ***
origin_hex648        13.4656216  0.0030485  4417.1   <2e-16 ***
origin_hex653        10.7759826  0.0074301  1450.3   <2e-16 ***
origin_hex654         9.4452050  0.0154608   610.9   <2e-16 ***
origin_hex669        11.0601704  0.0064951  1702.9   <2e-16 ***
origin_hex670         9.4211189  0.0164033   574.3   <2e-16 ***
origin_hex671        10.8886658  0.0074563  1460.3   <2e-16 ***
origin_hex687        10.6671142  0.0081410  1310.3   <2e-16 ***
origin_hex700        12.0246129  0.0043300  2777.0   <2e-16 ***
origin_hex703        10.5632600  0.0093153  1134.0   <2e-16 ***
origin_hex708        12.3814688  0.0040301  3072.2   <2e-16 ***
origin_hex714        12.0694991  0.0041256  2925.5   <2e-16 ***
origin_hex728        11.9847411  0.0044636  2685.0   <2e-16 ***
origin_hex729        13.3441185  0.0030300  4403.9   <2e-16 ***
origin_hex746        13.1234996  0.0031055  4225.9   <2e-16 ***
origin_hex749        11.7091472  0.0047312  2474.9   <2e-16 ***
origin_hex759        11.8797064  0.0048717  2438.5   <2e-16 ***
origin_hex763        13.2437947  0.0030369  4361.0   <2e-16 ***
origin_hex774        13.3672393  0.0031099  4298.3   <2e-16 ***
origin_hex777        11.9967863  0.0043930  2730.9   <2e-16 ***
origin_hex790        11.4660936  0.0053789  2131.7   <2e-16 ***
origin_hex798        11.0395804  0.0061331  1800.0   <2e-16 ***
origin_hex812        12.1202840  0.0039791  3046.0   <2e-16 ***
origin_hex813        11.4301995  0.0050262  2274.1   <2e-16 ***
origin_hex824        13.0244530  0.0033435  3895.5   <2e-16 ***
origin_hex826        12.5031433  0.0037063  3373.4   <2e-16 ***
origin_hex847        10.9869947  0.0060916  1803.6   <2e-16 ***
origin_hex859        10.7084013  0.0072209  1483.0   <2e-16 ***
origin_hex861        12.1606061  0.0038937  3123.1   <2e-16 ***
origin_hex863        10.9244837  0.0062173  1757.1   <2e-16 ***
origin_hex880        11.3550298  0.0051938  2186.3   <2e-16 ***
origin_hex903        11.2945262  0.0056642  1994.0   <2e-16 ***
origin_hex908        13.7242614  0.0028836  4759.4   <2e-16 ***
origin_hex910        12.3571422  0.0035984  3434.1   <2e-16 ***
origin_hex928        11.5482449  0.0046950  2459.7   <2e-16 ***
origin_hex940        12.9100319  0.0034640  3726.9   <2e-16 ***
origin_hex943        12.3894077  0.0035285  3511.2   <2e-16 ***
origin_hex946        11.2169526  0.0053975  2078.2   <2e-16 ***
origin_hex973        12.0573495  0.0044460  2712.0   <2e-16 ***
origin_hex976        11.1875520  0.0052927  2113.8   <2e-16 ***
origin_hex977        13.5257399  0.0027477  4922.6   <2e-16 ***
origin_hex982        11.9992138  0.0040929  2931.7   <2e-16 ***
origin_hex987        12.5865654  0.0036554  3443.3   <2e-16 ***
origin_hex988        13.1674375  0.0032093  4102.9   <2e-16 ***
origin_hex993        11.1985315  0.0051253  2184.9   <2e-16 ***
origin_hex1001       10.6470973  0.0072559  1467.4   <2e-16 ***
origin_hex1008       13.2530033  0.0029311  4521.6   <2e-16 ***
origin_hex1010       12.9186001  0.0029988  4307.9   <2e-16 ***
origin_hex1011       13.3282303  0.0028275  4713.7   <2e-16 ***
origin_hex1013       11.1615167  0.0053836  2073.3   <2e-16 ***
origin_hex1014       11.5582350  0.0046726  2473.6   <2e-16 ***
origin_hex1016       10.9862554  0.0059940  1832.9   <2e-16 ***
origin_hex1025       13.5154042  0.0027586  4899.4   <2e-16 ***
origin_hex1026       11.1884899  0.0050857  2200.0   <2e-16 ***
origin_hex1028       13.2642831  0.0028651  4629.7   <2e-16 ***
origin_hex1041       13.6526660  0.0027361  4989.9   <2e-16 ***
origin_hex1042       12.2891960  0.0035028  3508.4   <2e-16 ***
origin_hex1043       12.8624334  0.0029783  4318.8   <2e-16 ***
origin_hex1046       12.8280993  0.0031437  4080.5   <2e-16 ***
origin_hex1050       12.8486946  0.0032343  3972.7   <2e-16 ***
origin_hex1058       14.2714884  0.0025328  5634.7   <2e-16 ***
origin_hex1059       12.1713548  0.0035676  3411.7   <2e-16 ***
origin_hex1060       12.8884403  0.0029724  4336.0   <2e-16 ***
origin_hex1063       11.7446725  0.0042439  2767.4   <2e-16 ***
origin_hex1064       12.8669343  0.0031433  4093.4   <2e-16 ***
origin_hex1074       13.6656190  0.0027558  4958.8   <2e-16 ***
origin_hex1076       13.3729817  0.0027268  4904.3   <2e-16 ***
origin_hex1077       12.4291816  0.0033224  3741.0   <2e-16 ***
origin_hex1082       12.9233131  0.0031630  4085.8   <2e-16 ***
origin_hex1091       12.9477248  0.0030720  4214.8   <2e-16 ***
origin_hex1092       13.4240891  0.0027416  4896.4   <2e-16 ***
origin_hex1093       11.2046068  0.0049790  2250.4   <2e-16 ***
origin_hex1123       10.0042304  0.0098071  1020.1   <2e-16 ***
origin_hex1125       12.7513786  0.0031422  4058.1   <2e-16 ***
origin_hex1126       12.1783812  0.0035932  3389.3   <2e-16 ***
origin_hex1127       12.4611435  0.0033603  3708.4   <2e-16 ***
origin_hex1130       11.2205748  0.0052765  2126.5   <2e-16 ***
origin_hex1140       10.3732245  0.0080509  1288.5   <2e-16 ***
origin_hex1142       11.1846679  0.0051341  2178.5   <2e-16 ***
origin_hex1143       11.3624054  0.0047178  2408.4   <2e-16 ***
origin_hex1161       12.0316826  0.0038509  3124.4   <2e-16 ***
origin_hex1162       11.7881953  0.0042062  2802.6   <2e-16 ***
origin_hex1175       11.3834463  0.0048744  2335.3   <2e-16 ***
origin_hex1176       11.3858032  0.0047908  2376.6   <2e-16 ***
origin_hex1177       10.9015032  0.0058871  1851.8   <2e-16 ***
origin_hex1179       13.0730495  0.0029975  4361.3   <2e-16 ***
origin_hex1199       10.9202118  0.0060754  1797.4   <2e-16 ***
origin_hex1200        9.6664704  0.0110255   876.7   <2e-16 ***
origin_hex1212       10.9179011  0.0059592  1832.1   <2e-16 ***
origin_hex1216       10.0914448  0.0086793  1162.7   <2e-16 ***
origin_hex1225       11.2240082  0.0052914  2121.2   <2e-16 ***
origin_hex1226       12.1920679  0.0036823  3311.0   <2e-16 ***
origin_hex1227       10.8943213  0.0059261  1838.4   <2e-16 ***
origin_hex1233        9.5829577  0.0115616   828.9   <2e-16 ***
origin_hex1244       12.9684986  0.0030424  4262.6   <2e-16 ***
origin_hex1246       12.1051089  0.0038850  3115.8   <2e-16 ***
origin_hex1258       11.3003728  0.0052402  2156.5   <2e-16 ***
origin_hex1259       13.3272033  0.0028706  4642.7   <2e-16 ***
origin_hex1260       12.3939500  0.0035071  3533.9   <2e-16 ***
origin_hex1265       10.2847464  0.0076574  1343.1   <2e-16 ***
origin_hex1266       10.3308516  0.0077235  1337.6   <2e-16 ***
origin_hex1280       12.2853974  0.0036583  3358.2   <2e-16 ***
origin_hex1281       11.6059220  0.0044365  2616.0   <2e-16 ***
origin_hex1282       12.7634123  0.0030939  4125.3   <2e-16 ***
origin_hex1293       11.8872086  0.0041423  2869.7   <2e-16 ***
origin_hex1314       10.4679514  0.0071603  1462.0   <2e-16 ***
origin_hex1315       10.1801398  0.0080908  1258.2   <2e-16 ***
origin_hex1316       12.7443388  0.0031766  4012.0   <2e-16 ***
origin_hex1317       10.6040392  0.0072263  1467.4   <2e-16 ***
origin_hex1325       11.8482920  0.0043323  2734.9   <2e-16 ***
origin_hex1331        9.5692041  0.0108594   881.2   <2e-16 ***
origin_hex1332        9.9809026  0.0089119  1120.0   <2e-16 ***
origin_hex1333       10.4660618  0.0073436  1425.2   <2e-16 ***
origin_hex1343       12.1070055  0.0039725  3047.7   <2e-16 ***
origin_hex1348       10.1894369  0.0079797  1276.9   <2e-16 ***
origin_hex1349       10.5638823  0.0068237  1548.1   <2e-16 ***
origin_hex1350        9.7795372  0.0107447   910.2   <2e-16 ***
origin_hex1365       10.2489892  0.0078761  1301.3   <2e-16 ***
origin_hex1375       11.5086854  0.0050262  2289.7   <2e-16 ***
origin_hex1381       10.5362750  0.0070257  1499.7   <2e-16 ***
origin_hex1382       10.8731900  0.0061990  1754.0   <2e-16 ***
origin_hex1398       10.4137530  0.0076596  1359.6   <2e-16 ***
origin_hex1409       11.1325718  0.0058614  1899.3   <2e-16 ***
origin_hex1414        9.4721562  0.0123244   768.6   <2e-16 ***
origin_hex1474       12.8497757  0.0033959  3783.9   <2e-16 ***
origin_hex1475       10.8850830  0.0067207  1619.6   <2e-16 ***
origin_hex1509       11.4048739  0.0054470  2093.8   <2e-16 ***
origin_hex1526       13.3421428  0.0030191  4419.3   <2e-16 ***
origin_hex1540       12.3919255  0.0039273  3155.3   <2e-16 ***
origin_hex1544       12.4493263  0.0038318  3249.0   <2e-16 ***
origin_hex1575       12.1919042  0.0041091  2967.0   <2e-16 ***
origin_hex1576       11.9121654  0.0046190  2578.9   <2e-16 ***
origin_hex1607       13.0977762  0.0033084  3959.0   <2e-16 ***
origin_hex1624       11.6694607  0.0051816  2252.1   <2e-16 ***
origin_hex1741       12.7039587  0.0038835  3271.3   <2e-16 ***
log(dest_hdb)         0.1906171  0.0002946   647.1   <2e-16 ***
log(dest_entertn)     0.1394649  0.0001874   744.2   <2e-16 ***
log(dest_fnb)         0.0353077  0.0001476   239.3   <2e-16 ***
log(dest_facilities) -0.0377751  0.0001303  -289.9   <2e-16 ***
log(dest_retail)      0.0278225  0.0001677   165.9   <2e-16 ***
log(distance)        -0.7006256  0.0002732 -2564.6   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297060506  on 21598  degrees of freedom
Residual deviance:  29320336  on 21440  degrees of freedom
AIC: 29468186

Number of Fisher Scoring iterations: 6

Model results reveal that

Number of weekday evening peak period trips has a statistically significant relationship with all destination attractiveness attributes
The strongest +ve association is with number of HDB blocks (Coefficient Estimate: 0.1906171) followed by entertainment venues (Coefficient Estimate: 0.1394649). This suggests that the most attractive factors are housing and available entertainment centres such as cinemas and theaters
On the other hand, the strongest -ve association is with distance (-0.7006256). This reveals that the further away the destination, the less attractive it is.

6.4 Constrained SIM - Destination

Building Model
Model Results

A destination-constrained model features explanatory variables pertaining to the propulsiveness of the origin.

destSIM <- glm(
    formula = trips ~ dest_hex 
            + log(origin_hdb) 
            + log(origin_offices)
            + log(origin_schools)
            + log(distance) - 1,
    family = poisson(link = "log"),
    data = flow_attr
  )

summary(destSIM)


Call:
glm(formula = trips ~ dest_hex + log(origin_hdb) + log(origin_offices) + 
    log(origin_schools) + log(distance) - 1, family = poisson(link = "log"), 
    data = flow_attr)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-145.19   -24.59   -11.88     2.43   462.17  

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)    
dest_hex23          13.1467371  0.0066028  1991.1   <2e-16 ***
dest_hex39          12.7535341  0.0076444  1668.4   <2e-16 ***
dest_hex88          12.3679568  0.0093967  1316.2   <2e-16 ***
dest_hex137         12.1004292  0.0106051  1141.0   <2e-16 ***
dest_hex237         13.5250423  0.0053112  2546.5   <2e-16 ***
dest_hex320         14.9725952  0.0033083  4525.7   <2e-16 ***
dest_hex353         15.7099056  0.0029423  5339.3   <2e-16 ***
dest_hex436         15.2172186  0.0031247  4870.0   <2e-16 ***
dest_hex502         14.0756164  0.0039775  3538.8   <2e-16 ***
dest_hex534         15.4450762  0.0029718  5197.2   <2e-16 ***
dest_hex555         15.3717207  0.0029960  5130.8   <2e-16 ***
dest_hex571         13.3699501  0.0054046  2473.8   <2e-16 ***
dest_hex573         15.1753239  0.0031573  4806.5   <2e-16 ***
dest_hex585         14.9405503  0.0031799  4698.4   <2e-16 ***
dest_hex586         14.5988568  0.0033771  4322.9   <2e-16 ***
dest_hex604         12.5716888  0.0077609  1619.9   <2e-16 ***
dest_hex637         15.2109373  0.0029121  5223.3   <2e-16 ***
dest_hex641         14.6820846  0.0035950  4084.1   <2e-16 ***
dest_hex648         15.1630753  0.0029909  5069.7   <2e-16 ***
dest_hex653         12.2792388  0.0080796  1519.8   <2e-16 ***
dest_hex654         12.0787963  0.0106822  1130.7   <2e-16 ***
dest_hex669         13.5709789  0.0045530  2980.7   <2e-16 ***
dest_hex670         12.2236724  0.0100574  1215.4   <2e-16 ***
dest_hex671         13.6791912  0.0049611  2757.3   <2e-16 ***
dest_hex687         13.5699649  0.0051121  2654.5   <2e-16 ***
dest_hex700         14.0369937  0.0038171  3677.4   <2e-16 ***
dest_hex703         12.4731018  0.0090521  1377.9   <2e-16 ***
dest_hex708         14.6905583  0.0035533  4134.3   <2e-16 ***
dest_hex714         13.3263600  0.0046447  2869.2   <2e-16 ***
dest_hex728         12.9520496  0.0055195  2346.6   <2e-16 ***
dest_hex729         13.0569804  0.0051194  2550.5   <2e-16 ***
dest_hex746         12.4498343  0.0064366  1934.2   <2e-16 ***
dest_hex749         13.0137069  0.0052744  2467.3   <2e-16 ***
dest_hex759         12.8486962  0.0068972  1862.9   <2e-16 ***
dest_hex763         13.9562761  0.0036371  3837.2   <2e-16 ***
dest_hex774         15.8137709  0.0028765  5497.6   <2e-16 ***
dest_hex777         12.4448725  0.0067389  1846.7   <2e-16 ***
dest_hex790         13.8338836  0.0044044  3140.9   <2e-16 ***
dest_hex798         12.6664993  0.0059075  2144.1   <2e-16 ***
dest_hex812         13.5476274  0.0041017  3302.9   <2e-16 ***
dest_hex813         13.2670216  0.0045271  2930.6   <2e-16 ***
dest_hex824         15.4422190  0.0030558  5053.4   <2e-16 ***
dest_hex826         11.4716377  0.0104402  1098.8   <2e-16 ***
dest_hex847         12.2705512  0.0066992  1831.6   <2e-16 ***
dest_hex859         12.7453554  0.0057393  2220.7   <2e-16 ***
dest_hex861         13.9967600  0.0035393  3954.6   <2e-16 ***
dest_hex863         12.6929857  0.0054658  2322.3   <2e-16 ***
dest_hex880         12.6240145  0.0056812  2222.1   <2e-16 ***
dest_hex903         12.7180230  0.0064867  1960.6   <2e-16 ***
dest_hex908         14.5232131  0.0032700  4441.3   <2e-16 ***
dest_hex910         13.8252253  0.0036233  3815.6   <2e-16 ***
dest_hex928         12.1768536  0.0066697  1825.7   <2e-16 ***
dest_hex940         15.5010622  0.0030384  5101.7   <2e-16 ***
dest_hex943         14.3078874  0.0031504  4541.6   <2e-16 ***
dest_hex946         12.8554629  0.0050960  2522.7   <2e-16 ***
dest_hex973         14.5121918  0.0036582  3967.0   <2e-16 ***
dest_hex976         12.4695948  0.0055963  2228.2   <2e-16 ***
dest_hex977         14.4662452  0.0029488  4905.8   <2e-16 ***
dest_hex982         13.6836623  0.0040228  3401.5   <2e-16 ***
dest_hex987         15.1219639  0.0031244  4840.0   <2e-16 ***
dest_hex988         15.7349248  0.0028844  5455.1   <2e-16 ***
dest_hex993         12.5998584  0.0051454  2448.7   <2e-16 ***
dest_hex1001        12.9910542  0.0055852  2326.0   <2e-16 ***
dest_hex1008        13.5744996  0.0037961  3575.9   <2e-16 ***
dest_hex1010        14.0257250  0.0031684  4426.7   <2e-16 ***
dest_hex1011        14.7561971  0.0028091  5252.9   <2e-16 ***
dest_hex1013        12.8069920  0.0051409  2491.2   <2e-16 ***
dest_hex1014        12.9442119  0.0049960  2590.9   <2e-16 ***
dest_hex1016        13.6612765  0.0040839  3345.2   <2e-16 ***
dest_hex1025        14.4393532  0.0029040  4972.1   <2e-16 ***
dest_hex1026        12.4686792  0.0052935  2355.5   <2e-16 ***
dest_hex1028        13.7102119  0.0036018  3806.4   <2e-16 ***
dest_hex1041        13.7093119  0.0035784  3831.1   <2e-16 ***
dest_hex1042        12.9698280  0.0043805  2960.8   <2e-16 ***
dest_hex1043        13.9016626  0.0032072  4334.6   <2e-16 ***
dest_hex1046        14.5321582  0.0030085  4830.4   <2e-16 ***
dest_hex1050        13.8563464  0.0039262  3529.2   <2e-16 ***
dest_hex1058        13.7879597  0.0034034  4051.2   <2e-16 ***
dest_hex1059        12.8675463  0.0044542  2888.9   <2e-16 ***
dest_hex1060        14.1146988  0.0030647  4605.5   <2e-16 ***
dest_hex1063        13.6069044  0.0038448  3539.0   <2e-16 ***
dest_hex1064        14.8123763  0.0029483  5024.1   <2e-16 ***
dest_hex1074        12.9609666  0.0046282  2800.4   <2e-16 ***
dest_hex1076        14.3763986  0.0028790  4993.6   <2e-16 ***
dest_hex1077        14.1746362  0.0030826  4598.3   <2e-16 ***
dest_hex1082        15.1299082  0.0028885  5238.1   <2e-16 ***
dest_hex1091        13.9289661  0.0033258  4188.1   <2e-16 ***
dest_hex1092        14.2358288  0.0030046  4738.1   <2e-16 ***
dest_hex1093        13.0249762  0.0042617  3056.3   <2e-16 ***
dest_hex1123        10.7748731  0.0140069   769.3   <2e-16 ***
dest_hex1125        13.2652079  0.0040434  3280.7   <2e-16 ***
dest_hex1126        13.4739819  0.0037091  3632.7   <2e-16 ***
dest_hex1127        13.7894760  0.0034640  3980.9   <2e-16 ***
dest_hex1130        13.0377358  0.0048269  2701.1   <2e-16 ***
dest_hex1140        12.1079109  0.0071186  1700.9   <2e-16 ***
dest_hex1142        11.9105646  0.0069623  1710.7   <2e-16 ***
dest_hex1143        12.0113085  0.0066645  1802.3   <2e-16 ***
dest_hex1161        13.5959125  0.0037477  3627.8   <2e-16 ***
dest_hex1162        13.5066578  0.0039349  3432.5   <2e-16 ***
dest_hex1175        13.0352841  0.0045372  2873.0   <2e-16 ***
dest_hex1176        13.4858368  0.0038033  3545.8   <2e-16 ***
dest_hex1177        12.6663907  0.0052030  2434.4   <2e-16 ***
dest_hex1179        15.2071895  0.0027177  5595.6   <2e-16 ***
dest_hex1199        13.3384557  0.0045878  2907.4   <2e-16 ***
dest_hex1200        12.2659210  0.0073341  1672.4   <2e-16 ***
dest_hex1212        12.8699221  0.0050419  2552.6   <2e-16 ***
dest_hex1216        12.6356194  0.0060237  2097.6   <2e-16 ***
dest_hex1225        12.9119357  0.0048622  2655.6   <2e-16 ***
dest_hex1226        14.1435771  0.0032347  4372.4   <2e-16 ***
dest_hex1227        12.4285556  0.0057675  2154.9   <2e-16 ***
dest_hex1233        12.2343326  0.0073944  1654.5   <2e-16 ***
dest_hex1244        13.1341644  0.0044682  2939.5   <2e-16 ***
dest_hex1246        14.2609782  0.0033749  4225.5   <2e-16 ***
dest_hex1258        13.2016405  0.0044793  2947.3   <2e-16 ***
dest_hex1259        14.6586848  0.0029335  4997.1   <2e-16 ***
dest_hex1260        13.4397417  0.0039421  3409.3   <2e-16 ***
dest_hex1265        12.6680307  0.0058791  2154.8   <2e-16 ***
dest_hex1266        12.6543472  0.0060178  2102.8   <2e-16 ***
dest_hex1280        14.8066930  0.0030490  4856.3   <2e-16 ***
dest_hex1281        14.4776187  0.0032280  4485.0   <2e-16 ***
dest_hex1282        15.1816805  0.0028317  5361.2   <2e-16 ***
dest_hex1293        13.0206136  0.0047228  2757.0   <2e-16 ***
dest_hex1314        13.0377732  0.0050679  2572.6   <2e-16 ***
dest_hex1315        12.6910291  0.0057915  2191.3   <2e-16 ***
dest_hex1316        15.1822355  0.0028684  5293.0   <2e-16 ***
dest_hex1317        13.0726839  0.0054136  2414.8   <2e-16 ***
dest_hex1325        14.0515889  0.0035259  3985.3   <2e-16 ***
dest_hex1331        12.5273997  0.0062725  1997.2   <2e-16 ***
dest_hex1332        12.6248250  0.0059806  2111.0   <2e-16 ***
dest_hex1333        12.8805144  0.0056160  2293.5   <2e-16 ***
dest_hex1343        13.5099219  0.0042028  3214.5   <2e-16 ***
dest_hex1348        13.0379204  0.0050163  2599.1   <2e-16 ***
dest_hex1349        12.8105077  0.0056926  2250.4   <2e-16 ***
dest_hex1350        11.6063300  0.0106563  1089.2   <2e-16 ***
dest_hex1365        12.9797287  0.0052155  2488.7   <2e-16 ***
dest_hex1375        13.8450899  0.0038186  3625.7   <2e-16 ***
dest_hex1381        13.1293352  0.0050023  2624.7   <2e-16 ***
dest_hex1382        13.2025133  0.0049516  2666.3   <2e-16 ***
dest_hex1398        13.1036741  0.0050645  2587.4   <2e-16 ***
dest_hex1409        13.4986480  0.0043595  3096.4   <2e-16 ***
dest_hex1414        11.9050818  0.0090139  1320.7   <2e-16 ***
dest_hex1474        14.9720173  0.0030837  4855.2   <2e-16 ***
dest_hex1475        13.7818416  0.0041211  3344.2   <2e-16 ***
dest_hex1509        13.8022672  0.0041246  3346.3   <2e-16 ***
dest_hex1526        15.5259412  0.0028624  5424.1   <2e-16 ***
dest_hex1540        14.1764830  0.0037990  3731.6   <2e-16 ***
dest_hex1544        15.1286571  0.0031506  4801.8   <2e-16 ***
dest_hex1575        14.4904840  0.0035450  4087.6   <2e-16 ***
dest_hex1576        14.6635515  0.0033931  4321.6   <2e-16 ***
dest_hex1607        13.1868627  0.0054264  2430.1   <2e-16 ***
dest_hex1624        13.0964724  0.0057861  2263.4   <2e-16 ***
dest_hex1741        14.1043886  0.0043533  3240.0   <2e-16 ***
log(origin_hdb)     -0.2553741  0.0002400 -1064.0   <2e-16 ***
log(origin_offices)  0.1603779  0.0001098  1460.7   <2e-16 ***
log(origin_schools)  0.0488396  0.0001824   267.8   <2e-16 ***
log(distance)       -0.7977006  0.0002747 -2904.2   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 297060506  on 21598  degrees of freedom
Residual deviance:  31833431  on 21442  degrees of freedom
AIC: 31981277

Number of Fisher Scoring iterations: 6

Model results reveal that:

Weekday evening peak period trips has a statistically significant relationship with all origin propulsive attributes
As with the origin-constrained model, the strongest -ve association is with distance (-0.7982436), asserting that as distance increases, the number of trips decreases.
Pervasiveness of offices and schools have +ve correlations to number of trips, while more residential properties recorded -ve correlation – this suggests that there is a higher propulsion to leave the origin from work or school rather than home.

6.5 Doubly-constrained SIM

This is an extension of the basic spatial interaction model that introduces constraints on both the origins and destinations of flows.

Model Building
Model Results

dbcSIM <- glm(formula = trips ~ 
                origin_hex + 
                dest_hex +
                log(distance),
              family = poisson(link = "log"),
              data = flow_attr,
              na.action = na.exclude)

summary(dbcSIM)


Call:
glm(formula = trips ~ origin_hex + dest_hex + log(distance), 
    family = poisson(link = "log"), data = flow_attr, na.action = na.exclude)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-229.662   -14.650    -5.366     3.767   251.861  

Coefficients:
                 Estimate Std. Error   z value Pr(>|z|)    
(Intercept)    13.0913666  0.0084100  1556.649  < 2e-16 ***
origin_hex39    0.5642319  0.0063050    89.490  < 2e-16 ***
origin_hex88    0.5883606  0.0062276    94.477  < 2e-16 ***
origin_hex137   0.4265728  0.0063578    67.094  < 2e-16 ***
origin_hex237   1.4966848  0.0054608   274.077  < 2e-16 ***
origin_hex320   0.8955789  0.0056236   159.253  < 2e-16 ***
origin_hex353   1.6122621  0.0053759   299.905  < 2e-16 ***
origin_hex436   0.8466564  0.0056439   150.011  < 2e-16 ***
origin_hex502  -0.6638253  0.0070779   -93.788  < 2e-16 ***
origin_hex534   2.3397989  0.0052026   449.739  < 2e-16 ***
origin_hex555   1.0235457  0.0055290   185.123  < 2e-16 ***
origin_hex571  -1.7080631  0.0095763  -178.364  < 2e-16 ***
origin_hex573   0.6164374  0.0057810   106.632  < 2e-16 ***
origin_hex585   0.6626137  0.0056818   116.621  < 2e-16 ***
origin_hex586  -0.0836836  0.0062702   -13.346  < 2e-16 ***
origin_hex604  -1.5189815  0.0088099  -172.418  < 2e-16 ***
origin_hex637   0.8915003  0.0055524   160.562  < 2e-16 ***
origin_hex641   0.5590226  0.0058171    96.101  < 2e-16 ***
origin_hex648   1.5709876  0.0053595   293.121  < 2e-16 ***
origin_hex653  -1.4103372  0.0086543  -162.963  < 2e-16 ***
origin_hex654  -2.8259038  0.0161305  -175.190  < 2e-16 ***
origin_hex669  -1.0209615  0.0078576  -129.933  < 2e-16 ***
origin_hex670  -2.8243286  0.0170384  -165.763  < 2e-16 ***
origin_hex671  -1.2568951  0.0087240  -144.074  < 2e-16 ***
origin_hex687  -1.4344641  0.0093248  -153.834  < 2e-16 ***
origin_hex700   0.0277210  0.0061901     4.478 7.52e-06 ***
origin_hex703  -1.5648439  0.0103681  -150.929  < 2e-16 ***
origin_hex708   0.2597694  0.0059778    43.456  < 2e-16 ***
origin_hex714   0.1689250  0.0060608    27.872  < 2e-16 ***
origin_hex728   0.1244928  0.0062766    19.834  < 2e-16 ***
origin_hex729   1.4981632  0.0053707   278.950  < 2e-16 ***
origin_hex746   1.2546869  0.0054283   231.136  < 2e-16 ***
origin_hex749  -0.2469139  0.0064835   -38.083  < 2e-16 ***
origin_hex759  -0.2524856  0.0065647   -38.461  < 2e-16 ***
origin_hex763   1.3862467  0.0053860   257.381  < 2e-16 ***
origin_hex774   1.3036042  0.0054165   240.672  < 2e-16 ***
origin_hex777   0.1602019  0.0062305    25.713  < 2e-16 ***
origin_hex790  -0.8155777  0.0069925  -116.636  < 2e-16 ***
origin_hex798  -0.9000220  0.0075713  -118.873  < 2e-16 ***
origin_hex812   0.2723124  0.0059709    45.606  < 2e-16 ***
origin_hex813  -0.4262031  0.0067184   -63.438  < 2e-16 ***
origin_hex824   0.9072750  0.0055500   163.472  < 2e-16 ***
origin_hex826   0.6843073  0.0057745   118.505  < 2e-16 ***
origin_hex847  -0.9229907  0.0075496  -122.257  < 2e-16 ***
origin_hex859  -1.1363383  0.0084714  -134.139  < 2e-16 ***
origin_hex861   0.3355707  0.0059176    56.707  < 2e-16 ***
origin_hex863  -0.9800962  0.0076526  -128.074  < 2e-16 ***
origin_hex880  -0.5490957  0.0068478   -80.186  < 2e-16 ***
origin_hex903  -0.7460364  0.0071856  -103.824  < 2e-16 ***
origin_hex908   1.9631609  0.0052795   371.844  < 2e-16 ***
origin_hex910   0.5321360  0.0057435    92.650  < 2e-16 ***
origin_hex928  -0.3470483  0.0064857   -53.510  < 2e-16 ***
origin_hex940   0.8193390  0.0056165   145.880  < 2e-16 ***
origin_hex943   0.5719441  0.0057070   100.218  < 2e-16 ***
origin_hex946  -0.6999576  0.0070053   -99.918  < 2e-16 ***
origin_hex973  -0.1667584  0.0062805   -26.552  < 2e-16 ***
origin_hex976  -0.6935922  0.0069464   -99.849  < 2e-16 ***
origin_hex977   1.6842274  0.0052703   319.570  < 2e-16 ***
origin_hex982   0.0405842  0.0060513     6.707 1.99e-11 ***
origin_hex987   0.4957370  0.0057435    86.313  < 2e-16 ***
origin_hex988   1.1363384  0.0054672   207.848  < 2e-16 ***
origin_hex993  -0.6849547  0.0068307  -100.277  < 2e-16 ***
origin_hex1001 -1.3745982  0.0085069  -161.587  < 2e-16 ***
origin_hex1008  1.4318752  0.0053546   267.409  < 2e-16 ***
origin_hex1010  1.0425800  0.0054183   192.418  < 2e-16 ***
origin_hex1011  1.4633404  0.0053113   275.512  < 2e-16 ***
origin_hex1013 -0.8025222  0.0070084  -114.508  < 2e-16 ***
origin_hex1014 -0.4205600  0.0064713   -64.989  < 2e-16 ***
origin_hex1016 -1.0038818  0.0074729  -134.336  < 2e-16 ***
origin_hex1025  1.7044251  0.0052788   322.884  < 2e-16 ***
origin_hex1026 -0.6940548  0.0068054  -101.987  < 2e-16 ***
origin_hex1028  1.3375627  0.0053308   250.912  < 2e-16 ***
origin_hex1041  1.8424202  0.0052582   350.388  < 2e-16 ***
origin_hex1042  0.4347072  0.0057164    76.046  < 2e-16 ***
origin_hex1043  1.0057671  0.0054205   185.548  < 2e-16 ***
origin_hex1046  0.9196896  0.0054774   167.906  < 2e-16 ***
origin_hex1050  0.8862781  0.0055097   160.858  < 2e-16 ***
origin_hex1058  2.4572417  0.0051680   475.469  < 2e-16 ***
origin_hex1059  0.3020139  0.0057664    52.375  < 2e-16 ***
origin_hex1060  1.0098044  0.0054168   186.421  < 2e-16 ***
origin_hex1063 -0.2158337  0.0061786   -34.932  < 2e-16 ***
origin_hex1064  0.9464905  0.0054701   173.030  < 2e-16 ***
origin_hex1074  1.8367952  0.0052620   349.067  < 2e-16 ***
origin_hex1076  1.5244492  0.0052882   288.273  < 2e-16 ***
origin_hex1077  0.5679687  0.0056119   101.207  < 2e-16 ***
origin_hex1082  0.9969216  0.0054708   182.226  < 2e-16 ***
origin_hex1091  1.1357081  0.0054398   208.778  < 2e-16 ***
origin_hex1092  1.6033152  0.0052827   303.504  < 2e-16 ***
origin_hex1093 -0.6953480  0.0067361  -103.227  < 2e-16 ***
origin_hex1123 -1.8255990  0.0107667  -169.560  < 2e-16 ***
origin_hex1125  0.9184157  0.0054947   167.146  < 2e-16 ***
origin_hex1126  0.3231243  0.0057732    55.970  < 2e-16 ***
origin_hex1127  0.5812092  0.0056220   103.380  < 2e-16 ***
origin_hex1130 -0.7995859  0.0069269  -115.432  < 2e-16 ***
origin_hex1140 -1.4525696  0.0092008  -157.875  < 2e-16 ***
origin_hex1142 -0.6592344  0.0068384   -96.402  < 2e-16 ***
origin_hex1143 -0.5033756  0.0065381   -76.991  < 2e-16 ***
origin_hex1161  0.1061669  0.0059217    17.928  < 2e-16 ***
origin_hex1162 -0.2352824  0.0061615   -38.186  < 2e-16 ***
origin_hex1175 -0.4498091  0.0066292   -67.853  < 2e-16 ***
origin_hex1176 -0.4831760  0.0065754   -73.482  < 2e-16 ***
origin_hex1177 -1.0400228  0.0074128  -140.301  < 2e-16 ***
origin_hex1179  1.1650199  0.0053979   215.828  < 2e-16 ***
origin_hex1199 -1.0532806  0.0075592  -139.338  < 2e-16 ***
origin_hex1200 -2.2763550  0.0119127  -191.087  < 2e-16 ***
origin_hex1212 -1.1317545  0.0074663  -151.583  < 2e-16 ***
origin_hex1216 -1.8578205  0.0097887  -189.793  < 2e-16 ***
origin_hex1225 -0.6667848  0.0069371   -96.119  < 2e-16 ***
origin_hex1226  0.2826910  0.0058132    48.629  < 2e-16 ***
origin_hex1227 -1.0604812  0.0074432  -142.477  < 2e-16 ***
origin_hex1233 -2.3785807  0.0124226  -191.472  < 2e-16 ***
origin_hex1244  1.0370773  0.0054276   191.075  < 2e-16 ***
origin_hex1246  0.1337962  0.0059318    22.556  < 2e-16 ***
origin_hex1258 -0.5986330  0.0068908   -86.875  < 2e-16 ***
origin_hex1259  1.4619383  0.0053276   274.406  < 2e-16 ***
origin_hex1260  0.4523398  0.0057015    79.336  < 2e-16 ***
origin_hex1265 -1.7461350  0.0089076  -196.028  < 2e-16 ***
origin_hex1266 -1.6914073  0.0089663  -188.640  < 2e-16 ***
origin_hex1280  0.3532188  0.0057941    60.962  < 2e-16 ***
origin_hex1281 -0.3297471  0.0063424   -51.991  < 2e-16 ***
origin_hex1282  0.9085260  0.0055032   165.090  < 2e-16 ***
origin_hex1293 -0.0547005  0.0061082    -8.955  < 2e-16 ***
origin_hex1314 -1.5175325  0.0084759  -179.041  < 2e-16 ***
origin_hex1315 -1.7980359  0.0092985  -193.369  < 2e-16 ***
origin_hex1316  0.9000275  0.0055387   162.498  < 2e-16 ***
origin_hex1317 -1.3266205  0.0085158  -155.783  < 2e-16 ***
origin_hex1325 -0.0658630  0.0062194   -10.590  < 2e-16 ***
origin_hex1331 -2.3895038  0.0117770  -202.896  < 2e-16 ***
origin_hex1332 -1.9783692  0.0100251  -197.341  < 2e-16 ***
origin_hex1333 -1.4947833  0.0086411  -172.986  < 2e-16 ***
origin_hex1343  0.1557659  0.0059790    26.052  < 2e-16 ***
origin_hex1348 -1.7256753  0.0091934  -187.709  < 2e-16 ***
origin_hex1349 -1.3686042  0.0082142  -166.615  < 2e-16 ***
origin_hex1350 -2.1602979  0.0116514  -185.410  < 2e-16 ***
origin_hex1365 -1.6465430  0.0091071  -180.798  < 2e-16 ***
origin_hex1375 -0.4354431  0.0067172   -64.825  < 2e-16 ***
origin_hex1381 -1.3575669  0.0083662  -162.267  < 2e-16 ***
origin_hex1382 -1.0002776  0.0076822  -130.207  < 2e-16 ***
origin_hex1398 -1.4757313  0.0089006  -165.801  < 2e-16 ***
origin_hex1409 -0.8365764  0.0073641  -113.602  < 2e-16 ***
origin_hex1414 -2.3786877  0.0131264  -181.214  < 2e-16 ***
origin_hex1474  0.9410580  0.0055852   168.491  < 2e-16 ***
origin_hex1475 -1.1332293  0.0080630  -140.546  < 2e-16 ***
origin_hex1509 -0.7336266  0.0070488  -104.078  < 2e-16 ***
origin_hex1526  1.3999331  0.0053923   259.618  < 2e-16 ***
origin_hex1540  0.4401952  0.0059140    74.432  < 2e-16 ***
origin_hex1544  0.4986567  0.0058609    85.081  < 2e-16 ***
origin_hex1575  0.1588065  0.0060601    26.205  < 2e-16 ***
origin_hex1576 -0.1689128  0.0064144   -26.334  < 2e-16 ***
origin_hex1607  1.1175386  0.0055249   202.273  < 2e-16 ***
origin_hex1624 -0.3471916  0.0068161   -50.937  < 2e-16 ***
origin_hex1741  0.8389002  0.0058361   143.743  < 2e-16 ***
dest_hex39     -0.3468708  0.0094163   -36.837  < 2e-16 ***
dest_hex88     -0.8314518  0.0108803   -76.418  < 2e-16 ***
dest_hex137    -1.1277465  0.0119354   -94.488  < 2e-16 ***
dest_hex237     0.3806882  0.0076289    49.900  < 2e-16 ***
dest_hex320     1.6986850  0.0064337   264.029  < 2e-16 ***
dest_hex353     2.5175418  0.0062469   403.007  < 2e-16 ***
dest_hex436     2.0093474  0.0063334   317.261  < 2e-16 ***
dest_hex502     0.7952085  0.0068075   116.813  < 2e-16 ***
dest_hex534     2.3498157  0.0062668   374.962  < 2e-16 ***
dest_hex555     2.3108372  0.0062848   367.690  < 2e-16 ***
dest_hex571     0.0478647  0.0077514     6.175 6.62e-10 ***
dest_hex573     2.1108765  0.0063465   332.606  < 2e-16 ***
dest_hex585     1.7508771  0.0063780   274.518  < 2e-16 ***
dest_hex586     1.4680433  0.0064859   226.344  < 2e-16 ***
dest_hex604    -0.7700830  0.0095549   -80.596  < 2e-16 ***
dest_hex637     2.2374057  0.0062823   356.142  < 2e-16 ***
dest_hex641     1.5874386  0.0065582   242.052  < 2e-16 ***
dest_hex648     2.0107643  0.0062988   319.231  < 2e-16 ***
dest_hex653    -0.8462923  0.0098037   -86.323  < 2e-16 ***
dest_hex654    -1.2626743  0.0120701  -104.612  < 2e-16 ***
dest_hex669     0.4636665  0.0071735    64.636  < 2e-16 ***
dest_hex670    -1.1348935  0.0115195   -98.520  < 2e-16 ***
dest_hex671     0.4035530  0.0074593    54.101  < 2e-16 ***
dest_hex687     0.2752735  0.0075636    36.394  < 2e-16 ***
dest_hex700     0.8945322  0.0067335   132.847  < 2e-16 ***
dest_hex703    -0.9211151  0.0106357   -86.606  < 2e-16 ***
dest_hex708     1.5699519  0.0065490   239.724  < 2e-16 ***
dest_hex714     0.0975154  0.0072498    13.451  < 2e-16 ***
dest_hex728    -0.2765618  0.0078345   -35.301  < 2e-16 ***
dest_hex729    -0.1501514  0.0075723   -19.829  < 2e-16 ***
dest_hex746    -0.8121345  0.0085278   -95.234  < 2e-16 ***
dest_hex749    -0.1088000  0.0076646   -14.195  < 2e-16 ***
dest_hex759    -0.2793742  0.0088105   -31.709  < 2e-16 ***
dest_hex763     0.7849966  0.0066716   117.663  < 2e-16 ***
dest_hex774     2.7833739  0.0062217   447.366  < 2e-16 ***
dest_hex777    -0.7695847  0.0087386   -88.067  < 2e-16 ***
dest_hex790     0.6455073  0.0070633    91.389  < 2e-16 ***
dest_hex798    -0.4504138  0.0081222   -55.455  < 2e-16 ***
dest_hex812     0.3502090  0.0069326    50.517  < 2e-16 ***
dest_hex813     0.0895837  0.0071927    12.455  < 2e-16 ***
dest_hex824     2.3522382  0.0062958   373.620  < 2e-16 ***
dest_hex826    -1.7474047  0.0118347  -147.651  < 2e-16 ***
dest_hex847    -0.8252480  0.0087299   -94.531  < 2e-16 ***
dest_hex859    -0.5275015  0.0080075   -65.876  < 2e-16 ***
dest_hex861     0.8097408  0.0066175   122.364  < 2e-16 ***
dest_hex863    -0.4116977  0.0078284   -52.590  < 2e-16 ***
dest_hex880    -0.4769136  0.0079756   -59.796  < 2e-16 ***
dest_hex903    -0.3652634  0.0085176   -42.883  < 2e-16 ***
dest_hex908     1.3496211  0.0064557   209.059  < 2e-16 ***
dest_hex910     0.6136111  0.0066772    91.896  < 2e-16 ***
dest_hex928    -1.0389544  0.0087153  -119.211  < 2e-16 ***
dest_hex940     2.4532057  0.0062839   390.394  < 2e-16 ***
dest_hex943     1.0559990  0.0064433   163.891  < 2e-16 ***
dest_hex946    -0.2838397  0.0075695   -37.498  < 2e-16 ***
dest_hex973     1.4158009  0.0066138   214.066  < 2e-16 ***
dest_hex976    -0.8268750  0.0079487  -104.026  < 2e-16 ***
dest_hex977     1.2761932  0.0063654   200.488  < 2e-16 ***
dest_hex982     0.6297387  0.0068664    91.713  < 2e-16 ***
dest_hex987     2.0655655  0.0063398   325.809  < 2e-16 ***
dest_hex988     2.7393000  0.0062240   440.118  < 2e-16 ***
dest_hex993    -0.7007766  0.0076485   -91.623  < 2e-16 ***
dest_hex1001   -0.0972404  0.0078674   -12.360  < 2e-16 ***
dest_hex1008    0.2100492  0.0068002    30.889  < 2e-16 ***
dest_hex1010    0.7715518  0.0064879   118.921  < 2e-16 ***
dest_hex1011    1.5708517  0.0062960   249.501  < 2e-16 ***
dest_hex1013   -0.3098188  0.0076039   -40.745  < 2e-16 ***
dest_hex1014   -0.1270855  0.0074963   -16.953  < 2e-16 ***
dest_hex1016    0.6074577  0.0069025    88.005  < 2e-16 ***
dest_hex1025    1.0713796  0.0063712   168.160  < 2e-16 ***
dest_hex1026   -0.8452333  0.0077602  -108.918  < 2e-16 ***
dest_hex1028    0.5430258  0.0066821    81.266  < 2e-16 ***
dest_hex1041    0.2968982  0.0066961    44.339  < 2e-16 ***
dest_hex1042   -0.4252940  0.0071703   -59.313  < 2e-16 ***
dest_hex1043    0.6534024  0.0065170   100.261  < 2e-16 ***
dest_hex1046    1.4921105  0.0063647   234.437  < 2e-16 ***
dest_hex1050    0.8545220  0.0067996   125.673  < 2e-16 ***
dest_hex1058    0.4789165  0.0066145    72.404  < 2e-16 ***
dest_hex1059   -0.4388871  0.0072187   -60.798  < 2e-16 ***
dest_hex1060    0.9091748  0.0064409   141.156  < 2e-16 ***
dest_hex1063    0.5297147  0.0067955    77.951  < 2e-16 ***
dest_hex1064    1.8080197  0.0063165   286.240  < 2e-16 ***
dest_hex1074   -0.4007732  0.0073085   -54.836  < 2e-16 ***
dest_hex1076    1.1639901  0.0063608   182.994  < 2e-16 ***
dest_hex1077    1.0079190  0.0064353   156.624  < 2e-16 ***
dest_hex1082    2.1441201  0.0062704   341.945  < 2e-16 ***
dest_hex1091    0.5566110  0.0065641    84.796  < 2e-16 ***
dest_hex1092    1.0024906  0.0064127   156.328  < 2e-16 ***
dest_hex1093   -0.1861516  0.0070891   -26.259  < 2e-16 ***
dest_hex1123   -2.5642008  0.0150885  -169.944  < 2e-16 ***
dest_hex1125    0.0628611  0.0069481     9.047  < 2e-16 ***
dest_hex1126    0.3569138  0.0067615    52.786  < 2e-16 ***
dest_hex1127    0.7236400  0.0066133   109.422  < 2e-16 ***
dest_hex1130   -0.0146367  0.0073837    -1.982 0.047446 *  
dest_hex1140   -1.1569765  0.0090647  -127.635  < 2e-16 ***
dest_hex1142   -1.2491134  0.0089665  -139.309  < 2e-16 ***
dest_hex1143   -1.0884172  0.0087340  -124.618  < 2e-16 ***
dest_hex1161    0.5773382  0.0067553    85.465  < 2e-16 ***
dest_hex1162    0.4690120  0.0068526    68.443  < 2e-16 ***
dest_hex1175   -0.0734752  0.0072275   -10.166  < 2e-16 ***
dest_hex1176    0.4381898  0.0067970    64.468  < 2e-16 ***
dest_hex1177   -0.3853478  0.0076661   -50.267  < 2e-16 ***
dest_hex1179    2.2630778  0.0062306   363.221  < 2e-16 ***
dest_hex1199    0.4025798  0.0072150    55.798  < 2e-16 ***
dest_hex1200   -0.6762771  0.0092053   -73.466  < 2e-16 ***
dest_hex1212   -0.1643054  0.0075384   -21.796  < 2e-16 ***
dest_hex1216   -0.2727566  0.0082131   -33.210  < 2e-16 ***
dest_hex1225   -0.1422108  0.0074263   -19.150  < 2e-16 ***
dest_hex1226    1.0987822  0.0064872   169.377  < 2e-16 ***
dest_hex1227   -0.6016470  0.0080614   -74.633  < 2e-16 ***
dest_hex1233   -0.6807825  0.0092629   -73.495  < 2e-16 ***
dest_hex1244    0.1669689  0.0071766    23.266  < 2e-16 ***
dest_hex1246    1.2791763  0.0065173   196.275  < 2e-16 ***
dest_hex1258    0.1541270  0.0071715    21.492  < 2e-16 ***
dest_hex1259    1.7005042  0.0063362   268.381  < 2e-16 ***
dest_hex1260    0.4606862  0.0068740    67.019  < 2e-16 ***
dest_hex1265   -0.2885851  0.0081072   -35.596  < 2e-16 ***
dest_hex1266   -0.2813900  0.0082108   -34.271  < 2e-16 ***
dest_hex1280    1.8756925  0.0063478   295.487  < 2e-16 ***
dest_hex1281    1.5692311  0.0064477   243.380  < 2e-16 ***
dest_hex1282    2.3574545  0.0062648   376.304  < 2e-16 ***
dest_hex1293    0.0279906  0.0073340     3.817 0.000135 ***
dest_hex1314    0.1224465  0.0075386    16.243  < 2e-16 ***
dest_hex1315   -0.2097570  0.0080537   -26.045  < 2e-16 ***
dest_hex1316    2.3820569  0.0062761   379.545  < 2e-16 ***
dest_hex1317    0.1707555  0.0077556    22.017  < 2e-16 ***
dest_hex1325    1.0387308  0.0066054   157.255  < 2e-16 ***
dest_hex1331   -0.3679158  0.0083980   -43.810  < 2e-16 ***
dest_hex1332   -0.2591731  0.0081912   -31.641  < 2e-16 ***
dest_hex1333   -0.0107700  0.0079167    -1.360 0.173700    
dest_hex1343    0.5209893  0.0069866    74.570  < 2e-16 ***
dest_hex1348    0.2024922  0.0075152    26.944  < 2e-16 ***
dest_hex1349   -0.0792529  0.0079755    -9.937  < 2e-16 ***
dest_hex1350   -1.3052145  0.0120194  -108.593  < 2e-16 ***
dest_hex1365    0.1470032  0.0076507    19.214  < 2e-16 ***
dest_hex1375    0.8582576  0.0067580   127.000  < 2e-16 ***
dest_hex1381    0.2779144  0.0074974    37.068  < 2e-16 ***
dest_hex1382    0.3647618  0.0074577    48.911  < 2e-16 ***
dest_hex1398    0.2858935  0.0075386    37.924  < 2e-16 ***
dest_hex1409    0.5060090  0.0070709    71.562  < 2e-16 ***
dest_hex1414   -1.0058322  0.0105913   -94.968  < 2e-16 ***
dest_hex1474    1.9899961  0.0063475   313.511  < 2e-16 ***
dest_hex1475    0.7436149  0.0069189   107.476  < 2e-16 ***
dest_hex1509    0.7091102  0.0069273   102.365  < 2e-16 ***
dest_hex1526    2.5601999  0.0062439   410.032  < 2e-16 ***
dest_hex1540    1.1513733  0.0067128   171.520  < 2e-16 ***
dest_hex1544    2.0955559  0.0063570   329.643  < 2e-16 ***
dest_hex1575    1.3892479  0.0065812   211.094  < 2e-16 ***
dest_hex1576    1.6025497  0.0064956   246.714  < 2e-16 ***
dest_hex1607    0.1885555  0.0077543    24.316  < 2e-16 ***
dest_hex1624   -0.0443418  0.0080126    -5.534 3.13e-08 ***
dest_hex1741    1.1029381  0.0069943   157.692  < 2e-16 ***
log(distance)  -0.8677553  0.0002854 -3040.280  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 57961929  on 21597  degrees of freedom
Residual deviance: 14941183  on 21294  degrees of freedom
AIC: 15089325

Number of Fisher Scoring iterations: 6

Model results reveal that:

The difference between null deviance and residual deviance is substantial, suggesting that the model predictor variable (distance) provides valuable information in explaining the variability in number of weekday evening peak hour train trips
This is substantiated by the origin and destination constrained model results, where higher number of trips are related to decreasing distances.

6.6 Model Diagnostics- R-squared

The goodness-of-fit test using \(R^2\) values is used to evaluate how well the models explain variations in number of O-D trips.

Define function for R2 calculation
Calculate R2 for each model

# Function to calculate R2 value

calc_r2 <- function(observed, estimated){
  r <- cor(observed, estimated)
  R2 <- r^2
  R2
}

calc_r2(originSIM$data$trips, originSIM$fitted.values)

[1] 0.2746338

calc_r2(destSIM$data$trips, destSIM$fitted.values)

[1] 0.2508897

calc_r2(dbcSIM$data$trips, dbcSIM$fitted.values)

[1] 0.615503

The \(R^2\) Values for each model are summarized below:

SIM	\(R^2\)
Origin-constrained	0.275
Destination-constrained	0.251
Doubly-constrained	0.615

We see that there is a marked improvement in \(R^2\) value in the doubly-constrained SIM compared to other singluar constrained models. This means that it accounts for ~62% of variation in number of trips. .

6.7 Model Diagnostics- RMSE

Root Mean Squared Error (RMSE) is a measure of RMSE how spread out the residuals (prediction errors) are – in general, it tells us how concentrated the data is around the line of best fit. A better fitting model thus has a lower RMSE score.

The following steps are taken to compute the RMSE for all models for comparison:

Save all models into a list
Compute RMSE

all_models <- list(
  origin_constrained = originSIM,
  destination_constrained = destSIM,
  doubly_constrained = dbcSIM)

compare_performance9) function computes the RMSE score for all models:

compare_performance(all_models,
                    metrics = "RMSE")

# Comparison of Model Performance Indices

Name                    | Model |     RMSE
------------------------------------------
origin_constrained      |   glm | 2238.430
destination_constrained |   glm | 2266.699
doubly_constrained      |   glm | 1624.140

The model comparison reveals that the doubly constrained model has the lowest RMSE score, and is the best fitting model out of all 3 SIMs. This result is consistent with the earlier \(R^2\) comparison, thus strongly positioning the doubly-constrained model as the best fit model.

6.8 Model Diagnostics - Fitted vs Observed values

Plotting the model’s fitted versus observed values could provide insights into the spread and linearity of the model; in general, a well-fitted model would exhibit a tight and linear relationship between the fitted and observed values. A scattered or non-linear pattern, on the other hand, may indicate that the model does not capture the underlying structure of the data.

Save fitted values as variables
Append to flow dataframe

originSIM_fitted <- as.data.frame(originSIM$fitted.values) %>%
  round(digits = 0)

destSIM_fitted <- as.data.frame(destSIM$fitted.values) %>%
  round(digits = 0)

dbcSIM_fitted <- as.data.frame(dbcSIM$fitted.values) %>%
  round(digits = 0)

flow_attr <- flow_attr %>%
  cbind(
    originSIM_fitted,
    destSIM_fitted,
    dbcSIM_fitted
  ) %>%
  rename(
    orc_trips = originSIM.fitted.values,
    destc_trips = destSIM.fitted.values,
    dbc_trips = dbcSIM.fitted.values
  )

code block

p_orc <- ggplot(
          data = flow_attr,
          aes(x = orc_trips,
              y = trips)
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    alpha = .6
  ) +
  xlim(0, 50000) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Origin-constrained") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank()
  )

p_destc <- ggplot(
            data = flow_attr,
            aes(x = destc_trips,
                y = trips)
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    color = "#4d5887",
    alpha = .6
  ) +
  xlim(0, 50000) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Destination-constrained") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank()
  )

p_dbc <- ggplot(
          data = flow_attr,
          aes(x = dbc_trips,
              y = trips)
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    color = "#6D435A",
    alpha = .6
  ) +
  xlim(0, 50000) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Doubly-constrained") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank()
  )

p_orc + p_destc + p_dbc

The scatterplots with original values above show that there is a stronger linear trend between fitted and observed values in the doubly-constrained model, compared to the deviation of points in the origin-constrained and destination-constrained models. However, due to the skewed nature of the distribution of trips, this may result in disproprotional or unmeaningful conclusions. To stabilize the variance of the data scales, we look at log transformed fitted versus observed trip values instead.

code block

log_orc <- ggplot(
          data = flow_attr,
          aes(x = log(orc_trips),
              y = log(trips))
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    alpha = .6
  ) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Log(Origin-constrained)") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.x = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.x = element_blank()
  )

log_destc <- ggplot(
            data = flow_attr,
            aes(x = log(destc_trips),
                y = log(trips))
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    color = "#4d5887",
    alpha = .6
  ) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Log(Destination-constrained)") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank()
  )

log_dbc <- ggplot(
          data = flow_attr,
          aes(x = log(dbc_trips),
              y = log(trips))
  ) +
  geom_point(
    size = flow_attr$trips/10000,
    color = "#6D435A",
    alpha = .6
  ) +
  geom_smooth(
    method = lm,
    se = TRUE
  ) +
  labs(title = "Log(Doubly-constrained)") +
  theme(
    plot.title = element_text(size = 10),
    axis.text.y = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks.x = element_blank(),
    axis.title.y = element_blank()
  )

log_orc + log_destc + log_dbc

The log-transformed scatterplots provide increased interpretability of the values – The doubly-constrained model still seems to have a closer linear relationship, while the points in the origin-constrained and destination-constrained models only taper off from the linear trend towards the top.

7 Key Takeaways from Study

From the Goodness-of-fit and Linearity tests conducted, the doubly constrained SIM consistently outperformed origin-constrained and destination-constrained SIMs. This suggests that:

The simplicity of the model is effective in capturing underlying patterns in the data. Despite having fewer variables compared to the other two models, the doubly-constrained SIM achieved better performance – thus indicating that the additional complexity of the other models may not necessarily improve the explanatory power of the models.
Distance between TAZs is thus the key explanatory variable influencing number of trips during Weekday Evening Peak Periods, where shorter distances tend to lead to higher number of trips. Even in the origin and destination-constrained models, distance had the highest negative association (around -0.79); this figure represents the effect of a unit of increase in distance on the expected number of trips. When we compare the top 10 trips by distance versus number of trips below, we see that the most frequent trips by passenger volume are ~10 times shorter in distance than the least popular trips:

Most popular trips recorded an average distance of ~3500m

code block

od_data_dist %>%
  st_drop_geometry() %>%
  arrange(desc(weekday_pm_trips)) %>%
  mutate(
    distance = round(distance,0)
  ) %>%
  slice_head(n = 10) %>%
  select(
    origin_hex,
    dest_hex,
    weekday_pm_trips,
    distance
  ) %>%
  datatable()

Least popular trips recorded an average distance of ~37000m

code block

od_data_dist %>%
  st_drop_geometry() %>%
  arrange(desc(distance)) %>%
  mutate(
    distance = round(distance,0)
  ) %>%
  slice_head(n = 15) %>%
  select(
    origin_hex,
    dest_hex,
    weekday_pm_trips,
    distance
  ) %>%
  datatable()

However, it is not indicative that the origin-constrained and destination-constrained SIMs are not valid, or that other variables have no explanatory value. There is a need for further calibration of the model, such as including data from a longer time-range sample (over 6 months)
The explanatory variables used for the SIMs are related to the quantity of specific types of facilities within each TAZ, but does not account for the quality of these features. More qualitative data such as types of shops in retail areas or popularity of F&B joints could be considered as other factors
Spatial Interaction Models assume independence among observations in each TAZ, and does not consider the relative attractiveness or propulsiveness of the neighbouring areas. Further spatial econometrics modifications could be made to the SIMs by adding weighted metrics to understand the relative influence of the neighborhood on the TAZs.