Adepeju, M.
Big Data Centre, Manchester Metropolitan University, Manchester, M15 6BH, UK
Author:
2024-07-24
Date:
Abstract
In light of the progressively limited access to comprehensive spatially and temporally logged point data, the stppSim package presents an alternate data solution that carries substantial promise across a spectrum of research and practical applications. This package equips users with the capability to specify the attributes of an assemblage of ‘agents’ (symbolic of entities like objects, individuals, etc.), whose activities within spatial (landscape) and temporal contexts yield fresh instances of point patterns and interactions within the surroundings. The resultant assemblage of points and patterns can subsequently be quantified, scrutinized, and processed to facilitate assessments and evaluations of spatial and/or temporal models.In numerous research scenarios, the availability of detailed
spatiotemporal (ST) point data is often greatly limited due to privacy
considerations. To tackle this issue, the R-stppSim
package
has been created with the purpose of offering a solution. It enables
users to replicate real-world data situations, thus offering an
alternative reservoir of spatiotemporal point patterns. The suggested
methodology employs microsimulation and agent-based methodologies to
generate a collection of ‘walkers’ (which can represent agents, objects,
individuals, etc.). These walkers possess defined movement
characteristics and engage with the surrounding environment.
The package includes two main functions: (i) psim_artif and (ii)
psim_real, both of which play a central role in simulating defined
spatiotemporal interactions within point data. The function
psim_artif
generates these interactions based on
user-provided parameters, effectively executing the simulation process
without relying on any existing point data. In contrast, the function
psim_real generates point interactions using the provided actual sample
dataset. This latter function proves particularly valuable in situations
where genuine point data is scarce or inadequate for practical
applications.
The following section describes three essential components of the simulation: the agents, the spatial factors, and the temporal aspects:
walkers
)The following properties defines the agents:
Movement - Agents or walkers possess the capacity to navigate in diverse directions and are equipped to identify obstacles or limitations along their trajectories. These movements are primarily governed by an inherent transition matrix (TM), which establishes two primary operational states: the exploratory state (where a walker is engaged in environmental exploration) and the performative state (where a walker is executing an action). The probabilistic characteristics of this TM introduce diversity in behavioral patterns among the walkers. To instigate a switch from one state to the other, a categorical distribution is assigned to a latent state variable \(z_{it}\), such that each step (in time) may result into the next state, independent of the previous state: \[z_t \sim Categorical(\Psi{_{1t}}, \Psi{_{2t}})\] Such that \(\Psi{_{i}}\) = Pr\((z_t = i)\), where \(\Psi{_{i}}\) is the fixed probability of being in state \(i\) at time \(t\), and \(\sum_{i=1}^{z}\Psi{_{i}}=1\)
Spatial perception
[s_threshold
] - Perception range of a walker
at a specified location is determined by the parameter
s_threshold
. As the walker changes its position, this
parameter undergoes an update. A common technique to set this parameter
is by visually representing the data and then selecting an estimate that
aligns with prior assumptions about the parameter. For many user cases,
this strategy is quite effective. For psim_artif
, users
need to specify a value. However, for psim_real
, the
best-suited s_threshold value
can be derived from the
available sample dataset.
Steps [step_length
] - The
furthest distance a walker travels from one location point to another
represents the step_length
, which essentially characterizes
the walker’s speed across an area. It’s vital to set the
step_length
judiciously, especially when the walker’s
movements are confined to tight pathways like a route network. Here, teh
chose value should be less than the pathway’s breadth.
Proportional ratios
[p_ratio
] - This refers to the density of
events produced by the walkers in a given space. Specifically, it
represents the fraction of total events stemming from a select group of
the most active starting points. Take, for instance, a
20:80
ratio: this suggests that 20% of starting points (or
walkers) are responsible for generating 80% of all point events. This
implies that starting points possess varying intensity values, which can
be leveraged to predict the eventual spatial distribution of these
events, termed as the spatial model
.
The followings are the key properties of a landscape:
Spatial bandwidth [s_band
]
The spatial bandwidth is utilized to identify event re-occurrences that
take place between two specific spatial thresholds. For instance,
setting a spatial bandwidth of 200m to 400m means the user aims to
pinpoint repeated events happening within this distance range. When
paired with the Temporal bandwidth (discussed
further below), this defines a comprehensive
spatiotemporal bandwidth
. Please note: This applies solely
to point pattern simulations created from scratch using the
psim_artif
function. For simulations grounded in actual
sample datasets, spatial bandwidths are automatically
identified.
Origins [coords
] - Walkers
originate from specific starting points, referred to as origins. These
origins can be randomly scattered throughout an area or may follow
particular spatial patterns. Each origin is characterized by its xy
coordinates. For instance, in the context of criminology, an offender
might be represented as a walker, with their home serving as the
origin.
There are two primary patterns in which origins can be concentrated: nucleated and dispersed, as highlighted by (Hornby and Jones, 1991). In a nucleated concentration, all origins cluster around a single central point. On the other hand, a dispersed concentration features multiple focal points, with origins possibly spread randomly throughout the area (refer to fig. 1 for illustration).
Boundary [poly
] - A
landscape has defined boundaries, either represented by a polygon
shapefile (known as poly
) or determined by the spatial
range of the sample point data.
Restrictions
[restriction_feat
] - Features that act as
barriers consist of two main components:
Regions outside of the defined boundary (poly
),
which have a maximum restriction value of 1
. This means
that walkers are prohibited from moving beyond this boundary.
Features inside the boundary that hinder movement. These can be specific types of land use or physical landforms, like fenced-off areas or hills.
To produce a restriction map, one typically follows a two-step process. For instance, when using a boundary shapefile of the Camden area in London (UK), a restriction map can be constructed in the following manner:
Step 1
: Generate boundary restriction
#load shapefile data
load(file = system.file("extdata", "camden.rda", package="stppSim"))
#extract boundary shapefile
boundary = camden$boundary # get boundary
#compute the restriction map
restrct_map <- space_restriction(shp = boundary,res = 20, binary = TRUE)
#plot the restriction map
plot(restrct_map)
Step 2
: Setting the restrct_map
above as
the basemap
, and then stack the land use features to define
the restrictions within the area,
# get landuse data
landuse = camden$landuse
#compute the restriction map
full_restrct_map <- space_restriction(shp = landuse,
baseMap = restrct_map, res = 20, field = "restrVal", background = 1)
#plot the restriction map
plot(full_restrct_map)
Figure 2 provides a graphical representation of both the boundary
extent and the restrictions posed by the within-features
.
These within-features
are categorized into three separate
classes, each having a unique restriction value as enumerated below:
0.5
0.7
0.9
These values indicate the relative restriction each land use type imposes on movement.
Within the simulation function, the boundary and the within-features
are inputted using the poly
and
restriction_feat
parameters, respectively. Both are
provided in the .shp
(shapefile) format.
n_foci
] -
Locations, or origins, that hold greater significance often present more
opportunities for event occurrences. This is specifically indicated when
utilizing psim_artif
. Users generally determine the number
of focal points they wish to simulate. In terms of urban landscape
structure, a focal point can equate to a
city/town centre
.Additionally, if there’s a principal focal point within a city, it
can be denoted using the mfocal
parameter. By default, the
value for mfocal
is set to NULL
.
There’s also a foci separation parameter that lets users define how close or far apart these focal points are from each other. This parameter accepts values ranging from 1 to 100. A value of 1 signifies the closest proximity, whereas 100 indicates the farthest distance between focal points.
The following parameters define the temporal dimension:
Temporal bandwidth
[t_band
] The temporal bandwidth is utilized
to identify event re-occurrences that take place between two specific
temporal thresholds. For instance, setting a spatial bandwidth of 2day
to 4days means the user aims to pinpoint repeated events happening
within this time range. When paired with the Spatial
bandwidth (discussed above), this defines a comprehensive
spatiotemporal bandwidth
. Similar to
spatial bandwidth', this applies solely to point pattern simulations created from scratch using the
psim_artif`
function. For simulations grounded in actual sample datasets, temporal
bandwidths are automatically identified.
Long-term trend [trend
] -
This parameter establishes the overarching trend of the time series that
is to be simulated. The trend can be categorized as stable
,
rising
, or falling
.
Stable
: Indicates that the time series remains
relatively constant over time, with no significant upward or downward
trend.
Rising
: Suggests an upward trend in the time series.
When this is selected, the supplementary slope
argument can
be employed to further define the incline of the trend as either
gentle
(a moderate increase) or steep
(a rapid
increase).
Falling
: Denotes a downward trend in the time
series. Similar to the rising trend, when this is chosen, the
slope
argument can be used to distinguish between a
gentle
decline or a steep
drop.
This parameter is pertinent only when simulating a time series from the scratch, without any pre-existing data.
fPeak
] - This
parameter sets the initial temporal peak of a sinusoidal pattern in a
time series, thereby dictating the medium-term undulations throughout
the series’ duration. For instance, a first peak set at 90
days denotes a seasonal cycle spanning 180
days in the time
series. This approach is primarily employed when the simulation’s
objective isn’t to produce spatiotemporal interactions but to capture
more general cyclic patterns within the data.Figure 3 depicts anticipated seasonal patterns determined by various
fPeak
values. Beginning at 90 days, each subsequent pattern
sees the fPeak
value augmented by one month. As the
fPeak
date is pushed forward, the number of full seasonal
cycles reduces.
The integration of the long-term trend
with the
seasonal peak
shapes the temporal model
for
the simulation. Before launching the actual simulation, it is advisable
to either preview or review this model to ensure accuracy and alignment
with objectives.
stppSim
From R
console, type:
#To install from `CRAN`
install.packages("stppSim")
#To install the `developmental version`, type:
remotes::install_github("MAnalytics/stppSim")
#Note: `remotes` is an extra package that needed to be installed prior to the running of this code.
Now, to load the package,
library(stppSim)
interactive
argumentBoth psim_artif
and psim_real
functions
include the interactive
argument, which is set to
FALSE
as the default setting. When the interactive argument
is toggled to TRUE
, the console displays queries during the
function’s execution, prompting the user to decide if they wish to view
the spatial and temporal models
of the simulation.
The spatial model
displays the origins’ locations and
their strength distribution across the simulated space. This strength
distribution provides an insight into how the eventual point (event)
distribution in the simulation is likely to be distributed.
On the other hand, the temporal model
offers a visual
representation of the expected trend and seasonal pattern, presented in
a smoothed manner.
Thus, by using the interactive
option, users are given
the advantage of reviewing both spatial and temporal patterns, ensuring
that they align with their expectations and objectives before moving
forward with the complete simulation.
Three essential arguments are necessary for the simulation:
n_events
- This refers to
the number of points to simulate
. Instead of providing just
a single value, it’s recommended to input a vector of values. For
instance, n_events = c(200, 500, 1000, 2000)
. The output is
presented as a list, with each value corresponding to a separate data
frame. Notably, the length of n_events
has minimal to no
impact on processing duration.
start_date
- This designates the commencement date
of the time series.
poly
- This represents
the polygon shapefile that demarcates the boundary of the study area
.
The simulated point patterns are restricted to occur within this
designated boundary.
By providing these arguments, users can customize the scope and specifics of their simulation to meet their research objectives.
To generate a spatiotemporal point pattern (stpp
) using
a boundary shapefile for the Camden Borough of London, which is embedded
in the package, you the following code:
#load the data
load(file = system.file("extdata", "camden.rda",
package="stppSim"))
boundary <- camden$boundary # get boundary data
#specifying data sizes
pt_sizes = c(200, 1000, 2000)
#simulate data
artif_stpp <- psim_artif(n_events=pt_sizes, start_date = "2021-01-01",
poly=boundary, n_origin=50, restriction_feat = NULL,
field = NA,
n_foci=5, foci_separation = 10, mfocal = NULL,
conc_type = "dispersed",
p_ratio = 20, s_threshold = 50, step_length = 20,
trend = "stable", fpeak=NULL,
slope = NULL,show.plot=FALSE, show.data=FALSE)
The processing time on an Intel Core i7-7500CPU @ 2.70GHz, 16.0GB RAM
PC is 12.5 minutes
. The processing time is increases to
45.2
minutes if landscape restriction is added.
Specifically, this increase occurs when the argument
restriction_feat = camden$landuse
is used, accompanied by
field = "val"
.
To retrieve the result of any n_events
, simply type the
object name with the value index. For example to retrieve the result
based on n_events = 1000
, type:
stpp_1000 <- artif_stpp[[2]]
The configuration and clustering of events in the spatial domain can
be fine-tuned by adjusting parameters that determine spatial components
(such as restriction_feat
, n_origin
,
mfocal
, foci_separation
, n_foci
,
s_band
, and so forth) as well as those that guide walker
behaviors (for example, step_length
,
s_threshold
, and p_ratio
). To introduce a
focal point in the simulation (refer to the mfocal
see
package manual), employ the make_grids
function. This
function produces an interactive map that displays and permits the
extraction of the xy coordinates from any location on the map. Enhanced
with an integrated OpenStreetMap
, the interactive platform
aids users in more conveniently pinpointing specific locations.
Figure 4
showcases the spatial point patterns
(spp
) for n_events = 1000
under diverse
parameter settings. Note:
The spatial configuration may
differ with each code execution due to inherent random aspects within
the function.
Figure 4a
displays the outcome when relying solely on
default arguments, as demonstrated in the previous code.
Figure 4b
presents the pattern resulting from the
integration of additional parameters:
restriction_feat = camden$landuse
and
mfocal = c(530000, 182250)
. Here, the first parameter
restricts the number of events created within the land use (restriction)
features, while the second emphasizes a central spatial concentration of
origins, highlighted by a red dot on the map.
Figure 4c
depicts the configuration when the parameters
of restriction_feat
and mfocal
are retained
(as in 4b), but with an added foci_separation = 50
. This
ensures a moderate spatial distance between individual origins.
Lastly, Figure 4d
illustrates the spatial pattern when,
besides maintaining the mfocal
setting (similar to the
above figures), the s_threshold
and
step_length
are set at 250
and 50
respectively. This configuration aims to promote a broader distribution
of points relative to their origins.