Spatial data analaysis Training

Yebelay Berehan

Debre Markos University + C4ED



2024-07-27

Outlines

  • What is spatial data and why should we use it?

  • Types of spatial data

  • Spatial data in R: how to store, load, and tidy spatial data

  • Visualizing spatial data

  • Exploring spatial data: Spatial authorization and spatial statistics to quantify relationships

What is Spatial data and why should we use it?

  • Spatial data, also known as geospatial data, refers to information that identifies the geographic location and characteristics of natural or constructed features and boundaries on the Earth.

  • This data is often represented in terms of Cartesian coordinates (x,y) for two-dimensional maps, but may also include altitude (z) for a three-dimensional representation.

  • Spatial data can come in various forms including points, lines, and polygons.

  • The concept of spatial data is integral to a variety of applications that require an understanding of how different elements relate to each other within a geographical space.

  • This data can be collected through various means, including but not limited to, satellite imagery, aerial photography, and ground-based surveys.

Spatial Data Formats

  • Let us see a basic way to represent the spatial data.

  • But there is a variety of data formats to represent the data to suit different applications.

  • In most cases, spatial data formats are an extension of existing data formats.

    Type Non-Spatial Data Spatial Data
    Text csv, json, xml csv, geojson, gml, kml
    Binary/Compressed pdf, xls, zip shapefile, geopdf, geopackage
    Images tiff, jpg, png geotiff, jpeg2000
    Databases SQLite, PostgreSQL, Oracle Spatialite, PostGIS, Oracle Spatial

Spatial Data Types

Spatial Data can be broadly categorized into 2 types - Vector and Raster.

Type Sub Types Examples
Vector Data Points: Represents features with a single coordinate pair (x, y). Locations of ATMs, tree
Lines: Represents linear features as ordered sequences of points. Roads, rivers, utility lines
Polygons: Represents areas enclosed by closed loops of lines. Buildings, lakes, zones
Raster Data Grids: Represents continuous data across a surface. Satellite images, digital elevation models (DEMs)
Pixels: Smallest units in a raster dataset, each with a specific value. Values representing color, temperature,

Other data associated with Vector and Raster

  • Attribute data: Additional information describing the characteristics of spatial features.
  • For example in Vector data:
    • Schools: have school name, number of students, education level.
    • Roads:have road type, traffic volume, maintenance status.
    • Regions: have population density, average income, and land use type.

In Raster Data: Each pixel can have multiple attributes, such as vegetation type or pollution levels.

What are the components of spatial data

  • Geometry: Refers to the coordinates that define the shape of an object.
    • Coordinates: These can be in different coordinate systems, such as geographic (latitude) and longitude) or projected systems.
    • Types of Geometries: Common types include points, lines, and polygons.
    • Dimension: Geometry can also have dimensional attributes like 2D (x, y), 3D (x, y, z), or even 4D (x, y, z, time).
  • Topology: Refers to the spatial relationships between geometric entities.
    • Spatial Relationships
      • Adjacency: How entities are next to each other.
      • Connectivity: How entities are connected.
      • Containment: How one entity is contained within another.
    • Rules and Constraints: Enforces rules such as:
      • No overlapping polygons.
      • Lines meeting at nodes.
      • Ensuring polygon boundaries are closed loops.

Attribute Data: provide detailed information about spatial features.

  • Example: A point representing a city may have attributes like population, elevation, and city name.

  • Types of Attributes: Attribute data is either,

    • Numeric (population size),
    • Categorical (land use type), or
    • Temporal (date of a satellite image).
  • Attribute data is often stored in tables and linked to geometry through unique identifiers.

How does the workflow of handling spatial data effectively

Data Collection: Data can be collected in either

  • Satellite Imagery: Captured by remote sensing satellites, these images can provide data on land cover, vegetation, weather patterns, and more.

  • GPS Surveys: Use Global Positioning System technology to collect precise location data for mapping and navigation purposes.

  • Traditional Surveying Methods: Involve measuring angles, distances, and elevations to map out areas accurately, often using tools like theodolites and total stations.

Data Storage:

  • Shapefiles: A popular vector data format that stores geometric locations and associated attribute data.
    • It consists of at least three files with extensions .shp (geometry), .shx (shape index), and .dbf (attribute data).
  • GeoJSON: A format for encoding a variety of geographic data structures using JavaScript Object Notation (JSON).
  • GeoTIFF: A raster format that includes spatial metadata embedded within the file, making it suitable for georeferenced raster data like satellite images.
  • Spatial Databases: PostgreSQL with PostGIS extensions enables the storage, querying, and manipulation of spatial data within a relational database system, supporting complex spatial operations and large datasets.

Practice for Spatial Data in R

  • Spatial Data: Information about the location and shape of geographic features.
  • Types: Vector Data and Raster Data.

Shapefile

  • Shapefile: A common format for representing vector data.
    • .shp: Contains geometry data.
    • .shx: Positional index of the geometry data.
    • .dbf: Stores attributes for each shape.
  • Additional Files:
    • .prj: Describes the projection (plain text).
    • .sbn and .sbx: Spatial indices of the geometry data.
    • .shp.xml: Contains spatial metadata in XML format.

The sf package for spatial vector data

  • The sf package stores geometric features (termed simple features, hence sf) in a data frame.

  • Geographic data is stored in the special geometry column.

  • The sf package is a modern alternative to the traditional sp, rgeos, and rgdal packages.

  • Supports a wide range of geometry types: points, lines, polygons, and their ‘multi’ versions.

  • Useful for handling geographic vector data.

Understanding Shapefiles

  • Shapefiles are a simple non-topological format to store geometric location and attribute information for geographical features.

  • They are commonly used in spatial analyses as a type of map.

  • We can read a shapefile or a sf object with the st_read() function of sf.

  • For example, here we read the Eth_Region_2013.shp shapefile of sf which contains the regions of Ethiopia, with their name, and Region and country code.

Reading a Shapefile with sf

Code
library(sf)
ethR_shape <- st_read("Ethiopia_All/Eth_Region_2013.shp", 
                      quiet = TRUE)
class(ethR_shape)
[1] "sf"         "data.frame"
Code
head(ethR_shape, 2)
Simple feature collection with 2 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 460361.8 ymin: 976360 xmax: 864787.5 ymax: 1602973
Projected CRS: Adindan / UTM zone 37N
   REGIONNAME REG_P_CODE REG_Pcode      HRname HRpcode HRparent
1 Addis Ababa         14      ET14 Addis Ababa    ET14       ET
2        Afar          2      ET02        Afar    ET02       ET
                        geometry
1 MULTIPOLYGON (((475625.4 10...
2 MULTIPOLYGON (((626409.3 16...
  • The sf object ethR_shape is a data.frame containing a collection with

    • [11 simple features (rows)] {style=“color:#ae01c7;”} and 6 attributes (columns) [plus a list-column with the geometry of each feature] {style=“color:red;”}.
  • A sf object contains the following objects of class sf, sfc and sfg:

    • sf (simple feature): each row of the data.frame is a single simple feature consisting of attributes and geometry.
    • sfc (simple feature geometry list-column): the geometry column of the data.frame is a list-column of class sfc with the geometry of each simple feature.
    • sfg (simple feature geometry): each of the rows of the sfc list-column corresponds to the simple feature geometry (sfg) of a single simple feature.
  • The sf package stores geometric features in a data frame.

  • Geographic data is stored in the special geometry column.

  • Using dim() and names() functions we can see how many rows and columns it contains, as well as the names of the columns.
Code
dim(ethR_shape)
[1] 11  7
Code
names(ethR_shape)
[1] "REGIONNAME" "REG_P_CODE" "REG_Pcode"  "HRname"     "HRpcode"   
[6] "HRparent"   "geometry"  
  • Each row represents a geographic element, in this case, one of the 11 regions of Ethiopia.
  • Columns contain information about the region, including the name and administrative code.
  • The last column is the geometry column.

Displaying the First Few Geometries

Code
head(ethR_shape$geometry)
Geometry set for 6 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -162404.5 ymin: 695805 xmax: 866537.7 ymax: 1602973
Projected CRS: Adindan / UTM zone 37N
First 5 geometries:

Each region is captured as a multipolygon geometry.

Plotting the Shapefile

Code
eth_reg <- ethR_shape %>%
  dplyr::select(region_name = REGIONNAME)
plot(eth_reg, main = "Region name")
  • The default behavior of plot with an sf object is to create a map for each column of data that describes the geography.

Plotting with ggplot2

Code
library(ggplot2)
region_map <- ggplot(ethR_shape) +geom_sf(aes(fill=REGIONNAME))+
  scale_fill_discrete("Region") + coord_sf(datum = NA) +
  theme_bw() + labs(x = NULL, y = NULL) + 
  ggtitle("Map of regions in Ethiopia")
print(region_map)
  • ggplot2 is flexible once you learn the syntax.
  • The plot shows each region filled with different colors.

Adding Labels to the Map

  • Adding labels to each region ensures they do not overlap.
Code
map_label <- ggplot() + geom_sf(data = ethR_shape) +
  ggrepel::geom_label_repel(data = ethR_shape,
    aes(label = REGIONNAME, geometry = geometry),
    stat = "sf_coordinates",min.segment.length = 0) + 
  labs(x = NULL, y = NULL)+ coord_sf(datum = NA) +  theme_bw()
print(map_label)
  • The st_geometry() function can be used to retrieve the simple feature geometry list-column (sfc).
Code
# Geometries printed in abbreviated form 
st_geometry(ethR_shape) # View complete geometry 
Geometry set for 11 features 
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -162404.5 ymin: 375657.9 xmax: 1491092 ymax: 1641360
Projected CRS: Adindan / UTM zone 37N
First 5 geometries:
Code
st_geometry(ethR_shape)[[1]]

Creating a sf object

  • We can use the st_sf() function to create a sf object by providing two elements, namely, a data.frame with the attributes of each feature, and a simple feature geometry list-column sfc containing simple feature geometries sfg.

  • In more detail, we create simple feature geometries sfg and use the st_sfc() function to create a simple feature geometry list-column sfc with them.

  • Then, we use st_sf() to put the data.frame with the attributes and the simple feature geometry list-column sfc together.

Simple feature geometries sfg objects can be, for example, of type POINT (single point), MULTIPOINT (set of points) or POLYGON (polygon), and can be created with st_point(), st_multipoint() and st_polygon(), respectively.

  • Here, we create a sf object containing two single points, a set of points, and a polygon, with one attribute.

  • First, we create the simple feature geometry objects (sfg) of type POINT, MULTIPOINT, and POLYGON.

  • Then, we use st_sfc() to create a simple feature geometry list-column sfc with the sfg objects.

  • Finally, we use st_sf() to put the data.frame with the attribute and the simple feature geometry list-column sfc together.

Code
# Single point (point as a vector) 
p1_sfg <- st_point(c(2, 2)) 
p2_sfg <- st_point(c(2.5, 3))  # Set of points 
p <- rbind(c(6, 2), c(6.1, 2.6), c(6.8, 2.5), 
           c(6.2, 1.5), c(6.8, 1.8)) 
mp_sfg <- st_multipoint(p)  
Code
#|ccolumn: true
p1 <- rbind(c(10, 0), c(11, 0), c(13, 2), c(12, 4), 
            c(11, 4), c(10, 0)) 
p2 <- rbind(c(11, 1), c(11, 2), c(12, 2), c(11, 1)) 
pol_sfg <- st_polygon(list(p1, p2))  # Create sf object 
p_sfc <- st_sfc(p1_sfg, p2_sfg, mp_sfg, pol_sfg) 
df <- data.frame(v1 = c("A", "B", "C", "D"))

# Plot single points, set of points and polygon
p_sf <- st_sf(df, geometry = p_sfc)  
library(ggplot2) 
ggplot(p_sf) + geom_sf(aes(col = v1), size = 3) + 
  theme_bw()

st_*() functions

Common functions to manipulate sf objects include the following:

  • We can delete some of the polygons by taking a subset of the rows of map.
Code
# Delete polygon 
map <- ethR_shape %>%
  st_filter(ethR_shape[ethR_shape$REG_P_CODE %in% c("3", "4"), ])

ggplot(map) + geom_sf(aes(fill = HRname))  
# Combine geometries 
ggplot(st_union(map, by_feature=FALSE) %>% st_sf())+geom_sf() 
# Simplify
ggplot(st_simplify(map, dTolerance =10000))+geom_sf()
Code

Code

sf object obtained by deleting some of its polygons (top), combining polygons (middle), and simplifying polygons (bottom).

Transforming Point Data to an sf Object

  • The st_as_sf() function allows us to convert a foreign object to an sf object.

  • This can be specifying in the argument coords the name of the columns that contain the point coordinates.

  • Example: Converting a Data Frame to an sf Object

  • Here, we use st_as_sf() to turn a data frame containing coordinates long and lat and two variables place and value into an sf object.

  • Then, we use st_crs() to set the coordinate reference system given by the EPSG code 4326 to represent longitude and latitude coordinates.

Code
library(sf)
library(mapview)
d <- data.frame(
  place = c("Ethiopia", "Kenya", "Somalia", "Sudan"),
  long = c(39.9559, 37.9062, 45.0792, 30.2176),
  lat = c(9.145, -1.286389, 2.046934, 19.6133),
  value = c(200, 150, 100, 300))
class(d)
[1] "data.frame"
Code
dsf <- st_as_sf(d, coords = c("long", "lat"))
st_crs(dsf) <- 4326
class(dsf)
[1] "sf"         "data.frame"
Code
mapview(dsf)

Counting the Number of Points within Polygons

Using st_intersects()

  • We can use the st_intersects() function of sf to count the number of points within the polygons of an sf object.
  • The returned object is a list with feature ids intersected in each of the polygons.
  • We can use the lengths() function to calculate the number of points inside each feature.

Example: Counting Points within Polygons

In this example, we create a map with divisions (an sf object) and generate random points over the map.

We then count the number of points within each polygon using st_intersects() and visualize the results with ggplot2.

Code
library(sf)
library(ggplot2)
# Points over map (simple feature geometry list-column sfc)
points <- st_sample(map, size = 100)
# Map of points within polygons
ggplot() + geom_sf(data = map) + geom_sf(data = points)

Code
# Intersection (first argument map, then points)
inter <- st_intersects(map, points)
# Add point count to each polygon
map$count <- lengths(inter)
# Map of number of points within polygons
ggplot(map) + geom_sf(aes(fill = count))

Identifying Polygons Containing Points

Using st_intersects()

  • Given an sf object with points and an sf object with polygons, we can use the st_intersects() function to obtain the polygon each of the points belongs to.

Example: Identifying Polygons Containing Points

  • In this example, we create a map with divisions (an sf object) and generate three random points over the map.

  • We then identify which polygons contain these points and add the polygon names to the points data.

Code
library(sf)
library(ggplot2)

# Points over map (sf object)
points <- st_sample(map, size = 3) %>% st_as_sf()
inter <- st_intersects(points, map)# first points, then map
points$areaname <- map[unlist(inter), "HRname", drop = TRUE] # drop geometry
points
Simple feature collection with 3 features and 1 field
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 28324.07 ymin: 663273 xmax: 851951.7 ymax: 1141708
Projected CRS: Adindan / UTM zone 37N
                         x         areaname
1  POINT (851951.7 663273)           Somali
2 POINT (28324.07 1141708) Beneshangul Gumu
3  POINT (662342.5 945727)           Oromia
Code
# Map
ggplot(map) + geom_sf() + geom_sf(data = points) + 
 geom_sf_label(data = map[unlist(inter), ], aes(label = HRname), nudge_y = 0.2)

Joining Map and Data

Using left_join()

  • Sometimes, a map and its corresponding data are available separately and we may wish to create an sf object representing the map with the added data that we can manipulate and plot.

  • We can create an sf map with the data attributes by joining the map and the data with the left_join() function of the dplyr package.

Example: Adding Air Pollution Data to a World Map

First, we use the ne_countries() function of rnaturalearth to download the world map with the country polygons of class sf.

Code
library(rnaturalearth)
map <- ne_countries(returnclass = "sf")
  • Then, we use the wbstats package to download a data frame of air pollution data from the World Bank.
    • Specifically, we search the pollution indicators with wb_search(), and use wb_data() to download PM2.5 in the year 2016 by specifying the indicator corresponding to PM2.5, and the start and end dates.
Code
library(wbstats)
indicators <- wb_search(pattern = "pollution")
#d <- wb_data(indicator = "EN.ATM.PM25.MC.M3", start_date = 2016, end_date = 2016)
  • Next, we use the left_join() function of dplyr to join the map and the data, specifying the argument by with the variables we wish to join by. Here, we use the ISO3 standard code of the countries rather than the country names, since names can be written differently in the map and the data frame.
Code
library(dplyr)
library(ggplot2)
library(viridis)

map1 <- left_join(map, d, by = c("iso_a3" = "iso3c"))
ggplot(map1) + geom_sf(aes(fill = EN.ATM.PM25.MC.M3)) +
  scale_fill_viridis() + labs(fill = "PM2.5") + theme_bw()

Note

When we use left_join(), the class of the resulting object is the same as the class of the first argument.

Code
map1 <- left_join(map, d, by = c("iso_a3" = "iso3c"))
class(map1)
# [1] "sf" "data.frame"

d1 <- left_join(d, map, by = c("iso3c" = "iso_a3"))
class(d1)
# [1] "data.frame"

Raster Data

  • Raster Data: A spatial data structure that divides the study region into equal-sized rectangles called cells or pixels, storing one or more values for each cell.
  • Uses: Represent spatially continuous phenomena such as elevation, temperature, or air pollution values.
  • Main Packages:
    • terra: Primary package for working with raster data. Also supports vector data.
    • raster: Previously used for raster data; terra is faster and has more functionality.
    • stars: Used for analyzing raster data and spatial data cubes (arrays with spatial dimensions).

GeoTIFF

  • GeoTIFF: A common format for raster data, with the extension .tif.
  • Example: Reading the elev.tif file from the terra package, representing elevation in Luxembourg.
Code
# Load the terra package 
library(terra)
pathraster <- system.file("ex/elev.tif", package = "terra")
r <- terra::rast(pathraster)
print(r)
class       : SpatRaster 
dimensions  : 90, 95, 1  (nrow, ncol, nlyr)
resolution  : 0.008333333, 0.008333333  (x, y)
extent      : 5.741667, 6.533333, 49.44167, 50.19167  (xmin, xmax, ymin, ymax)
coord. ref. : lon/lat WGS 84 (EPSG:4326) 
source      : elev.tif 
name        : elevation 
min value   :       141 
max value   :       547 
Code
# Plot the raster data
plot(r)

EPSG Codes and CRS Transformation

EPSG Codes

  • Most common CRSs can be specified by providing their EPSG (European Petroleum Survey Group) codes or their Proj4 strings.

Details of a given projection can be inspected using the st_crs() function of the sf package.

  • Example: EPSG Code 4326

  • EPSG code 4326 refers to the WGS84 longitude/latitude projection.

Code
st_crs("EPSG:4326")$Name
[1] "WGS 84"
Code
st_crs("EPSG:4326")$proj4string
[1] "+proj=longlat +datum=WGS84 +no_defs"
Code
st_crs("EPSG:4326")$epsg
[1] 4326

Transforming CRS with sf and terra

  • Functions sf::st_crs() and terra::crs() allow us to get the CRS of spatial data.

  • These functions also allow us to set a CRS to spatial data by using st_crs(x) <- value if x is a sf object, and crs(r) <- value if r is a raster.

Important Note

  • Setting a CRS does not transform the data; it just changes the CRS label.

  • We may want to set a CRS to data that does not come with CRS, and the CRS should be what it is, not what we would like it to be.

  • We use sf::st_transform() and terra::project() to transform the sf or raster data, respectively, to a new CRS.

Working with sf Package

Reading and Getting the CRS

Code
library(sf)
pathshp <- system.file("shape/nc.shp", package = "sf")
map <- st_read(pathshp, quiet = TRUE)
# Get CRS
st_crs(map)
Coordinate Reference System:
  User input: NAD27 
  wkt:
GEOGCRS["NAD27",
    DATUM["North American Datum 1927",
        ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
            LENGTHUNIT["metre",1]]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["latitude",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["longitude",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    ID["EPSG",4267]]

Transforming the CRS

Code
# Transform CRS
map2 <- st_transform(map, crs = "EPSG:4326")

# Get CRS of transformed data
st_crs(map2)
Coordinate Reference System:
  User input: EPSG:4326 
  wkt:
GEOGCRS["WGS 84",
    ENSEMBLE["World Geodetic System 1984 ensemble",
        MEMBER["World Geodetic System 1984 (Transit)"],
        MEMBER["World Geodetic System 1984 (G730)"],
        MEMBER["World Geodetic System 1984 (G873)"],
        MEMBER["World Geodetic System 1984 (G1150)"],
        MEMBER["World Geodetic System 1984 (G1674)"],
        MEMBER["World Geodetic System 1984 (G1762)"],
        MEMBER["World Geodetic System 1984 (G2139)"],
        ELLIPSOID["WGS 84",6378137,298.257223563,
            LENGTHUNIT["metre",1]],
        ENSEMBLEACCURACY[2.0]],
    PRIMEM["Greenwich",0,
        ANGLEUNIT["degree",0.0174532925199433]],
    CS[ellipsoidal,2],
        AXIS["geodetic latitude (Lat)",north,
            ORDER[1],
            ANGLEUNIT["degree",0.0174532925199433]],
        AXIS["geodetic longitude (Lon)",east,
            ORDER[2],
            ANGLEUNIT["degree",0.0174532925199433]],
    USAGE[
        SCOPE["Horizontal component of 3D system."],
        AREA["World."],
        BBOX[-90,-180,90,180]],
    ID["EPSG",4326]]

Working with terra Package

Reading and Getting the CRS of a Raster

Code
library(terra)
pathraster <- system.file("ex/elev.tif", package = "terra")
r <- rast(pathraster)
# Get CRS
crs(r)
[1] "GEOGCRS[\"WGS 84\",\n    ENSEMBLE[\"World Geodetic System 1984 ensemble\",\n        MEMBER[\"World Geodetic System 1984 (Transit)\"],\n        MEMBER[\"World Geodetic System 1984 (G730)\"],\n        MEMBER[\"World Geodetic System 1984 (G873)\"],\n        MEMBER[\"World Geodetic System 1984 (G1150)\"],\n        MEMBER[\"World Geodetic System 1984 (G1674)\"],\n        MEMBER[\"World Geodetic System 1984 (G1762)\"],\n        MEMBER[\"World Geodetic System 1984 (G2139)\"],\n        ELLIPSOID[\"WGS 84\",6378137,298.257223563,\n            LENGTHUNIT[\"metre\",1]],\n        ENSEMBLEACCURACY[2.0]],\n    PRIMEM[\"Greenwich\",0,\n        ANGLEUNIT[\"degree\",0.0174532925199433]],\n    CS[ellipsoidal,2],\n        AXIS[\"geodetic latitude (Lat)\",north,\n            ORDER[1],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        AXIS[\"geodetic longitude (Lon)\",east,\n            ORDER[2],\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n    USAGE[\n        SCOPE[\"Horizontal component of 3D system.\"],\n        AREA[\"World.\"],\n        BBOX[-90,-180,90,180]],\n    ID[\"EPSG\",4326]]"

Transforming the CRS of a Raster

Code
# Transform CRS
r2 <- terra::project(r, "EPSG:2169")
# Get CRS of transformed data
crs(r2)
[1] "PROJCRS[\"LUREF / Luxembourg TM\",\n    BASEGEOGCRS[\"LUREF\",\n        DATUM[\"Luxembourg Reference Frame\",\n            ELLIPSOID[\"International 1924\",6378388,297,\n                LENGTHUNIT[\"metre\",1]]],\n        PRIMEM[\"Greenwich\",0,\n            ANGLEUNIT[\"degree\",0.0174532925199433]],\n        ID[\"EPSG\",4181]],\n    CONVERSION[\"Luxembourg TM\",\n        METHOD[\"Transverse Mercator\",\n            ID[\"EPSG\",9807]],\n        PARAMETER[\"Latitude of natural origin\",49.8333333333333,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8801]],\n        PARAMETER[\"Longitude of natural origin\",6.16666666666667,\n            ANGLEUNIT[\"degree\",0.0174532925199433],\n            ID[\"EPSG\",8802]],\n        PARAMETER[\"Scale factor at natural origin\",1,\n            SCALEUNIT[\"unity\",1],\n            ID[\"EPSG\",8805]],\n        PARAMETER[\"False easting\",80000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8806]],\n        PARAMETER[\"False northing\",100000,\n            LENGTHUNIT[\"metre\",1],\n            ID[\"EPSG\",8807]]],\n    CS[Cartesian,2],\n        AXIS[\"northing (X)\",north,\n            ORDER[1],\n            LENGTHUNIT[\"metre\",1]],\n        AXIS[\"easting (Y)\",east,\n            ORDER[2],\n            LENGTHUNIT[\"metre\",1]],\n    USAGE[\n        SCOPE[\"Engineering survey, topographic mapping.\"],\n        AREA[\"Luxembourg.\"],\n        BBOX[49.44,5.73,50.19,6.53]],\n    ID[\"EPSG\",2169]]"

The terra Package for Raster and Vector Data

The terra package (Hijmans 2022) provides functions to create, read, manipulate, and write raster and vector data. Raster data represents spatially continuous phenomena through a grid of equally sized cells, while vector data includes points, lines, and polygons with associated attributes. This chapter demonstrates how to handle raster and vector data using terra.

Raster Data

Creating and Reading Raster Data

The rast() function can create and read raster data. The writeRaster() function allows writing raster data. Here, we read elevation data for Luxembourg from a file provided by terra.

Code
library(terra) 
pathraster <- system.file("ex/elev.tif", package = "terra") 
r <- rast(pathraster)
plot(r)

We can also create a SpatRaster object by specifying dimensions and extents.

Raster Operations

Several functions provide information about raster size and dimensions.

Code
nrow(r) # number of rows
[1] 90
Code
ncol(r) # number of columns
[1] 95
Code
dim(r) # dimensions
[1] 90 95  1
Code
ncell(r) # number of cells
[1] 8550

The values() function sets and accesses raster values.

Code
values(r) <- 1:ncell(r)

Creating multilayer rasters and subsetting layers is straightforward.

Code
r2 <- r * r
s <- c(r, r2)
plot(s[[2]]) # layer 2

Generic operations on rasters include:

Code
plot(min(s))

Code
plot(r + r + 10)

Code
plot(round(r))

Code
plot(r == 1)

Vector Data

The SpatVector class handles vector data with attributes. The vect() function reads shapefiles, and writeVector() writes SpatVector objects. Here, we obtain a map of Luxembourg’s divisions.

Code
pathshp <- system.file("ex/lux.shp", package = "terra")
v <- vect(pathshp)

Creating a SpatVector with point locations and attributes:

Code
long <- c(-0.118092, 2.349014, -3.703339, 12.496366)
lat <- c(51.509865, 48.864716, 40.416729, 41.902782)
longlat <- cbind(long, lat)

crspoints <- "+proj=longlat +datum=WGS84"
d <- data.frame(place = c("London", "Paris", "Madrid", "Rome"), value = c(200, 300, 400, 500))
pts <- vect(longlat, atts = d, crs = crspoints)
plot(pts)

Cropping, Masking, and Aggregating Raster Data

The terra package provides functions to crop, mask, and aggregate raster data. Here, we demonstrate these operations with temperature data for Spain.

Downloading Data

Code
library(terra)
r <- geodata::worldclim_country(country = "Spain", var = "tavg", res = 10, path = tempdir())
plot(r)

Averaging Temperature Data

Code
r <- mean(r)
plot(r)

Cropping and Masking Data

Code
library(ggplot2)
library(terra)
map <- rnaturalearth::ne_states("Spain", returnclass = "sf")
map <- map[-which(map$region == "Canary Is."), ] # delete region
ggplot(map) + geom_sf()

Aggregating Data

Code
r <- terra::aggregate(r, fact = 20, fun = "mean", na.rm = TRUE)
plot(r)

Extracting Raster Values at Points

The extract() function retrieves raster values at specified points.

Example

Code
library(terra)
r <- rast(system.file("ex/elev.tif", package = "terra"))
v <- vect(system.file("ex/lux.shp", package = "terra"))

points <- crds(centroids(v))

plot(r)
plot(v, add = TRUE)
points(points)

Extracting Values

Code
points <- as.data.frame(points)
valuesatpoints <- extract(r, points)
cbind(points, valuesatpoints)
          x        y ID elevation
1  6.009082 50.07064  1       444
2  6.127425 49.86614  2       295
3  5.886502 49.80014  3       382
4  6.165081 49.92886  4       404
5  5.914545 49.93892  5       414
6  6.378449 49.78511  6       320
7  6.311601 49.54569  7       193
8  6.346395 49.68742  8       228
9  5.963503 49.64159  9       313
10 6.023816 49.52331 10       282
11 6.167624 49.61815 11       328
12 6.113598 49.75744 12       221

Extracting and Averaging Raster Values

Extracting raster values within polygons and computing area-weighted averages.

Extracting Values Within Polygons

Code
# Extracted raster cells within each polygon
head(extract(r, v, na.rm = TRUE))
  ID elevation
1  1       547
2  1       485
3  1       497
4  1       515
5  1       515
6  1       515
Code
# Extracted raster cells and percentage of area
# covered within each polygon
head(extract(r, v, na.rm = TRUE, weights = TRUE))
  ID elevation     weight
1  1        NA 0.04545454
2  1        NA 0.10909091
3  1       529 0.24545454
4  1       542 0.46363635
5  1       547 0.68181816
6  1       535 0.11818181

Averaging Values

Code
# Average raster values by polygon
v$avg <- extract(r, v, mean, na.rm = TRUE)$elevation

# Area-weighted average raster values by polygon (weights = TRUE)
v$weightedavg <- extract(r, v, mean, na.rm = TRUE, weights = TRUE)$elevation

library(ggplot2)
library(tidyterra)

# Plot average raster values within polygons
ggplot(data = v) + geom_spatvector(aes(fill = avg)) + scale_fill_terrain_c()

Code
# Plot area-weighted average raster values within polygons
ggplot(data = v) + geom_spatvector(aes(fill = weightedavg)) + scale_fill_terrain_c()

Introduction to Mapping in R

Importance of Maps

  • Maps are essential tools for visualizing spatial data.
  • They help to reveal patterns and trends that are not immediately apparent in tabular data.
  • Useful in various fields such as geography, urban planning, epidemiology, and environmental science.

Packages Covered

  • ggplot2: A powerful package for creating static maps based on the grammar of graphics.
  • leaflet: A package for creating interactive maps using the Leaflet JavaScript library.
  • mapview: A package for quick and easy interactive map visualization.
  • tmap: A versatile package that can create both static and interactive maps.
  • flowmapblue: A package for visualizing mobility flows with interactive maps.

Preparing Areal Data

Data Source

  • The data used is from a study on sudden infant deaths in North Carolina, USA, for the years 1974 and 1979.

Loading Data

  • Use the sf package to load shapefile data.
Code
library(sf)
nameshp <- system.file("shape/nc.shp", package = "sf")
d <- st_read(nameshp, quiet = TRUE)
d$vble <- d$SID74
d$vble2 <- d$SID79

Static Maps with ggplot2

Grammar of Graphics

  • ggplot2 is based on the “grammar of graphics”, which provides a structured way to describe and build graphs.

  • You define aesthetics (like colors and shapes) and layers (like points, lines, and polygons).

Creating a Map

  • First, load the required libraries.

  • Use geom_sf() to plot the spatial data.

  • Customize the map with color scales and themes.

Code
library(ggplot2)
library(viridis)
ggplot(d) + geom_sf(aes(fill = vble)) +
  scale_fill_viridis() + theme_bw()

Saving Plots

  • Save your plots to a file using ggsave().

    Interactive Maps with leaflet

    Creating a Leaflet Map

    • Leaflet maps are interactive and require transforming the coordinate reference system (CRS) to EPSG:4326.
Code
d <- st_transform(d, 4326)

Define Color Palette and Create Map

  • Define a color palette using colorNumeric().

  • Create the map and add polygons with addPolygons().

  • Add a legend with addLegend().

    Code
    library(leaflet)
    pal <- colorNumeric(palette = "YlOrRd", domain = d$vble)
    leaflet(d) %>% addTiles() %>%
      addPolygons(color = "white", fillColor = ~ pal(vble),
                  fillOpacity = 0.8) %>%
      addLegend(pal = pal, values = ~vble, opacity = 0.8)

Quick Maps with mapview

Creating a Map

  • Use mapview() for quick and simple interactive maps.
Code
library(mapview)
mapview(d, zcol = "vble")

Customizing Maps

  • Customize the map by changing the background, color palette, and legend title.
Code
library(RColorBrewer)
palette <- colorRampPalette(brewer.pal(9, "YlOrRd"))

mapview(d, zcol = "vble", map.types = "CartoDB.DarkMatter",
        col.regions = colorRampPalette(brewer.pal(9, "YlOrRd"))(), 
        layer.name = "SDI")

Side-by-Side and Synchronized Maps

Side-by-Side Maps

  • Display multiple maps side-by-side for comparison.
Code
library(leaflet.extras2)
#m1 | m2

Synchronized Maps

  • Sync multiple maps to pan and zoom together using leafsync.
Code
m <- leafsync::sync(m1, m2)

Static and Interactive Maps with tmap

Static Map

  • Create static maps with tmap using tm_shape() and tm_polygons().
Code
library(tmap)
tmap_mode("plot")
tm_shape(d) + tm_polygons("vble")

Interactive Map

  • Switch to interactive mode with tmap_mode("view") and create maps.
Code
tmap_mode("view")
tm_shape(d) + tm_polygons("vble")

Mapping Point Data

Creating Point Data Map with ggplot2

  • Plot point data with ggplot2, using geom_sf() to define aesthetics for color and size.
Code
ggplot(d) + geom_sf(aes(col = vble, size = size)) +
  scale_color_viridis()

Mapping Raster Data

Creating Raster Data Map with ggplot2

  • Plot raster data by converting it to a data frame and using geom_raster().
Code
ggplot(d) + geom_sf() +
  geom_raster(data = as.data.frame(r, xy = TRUE),
    aes(x = x, y = y, fill = elevation))

Mapping Mobility Flows with flowmapblue

Creating Interactive Flow Map

  • Create interactive flow maps with flowmapblue.

  • Define locations and flows data frames.

  • Generate the map with clustering, dark mode, and animation options.

Code
library(flowmapblue)
locations <- data.frame(
  id = c(1, 2, 3),
  name = c("New York", "London", "Rio de Janeiro"),
  lat = c(40.713543, 51.507425, -22.906241),
  lon = c(-74.011219, -0.127738, -43.180244))
flows <- data.frame(
  origin = c(1, 2, 3, 2, 1, 3),
  dest = c(2, 1, 1, 3, 3 , 2),
  count = c(42, 51, 50, 40, 22, 42))
flowmapblue(locations, flows, mapboxAccessToken,
            clustering = TRUE, darkMode = TRUE, animation = FALSE)

R Packages to Download Open Spatial Data

Spatial data are used in a wide range of disciplines including environment, health, agriculture, economy, and society (Moraga and Baker 2022). Several R packages have been recently developed as clients for various databases that can be used for easy access to spatial data including administrative boundaries, climatic, and OpenStreetMap data. Here, we give short reproducible examples on how to download and visualize spatial data that can be useful in different settings. More extended examples and details about the capabilities of each of the packages can be seen at the packages’ websites and the rspatialdata website which provides a collection of tutorials on R packages to download and visualize spatial data using R.

Administrative Boundaries of Countries

Overview

  • Administrative boundaries data can be obtained from multiple packages:
    • rnaturalearth: Global administrative boundaries.
    • tidycensus and tigris: USA-specific data.
    • mapSpain: Spain-specific data.
    • geobr: Brazil-specific data.
    • giscoR: Eurostat GISCO data.