In this page, we demonstrate how to extract values from maps using R. Often we have maps showing values for each pixels, for example, ITN access and usage map with 5km x 5km resolutions or Population counts with 1km x 1km resolutions. To convert these maps to useful estimates for our model, we need to extract the values underlying each pixel in the map, within the geographical area of interest. For example, if the map covers the entire Sub-Saharan Africa, and all we need is values from Burkina Faso, we will need to only extract the pixels within the country’s boundary. Then we would need to aggregate the values of the pixels, to obtain some form of country-wide estimate. Similar analogy is applicable to region, district and province.
We use the sp and raster packages in R to achieve this. For the first example, we will walk through extracting ITN usage estimate for a Burkina Faso district of Kampti.
Shape file serves as a “cookie cutter” such that we can ensure to extract values within the boundary of a district or area of interest. We will discuss about the use of population distribution map later.
Here are the code walkthrough. First we import the libraries, and load the input using raster() and shapefile() functions.
The summaries for each of the object suggests that shp has a different CRS (Coordinate Reference System) than others. The CRS of all objects must align for our work here, and it’s easier to convert CRS in shp to that of other raster objects.
library(rgdal)
Please note that rgdal will be retired by the end of 2023,
plan transition to sf/stars/terra functions using GDAL and PROJ
at your earliest convenience.
rgdal: version: 1.5-32, (SVN revision 1176)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.4.3, released 2022/04/22
Path to GDAL shared files: /usr/share/gdal
GDAL binary built with GEOS: TRUE
Loaded PROJ runtime: Rel. 8.2.1, January 1st, 2022, [PJ_VERSION: 821]
Path to PROJ shared files: /home/runner/.local/share/proj:/usr/share/proj
PROJ CDN enabled: FALSE
Linking to sp version:1.5-0
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
shp <-spTransform(shp, crs(itn_raster))
Now we can plot and see the raster and shapes. Here’s ITN raster.
plot(itn_raster)plot(shp, add=T)
And here’s population raster. There are a little misalignment in the north for this raster, but for our purpose, this problem is not too big.
plot(pop_raster)plot(shp, add=T)
Naïve Extraction
The most naïve way of extracting values for each district here is to directly use the extract() function. When supplied with a set of polygons (shapes), the function pull the values underlying all the pixels within the shapes, and aggregate them, say by taking means. The function can be supplied by a set of points too.
df <- shp@data # Pull data out of shapefiles (District names etc.)# Extract and aggregate using mean functiondf$itn_cov <-extract(itn_raster, shp, fun=mean)# Look at values for Bittou districtdf[df$NOMDEP=='BITTOU', c('NOMDEP', 'NOMREGION', 'itn_cov')]
Sometimes averaging each pixel without weights doesn’t feel quite “right”: Pixels inside uninhabited forests aren’t the same as pixels inside a crowded town when we care about ITN coverage. Therefore, we may elect to use population map to weight the pixels and obtain a population-weighted estimate for each district.
# First try to stack the ITN and Population map## To do so, need to ensure two maps cover same area## Crop ITN raster according to Population rasteritn_raster_cr <-crop(itn_raster, pop_raster)## Lower resolution of Population raster to the level of ITN rasterpop_raster_res <-resample(pop_raster, itn_raster_cr, )## Stacking the rasteritn_pop_stack <-stack(itn_raster_cr, pop_raster_res)# Extract the raster stack according to the shapes, no aggregationout <-extract(itn_pop_stack, shp)# Define functions to calculate weighted average from output matrixweighted_avg <-function (x, w) sum(x * w) /sum(w)estim_from_matrix <-function (mat) { mat <-na.omit(mat)return(weighted_avg(mat[,1], mat[,2]))}
Without specifying the aggregating function for extract(), the function pull every single pixel out of the raster stack. Result is a list of matrices, first column is ITN coverage, second column is population count. So we define a function estim_from_matrix() that would calculate weighted average of first column according to second column. sapply() applies the function to each element (matrix) of the list, and combine the output nicely into a vector.
The ITN coverage estimates from population-weighted method is almost the same as the naive method. This is potentially due to the ITN estimates within the district being relatively homogeneous.