Clasificación supervisada y cálculo de métricas de paisaje

Author

Elio Lagunes-Díaz

Code: module loading and file reading
library(knitr, include.only = "kable")
library(terra)
library(caret)    
library(randomForest)
library(dplyr)       
library(doParallel)  
library(data.table)
library(landscapemetrics)
library(tidyr, include.only = "pivot_longer")

puntos = vect("~/papers_y_tesis/A_RESPALDARparques_marisela/puntos_clasificacion.gpkg") |>   project("epsg:3857")
cls = data.frame(landuse = 1:7, cover = unique(puntos$cobertura) |> sort() )
coltb = data.frame(landuse = 1:7, 
                   col = c("lightblue", "chartreuse4", "gray20", "gray80", "gray80", "ghostwhite", "gold"))

Supervised classification

Imagery

We performed a supervised classification for characterizing landscape features in our study area: we used two images, 4.7 m resolution, 4 bands (red, green, blue and near infrared) from the Planet remote sensing platform, from the “Tropical Normalized Analytic Monthly series basemaps” (Planet Labs PBC, 2024), made available through the Norway’s International Climate and Forests Initiative (NICFI). We selected one image for April 2024, for the period when the point count survey was carried out, and another image for August, 2023, the image for the most recent rainy season, when plant phenology allows for better differentiation of its spectral signatures from other classes. We assessed the accuracy of classification on both images before selecting one for the landscape metric calculation.

Planet Labs PBC. (2024). Planet application program interface: In space for life on earth. Retrieved from https://api.planet.com

Training point selection

We visually selected points for sampling the different land use and land cover classes on a Google Satellite image (Google, 2024), using the Quickmapservices plugin for the QGIS software (QGIS Development Team, 2024). The classes we used for the sampling were: asphalt, built-up, water, grass and forest; to avoid mixing in very different spectral signatures, built-up class was subdivided into concrete (raw and waterproof coated) and metal-roof (warehouses). At least 15 samples were selected from each class (Table 1).

Google. (2024). Imagery from mexico city. Lat: 19.4142, long: -99.1258. Map Data: Airbus, CNES / Airbus, Landsat, Copernicus, Maxar Technologies. Retrieved from https://api.planet.com
QGIS Development Team. (2024). QGIS geographic information system. Open Source Geospatial Foundation. Retrieved from https://www.qgis.org/
Table 1: Number of point samples taken by class
Class Freq
water 15
trees 26
asphalt 19
concrete 20
waterproof concrete 15
metal-roof 15
grass 15

We used the modules terra (Hijmans, 2023) for raster processing, randomForest (Liaw & Wiener, 2002) for classification with the random forest algorithm and caret (Kuhn & Max, 2008) for predictive model building and analysis, in the R statistical computing language (R Core Team, 2024)

Hijmans, R. J. (2023). Terra: Spatial data analysis. Retrieved from https://CRAN.R-project.org/package=terra
Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22. Retrieved from https://CRAN.R-project.org/doc/Rnews/
Kuhn, & Max. (2008). Building predictive models in r using the caret package. Journal of Statistical Software, 28(5), 1–26. Retrieved from https://www.jstatsoft.org/index.php/jss/article/view/v028i05
R Core Team. (2024). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Image classification

The selected image for the landscape metric process was that of August 2023, which had the highest accuracy, with a benchmarked median Kappa of 0.925, while the April 2024 image had a median Kappa of 0.85 (Figure 1). The confusion matrix for the August 2023 image is shown at Table 2 and the final classified image is Figure 2. System time for the classification step for a resampled image to 14 m resolution is at Table 3.

Code: benchmarks
# ajustes para random forests
mc <- makeCluster(2) # mis dos pobres núcleos
registerDoParallel(mc)
micontrol = trainControl(method="repeatedcv", 
                         number=3, 
                         repeats=2,
                         returnResamp='all', 
                         allowParallel=TRUE)

imagenes = c("~/papers_y_tesis/A_RESPALDARparques_marisela/planet.tif", "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif")

bmarks = vector(mode = "list", length = 2L)
for(i in 1:2 ){ 
    imgsat = terra::rast(imagenes[i])
    imgsat[[gsub(".tif", "_5", basename(imagenes[i])) ]]  = NULL
    puntos = terra::crop(puntos, imgsat) # por el caso de tener puntos fuera de la imagen
    firmas = terra::extract(imgsat, puntos)
    firmas$landuse = factor(puntos$cobertura) |> as.numeric()
    
  kappas = vector(mode = "list", length = 20)  
  for(j in 1:20){ 
    firmas = as.data.table(firmas) # para muestra estratificada ---------------------
    train_df = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse] 
    test_df = firmas[!firmas$ID %in% train_df$ID]
    # RANDOM FOREST --------------
    fit.rf = train(as.factor(landuse) ~ ., data=train_df[,-"ID"],
                    method = "rf", metric = "Accuracy", preProc = c("center", "scale"), 
                    trControl = micontrol)
    p2 = predict(fit.rf, test_df, type = "raw")
  kappas[[j]] = confusionMatrix(p2, as.factor(test_df$landuse))$overall[["Kappa"]]
  }
  bmarks[[i]] = kappas
}

bmarksdf = data.frame(april2024 = do.call(rbind, bmarks[[1]]), august2023 = do.call(rbind, bmarks[[2]]))
bmarksdf = bmarksdf |> tidyr::pivot_longer(cols = everything(), names_to  = "image", values_to =   "Kappa")
bmarksdf$image = factor(bmarksdf$image, levels = c("august2023", "april2024"))
par(bg = "cornsilk")
boxplot(Kappa ~ image, data = bmarksdf)
Figure 1: Benchmark for the two images, performed on 20 predictions
Code: image processing
imagen = "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif"
imgsat = rast(imagen)
imgsat[[gsub(".tif", "_5", basename(imagen)) ]]  = NULL
puntos = crop(puntos, imgsat)
firmas = extract(imgsat, puntos)
firmas$landuse = factor(puntos$cobertura) |> as.numeric()

firmas = as.data.table(firmas)
train_df = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse]
test_df = firmas[!firmas$ID %in% train_df$ID]

set.seed(1)
fit.rf = train(as.factor(landuse) ~ .,
                data=train_df[,-"ID"],
                method = "rf",
                metric= "Accuracy",
                preProc = c("center", "scale"), 
                trControl = micontrol
)
#p1 = predict(fit.rf, train_df, type = "raw")
#confusionMatrix(p1, as.factor(train_df$landuse))
p2 = predict(fit.rf, test_df, type = "raw")
Code: confusion matrix for August 2023
knitr::kable(confusionMatrix(factor(p2, labels = cobsdf$Class), factor(test_df$landuse, labels = cobsdf$Class))$table)
Table 2: Confusion matrix for Aug. 2023 image
water trees asphalt concrete waterproof concrete metal-roof grass
water 5 0 0 0 0 0 0
trees 1 10 1 0 0 0 0
asphalt 0 0 6 0 1 0 0
concrete 0 0 0 7 0 0 0
waterproof concrete 0 0 0 0 5 0 0
metal-roof 0 0 0 0 0 6 0
grass 0 0 0 0 0 0 5
Code: class prediction systime
# resamplear para más rápido -----------------
resampleador = rast(ext(imgsat), resolution = c(14, 14))
system.time({
imgsat = resample(imgsat, resampleador, method = "bilinear")
clases = predict(imgsat, fit.rf)
})
Table 3: Whole image classification system time
   user  system elapsed 
243.050  13.372 260.411 
Code: class prediction for the whole scene
# resamplear para más rápido -----------------

levels(clases) = cls
coltab(clases) = coltb
par(bg = "cornsilk")
plot(clases, axes = F)
Figure 2: Classification result

Landscape metric calculation

On the classified image we calculated class area (ha), total edge length (m), euclidean nearest neighbor mean distance (m) and patch number for each of the five classes using the module landscapemetrics (Hesselbarth, Sciaini, With, Wiegand, & Nowosad, 2019). We calculated each of these metrics at 9 spatial scales away from the point count center (radius 100-900m, by 100m) for each study site. Land cover was reclassified into water, trees, asphalt, built-up and grass.

Hesselbarth, M. H. K., Sciaini, M., With, K. A., Wiegand, K., & Nowosad, J. (2019). Landscapemetrics: An open-source r tool to calculate landscape metrics. Ecography, 42, 1648–1657.
Code: class reclassify
parques = vect("~/papers_y_tesis/A_RESPALDARparques_marisela/parques_nuevos.gpkg")

clases[clases[] %in% c(4,5,6)] = 4
clases = droplevels(clases)

id_clases = data.frame(
  value = c(1, 2, 3, 4, 7),
  cover = c("agua", "arbórea", "asfalto", "construído", "pasto")
)

levels(clases) = id_clases

Before computing landscape metrics, we insure the image has a projected coordinate reference system (Table 4):

Code: landscape crs check
check_landscape(clases) |> knitr::kable()
Table 4: Lansdscape crs check
layer crs units class n_classes OK
1 projected m integer 5

The description of the computed metrics are shown on Table 5:

Code: metric table
metricas = data.frame(lsm_function_name = c("lsm_c_enn_mn", "lsm_c_ca", "lsm_c_te", "lsm_c_np"), 
           description = c("Class euclidean nearest neighbor mean distance",
                         "Class area", 
                         "Class total edge",
                         "Patch number"),
           units = c("m", "ha", "m", "none" )) 


knitr::kable(metricas)
Table 5: Landscape metrics used for this work
lsm_function_name description units
lsm_c_enn_mn Class euclidean nearest neighbor mean distance m
lsm_c_ca Class area ha
lsm_c_te Class total edge m
lsm_c_np Patch number none

900 m buffers departing the central point for each park are shown on Figure 3 and Figure 4.

Code: park plot
par(mfrow = c(4,4), bg = "cornsilk")
for(i in 1:16) { 
crop(clases, terra::buffer(parques[i,], 900)) |> 
  mask( terra::buffer(parques[i,], 900)) |> 
  plot(main = paste0( parques$Parques[i], ", 900m"), cex.main = .8, legend = F, axes = F )
}
Figure 3: The parks we used for the analysis (1)
Figure 4: The parks we used for the analysis (2)

The resulting landscape-metrics, calculated at 100, 200, 300, …, 900 m departing each point count central point are shown on Table 7. Since this process is cumbersome, the code was written to avoid storing high volumes of information on memory (RAM). System time for the 9 buffers, computed for a sample of two parks on a 4 core (process was not run in parallel here), 1.9 GHz AMD processor is at Table 6.

Code: sample landscape metric calculation system time
system.time({
buffers = seq(100, 900, by = 100 )
parks_lsm = vector(mode = "list", length = length(buffers))
names(parks_lsm) = paste0("b_", buffers)
for(i in buffers){ 
lsm_buff = vector(mode = "list", length = 2) # solo 2 parques para el html de quarto 
names(lsm_buff) = parques$Parques[1:2]       # - - - -
for(j in  seq_along(parques)[1:2]){          # - - - -
  temp = crop(clases, terra::buffer(parques[j,], i)) |> 
    mask( terra::buffer(parques[j,], i)) 
  tempdf = bind_rows(
    lsm_c_enn_mn(temp), 
    lsm_c_ca(temp), 
    lsm_c_te(temp), 
    lsm_c_np(temp)
    )
  lsm_buff[[parques$Parques[j]]] = tempdf
  }
 parks_lsm[[paste0("b_", i)]] = bind_rows(lsm_buff, .id = "parque")
}
}) 
Table 6: System time for 2 parks, 9 buffers
   user  system elapsed 
  3.139   0.012   3.188 
Code: sample landscape metric calculation
parks_lsm_b = bind_rows(parks_lsm, .id = "buffer")

parks_lsm_b = parks_lsm_b |>
  left_join(id_clases, by = c("class" = "value"))

#parks_lsm_b |>
#  write.csv("parques_lsm_planet.csv", row.names = FALSE)

parks_lsm_b |> head() |> kable()
Table 7: Example table of the landscape metrics
buffer parque layer level class id metric value cover
b_100 Alameda del Sur 1 class 2 NA enn_mn 28.00000 arbórea
b_100 Alameda del Sur 1 class 3 NA enn_mn 79.33333 asfalto
b_100 Alameda del Sur 1 class 4 NA enn_mn 45.50000 construído
b_100 Alameda del Sur 1 class 7 NA enn_mn NA pasto
b_100 Alameda del Sur 1 class 2 NA ca 2.82240 arbórea
b_100 Alameda del Sur 1 class 3 NA ca 0.15680 asfalto