Clasificación supervisada y cálculo de métricas de paisaje
Code: module loading and file reading
library(knitr, include.only = "kable")
library(terra)
library(caret)
library(randomForest)
library(dplyr)
library(doParallel)
library(data.table)
library(landscapemetrics)
library(tidyr, include.only = "pivot_longer")
= vect("~/papers_y_tesis/A_RESPALDARparques_marisela/puntos_clasificacion.gpkg") |> project("epsg:3857")
puntos = data.frame(landuse = 1:7, cover = unique(puntos$cobertura) |> sort() )
cls = data.frame(landuse = 1:7,
coltb col = c("lightblue", "chartreuse4", "gray20", "gray80", "gray80", "ghostwhite", "gold"))
Supervised classification
Imagery
We performed a supervised classification for characterizing landscape features in our study area: we used two images, 4.7 m resolution, 4 bands (red, green, blue and near infrared) from the Planet remote sensing platform, from the “Tropical Normalized Analytic Monthly series basemaps” (Planet Labs PBC, 2024), made available through the Norway’s International Climate and Forests Initiative (NICFI). We selected one image for April 2024, for the period when the point count survey was carried out, and another image for August, 2023, the image for the most recent rainy season, when plant phenology allows for better differentiation of its spectral signatures from other classes. We assessed the accuracy of classification on both images before selecting one for the landscape metric calculation.
Training point selection
We visually selected points for sampling the different land use and land cover classes on a Google Satellite image (Google, 2024), using the Quickmapservices plugin for the QGIS software (QGIS Development Team, 2024). The classes we used for the sampling were: asphalt, built-up, water, grass and forest; to avoid mixing in very different spectral signatures, built-up class was subdivided into concrete (raw and waterproof coated) and metal-roof (warehouses). At least 15 samples were selected from each class (Table 1).
Class | Freq |
---|---|
water | 15 |
trees | 26 |
asphalt | 19 |
concrete | 20 |
waterproof concrete | 15 |
metal-roof | 15 |
grass | 15 |
We used the modules terra
(Hijmans, 2023) for raster processing, randomForest
(Liaw & Wiener, 2002) for classification with the random forest algorithm and caret
(Kuhn & Max, 2008) for predictive model building and analysis, in the R statistical computing language (R Core Team, 2024)
Image classification
The selected image for the landscape metric process was that of August 2023, which had the highest accuracy, with a benchmarked median Kappa of 0.925, while the April 2024 image had a median Kappa of 0.85 (Figure 1). The confusion matrix for the August 2023 image is shown at Table 2 and the final classified image is Figure 2. System time for the classification step for a resampled image to 14 m resolution is at Table 3.
Code: benchmarks
# ajustes para random forests
<- makeCluster(2) # mis dos pobres núcleos
mc registerDoParallel(mc)
= trainControl(method="repeatedcv",
micontrol number=3,
repeats=2,
returnResamp='all',
allowParallel=TRUE)
= c("~/papers_y_tesis/A_RESPALDARparques_marisela/planet.tif", "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif")
imagenes
= vector(mode = "list", length = 2L)
bmarks for(i in 1:2 ){
= terra::rast(imagenes[i])
imgsat gsub(".tif", "_5", basename(imagenes[i])) ]] = NULL
imgsat[[= terra::crop(puntos, imgsat) # por el caso de tener puntos fuera de la imagen
puntos = terra::extract(imgsat, puntos)
firmas $landuse = factor(puntos$cobertura) |> as.numeric()
firmas
= vector(mode = "list", length = 20)
kappas for(j in 1:20){
= as.data.table(firmas) # para muestra estratificada ---------------------
firmas = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse]
train_df = firmas[!firmas$ID %in% train_df$ID]
test_df # RANDOM FOREST --------------
= train(as.factor(landuse) ~ ., data=train_df[,-"ID"],
fit.rf method = "rf", metric = "Accuracy", preProc = c("center", "scale"),
trControl = micontrol)
= predict(fit.rf, test_df, type = "raw")
p2 = confusionMatrix(p2, as.factor(test_df$landuse))$overall[["Kappa"]]
kappas[[j]]
}= kappas
bmarks[[i]]
}
= data.frame(april2024 = do.call(rbind, bmarks[[1]]), august2023 = do.call(rbind, bmarks[[2]]))
bmarksdf = bmarksdf |> tidyr::pivot_longer(cols = everything(), names_to = "image", values_to = "Kappa")
bmarksdf $image = factor(bmarksdf$image, levels = c("august2023", "april2024"))
bmarksdfpar(bg = "cornsilk")
boxplot(Kappa ~ image, data = bmarksdf)
Code: image processing
= "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif"
imagen = rast(imagen)
imgsat gsub(".tif", "_5", basename(imagen)) ]] = NULL
imgsat[[= crop(puntos, imgsat)
puntos = extract(imgsat, puntos)
firmas $landuse = factor(puntos$cobertura) |> as.numeric()
firmas
= as.data.table(firmas)
firmas = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse]
train_df = firmas[!firmas$ID %in% train_df$ID]
test_df
set.seed(1)
= train(as.factor(landuse) ~ .,
fit.rf data=train_df[,-"ID"],
method = "rf",
metric= "Accuracy",
preProc = c("center", "scale"),
trControl = micontrol
)#p1 = predict(fit.rf, train_df, type = "raw")
#confusionMatrix(p1, as.factor(train_df$landuse))
= predict(fit.rf, test_df, type = "raw") p2
Code: confusion matrix for August 2023
::kable(confusionMatrix(factor(p2, labels = cobsdf$Class), factor(test_df$landuse, labels = cobsdf$Class))$table) knitr
water | trees | asphalt | concrete | waterproof concrete | metal-roof | grass | |
---|---|---|---|---|---|---|---|
water | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
trees | 1 | 10 | 1 | 0 | 0 | 0 | 0 |
asphalt | 0 | 0 | 6 | 0 | 1 | 0 | 0 |
concrete | 0 | 0 | 0 | 7 | 0 | 0 | 0 |
waterproof concrete | 0 | 0 | 0 | 0 | 5 | 0 | 0 |
metal-roof | 0 | 0 | 0 | 0 | 0 | 6 | 0 |
grass | 0 | 0 | 0 | 0 | 0 | 0 | 5 |
Code: class prediction systime
# resamplear para más rápido -----------------
= rast(ext(imgsat), resolution = c(14, 14))
resampleador system.time({
= resample(imgsat, resampleador, method = "bilinear")
imgsat = predict(imgsat, fit.rf)
clases })
user system elapsed
243.050 13.372 260.411
Landscape metric calculation
On the classified image we calculated class area (ha), total edge length (m), euclidean nearest neighbor mean distance (m) and patch number for each of the five classes using the module landscapemetrics (Hesselbarth, Sciaini, With, Wiegand, & Nowosad, 2019). We calculated each of these metrics at 9 spatial scales away from the point count center (radius 100-900m, by 100m) for each study site. Land cover was reclassified into water, trees, asphalt, built-up and grass.
Code: class reclassify
= vect("~/papers_y_tesis/A_RESPALDARparques_marisela/parques_nuevos.gpkg")
parques
%in% c(4,5,6)] = 4
clases[clases[] = droplevels(clases)
clases
= data.frame(
id_clases value = c(1, 2, 3, 4, 7),
cover = c("agua", "arbórea", "asfalto", "construído", "pasto")
)
levels(clases) = id_clases
Before computing landscape metrics, we insure the image has a projected coordinate reference system (Table 4):
Code: landscape crs check
check_landscape(clases) |> knitr::kable()
layer | crs | units | class | n_classes | OK |
---|---|---|---|---|---|
1 | projected | m | integer | 5 | ✔ |
The description of the computed metrics are shown on Table 5:
Code: metric table
= data.frame(lsm_function_name = c("lsm_c_enn_mn", "lsm_c_ca", "lsm_c_te", "lsm_c_np"),
metricas description = c("Class euclidean nearest neighbor mean distance",
"Class area",
"Class total edge",
"Patch number"),
units = c("m", "ha", "m", "none" ))
::kable(metricas) knitr
lsm_function_name | description | units |
---|---|---|
lsm_c_enn_mn | Class euclidean nearest neighbor mean distance | m |
lsm_c_ca | Class area | ha |
lsm_c_te | Class total edge | m |
lsm_c_np | Patch number | none |
900 m buffers departing the central point for each park are shown on Figure 3 and Figure 4.
Code: park plot
par(mfrow = c(4,4), bg = "cornsilk")
for(i in 1:16) {
crop(clases, terra::buffer(parques[i,], 900)) |>
mask( terra::buffer(parques[i,], 900)) |>
plot(main = paste0( parques$Parques[i], ", 900m"), cex.main = .8, legend = F, axes = F )
}
The resulting landscape-metrics, calculated at 100, 200, 300, …, 900 m departing each point count central point are shown on Table 7. Since this process is cumbersome, the code was written to avoid storing high volumes of information on memory (RAM). System time for the 9 buffers, computed for a sample of two parks on a 4 core (process was not run in parallel here), 1.9 GHz AMD processor is at Table 6.
Code: sample landscape metric calculation system time
system.time({
= seq(100, 900, by = 100 )
buffers = vector(mode = "list", length = length(buffers))
parks_lsm names(parks_lsm) = paste0("b_", buffers)
for(i in buffers){
= vector(mode = "list", length = 2) # solo 2 parques para el html de quarto
lsm_buff names(lsm_buff) = parques$Parques[1:2] # - - - -
for(j in seq_along(parques)[1:2]){ # - - - -
= crop(clases, terra::buffer(parques[j,], i)) |>
temp mask( terra::buffer(parques[j,], i))
= bind_rows(
tempdf lsm_c_enn_mn(temp),
lsm_c_ca(temp),
lsm_c_te(temp),
lsm_c_np(temp)
)$Parques[j]]] = tempdf
lsm_buff[[parques
}paste0("b_", i)]] = bind_rows(lsm_buff, .id = "parque")
parks_lsm[[
} })
user system elapsed
3.139 0.012 3.188
Code: sample landscape metric calculation
= bind_rows(parks_lsm, .id = "buffer")
parks_lsm_b
= parks_lsm_b |>
parks_lsm_b left_join(id_clases, by = c("class" = "value"))
#parks_lsm_b |>
# write.csv("parques_lsm_planet.csv", row.names = FALSE)
|> head() |> kable() parks_lsm_b
buffer | parque | layer | level | class | id | metric | value | cover |
---|---|---|---|---|---|---|---|---|
b_100 | Alameda del Sur | 1 | class | 2 | NA | enn_mn | 28.00000 | arbórea |
b_100 | Alameda del Sur | 1 | class | 3 | NA | enn_mn | 79.33333 | asfalto |
b_100 | Alameda del Sur | 1 | class | 4 | NA | enn_mn | 45.50000 | construído |
b_100 | Alameda del Sur | 1 | class | 7 | NA | enn_mn | NA | pasto |
b_100 | Alameda del Sur | 1 | class | 2 | NA | ca | 2.82240 | arbórea |
b_100 | Alameda del Sur | 1 | class | 3 | NA | ca | 0.15680 | asfalto |