Clasificación supervisada y cálculo de métricas de paisaje

Author

Elio Lagunes-Díaz

Code: module loading and file reading

library(knitr, include.only = "kable")
library(terra)
library(caret)    
library(randomForest)
library(dplyr)       
library(doParallel)  
library(data.table)
library(landscapemetrics)
library(tidyr, include.only = "pivot_longer")

puntos = vect("~/papers_y_tesis/A_RESPALDARparques_marisela/puntos_clasificacion.gpkg") |>   project("epsg:3857")
cls = data.frame(landuse = 1:7, cover = unique(puntos$cobertura) |> sort() )
coltb = data.frame(landuse = 1:7, 
                   col = c("lightblue", "chartreuse4", "gray20", "gray80", "gray80", "ghostwhite", "gold"))

Supervised classification

Imagery

We performed a supervised classification for characterizing landscape features in our study area: we used two images, 4.7 m resolution, 4 bands (red, green, blue and near infrared) from the Planet remote sensing platform, from the “Tropical Normalized Analytic Monthly series basemaps” (Planet Labs PBC, 2024), made available through the Norway’s International Climate and Forests Initiative (NICFI). We selected one image for April 2024, for the period when the point count survey was carried out, and another image for August, 2023, the image for the most recent rainy season, when plant phenology allows for better differentiation of its spectral signatures from other classes. We assessed the accuracy of classification on both images before selecting one for the landscape metric calculation.

Planet Labs PBC. (2024). Planet application program interface: In space for life on earth. Retrieved from https://api.planet.com

Training point selection

We visually selected points for sampling the different land use and land cover classes on a Google Satellite image (Google, 2024), using the Quickmapservices plugin for the QGIS software (QGIS Development Team, 2024). The classes we used for the sampling were: asphalt, built-up, water, grass and forest; to avoid mixing in very different spectral signatures, built-up class was subdivided into concrete (raw and waterproof coated) and metal-roof (warehouses). At least 15 samples were selected from each class (Table 1).

Google. (2024). Imagery from mexico city. Lat: 19.4142, long: -99.1258. Map Data: Airbus, CNES / Airbus, Landsat, Copernicus, Maxar Technologies. Retrieved from https://api.planet.com

QGIS Development Team. (2024). QGIS geographic information system. Open Source Geospatial Foundation. Retrieved from https://www.qgis.org/

Table 1: Number of point samples taken by class

Class	Freq
water	15
trees	26
asphalt	19
concrete	20
waterproof concrete	15
metal-roof	15
grass	15

We used the modules terra (Hijmans, 2023) for raster processing, randomForest (Liaw & Wiener, 2002) for classification with the random forest algorithm and caret (Kuhn & Max, 2008) for predictive model building and analysis, in the R statistical computing language (R Core Team, 2024)

Hijmans, R. J. (2023). Terra: Spatial data analysis. Retrieved from https://CRAN.R-project.org/package=terra

Liaw, A., & Wiener, M. (2002). Classification and regression by randomForest. R News, 2(3), 18–22. Retrieved from https://CRAN.R-project.org/doc/Rnews/

Kuhn, & Max. (2008). Building predictive models in r using the caret package. Journal of Statistical Software, 28(5), 1–26. Retrieved from https://www.jstatsoft.org/index.php/jss/article/view/v028i05

R Core Team. (2024). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Image classification

The selected image for the landscape metric process was that of August 2023, which had the highest accuracy, with a benchmarked median Kappa of 0.925, while the April 2024 image had a median Kappa of 0.85 (Figure 1). The confusion matrix for the August 2023 image is shown at Table 2 and the final classified image is Figure 2. System time for the classification step for a resampled image to 14 m resolution is at Table 3.

Code: benchmarks

# ajustes para random forests
mc <- makeCluster(2) # mis dos pobres núcleos
registerDoParallel(mc)
micontrol = trainControl(method="repeatedcv", 
                         number=3, 
                         repeats=2,
                         returnResamp='all', 
                         allowParallel=TRUE)

imagenes = c("~/papers_y_tesis/A_RESPALDARparques_marisela/planet.tif", "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif")

bmarks = vector(mode = "list", length = 2L)
for(i in 1:2 ){ 
    imgsat = terra::rast(imagenes[i])
    imgsat[[gsub(".tif", "_5", basename(imagenes[i])) ]]  = NULL
    puntos = terra::crop(puntos, imgsat) # por el caso de tener puntos fuera de la imagen
    firmas = terra::extract(imgsat, puntos)
    firmas$landuse = factor(puntos$cobertura) |> as.numeric()
    
  kappas = vector(mode = "list", length = 20)  
  for(j in 1:20){ 
    firmas = as.data.table(firmas) # para muestra estratificada ---------------------
    train_df = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse] 
    test_df = firmas[!firmas$ID %in% train_df$ID]
    # RANDOM FOREST --------------
    fit.rf = train(as.factor(landuse) ~ ., data=train_df[,-"ID"],
                    method = "rf", metric = "Accuracy", preProc = c("center", "scale"), 
                    trControl = micontrol)
    p2 = predict(fit.rf, test_df, type = "raw")
  kappas[[j]] = confusionMatrix(p2, as.factor(test_df$landuse))$overall[["Kappa"]]
  }
  bmarks[[i]] = kappas
}

bmarksdf = data.frame(april2024 = do.call(rbind, bmarks[[1]]), august2023 = do.call(rbind, bmarks[[2]]))
bmarksdf = bmarksdf |> tidyr::pivot_longer(cols = everything(), names_to  = "image", values_to =   "Kappa")
bmarksdf$image = factor(bmarksdf$image, levels = c("august2023", "april2024"))
par(bg = "cornsilk")
boxplot(Kappa ~ image, data = bmarksdf)

Figure 1: Benchmark for the two images, performed on 20 predictions

Code: image processing

imagen = "~/papers_y_tesis/A_RESPALDARparques_marisela/planet_cdmx_2023_agosto.tif"
imgsat = rast(imagen)
imgsat[[gsub(".tif", "_5", basename(imagen)) ]]  = NULL
puntos = crop(puntos, imgsat)
firmas = extract(imgsat, puntos)
firmas$landuse = factor(puntos$cobertura) |> as.numeric()

firmas = as.data.table(firmas)
train_df = firmas[, .SD[sample(1:.N, trunc(.N*.65))], by = landuse]
test_df = firmas[!firmas$ID %in% train_df$ID]

set.seed(1)
fit.rf = train(as.factor(landuse) ~ .,
                data=train_df[,-"ID"],
                method = "rf",
                metric= "Accuracy",
                preProc = c("center", "scale"), 
                trControl = micontrol
)
#p1 = predict(fit.rf, train_df, type = "raw")
#confusionMatrix(p1, as.factor(train_df$landuse))
p2 = predict(fit.rf, test_df, type = "raw")

Code: confusion matrix for August 2023

knitr::kable(confusionMatrix(factor(p2, labels = cobsdf$Class), factor(test_df$landuse, labels = cobsdf$Class))$table)

Table 2: Confusion matrix for Aug. 2023 image

	water	trees	asphalt	concrete	waterproof concrete	metal-roof	grass
water	5	0	0	0	0	0	0
trees	1	10	1	0	0	0	0
asphalt	0	0	6	0	1	0	0
concrete	0	0	0	7	0	0	0
waterproof concrete	0	0	0	0	5	0	0
metal-roof	0	0	0	0	0	6	0
grass	0	0	0	0	0	0	5

Code: class prediction systime

# resamplear para más rápido -----------------
resampleador = rast(ext(imgsat), resolution = c(14, 14))
system.time({
imgsat = resample(imgsat, resampleador, method = "bilinear")
clases = predict(imgsat, fit.rf)
})

Table 3: Whole image classification system time

   user  system elapsed 
243.050  13.372 260.411

Code: class prediction for the whole scene

# resamplear para más rápido -----------------

levels(clases) = cls
coltab(clases) = coltb
par(bg = "cornsilk")
plot(clases, axes = F)

Landscape metric calculation

On the classified image we calculated class area (ha), total edge length (m), euclidean nearest neighbor mean distance (m) and patch number for each of the five classes using the module landscapemetrics (Hesselbarth, Sciaini, With, Wiegand, & Nowosad, 2019). We calculated each of these metrics at 9 spatial scales away from the point count center (radius 100-900m, by 100m) for each study site. Land cover was reclassified into water, trees, asphalt, built-up and grass.

Hesselbarth, M. H. K., Sciaini, M., With, K. A., Wiegand, K., & Nowosad, J. (2019). Landscapemetrics: An open-source r tool to calculate landscape metrics. Ecography, 42, 1648–1657.

Code: class reclassify

parques = vect("~/papers_y_tesis/A_RESPALDARparques_marisela/parques_nuevos.gpkg")

clases[clases[] %in% c(4,5,6)] = 4
clases = droplevels(clases)

id_clases = data.frame(
  value = c(1, 2, 3, 4, 7),
  cover = c("agua", "arbórea", "asfalto", "construído", "pasto")
)

levels(clases) = id_clases

Before computing landscape metrics, we insure the image has a projected coordinate reference system (Table 4):

Code: landscape crs check

check_landscape(clases) |> knitr::kable()

Table 4: Lansdscape crs check

layer	crs	units	class	n_classes	OK
1	projected	m	integer	5	✔

The description of the computed metrics are shown on Table 5:

Code: metric table

metricas = data.frame(lsm_function_name = c("lsm_c_enn_mn", "lsm_c_ca", "lsm_c_te", "lsm_c_np"), 
           description = c("Class euclidean nearest neighbor mean distance",
                         "Class area", 
                         "Class total edge",
                         "Patch number"),
           units = c("m", "ha", "m", "none" )) 


knitr::kable(metricas)

Table 5: Landscape metrics used for this work

lsm_function_name	description	units
lsm_c_enn_mn	Class euclidean nearest neighbor mean distance	m
lsm_c_ca	Class area	ha
lsm_c_te	Class total edge	m
lsm_c_np	Patch number	none

900 m buffers departing the central point for each park are shown on Figure 3 and Figure 4.

Code: park plot

par(mfrow = c(4,4), bg = "cornsilk")
for(i in 1:16) { 
crop(clases, terra::buffer(parques[i,], 900)) |> 
  mask( terra::buffer(parques[i,], 900)) |> 
  plot(main = paste0( parques$Parques[i], ", 900m"), cex.main = .8, legend = F, axes = F )
}

Figure 3: The parks we used for the analysis (1)

Figure 4: The parks we used for the analysis (2)

The resulting landscape-metrics, calculated at 100, 200, 300, …, 900 m departing each point count central point are shown on Table 7. Since this process is cumbersome, the code was written to avoid storing high volumes of information on memory (RAM). System time for the 9 buffers, computed for a sample of two parks on a 4 core (process was not run in parallel here), 1.9 GHz AMD processor is at Table 6.

Code: sample landscape metric calculation system time

system.time({
buffers = seq(100, 900, by = 100 )
parks_lsm = vector(mode = "list", length = length(buffers))
names(parks_lsm) = paste0("b_", buffers)
for(i in buffers){ 
lsm_buff = vector(mode = "list", length = 2) # solo 2 parques para el html de quarto 
names(lsm_buff) = parques$Parques[1:2]       # - - - -
for(j in  seq_along(parques)[1:2]){          # - - - -
  temp = crop(clases, terra::buffer(parques[j,], i)) |> 
    mask( terra::buffer(parques[j,], i)) 
  tempdf = bind_rows(
    lsm_c_enn_mn(temp), 
    lsm_c_ca(temp), 
    lsm_c_te(temp), 
    lsm_c_np(temp)
    )
  lsm_buff[[parques$Parques[j]]] = tempdf
  }
 parks_lsm[[paste0("b_", i)]] = bind_rows(lsm_buff, .id = "parque")
}
})

Table 6: System time for 2 parks, 9 buffers

   user  system elapsed 
  3.139   0.012   3.188

Code: sample landscape metric calculation

parks_lsm_b = bind_rows(parks_lsm, .id = "buffer")

parks_lsm_b = parks_lsm_b |>
  left_join(id_clases, by = c("class" = "value"))

#parks_lsm_b |>
#  write.csv("parques_lsm_planet.csv", row.names = FALSE)

parks_lsm_b |> head() |> kable()

Table 7: Example table of the landscape metrics

buffer	parque	layer	level	class	id	metric	value	cover
b_100	Alameda del Sur	1	class	2	NA	enn_mn	28.00000	arbórea
b_100	Alameda del Sur	1	class	3	NA	enn_mn	79.33333	asfalto
b_100	Alameda del Sur	1	class	4	NA	enn_mn	45.50000	construído
b_100	Alameda del Sur	1	class	7	NA	enn_mn	NA	pasto
b_100	Alameda del Sur	1	class	2	NA	ca	2.82240	arbórea
b_100	Alameda del Sur	1	class	3	NA	ca	0.15680	asfalto