superClass { RStoolbox}R Documentation

Supervised Classification

R: Supervised Classification


Supervised classification both for classification and regression mode based on vector training data (points or polygons).


superClass(img, trainData, valData = NULL, responseCol = NULL,
  nSamples = 1000, polygonBasedCV = FALSE, trainPartition = NULL,
  model = "rf", tuneLength = 3, kfold = 5, minDist = 2,
  mode = "classification", predict = TRUE, predType = "raw",
  filename = NULL, verbose, overwrite = TRUE, ...)



Raster* object. Typically remote sensing imagery, which is to be classified.


SpatialPolygonsDataFrame or SpatialPointsDataFrame containing the training locations.


SpatialPolygonsDataFrame or SpatialPointsDataFrame containing the validation locations (optional).


Character or integer giving the column in trainData, which contains the response variable. Can be omitted, when trainData has only one column.


Integer. Number of samples per land cover class.


Logical. If TRUE model tuning during cross-validation is conducted on a per-polygon basis. Use this to deal with overfitting issues. Does not affect training data supplied as SpatialPointsDataFrames.


Numeric. Partition (polygon based) of trainData that goes into the training data set between zero and one. Ignored if valData is provided.


Character. Which model to use. See train for options. Defaults to randomForest ('rf'). In addition to the standard caret models, a maximum likelihood classification is available via model = 'mlc'.


Integer. Number of levels for each tuning parameter (see train for details).


Integer. Number of cross-validation resamples during model tuning.


Numeric. Minumum distance between training and validation data, e.g. minDist=1 clips validation polygons to ensure a minimal distance of one pixel (pixel size according to img) to the next training polygon. Requires all data to carry valid projection information.


Character. Model type: 'regression' or 'classification'.


Logical. Produce a map (TRUE, default) or only fit and validate the model (FALSE).


Character. Type of the final output raster. Either "raw" for class predictions or "prob" for class probabilities. Class probabilities are not available for all classification models (predict.train).


Path to output file (optional). If NULL, standard raster handling will apply, i.e. storage either in memory or in the raster temp directory.


Logical. prints progress and statistics during execution


logical. Overwrite spatial prediction raster if it already exists.


further arguments to be passed to train


SuperClass performs the following steps:

  1. Ensure non-overlap between training and validation data. This is neccesary to avoid biased performance estimates. A minimum distance (minDist) in pixels can be provided to enforce a given distance between training and validation data.

  2. Sample training coordinates. If trainData (and valData if present) are SpatialPolygonsDataFrames superClass will calculate the area per polygon and sample nSamples locations per class within these polygons. The number of samples per individual polygon scales with the polygon area, i.e. the bigger the polygon, the more samples.

  3. Split training/validation If valData was provided (reccomended) the samples from these polygons will be held-out and not used for model fitting but only for validation. If trainPartition is provided the trainingPolygons will be divided into training polygons and validation polygons.

  4. Extract raster data The predictor values on the sample pixels are extracted from img

  5. Fit the model. Using caret::train on the sampled training data the model will be fit, including parameter tuning (tuneLength) in kfold cross-validation. polygonBasedCV=TRUE will define cross-validation folds based on polygons (reccomended) otherwise it will be performed on a per-pixel basis.

  6. Predict the classes of all pixels in img based on the final model.

  7. Validate the model with the independent validation data.


A list containing [[1]] the model, [[2]] the predicted raster and [[3]] the class mapping

See Also



train <- readRDS(system.file("external/trainingPoints.rds", package="RStoolbox"))

## Plot training data
olpar <- par(no.readonly = TRUE) # back-up par
colors <- c("yellow", "green", "deeppink")
plot(train, add = TRUE, col =  colors[train$class], pch = 19)

## Fit classifier (splitting training into 70% training data, 30% validation data)
SC       <- superClass(rlogo, trainData = train, responseCol = "class", 
model = "rf", tuneLength = 1, trainPartition = 0.7)
#> superClass results
#> ************ Validation **************
#> $validation
#> Confusion Matrix and Statistics
#>           Reference
#> Prediction A B C
#>          A 3 0 0
#>          B 0 3 0
#>          C 0 0 3
#> Overall Statistics
#>                Accuracy : 1          
#>                  95% CI : (0.6637, 1)
#>     No Information Rate : 0.3333     
#>     P-Value [Acc > NIR] : 5.081e-05  
#>                   Kappa : 1          
#>  Mcnemar's Test P-Value : NA         
#> Statistics by Class:
#>                      Class: A Class: B Class: C
#> Sensitivity            1.0000   1.0000   1.0000
#> Specificity            1.0000   1.0000   1.0000
#> Pos Pred Value         1.0000   1.0000   1.0000
#> Neg Pred Value         1.0000   1.0000   1.0000
#> Prevalence             0.3333   0.3333   0.3333
#> Detection Rate         0.3333   0.3333   0.3333
#> Detection Prevalence   0.3333   0.3333   0.3333
#> Balanced Accuracy      1.0000   1.0000   1.0000
#> *************** Map ******************
#> $map
#> class       : RasterLayer 
#> dimensions  : 77, 101, 7777  (nrow, ncol, ncell)
#> resolution  : 1, 1  (x, y)
#> extent      : 0, 101, 0, 77  (xmin, xmax, ymin, ymax)
#> coord. ref. : +proj=merc +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
#> data source : in memory
#> names       : class 
#> values      : 1, 3  (min, max)
#> attributes  :
#>  ID value
#>   1     A
#>   2     B
#>   3     C
## Plots
plot(SC$map, col = colors, legend = FALSE, axes = FALSE, box = FALSE)
legend(1,1, legend = levels(train$class), fill = colors , title = "Classes", 
horiz = TRUE,  bty = "n")

plot of chunk unnamed-chunk-1

par(olpar) # reset par

[Package RStoolbox version 0.2.4 Index]