Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues
Despite the large number of recent advances and developments in landslide susceptibility mapping (LSM) there is still a lack of studies focusing on specific aspects of LSM model sensitivity. For example, the influence of factors such as the survey scale of the landslide conditioning variables (LCVs), the resolution of the mapping unit (MUR) and the optimal number and ranking of LCVs have never been investigated analytically, especially on large data sets.
In this paper we attempt this experimentation concentrating on the impact of model tuning choice on the final result, rather than on the comparison of methodologies. To this end, we adopt a simple implementation of the random forest (RF), a machine learning technique, to produce an ensemble of landslide susceptibility maps for a set of different model settings, input data types and scales. Random forest is a combination of Bayesian trees that relates a set of predictors to the actual landslide occurrence. Being it a nonparametric model, it is possible to incorporate a range of numerical or categorical data layers and there is no need to select unimodal training data as for example in linear discriminant analysis. Many widely acknowledged landslide predisposing factors are taken into account as mainly related to the lithology, the land use, the geomorphology, the structural and anthropogenic constraints. In addition, for each factor we also include in the predictors set a measure of the standard deviation (for numerical variables) or the variety (for categorical ones) over the map unit.
As in other systems, the use of RF enables one to estimate the relative importance of the single input parameters and to select the optimal configuration of the classification model. The model is initially applied using the complete set of input variables, then an iterative process is implemented and progressively smaller subsets of the parameter space are considered. The impact of scale and accuracy of input variables, as well as the effect of the random component of the RF model on the susceptibility results, are also examined. The model is tested in the Arno River basin (central Italy). We find that the dimension of parameter space, the mapping unit (scale) and the training process strongly influence the classification accuracy and the prediction process.
This, in turn, implies that a careful sensitivity analysis making use of traditional and new tools should always be performed before producing final susceptibility maps at all levels and scales.