On the Importance of Train-Test Split Ratio of Datasets in Automatic Landslide Detection by Supervised Classification
Kamila Pawłuszek-Filipiak , Andrzej Borkowski
AbstractMany automatic landslide detection algorithms are based on supervised classification of various remote sensing (RS) data, particularly satellite images and digital elevation models (DEMs) delivered by Light Detection and Ranging (LiDAR). Machine learning methods require the collection of both training and testing data to produce and evaluate the classification results. The collection of good quality landslide ground truths to train classifiers and detect landslides in other regions is a challenge, with a significant impact on classification accuracy. Taking this into account, the following research question arises: What is the appropriate training–testing dataset split ratio in supervised classification to effectively detect landslides in a testing area based on DEMs? We investigated this issue for both the pixel-based approach (PBA) and object-based image analysis (OBIA). In both approaches, the random forest (RF) classification was implemented. The experiments were performed in the most landslide-affected area in Poland in the Outer Carpathians-Rożnów Lake vicinity. Based on the accuracy assessment, we found that the training area should be of a similar size to the testing area. We also found that the OBIA approach performs slightly better than PBA when the quantity of training samples is significantly lower than the testing samples. To increase detection performance, the intersection of the OBIA and PBA results together with median filtering and the removal of small elongated objects were performed. This allowed an overall accuracy (OA) = 80% and F1 Score = 0.50 to be achieved. The achieved results are compared and discussed with other landslide detection-related studies.
|Journal series||Remote Sensing, ISSN 2072-4292, (N/A 100 pkt)|
|Publication size in sheets||1.55|
|Keywords in English||automatic landslide detection; OBIA; PBA; random forests; supervised classification|
|License||Journal (articles only); published final; ; with publication|
|Score||= 100.0, 13-04-2021, ArticleFromJournal|
|Publication indicators||= 0; = 0; : 2017 = 1.559; : 2018 = 4.118 (2) - 2018=4.74 (5)|
|Citation count*||1 (2021-05-10)|
* presented citation count is obtained through Internet information analysis and it is close to the number calculated by the Publish or Perish system.