Open Source Software For Land Cover Mapping From Remote Sensing Data

Pieter Kempeneers (VITO)

11:30 on Friday 20th September (in Session 33, starting at 11:30 a.m., EMCC: Room 4)

Show in Timetable

Description: A case study on image classification using free and open source software as part of the IEEE IGARSS 2013 data fusion contest.
Abstract:

Open source software is well established for basic raster and vector data processing, with the Geospatial Data Abstraction Library (GDAL) as one of the most well known tools. Its utilities and application programming interface (API) have become a common standard for data format conversion, reprojection, spatial and spectral subsetting. With its command line interface utilities, GDAL is better suited for the automatic processing of very large amounts of data and for repetitive processing tasks than most of its commercial counterparts. Though GDAL provides an excellent API on which more advanced image processing tasks can be built, not all users have the time or programming skills to get involved such development. In particular within the remote sensing user community, there is a large interest in machine learning techniques applied to remote sensing data. A typical example is the automatic classification of satellite imagery, an area that has been long reserved for commercial software (IDL/ENVI, Erdas Imagine, eCognition, ArcGIS). As one of the exceptions, there is GRASS that has gained a lot of interest. I has R integrated functionalities that offers a variety of statistical tools. Unfortunately, one of the limitations of R is its memory allocation and lack of performance for large data sets. However, GRASS also offers native functionalities that are much better suited for image processing. It combines image processing, visualization and geospatial modeling into a single integrated developing environment, providing its own file management and data structure as an open source Geographical Information System (GIS). More recently, two suites of open source software tools have been developed that combine the power and simplicity of the GDAL command line interface with more advanced and state of the art image processing techniques. The first is the Orfeo toolbox (http://www.orfeo-toolbox.org), released by the French Centre National d'Etudes Spatiales (CNES) under a free software license (CeCILL). It is based on the medical image processing library ITK and offers both a graphical and command line user interface. The second is pktools (http://pktools.nongnu.org), released under the GNU Public License v3. It uses the GDAL API and is available with a command line interface under Linux. Both tools are developed in C++ and are designed for high performance and large data processing. This study, focuses on a case study on a typical land cover/land use classification problem using the Orfeo toolbox and pktools. In particular, the case study deals with the data fusion contest 2013, organized by the Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society (GRSS). The Contest is open to everyone, with the goal of evaluating existing methodologies at the research or operational level to solve remote sensing problems using data from a variety of sensors. The final results of the contest will be announced at the 2013 IEEE International Geoscience and Remote Sensing Symposium in Melbourne, Australia, in July 2013. The current contest involves two datasets – a hyperspectral image and a LiDAR derived Digital Surface Model (DSM), co-registered and both at the same spatial resolution (2.5 m). They were acquired over the University of Houston campus and the neighboring urban area. A particularity of hyperspectral imagery is its high dimensionality. For the current data set, a total of 144 spectral bands were acquired in the 380 nm to 1050 nm region of the electromagnetic spectrum. For the contest, a total of 15 pre-defined classes must be distinguised. The labels as well as a training set for the classes have been put available by the organizing committee. The abundance of spectral information in the hyperspectral image has the potential to differentiate land cover classes with similar spectral characteristics that can not be distinguished with traditional sensors that capture only a few spectral bands in the visual range of the electromagnetic spectrum. As an example, the contest includes three classes for grass that have to be mapped: healthy grass, stressed (unhealthy) grass and synthetic grass. On the other hand, the high dimensionality of the data makes the classification task challenging due to the Hughes fenomenon. Also known as the curse of dimensionality within the machine learning community, this phenomenon typically occurs for classification problems where training data are limited with respect to the dimensionality of the input data. However, some state of the art classification problems such as support vector machines (SVM) have been shown to be more robust to this type of problem than others. Both the Orfeo toolbox and pktools have implemented this SVM technique and were applied for this contest. Another challenge in this contest is that some of the classes are related to land use rather than land cover. As an example, the classes include two types of parking lots, roads must be distinguised from highways and residential from commerical areas. These classes can have identical spectral characteristics, which make them difficult to classifiy with spectral information only. This is where the LiDAR derived DSM can provide valuable additional information to make a better distinction. In addition, the Orfeo toolbox includes powerful feature extraction methods including Haralick and structural feature set (SFS) textures that can provide spatial contextual information to the classifier. Although the results of the contest have not been revealed yet, the obtained result looks promising and can be considered as a potential winning candidate. At the end, the winning solution depends on a number of factors such as the available time of the applicants to fine tune the methods, the amount of manual interaction introduced such as expert knowledge of the area and additional training data obtained from extra sources (e.g., Google Earth). The solution presented here introduces no additional information and requires no manual interaction. It aims for a generic and fully automatic process that can be applied to other classification problems. It also shows that this challenging classification task can be performed using free and open source software only. All steps used for this solution will be presented, showing some of the potential of the Orfeo toolbox and pktools.