The Importance of Forests
Forests are one of humankind’s most valuable resources. They cover 30% of the Earth, providing habitats for wildlife, wood for buildings and fuel, and most importantly 20 – 50% of the Earth’s oxygen(Carlowicz, 2012). Governments around the world have recognized the importance of forests and in doing so have created institutions to manage them. In the United States the US Forest Service is charged with this task but the US government also funds private agencies to monitor environmental health. This includes monitoring forests. The National Ecological Observatory Network (NEON) is one such agency. NEON, funded by the National Science Foundation (NSF), uses a variety of methods to collect both aquatic and land based environmental data.
Forest Inventory
The systematic process of collecting data on forests is called forest inventory. This data can include species distribution, species abundance, tree geolocations, canopy area, height, and tree health, among other things. Soil and tree samples are also collected as part of these inventories. Some data is collected by surveyors on the ground while other data is collected from airborne platforms such as planes or satellites. Most inventories use a combination of these methods.
NEON monitors forests across the continental US, Alaska and Puerto Rico. NEON divides its area of responsibility into regions called domains (See Figure 1). There are 16 domains throughout the US and Puerto Rico. Each domain contains a NEON observation site. As part of its forest inventory process NEON collects data on forests within each observation site. NEON flies planes over each site once a year to collect hyperspectral, RGB, and LIDAR data from forests (Kampe, 2010). To ensure consistency within the data these annual flights are scheduled when forests should be at their maximum greenness. These data products are publicly available at the NEON website.
Collecting data manually is a slow process. NEON sites cover 1000’s of hectares. Ideally, a forest inventory would include every tree and cover ever square meter of land on an observation site but time and manpower constraints limit the granularity of the inventory process. Thus, forest health is estimated by sampling a subset of the forest area. However, data from airborne platforms can quickly collect image and lidar data from hectares of land in a single flyover. The NEON airborne observation platforms (AOPs) collect terabytes of data from each site. While the precision of the data varies with the instrument used, the data is collected for the entire forested area. The challenge then becomes how to leverage airborne data to improve the inventory process.
Tree Species Classification
One straightforward application of airborne hyperspectral data is tree species identification. A normal photograph captures light reflected from a scene that falls within the visible spectrum. For human beings this is light with wavelengths between 380 and 700 nanometers. Hyperspectral images capture a much larger swath of the electromagnetic spectrum. NEON’s hyperspectral cameras capture light with wavelength’s between 380nm and 2510nm with a resolution of 1 square meter per pixel. These wavelengths are separated into 426 5nm bands. Each band can be thought of as 1 color channel. As regular photographs are commonly decomposed into red, green, and blue color channels, NEON’s hyperspectral images are decomposed into 426 color channels. Using only 16-bits of color depth the number of “colors” that can be represented is on the order of 10^2051. In practice, this space is reduced by removing channels that are more influenced by atmospheric factors. Nevertheless, with such a huge color space, it’s not hard to imagine that tree species signatures are unique enough to be identified based on their hyperspectral data. But this process is further complicated by forest canopy structure. Many forests have closed canopies. This means that the crowns of individual trees become confluent. In order to make use of hyperspectral data individual tree crowns must be separated in the images. Therefore, the problem of species identification from airborne data is a two part problem
1) Crown delineation
2) Species identification.
Both problems are active areas of research. A third problem that arises in attempting to resolve the first two is aligning remote sensing data with data collected on the ground. From an aerial perspective in closed canopy forests the location of the tree stem is not visible. Though AOPs remote sensing data assigns geospatial coordinates to image pixels, canopy centroids don’t always align with tree stems. Therefore, there may be confusion between which stems correspond to which canopies. This problem is particularly important in the creation of datasets, as tree species are most accurately identified from traits that are observed in ground collected data.
Data Science Challenge
The University of Florida’s Weecology Lab in collaboration with the data science lab (DSR) hosted the IDTrees Data Science Competition to crowd source potential solutions to these problems. The 2017 competition used NEON remote sensing data collected from the Ordway Swisher Biological Station (OSBS) in Florida in addition to field data. It included hyperspectral, high resolution RGB, and LiDAR data from 8 species of trees, as well as ground based measurements of tree size, location, and type. The teams competed at 3 tasks: individual tree crown delineation, crown alignment with field data, and species identification. Teams could participate in any combination of tasks.
The Data
OSBS NEON AOPs data from 2014 and 2015 was combined with data collected in the field in 2017. The final dataset consisted of LiDAR point cloud data, a canopy height model, hyperspectral reflectance data, and high resolution RGB photographs. The spatial resolution of each modality is given in the chart below. The species classification data consisted of 6831 pixels from 452 tree crowns. The species used were Quercus laevis, Pinus palustris, Acer rubrum, Liquidambar styraciflua, Quercus geminata, Pinus taeda, Quercus nigra, and Pinus elliottii. An “other” category was also included. The dataset is highly imbalanced. The chart in Figure 2 gives the frequency of each species within the dataset. Examples of each raster are shown in the Figure 3.
Data Product | Description | Spatial Resolution | Data Format |
---|---|---|---|
RGB photographs | Raster data from visible spectrum | 100 cm^2 | GeoTiff |
LiDAR point cloud | Height of surface features on the ground given in rectangular coordinates | 6 points per m^2 | .las |
LiDAR canopy height model | Raster data with the height of the canopy vegetation | 1 m^2 | GeoTiff |
Hyperspectral surface reflectance | Raster data of surface reflected light in 380 - 2510 nm wavelength | 1 m^2 | GeoTiff |
Metrics
The Jaccard coefficient was used to measure the performance of the segmentation algorithms submitted. Given two areas, A and B, the Jaccard coefficient is given by
(1)
Here A is the ground truth area and B is the estimated area. If there is no overlap between the two a score of zero is returned. If the two areas overlap perfectly a score of 1 is returned.
The goal of the alignment task was to align field data with remote sensing data. In the field data, the location of the stems are mapped while the remote sensing data captures mostly the canopy information. The alignment algorithm must match each crown with exactly one stem. The performance on this task was measured using the trace of the prediction matrix given as
(2)
where M is the submitted prediction matrix.
The species classification tasks requires competitors to classify a pixel based on hyperspectral, RGB or LiDAR data or some combination of the three. Species classification performance was measured using two measures, categorical cross entropy and rank-1 accuracy.
Categorical cross entropy is given by the following formula
(3)
Rank-one accuracy is given by
(4)
Models submitted
Six teams participated in the competition. Table 2 gives a list of each teams submission and a brief description of the model they used as well as a link to a their paper with a detailed discussion of their implementation.
Group | Method | Reference |
---|---|---|
Task 1 | ||
FEM | itcSegment | Dalponte, Frizzera & Gianelle (2018) |
Shawn | Watershed based on CHM and NDVI | Taylor (2018) |
Conor | Watershed based on CHM and NDVI | McMahon (2018) |
Task 2 | ||
FEM | Euclidean distance of spatial coordinates, height, and crown radius | Dalponte, Frizzera & Gianelle (2018) |
Conor | RMS minimization of relations between geographic coordinates and estimated geographic coordinates | McMahon (2018) |
Task 3 | ||
FEM | Support vector machine | Dalponte, Frizzera & Gianelle (2018) |
Conor | Ensemble of maximum likelihood classifiers | McMahon (2018) |
Stnfd.CCB | Ensemble of random forest and gradient boosting | Anderson (2018) |
GtrSns | Multi instance adaptive cosine estimator | Zou, Gader & Zare (2018) |
BRG | Multilayer perceptron | Sumsion et al. (2018) |
Results
The results of the competition are summarized in the tables below. The crown delineation task was the most difficult and garnered the worst results. Crown delineation, which is akin to image segmentation, is an inherently difficult task. Image segmentation is an active area of research. A more in depth discussion of the results in available in Marconi et al.
In the alignment task, the FEM team was able to get a perfect score. This is probably because the dataset was made artificially simple by only including crowns in the task that had matching associated stems(Marconi et al., 2018).
The best performing model in the species classification task was the random forest-gradient boosting ensemble entry. This is surprising considering other entries were built on more powerful models, such as MLPs. However, research in remote sensing based species classification suggests that one of the biggest pitfalls in the use of hyperspectral data is overfitting, a trap which models like MLPs easily fall into, while decision tree based methods are more resilient (Fassnacht et al, 2016). The problem of overfitting can be exacerbated by a small datasets, as was the case for the competition. Furthermore, since only a fraction of the of the variance in hyperspectral signatures is due to differences in species traits with the remainder coming from environmental variables, equipment parameters, and noise, etc. use of dimensionality reduction techniques such as PCA or ICA becomes more important as it aids in homing in on relevant features. Prudent use of PCA in combination with a decision tree based methods may explain why this model performed so well.
Task 1: Crown Delineation
Participant | Model | Score |
---|---|---|
FEM | itcSegment | 0.3402 |
Conor | Watershed based on CHM and NDVI | 0.184 |
Shawn | Watershed based on CHM and NDVI | 0.0555 |
Task 2: Alignment
Participant | Model | Score |
---|---|---|
FEM | Euclidian distance of spatial coordinates, height, and crown radius | 1 |
Conor | RMS minimization of relations between geographic coordinates and estimated crown diameter | 0.48 |
Task 3: Species Classification
Participant | Model | Cross Entropy | Rank-1 Accuracy |
---|---|---|---|
StanfordCCB | Ensemble of random forest and gradient boosting | 0.4465 | 0.9194 |
FEM | SVM | 0.8769 | 0.88 |
GatorSense | Multi-instance adaptive cosine estimator | 0.9386 | 0.864 |
Conor | Ensemble of maximum likelihood classifiers | 1.2247 | 0.8226 |
BRG | Multilayer perceptron | 1.4478 | 0.688 |
Conclusion
Overall, the competition was a success. The species classification results were particularly promising. Such a high level of accuracy bodes well for automating species classification from remote sensing data. Both the Weecology and DSR labs hope to hold the challenge annually and continue to advance techniques in this vital area.
References
- Anderson CB. 2018. The CCB-ID approach to tree species mapping with airborne imaging spectroscopy. PeerJ 6:e5666 https://doi.org/10.7717/peerj.5666
- Carlowicz, Michael. “Seeing Forests for the Trees and the Carbon: Mapping the World’s Forests in Three Dimensions.” NASA, NASA, 12 Jan. 2012, earthobservatory.nasa.gov/features/ForestCarbon.
-
Fassnacht, Fabian Ewald, et al. “Review of studies on tree species classification from remotely sensed data.” Remote Sensing of Environment 186 (2016): 64-87.
- “IDTrees” NIST DSE Plant Identification with NEON Remote Sensing Data, University of Florida Weecology Lab, 2017, www.ecodse.org/.
- Kampe, Thomas U., et al. “NEON: the first continental-scale ecological observatory with airborne remote sensing of vegetation canopy biochemistry and structure.” Journal of Applied Remote Sensing 4.1 (2010): 043510.
- Leah, Wasser A. “About Hyperspectral Remote Sensing Data.” About Hyperspectral Remote Sensing Data | NSF NEON | Open Data to Understand Our Ecosystems, 7 Oct. 2020, www.neonscience.org/resources/learning-hub/tutorials/hyper-spec-intro.
- Marconi, Sergio, et al. “A data science challenge for converting airborne remote sensing data into ecological information.” PeerJ 6 (2019): e5843.