Machine Learning for LiDAR Classification

ASOG Article of the Month: May 2021

Hey everyone, my name is Jackson Beebe and I'm a former ASO turned desk-jockey working as a geospatial analyst in the Tampa area. Recently I've been working on some tools that will better automate the LiDAR classification process and wrote an article about it for another publication. I'm sharing here because I thought y'all would appreciate the read. I'd love to hear any thoughts or input!

In the airborne LiDAR industry, after acquisition and calibration, roughly 30-40% of a project’s processing budget is dedicated towards the classification of points according to PM’s that I’ve spoken with. The derivative products that are then produced all depend on that point cloud being classified accurately. While there are programs in place that automate this process, those routines still require an analyst to manually comb through the dataset and verify the accuracy of the programs and manually make changes to the data as needed. With readily available machine learning algorithms, I think we can create a workflow that will completely and accurately classify LiDAR data automatically.

To improve the classification process, I’ve been experimenting with using various decision-tree algorithms to classify small datasets of both topographic and bathymetric lidar that were provided by NOAA and USGS. (between 1-5 tiles) So far, the results have been promising with classification accuracies hovering around 98 percent. The decision trees that I’ve been using are predictive models that work by taking observations about data, (X, Y, Z, intensity, returns, scan angle) and then use those observations to gradually work towards a conclusion about the target variable (classification). In simpler terms, the machine learning algorithm is playing “21 questions” with the provided information to narrow down the target.

When tackling a project with AI, there are a host of methods and algorithms to pick from. Decision tree algorithms are preferable to other machine learning or deep learning algorithms for a few reasons, but mostly because they are not resource intensive. Many machine learning programs require powerful GPUs or proc servers, while decision trees are designed to work well with CPUs. For example, on a desktop CPU, these algorithms have been able to train in time frames ranging from 30 seconds to a few minutes per tile and classify individual tiles just as fast. Along with the speed, decision trees are explainable. We can look at each individual classification and see how and why the algorithm came to that conclusion.

The most promising aspect of my tests so far is that these algorithms are only using X, Y and Z values, intensity data, return number vs total number of returns and scan angle, rather than several datapoints that are already used by existing LiDAR classification programs. For example, a noise classification program could look at an individual point and determine if it is noise based on the number of neighboring points and the distance of those neighboring points. By including datapoints like the ones previously mentioned and increasing the number of relevant observations about the data, I think we will see the largest gains in accuracy and get closer towards having “hands-free” LiDAR classification.

While there is still plenty of research to be done, from the tests that I’ve seen so far, this seems like a viable solution to the manpower shortages that everyone faces and the timelines that we are always racing.

News & Information

Machine Learning for LiDAR Classification

Comments