This first tutorial will guide you through testing the correct installation of cLASpy_T. The tutorial assumes that the installation is clomplete and that the venv is operational. If this is not the case, see Install.

Explore Dataset

Using point cloud visualization software such as CloudCompare, open the Test/Orne_20130525.las file in the cLASpy_T root directory. You should see something like this:

../../_images/CC_Orne_20130525.png

Note

This point cloud is too light (only 50,000 points) to correspond to a real use case, but it’s useful to test that cLASpy_T works.

To create an automatic classification model, different types of data are required. These are labels, which identify the class to which each point belongs, and features, which describe each point.

Labels

Labels are contained in the ‘Target’ scalar field. You can view these labels by selecting the point cloud Orne_20130525.las, then in Scalar Fields, scroll down and select ‘Target’. You see 9 classes, from 0 to 8 as follow:

  1. Water

  2. Wet sand

  3. Dry sand

  4. Mix (sand + mud)

  5. Mud

  6. Grass and Schorre

  7. High vegetation and Buildings

  8. Roads

  9. Low vegetation

../../_images/CC_Orne_20130525_target.png

Features

Features are all other scalar fields such as ‘Roughness (5)’, ‘Omnivariance (10)’, ‘R’, ‘G’, ‘B’ or ‘Return Number’.

../../_images/CC_Orne_20130525_feature.png

The goal of a supervised machine learning algorithms is to model the membership of points to their class, or their label/target, based on input features. The choice of these features is therefore essential for a consistent and robust model.

Test command-line

The first step is to test cLASpy_T command line, to ensure all library dependencies are properly installed.

You will create a classification model by running a training session with the ‘train’ module and the very light point cloud Orne_20130525.las in the Test folder of the cLASpy_T sources. This point cloud contains the ‘Target’ scalar field and the features needed for training.

Then, you will use the model you created to make predictions on the same point cloud, to ensure that the ‘predict’ module is also fully operational.

If all goes well, you should obtain a folder containing 4 files: a model, a LAS point cloud and two reports, one for training and the other for prediction.

First run

Activate the Python virtual environment (venv) created during installation process, from the cLASpy_T folder.

On Windows:

C:\Users\Me\Code\cLASpy_T>.venv\claspy_venv\Scripts\activate

On Linux:

me@pc:~/Code/cLASpy_T$ source .venv/claspy_venv/bin/activate

cLASpy_T consists of 3 modules: ‘train’, ‘predict’ and ‘segment’. The first and second are used to train a supervised model and make predictions. The ‘segment’ module perform unsupervised machine learning, with the KMeans algorithm.

You can get more details about cLASpy_T and modules with --help command.

Example: help for ‘predict’ module

python cLASpy_T.py predict --help
usage: cLASpy_T.py predict [-h] [-c] [-i] [-o] [-m] [--fillnan]

-------------------------------------------------------------------------------

                          cLASpy_T
                      predict sub-command
                  =========================

'predict' makes predictions on the input point cloud according the selected model.

For predictions, two files are required:
  --> the input_data file with the same features used to create the model.
  --> the '*.model' file created during the training phase.

-------------------------------------------------------------------------------

options:
  -h, --help          show this help message and exit
  -c , --config       give the configuration file with all parameters
                          and selected scalar fields.
                          [WINDOWS]: 'X:/path/to/the/config.json'
                          [UNIX]: '/path/to/the/config.json'

  -i , --input_data   set the input data file:
                          [WINDOWS]: 'X:/path/to/the/input_data.file'
                          [UNIX]: '/path/to/the/input_data.file'

  -o , --output       set the output folder to save all result files:
                          [WINDOWS]: 'X:/path/to/the/output/folder'
                          [UNIX]: '/path/to/the/output/folder'
                          Default: '/path/to/the/input_data/'

  -m , --model        import the model file to make predictions:
                          '/path/to/the/training/file.model'

  --fillnan           set the value to fill NaN for feature values.
                          Could be 'median', 'mean', int or float.
                          Default: '--fillnan='median'

Note

If it doesn’t work, check the cLASpy_T dependencies are installed, as explained in the Install section.

Training

To train your first model with the ‘train’ module, you need to set the algorithm and the input file. All other arguments of ‘train’ module are optional.

Run the following command to train a basic RandomForestClassifier model.

python cLASpy_T.py train -a=rf -i=./Test/Orne_20130525.las
  • -a: set the supervised algorithm, here rf refers to RandomForestClassifier

  • -i: set the point cloud file, here Orne_20130525.las

Training Ouput

The first part of the terminal output shows the cLASpy_T mode, the algorithm used and the input data file.

# # # # # # # # # #  cLASpy_T  # # # # # # # # # # # #
- - - - - - - -    TRAIN MODE    - - - - - - - - - -
* * * *    Point Cloud Classification    * * * * * *

Algorithm used: RandomForestClassifier
Path to LAS file: Test\Orne_20130525.las

Create a new folder to store the result files... Done.

Once the file has been loaded, the output shows the LAS format and the total number of points. Then, the cLASpy_T pipeline starts, with the input data formatted in pandas.DataFrame (see 10 minutes to pandas).

If no list of selected features is provided with --features (-f) argument, the default behavior of cLASpy_T is to retrieve all extra dimensions from the LAS file as selected features. The standard LAS file dimensions are discarded by default.

The ‘train’ module also search the ‘Target’ field in the data and shows the labels used. Here, there are 9 labels, from 0 to 8 as already seen with CloudCompare.

LAS Version: 1.2
LAS point format: 1
Number of points: 50,000

Step 1/7: Formatting data as pandas.DataFrame...

All features in input_data will be used!
Except X, Y, Z and LAS standard dimensions!

LABELS FROM DATASET:
[0, 1, 2, 3, 4, 5, 6, 7, 8]

The second step of the cLASpy_T pipeline is to split dataset into train and test sets, according to the --train_r: the training ratio. Here, the train and test sets are 25,000 points each, according the default --train_r =0.5.

The third step scales the dataset according the --scaler selected: StandardScaler, MinMaxScaler or RobustScaler (see scalers).

Step 2/7: Splitting data in train and test sets...

Random state to split data: 0
        Number of used points: 50 000 pts
        Size of train|test datasets: 25 000 pts | 25 000 pts

Step 3/7: Scaling data...

Warning

With large dataset, this step consumes a lot of RAM and can take a long time if memory is full. If cLASpy_T stops at this stage with RAM full, reduce the size of the point cloud, or increase the computer’s RAM capacity.

Step 4/7 is the actual model training. Depending on the point cloud size, the algorithm used and the number of features and classes, this step is often the longest. It can last from a few minutes to several hours.

The training uses the cross-validation method (CV for short) to ensure robust models. So, 5 training are performed simultaneously on 5 subsets of trainset (see cross-validation). Here, the training set is composed of 25,000 points, so 5 subsets of 5,000 points are created. Each subset, or fold, is used once to test the model trained with the other 4 folds.

Once done, cLASpy_T returns the global accuracy of the 5 sub-models. To check that the model is consistent and robust, the 5 scores must be close to each other. If one or more scores are several units (%) apart, there is a problem with the classes, features, model or training parameters.

Step 4/7: Training model with cross validation...

Random state for the StratifiedShuffleSplit: 0

        Training model scores with cross-validation:
        [0.6934 0.6918 0.6898 0.6878 0.6862]

Model trained!

Note

Check CPUs are working to make sure that cLASpy_T isn’t freezing. The number of CPUs used by cLASpy_T can be set with --n_jobs argument.

After training, cLASpy_T tests the model by making predictions on the 25,000 points of the test set, created during step 2/7. The results are presented in the form of a confusion matrix and a classification report.

The confusion matrix allows to explore in detail the predictions made by the model for each point. The columns present the predictions made by the model, while the rows correspond to the expected classes for each point. The end of each column corresponds to the precision of each class, while the end of each row corresponds to the recall of each class. The global accuracy is the end of the last line, here: 69.6%.

The classification report exposes the same results by classes, with the number of points for each class (support).

Step 5/7: Creating confusion matrix...

CONFUSION MATRIX:
Predicted         0         1        2        3        4         5         6        7         8  Recall
True
0          5064.000   194.000     1.00   16.000    3.000   126.000    21.000   32.000    10.000   0.926
1           355.000  4635.000   367.00   34.000   27.000    29.000     3.000    7.000     2.000   0.849
2             1.000   769.000  2347.00    4.000    0.000    19.000     1.000    7.000    24.000   0.740
3           364.000   735.000    65.00  248.000   15.000   169.000     0.000   10.000     6.000   0.154
4           115.000   794.000    16.00   92.000  151.000    89.000    44.000   75.000    40.000   0.107
5           128.000    40.000    14.00   17.000    6.000  1808.000   200.000   14.000   417.000   0.684
6            20.000     5.000    11.00    1.000    4.000    60.000  1324.000   35.000   419.000   0.705
7           377.000    31.000    23.00    5.000   28.000    60.000   223.000  185.000   104.000   0.179
8             2.000    17.000    53.00    2.000    1.000   168.000   420.000   16.000  1636.000   0.707
Precision     0.788     0.642     0.81    0.592    0.643     0.715     0.592    0.486     0.616   0.696

TEST REPORT:
              precision    recall  f1-score   support

          0       0.79      0.93      0.85      5467
          1       0.64      0.85      0.73      5459
          2       0.81      0.74      0.77      3172
          3       0.59      0.15      0.24      1612
          4       0.64      0.11      0.18      1416
          5       0.72      0.68      0.70      2644
          6       0.59      0.70      0.64      1879
          7       0.49      0.18      0.26      1036
          8       0.62      0.71      0.66      2315

    accuracy                           0.70     25000
  macro avg       0.65      0.56      0.56     25000
weighted avg       0.69      0.70      0.66     25000

The step 6/7 save the model, and all other needed parameters such as scaler, in a binary file with a .model extension. This binary file is created with joblib python library.

The last step writes all relevant training parameters to a report file.

Step 6/7: Saving model and scaler in file:

Model path: Test\Orne_20130525/
Model file: train_rf50kpts_1217_1619.model

Step 7/7: Creating classification report:
Test\Orne_20130525/train_rf50kpts_1217_1619.txt

Training done in 0:00:03.095406

Note

Bravo ! You trained your first machine learning model with cLASpy_T.

Prediction

You now have a trained machine learning model. We’ll use it on the same dataset with the ‘predict’ module to make predictions and check that this module is working properly.

To use your first model with the ‘predict’ module, you must pass the .model file with the -m argument and set the input file. All other arguments of ‘predict’ module are optional.

Run the following command to make prediction with your model.

python cLASpy_T.py predict -m=Test/Orne_20130525/train_gb50kpts_mmjj_HHMM.model -i=Test/Orne_20130525.las
  • -m: set the model to make predictions, change train_gb50kpts_mmjj_HHMM.model with your model file.

  • -i: set the point cloud file, here Orne_20130525.las

Prediction Ouput

The first part of the terminal output shows the cLASpy_T mode.

At the first step, cLASpy_T loads the model and gives the labels, the original algorithm and the input data file. Once the file has been loaded, the output shows the LAS format and the total number of points.

# # # # # # # # # #  cLASpy_T  # # # # # # # # # # # #
- - - - - - - -    PREDICT MODE    - - - - - - - - - -
* * * * *    Point Cloud Classification    * * * * * *

Step 1/6: Loading model...
LABELS FROM MODEL:
[0, 1, 2, 3, 4, 5, 6, 7, 8]

      Any PCA data to load from model.

Algorithm used: RandomForestClassifier
Path to LAS file: Test\Orne_20130525.las

Create a new folder to store the result files... Folder already exists.

LAS Version: 1.2
LAS point format: 1
Number of points: 50,000

The second step, the ‘predict’ module format data as pandas.DataFrame and check that all features used by the model are in the input data. At this point, the ‘Target’ field is discarded if present.

Step 2/6: Formatting data as pandas.DataFrame...

Get selected features:
- roughness_(2) asked --> Roughness (2) found
- surface_variation_(10) asked --> Surface variation (10) found
- eigenentropy_(10) asked --> Eigenentropy (10) found
- anisotropy_(2) asked --> Anisotropy (2) found
- g asked --> G found
...
...
...
- omnivariance_(5) asked --> Omnivariance (5) found
- saturation asked --> Saturation found
- anisotropy_(10) asked --> Anisotropy (10) found
- omnivariance_(2) asked --> Omnivariance (2) found
- calibintensity_(5) asked --> CalibIntensity (5) found
- original_intensity asked --> Original_Intensity found
- sphericity_(10) asked --> Sphericity (10) found
- surface_variation_(5) asked --> Surface variation (5) found

Number of selected features: 32
Number of final used features: 32
--> All required features are present!