This first tutorial will guide you through testing the correct installation of |claspyt|. The tutorial assumes that the installation is clomplete and that the *venv* is operational. If this is not the case, see :doc:`/install`. Explore Dataset ***************** Using point cloud visualization software such as `CloudCompare`_, open the :file:`Test/Orne_20130525.las` file in the |claspyt| root directory. You should see something like this: .. image:: CC_Orne_20130525.png :width: 600 .. note:: This point cloud is too light (only 50,000 points) to correspond to a real use case, but it's useful to test that |claspyt| works. To create an automatic classification model, different types of data are required. These are **labels**, which identify the class to which each point belongs, and **features**, which describe each point. Labels ======= Labels are contained in the **'Target'** scalar field. You can view these labels by selecting the point cloud :file:`Orne_20130525.las`, then in :command:`Scalar Fields`, scroll down and select **'Target'**. You see 9 classes, from 0 to 8 as follow: 0. Water #. Wet sand #. Dry sand #. Mix (sand + mud) #. Mud #. Grass and Schorre #. High vegetation and Buildings #. Roads #. Low vegetation .. image:: CC_Orne_20130525_target.png :width: 600 Features ========== Features are all other scalar fields such as **'Roughness (5)'**, **'Omnivariance (10)'**, **'R'**, **'G'**, **'B'** or **'Return Number'**. .. image:: CC_Orne_20130525_feature.png :width: 600 The goal of a supervised machine learning algorithms is to model the membership of points to their **class**, or their **label/target**, based on input **features**. The choice of these **features** is therefore essential for a consistent and robust **model**. Test command-line ******************* The first step is to test |claspyt| command line, to ensure all library dependencies are properly installed. You will create a classification model by running a training session with the **'train'** module and the very light point cloud :file:`Orne_20130525.las` in the :file:`Test` folder of the |claspyt| sources. This point cloud contains the **'Target'** scalar field and the **features** needed for training. Then, you will use the model you created to make predictions on the same point cloud, to ensure that the **'predict'** module is also fully operational. If all goes well, you should obtain a folder containing **4 files**: a model, a LAS point cloud and two reports, one for training and the other for prediction. First run ========== Activate the Python virtual environment (*venv*) created during installation process, from the :file:`cLASpy_T` folder. On Windows: .. code-block:: doscon C:\Users\Me\Code\cLASpy_T>.venv\claspy_venv\Scripts\activate On Linux: .. code-block:: console me@pc:~/Code/cLASpy_T$ source .venv/claspy_venv/bin/activate |claspyt| consists of 3 modules: **'train', 'predict'** and **'segment'**. The first and second are used to train a supervised model and make predictions. The **'segment'** module perform unsupervised machine learning, with the **KMeans** algorithm. You can get more details about |claspyt| and modules with :command:`--help` command. Example: help for **'predict'** module .. code-block:: console python cLASpy_T.py predict --help .. code-block:: console usage: cLASpy_T.py predict [-h] [-c] [-i] [-o] [-m] [--fillnan] ------------------------------------------------------------------------------- cLASpy_T predict sub-command ========================= 'predict' makes predictions on the input point cloud according the selected model. For predictions, two files are required: --> the input_data file with the same features used to create the model. --> the '*.model' file created during the training phase. ------------------------------------------------------------------------------- options: -h, --help show this help message and exit -c , --config give the configuration file with all parameters and selected scalar fields. [WINDOWS]: 'X:/path/to/the/config.json' [UNIX]: '/path/to/the/config.json' -i , --input_data set the input data file: [WINDOWS]: 'X:/path/to/the/input_data.file' [UNIX]: '/path/to/the/input_data.file' -o , --output set the output folder to save all result files: [WINDOWS]: 'X:/path/to/the/output/folder' [UNIX]: '/path/to/the/output/folder' Default: '/path/to/the/input_data/' -m , --model import the model file to make predictions: '/path/to/the/training/file.model' --fillnan set the value to fill NaN for feature values. Could be 'median', 'mean', int or float. Default: '--fillnan='median' .. note:: If it doesn't work, check the |claspyt| dependencies are installed, as explained in the :doc:`/install` section. Training ========= To train your first model with the **'train'** module, you need to set the algorithm and the input file. All other arguments of **'train'** module are optional. Run the following command to train a basic *RandomForestClassifier* model. .. code-block:: console python cLASpy_T.py train -a=rf -i=./Test/Orne_20130525.las * **-a**: set the supervised algorithm, here *rf* refers to *RandomForestClassifier* * **-i**: set the point cloud file, here :file:`Orne_20130525.las` Training Ouput --------------- The first part of the terminal output shows the |claspyt| mode, the algorithm used and the input data file. .. code-block:: console # # # # # # # # # # cLASpy_T # # # # # # # # # # # # - - - - - - - - TRAIN MODE - - - - - - - - - - * * * * Point Cloud Classification * * * * * * Algorithm used: RandomForestClassifier Path to LAS file: Test\Orne_20130525.las Create a new folder to store the result files... Done. Once the file has been loaded, the output shows the LAS format and the total number of points. Then, the |claspyt| pipeline starts, with the input data formatted in pandas.DataFrame (see `10 minutes to pandas`_). If no list of selected features is provided with :command:`--features (-f)` argument, the default behavior of |claspyt| is to retrieve all extra dimensions from the LAS file as selected features. The standard LAS file dimensions are discarded by default. The **'train'** module also search the **'Target'** field in the data and shows the labels used. Here, there are 9 labels, from 0 to 8 as already seen with `CloudCompare`_. .. code-block:: console LAS Version: 1.2 LAS point format: 1 Number of points: 50,000 Step 1/7: Formatting data as pandas.DataFrame... All features in input_data will be used! Except X, Y, Z and LAS standard dimensions! LABELS FROM DATASET: [0, 1, 2, 3, 4, 5, 6, 7, 8] The second step of the |claspyt| pipeline is to split dataset into train and test sets, according to the :command:`--train_r`: the training ratio. Here, the train and test sets are 25,000 points each, according the default :command:`--train_r` =0.5. The third step scales the dataset according the :command:`--scaler` selected: *StandardScaler, MinMaxScaler or RobustScaler* (see `scalers`_). .. code-block:: console Step 2/7: Splitting data in train and test sets... Random state to split data: 0 Number of used points: 50 000 pts Size of train|test datasets: 25 000 pts | 25 000 pts Step 3/7: Scaling data... .. warning:: With large dataset, this step consumes a lot of RAM and can take a long time if memory is full. If |claspyt| stops at this stage with RAM full, reduce the size of the point cloud, or increase the computer's RAM capacity. Step 4/7 is the actual model training. Depending on the point cloud size, the algorithm used and the number of features and classes, this step is often the longest. It can last from a few minutes to several hours. The training uses the cross-validation method (CV for short) to ensure robust models. So, 5 training are performed simultaneously on 5 subsets of trainset (see `cross-validation`_). Here, the training set is composed of 25,000 points, so 5 subsets of 5,000 points are created. Each subset, or fold, is used once to test the model trained with the other 4 folds. Once done, |claspyt| returns the global accuracy of the 5 sub-models. To check that the model is consistent and robust, the 5 scores must be close to each other. If one or more scores are several units (%) apart, there is a problem with the classes, features, model or training parameters. .. code-block:: console Step 4/7: Training model with cross validation... Random state for the StratifiedShuffleSplit: 0 Training model scores with cross-validation: [0.6934 0.6918 0.6898 0.6878 0.6862] Model trained! .. note:: Check CPUs are working to make sure that |claspyt| isn't freezing. The number of CPUs used by |claspyt| can be set with :command:`--n_jobs` argument. After training, |claspyt| tests the model by making predictions on the 25,000 points of the test set, created during step 2/7. The results are presented in the form of a `confusion matrix`_ and a `classification report`_. The `confusion matrix`_ allows to explore in detail the predictions made by the model for each point. The columns present the predictions made by the model, while the rows correspond to the expected classes for each point. The end of each column corresponds to the **precision** of each class, while the end of each row corresponds to the **recall** of each class. The **global accuracy** is the end of the last line, here: **69.6%**. The `classification report`_ exposes the same results by classes, with the number of points for each class (support). .. code-block:: console Step 5/7: Creating confusion matrix... CONFUSION MATRIX: Predicted 0 1 2 3 4 5 6 7 8 Recall True 0 5064.000 194.000 1.00 16.000 3.000 126.000 21.000 32.000 10.000 0.926 1 355.000 4635.000 367.00 34.000 27.000 29.000 3.000 7.000 2.000 0.849 2 1.000 769.000 2347.00 4.000 0.000 19.000 1.000 7.000 24.000 0.740 3 364.000 735.000 65.00 248.000 15.000 169.000 0.000 10.000 6.000 0.154 4 115.000 794.000 16.00 92.000 151.000 89.000 44.000 75.000 40.000 0.107 5 128.000 40.000 14.00 17.000 6.000 1808.000 200.000 14.000 417.000 0.684 6 20.000 5.000 11.00 1.000 4.000 60.000 1324.000 35.000 419.000 0.705 7 377.000 31.000 23.00 5.000 28.000 60.000 223.000 185.000 104.000 0.179 8 2.000 17.000 53.00 2.000 1.000 168.000 420.000 16.000 1636.000 0.707 Precision 0.788 0.642 0.81 0.592 0.643 0.715 0.592 0.486 0.616 0.696 TEST REPORT: precision recall f1-score support 0 0.79 0.93 0.85 5467 1 0.64 0.85 0.73 5459 2 0.81 0.74 0.77 3172 3 0.59 0.15 0.24 1612 4 0.64 0.11 0.18 1416 5 0.72 0.68 0.70 2644 6 0.59 0.70 0.64 1879 7 0.49 0.18 0.26 1036 8 0.62 0.71 0.66 2315 accuracy 0.70 25000 macro avg 0.65 0.56 0.56 25000 weighted avg 0.69 0.70 0.66 25000 The step 6/7 save the model, and all other needed parameters such as scaler, in a binary file with a :file:`.model` extension. *This binary file is created with joblib python library.* The last step writes all relevant training parameters to a report file. .. code-block:: console Step 6/7: Saving model and scaler in file: Model path: Test\Orne_20130525/ Model file: train_rf50kpts_1217_1619.model Step 7/7: Creating classification report: Test\Orne_20130525/train_rf50kpts_1217_1619.txt Training done in 0:00:03.095406 .. note:: Bravo ! You trained your first machine learning model with |claspyt|. Prediction =========== You now have a trained machine learning model. We'll use it on the same dataset with the **'predict'** module to make predictions and check that this module is working properly. To use your first model with the **'predict'** module, you must pass the :file:`.model` file with the **-m** argument and set the input file. All other arguments of **'predict'** module are optional. Run the following command to make prediction with your model. .. code-block:: console python cLASpy_T.py predict -m=Test/Orne_20130525/train_gb50kpts_mmjj_HHMM.model -i=Test/Orne_20130525.las * **-m**: set the model to make predictions, change :file:`train_gb50kpts_mmjj_HHMM.model` with your model file. * **-i**: set the point cloud file, here :file:`Orne_20130525.las` Prediction Ouput ------------------ The first part of the terminal output shows the |claspyt| mode. At the first step, |claspyt| loads the model and gives the labels, the original algorithm and the input data file. Once the file has been loaded, the output shows the LAS format and the total number of points. .. code-block:: console # # # # # # # # # # cLASpy_T # # # # # # # # # # # # - - - - - - - - PREDICT MODE - - - - - - - - - - * * * * * Point Cloud Classification * * * * * * Step 1/6: Loading model... LABELS FROM MODEL: [0, 1, 2, 3, 4, 5, 6, 7, 8] Any PCA data to load from model. Algorithm used: RandomForestClassifier Path to LAS file: Test\Orne_20130525.las Create a new folder to store the result files... Folder already exists. LAS Version: 1.2 LAS point format: 1 Number of points: 50,000 The second step, the **'predict'** module format data as pandas.DataFrame and check that all features used by the model are in the input data. At this point, the **'Target'** field is discarded if present. .. code-block:: console Step 2/6: Formatting data as pandas.DataFrame... Get selected features: - roughness_(2) asked --> Roughness (2) found - surface_variation_(10) asked --> Surface variation (10) found - eigenentropy_(10) asked --> Eigenentropy (10) found - anisotropy_(2) asked --> Anisotropy (2) found - g asked --> G found ... ... ... - omnivariance_(5) asked --> Omnivariance (5) found - saturation asked --> Saturation found - anisotropy_(10) asked --> Anisotropy (10) found - omnivariance_(2) asked --> Omnivariance (2) found - calibintensity_(5) asked --> CalibIntensity (5) found - original_intensity asked --> Original_Intensity found - sphericity_(10) asked --> Sphericity (10) found - surface_variation_(5) asked --> Surface variation (5) found Number of selected features: 32 Number of final used features: 32 --> All required features are present! .. _CloudCompare: https://www.cloudcompare.org/ .. _10 minutes to pandas: https://pandas.pydata.org/docs/user_guide/10min.html#min .. _scalers: https://scikit-learn.org/stable/auto_examples/preprocessing/plot_all_scaling.html .. _cross-validation: https://scikit-learn.org/stable/modules/cross_validation.html .. _confusion matrix: https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix .. _classification report: https://scikit-learn.org/stable/modules/model_evaluation.html#classification-report