geocomputing/R/R_LiDAR/R_lidar_course_exercises/session3.3-lidR-LAScatalog-class.Rmd at master · csc-training/geocomputing · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
---
title: "The lidR package - LAScatalog formal class"
author: "Eduardo González"
date: "23. February 2019"
output:
  html_document:
    df_print: paged
    theme: flatly
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(lidR)
```


This materials have been adapted from the official [lidR wiki documentation](https://github.com/Jean-Romain/lidR/wiki).

A LAScatalog class is a representation in R of a file or a set of las files not loaded in memory. Indeed, a regular computer cannot load the entire point cloud in R if it covers a wide area. For very high density datasets it can even fail loading a single file. In lidR, we use a LAScatalog to process datasets that cannot fit in memory.

## Build a LAScatalog object reading a folder of las files

Create a LAScatalog is stragight forward, you can use a folder or a las file as follows:
```r
ctg <- catalog("path/to/las/files/")
# or
ctg <- catalog("path/to/las/files/big_file.las")
```

To read a folder of las files and plot their footprint to a map viewer:
```{r}
ctg_subset <- catalog("/appl/data/geo/mml/laserkeilaus/2008_latest/2019/U442/1/")
print(ctg_subset)
plot(ctg_subset, map = TRUE)
```


## Basic structure of a LAScatalog object

A LAScatalog inherits a SpatialPolygonsDataFrame. Thus, it has the structure of a SpatialPolygonsDataFrame plus some extra slots that store information relative to how the LAScatalog will be processed.

The slot `data` of a LAScatalog object contains a `data.frame` with the most important information read from the header of .las or .laz files. Reading only the header of the file provides an overview of the content of the files very quickly without actually loading the point cloud. The columns of the table are named after the [LAS specification](http://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf) version 1.4

```{r}
ctg_subset@data
```


See other slots documentation with:
```{r}
?`LAScatalog-class`
```

## Validation of LAScatalog object

Users commonly report bugs arising from the fact that the point cloud is invalid. This is why we introduced the function `lascheck` to perform an inspection of the LAScatalog objects. This function checks if a LAScatalog object is consistent (files are all of the same type for example). For example, it may happen that a catalog mixes files of type 1 with files of type 3 or files with different scale factors.

```{r}
lascheck(ctg_subset)
```

The function `lascheck` when applied to a catalog does not perform a deep inspection of the point cloud unlike when applied to a LAS object. Indeed the point cloud is not actually read.


## Display a LAScatalog object

lidR provides a simple `plot` function to plot a LAScatalog object:

```{r, fig.width=4, fig.height=4}
plot(ctg_subset)
```

The option `mapview = TRUE` displays the catalog on an interactive map with pan and zoom and allows the addition of a satellite map in the background. It uses the package `mapview` internally. It is often useful to check if the CRS of the file are properly registered. The epsg code of the las file appears to be often incorrect, according to our own experience.

```{r}
plot(ctg_subset, mapview = TRUE, map.type = "Esri.WorldImagery")
```


## Apply lidR functions on a LAScatalog

Most of lidR functions are compatible with a LAScatalog and work almost like with a single point cloud loaded in memory. In the following example we use the function `grid_metrics` to compute the mean elevation of the points. The output is a continuous wall-to-wall RasterLayer. It works exactly as if the input was a LAS object.

```{r}
hmean <- grid_metrics(las = ctg_subset, func = mean(Z), res = 20)
```

However, processing a LAScatalog usually requires some tuning of the processing options to get better control of the computation. Indeed, if the catalog is huge the output is likely to be huge as well, and maybe the output cannot fit in the R memory. For example, `lasnormalize` throws an error if used 'as is' without tuning the processing options. Using `lasnormalize` like in the following example the expected `output` would be a huge point cloud loaded in memory. The lidR package forbids such a calls.

Instead, one can use the processing option `opt_output_files`. Processing options drive how the big files are split in small chunks and how the outputs are either returned into R or written on disk into files.

```{r, error = TRUE, purl = FALSE}
opt_output_files(ctg_subset) <- "./outputs/hmean_{ORIGINALFILENAME}"
summary(ctg_subset)
hmean <- grid_metrics(las = ctg_subset, func = mean(Z), res = 20)

```

Now besides the `hmean` R object, the resulting tif files have been saved to the `outputs` folder. The user can check how the catalog will be processed by calling `summary`.

In the next example, a catalog is normalized. The output is not a point cloud but a LAScatalog pointing to the newly created files.
```{r}
opt_output_files(ctg_subset) <- "./outputs/lasnormalized_{ORIGINALFILENAME}"
summary(ctg_subset)
las_subset_normalized <- lasnormalize(ctg_subset, tin())
plot(ctg_subset, mapview = TRUE, map.type = "Esri.WorldImagery")
```

You can set more options on how a catalog is going to be used when you run some function on it. By default, the computations are run on one full file at a time, but you can also specify a square chunk as a computation unit.
```{r}
opt_chunk_size(ctg_subset) <- 100
opt_output_files(ctg_subset) <- "./outputs/lasnormalized_{XLEFT}_{YBOTTOM}_{ID}"
summary(ctg_subset)
las_subset_normalized <- lasnormalize(ctg_subset, tin())
plot(las_subset_normalized, mapview = TRUE, map.type = "Esri.WorldImagery")
```

To set the processing to the file level (one file at a time) set the chunk size to zero:
```{r, fig.show='hold'}
opt_chunk_size(ctg_subset) <- 0
plot(ctg_subset, chunk = TRUE)
summary(ctg_subset)
```


## Creating a Digital Terrain Model as a parallel job in Puhti
lidR catalogs can be set to use a certain number of cpus using the `opt_cores(ctg)`. To assign 16 cpus to your options (it will set the maximun number that your system has):
```r
opt_cores(ctg_subset) <- 16
```

The way the catalog work is such that only one task is performed at a time but you can set the number of cores that will be used to run that task.

Let's repeat the previous exercise but runing the same DTM computation making use of the catalog object.

Open the file `simple_catalog_lidR.R` and study it. It includes some parts that relate to the batch job you will need to send a request to Puhti.

Open also the batch job file `simple_catalog_lidR_batchjob.sh`, it includes some explanations on the contents.

Once you are sure all the parameters are correctly set, open a terminal to Puhti (using Putty or you can also open it from NoMachine). Go to the project folder and send your batch job with:
```bash
sbatch simple_catalog_lidR_batchjob.sh
```
The input data in those scripts is the same as before `las_files.txt`.

The results are written to the `./batch_output` folder, check also the batch jobs' output and error text files.