geocomputing/R/R_LiDAR/R_lidar_course_exercises/session3.2-lidR-LAS-class.Rmd at master · csc-training/geocomputing · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
---
title: "The lidR package - LAS formal class"
author: "Eduardo González"
date: "23. February 2019"
output:
  html_document:
    df_print: paged
    theme: flatly
---

This materials have been adapted from the official [lidR wiki documentation](https://github.com/Jean-Romain/lidR/wiki) and [A Brief Introduction of lidR](http://xzsunbest.tk/2018/07/30/ABriefIntroductionOfLidR/).

# The lidR package

lidr is an R package for Airborne LiDAR Data Manipulation and Visualization for Forestry Applications.

The lidR package provides functions to read and write `.las` and `.laz` files, plot point clouds, compute metrics using an area-based approach, compute digital canopy models, thin lidar data, manage a catalog of datasets, automatically extract ground inventories, process a set of tiles using multicore processing, individual tree segmentation, classify data from geographic data, and provides other tools to manipulate LiDAR data in a research and development context.

In CSC's Puhti the current lidR version is 2.1.4 and can be loaded with `module load r-env` along with many other R packages and GIS software.
```{r results='hide' }
library("lidR")
```


## Build a LAS object reading a las file

The function `readLAS` reads one or several .las or .laz file(s) to build a LAS object.

```{r}
LASfile <- system.file("extdata", "Megaplot.laz", package="lidR")
las_example <- readLAS(LASfile)
print(las_example)
```


In the same way you can read any LAS or LAZ files. You can find the lidar data provided by the National Land Survey of Finland/Maanmittauslaitos (NLSF/MML) ready to use in Puhti. For more information about this dataset (and other GIS datasets) see: [GIS data in Puhti](https://research.csc.fi/gis_data_in_csc_computing_env).

Once you know the path to a .las or .laz file in Puhti, you can open it with (note that the files are often over 100 Mb in size and will take a bit to load):
```{r  }
LASfile <- "/appl/data/geo/mml/laserkeilaus/2008_latest/2019/L331/1/L3313F3.laz"
las <- readLAS(LASfile)
print(las)
```

Notice that the memory use of the `las` object in R is 3Gb. For the rest of the exercise let's get a smaller version use one of lidR cliping functions:
```{r}
# Create sample from bbox
# lasclipRectangle(las, xleft, ybottom, xright, ytop)
las_small <- lasclipRectangle(las, las@bbox[1], las@bbox[2], las@bbox[1] + 500, las@bbox[2] + 500)
plot(las_small)
```

The clipped object covers 500 x 500 meters of the original 3000 x 3000 meters and needs only 100 Mb. We will use this LAS object for the rest of this exercise.


## Basic structure of a LAS object

You can verify the class of the object you just created with:
```{r }
class(las_small)
```

And the type of object ant its slots with:
```{r}
typeof(las_small)
slotNames(las_small)
```


A `LAS` object is composed of four slots: `@data`, `@header`, `@proj4string` and `@bbox`, and inherits `Spatial` from package `sp`.

Note that if you have an older version of lidR the list of slots  may be different. Check for your package version and try installing a different version if necessary (version 2.0.1. was used to create this exercise).

### @data: the point cloud

The slot `data` of a LAS object contains a `data.table` with the data read from .las or .laz file(s). The columns of the table are named after the [LAS specification](http://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf) version 1.4. Each name is reserved and is associated with a given type:

- `X` `Y` `Z` (dbl)
- `Intensity` (int)
- `gpstime` (dbl)
- `ReturnNumber` (int)
- `NumberOfReturns` (int)
- `ScanDirectionFlag` (int)
- `EdgeOfFlightline`(int)
- `Classification` (int)
- `Synthetic_flag` (bool)
- `Withheld_flag` (bool)
- `Keypoint_flag` (bool)
- `ScanAngle` (int)
- `UserData` (int)
- `PointSourceID` (int)
- `R` `G` `B` (int)
- `NIR` (int)

Here we can already see some deviations from the official las format specifications. For example, the attribute 'Classification' should be an `unsigned char` stored on 8 bits. However, the R language does not support this data type and consequently this attribute is stored in a 32-bit signed `int`. One can read the official las specifications to figure out the other deviations from the original file format induced by the fact that R only has 32-bit signed integers and 64-bit signed decimal numbers.

You can check this type of information from your data with commands like:
```{r}
names(las_small@data)
sapply(las_small@data, typeof)
sapply(las_small@data, class)
```


### @header: the header

A `LAS` object contains a slot `@header` that represents the header of the las file. The header is stored in a `LASheader`  object. A `LASheader` object contains two slots: `@PHB` for the public header block and `@VLR` for the variable length records. Both slots are lists labeled according to the las file format specification. See [public documentation of las file format](http://www.asprs.org/wp-content/uploads/2010/12/LAS_1_4_r13.pdf) for more information about las headers. Users should never normally have to worry about the header as long as they use functions from lidR. Everything is managed internally to ensure that objects are valid. However, users still need to know that the contents of the header are important, especially when writing `LAS` objects into las or laz files.

```{r}
las_small@header
```

### @proj4string: the CRS

The slot `@proj4string` is inherited from the `Spatial` class from the `sp` package. It is a `CRS` object that stores the coordinate reference system (CRS) of the las file. In the official las specifications the CRS is stored in the header. In a LAS object the CRS is stored in the header using the EPSG code of the CRS, but it is also stored in the slot `@proj4string`. This is to ensure it meets R standards and is in accordance with other spatial data packages in the R ecosystem. Consequently, to get a valid LAS object properly written into a las file it is important to set the CRS using the function `epsg()`. This function updates the header of the LAS object **and** the proj4string, while functions such as `raster::projection()` or `sp::proj4string` only update the slot `@proj4string`.

```{r}
epsg(las_small)
projection(las_small)
```

### @bbox: the bounding box

The slot `@bbox` is inherited from the `Spatial` class from `sp`. It is a `matrix` object that stores the XY bounding box of the point cloud. In the official las specifications the bounding box is stored in the header. In a `LAS` object the bounding box is stored both in the header and also stored in the slot `@bbox` (to be in compliance with R standards and other spatial data R packages). The user should never change the bounding box manually. However, doing that will have few consequences because this slot is of little practical use.

```{r}
las_small@bbox
```

## Validation of LAS objects

It is common that users report bugs arising from the fact that a point cloud is invalid. This is why we introduced the function `lascheck` to perform a deep inspection of LAS objects. This function checks if a LAS object is in accordance with the las specifications but also it checks for weird point clouds that could be valid with respect to the specifications but invalid for actual processing. For example, it often happens that a las file contains duplicated points for no valid reason. This may lead to trees being detected twice, to invalid metrics, or to errors in DTM generation, and so on...

```{r}
lascheck(las_small)
```

## Display a LAS object

lidR provides a simple `plot` function to plot a LAS object in 3D. It is based on the `rgl` package. The `rgl` package is amazing but has some problems working with large point clouds.

```{r}
plot(las_small)
```

lidR has [its own viewer](https://github.com/Jean-Romain/PointCloudViewer) to overcome this issue. This viewer is fully compatible with `lidR` but still in heavy development.

```{r}
plot(las_small, backend = "pcv")
```

## Memory considerations

This section is of major importance because there are many instances where R is weak at memory management.

Firstly, it is important to note that R only enables manipulation of 32-bit integers and 64-bit decimal numbers. But the las specification states, for example, that the intensity is stored on 16 bits (see previous sections). When read in R it must be converted to 32 bits and therefore will use twice as much memory than is needed. Worse, the return numbers are stored on 3 bits in las files but 32 bits in R, therefore using 11 times more memory than is required. Last but not least, flags are stored on 1 bit, whereas R uses 32 bits. This is 32 times more memory than is needed. As a consequence, a LAS object is 2 to 3 times larger than it needs to be.

Secondly, the way the point cloud is stored and the way R works implies that copies will be made of the point cloud either in the user's workspace or internally. Considering that point clouds can be huge it is important to be aware of this point.

There is more detail about memory considerations in the last part of this materials.

## Creating a Digital Terrain Model as an array job in Puhti
lidR has several tools and algorithms to manipulate and process lidar data. You can see them in the [Man pages for lidR](https://rdrr.io/github/Jean-Romain/lidR/man/).

The `grid_terrain`function creates a DTM by interpolating ground points. You can use three algorithms for this task `nnidw`, `tin` and `riging`, in the example below we  use the `tin` algorithm. The result is an R RasterLayer that you can manipulate in R or save it to a file whenever needed.

```{r}
dtm <- grid_terrain(las_small, algorithm = tin())
writeRaster(dtm, paste0("./outputs/", "dtm_las_small.tif"), format="GTiff", overwrite=TRUE)
```

Other functions work in a similar way. Look at the documentation and test the ones you find most useful.


Let's use the `grid_terrain` function to run an R script that we can run as an array job in Puhti. An array job is meant to run the same script on multiple files given for ex. as a list of files in a text file.

Array jobs need a list of filepaths to the LAZ files. Example of this kind of file is the file las_files.txt

Open the file `simple_lidR.R` and study it. It includes some parts that relate to the batch job you will need to send a request to Puhti.

Open also the Puhti batch job file `simple_lidR_arrayjob.sh`, it includes some explanations on the contents.

Once you are sure all the parameters are correctly set, **open a terminal connection to Puhti** (using Putty or you can also open it from NoMachine). Go to the project folder and send your batch job with:
```bash
sbatch simple_lidR_arrayjob.sh
```

The results are written to the `array_output` folder, check also the batch jobs' output and error text files.

## Additional exercise: Segment individual trees and compute metrics
This short exercise is adapted from the [Segment individual trees and compute metrics](https://github.com/Jean-Romain/lidR/wiki/Segment-individual-trees-and-compute-metrics) example by Jean-Romain. We will use our own dataset though.

### Classify ground points
`lasground` provides a several algorithm to classify ground points. Here we use the csf algorithm because it works well without need to tune the parameters.

```{r}
las_small <- lasground(las_small, csf())
plot(las_small, color = "Classification")
```

### Height normalize the dataset

We need to set the ground at 0. We could subtract the DTM to obtain ground points at 0 but here we won't use a DTM but we will rather interpolate each point exactly.

It is important to notice here that neither the ground classification nor the DTM interpolation where performed using a buffer around the region of interest.

```{r}
las_small <- lasnormalize(las_small, tin())
plot(las_small)
```

### Tree segmentation

There are several methods to segment the tree in lidR the following will use a watershed, that is far to be the best, but is good enough for this easy to segment example.

### Compute a canopy height model

In the next steps we will use an algorithm that requires an canopy height model. This step can be skipped if you chose an algorithm that performed the segmentation at the point cloud level. So, let's compute a digital surface model with the pit-free algorithm (see also canopy height models in lidR).

```{r}
algo = pitfree(thresholds = c(0,10,20,30,40,50), subcircle = 0.2)
chm  = grid_canopy(las_small, 0.5, algo)
plot(chm, col = height.colors(50))
```

Optionally we can smooth this CHM using the raster package

```{r}
# smoothing post-process (e.g. two pass, 3x3 median convolution)
ker <- matrix(1,3,3)
chm <- focal(chm, w = ker, fun = median)
chm <- focal(chm, w = ker, fun = median)

plot(chm, col = height.colors(50)) # check the image
```

### Segment the trees

The segmentation can be achieved with lastrees. Here I chose the watershed algorithm with a threshold of 4 meters. The point cloud has been updated and each point now has a number that refers to an individual tree (treeID). Points that not trees are assigned the id value NA.

```{r}
algo <- watershed(chm, th = 4)
las_small  <- lastrees(las_small, algo)

# remove points that are not assigned to a tree
trees <- lasfilter(las_small, !is.na(treeID))

plot(trees, color = "treeID", colorPalette = pastel.colors(100))
```

### Compute some metrics and hulls
```{r}
metric <- tree_metrics(las_small, .stdtreemetrics)
hulls  <- tree_hulls(las_small)
hulls@data <- dplyr::left_join(hulls@data, metric@data)

spplot(hulls, "Z")
```

### Deal with a raster

In the previous example, even if the segmentation is done using a canopy height model, the classification has been made on the point cloud. This is because lidR is a point cloud oriented library. But one may want to get the raster to work with rasters. In that case the function watershed can be used standalone:

```{r}
crowns <- watershed(chm, th = 4)()
plot(crowns, col = pastel.colors(100))
```

Once you are working with rasters the lidR package is not implied anymore. User can rely on the raster package for further analysis. For example:

```{r}
contour <- rasterToPolygons(crowns, dissolve = TRUE)

plot(chm, col = height.colors(50))
plot(contour, add = T)
```


# Some extra things to try at home

## Memory considerations - continuation

### Deep copies

Let's assume we have loaded a large las file that uses 1 GB of R memory.

```r
las.original <- readLAS("big_file.las")
```

Suppose we now want to remove a few outliers above 50 m. One can write the following:

```r
las.denoised <- lasfilter(las.original, Z < 50)
```

And the user now has two objects:

- `las.original` of size 1 GB
- `las.denoised` that is also 1 GB, because we only removed a dozen or so points out of millions.

This uses 2 GB of memory. This is how R works. When a vector is subsetted it is necessarily copied. We talk about **deep copies**. In regular data processing it rarely matters and this behavior is barely noticeable. Indeed, it is rare that data uses a lot of memory. But LiDAR datasets are often massive, and this necessitates that users must carefully consider memory usage to avoid running out of RAM.

### Shallow copies

In the previous example we showed a deep copy. A deep copy means that the point cloud is actually copied into the memory. A deep copy occurs when the number of points of the output is different from the number of points of the input. But many functions return the same number of point as the input. In such cases only **shallow copies** are made. For example, when classifying points into ground and non-ground:

```r
las.classified <- lasground(las.original, csf())
```

In this case the vectors that store the X Y Z coordinates as well as those that store the Intensity, ReturnNumber, NumberOfReturn and other attributes were not modified by the function. Only the contents of the 'Classification' attribute were modified. In this case `las.classified` and `las.original`, even though they are two different objects, share the same memory for X Y Z, and so on, but the attributes 'Classification' are different. In conclusion:

- `las.original` is of size 1 GB
- `las.classified` is also 1 GB.

But both together they are not equal to 2 GB, but ~1.1 GB because they share the same memory. The content of the original LAS object was shallow copied. An understanding of the concepts of deep and shallow copies is important for optimizing your scripts.

As we have seen, because of the way R is designed, lidR uses a large amount of memory anyway. To deal with this limitation `readLAS` has two optimizations: the parameter `select` and the parameter `filter`.

### Parameter `select`

To save memory only useful data can be loaded. `readLAS` can take an optional parameter `select` which enables the user to selectively load the data of interest. For example, one can load only the `X Y Z` fields. This selection is done at the C++ level while reading and is memory-optimized.

```r
las <- readLAS("file", select = "xyz")
las <- readLAS("file", select = "xyzi")
las <- readLAS("file", select = "* -i -u") # Negation works too
```

### Parameter `filter`

While `select` enables the user to select "columns" (or attributes) while reading files, `filter` allows selection of "rows" (or points) while reading. Again, the selection is done at the C++ level and is memory-optimized so not a single bit is lost at the R level. Removing data at reading time that is superfluous for your purposes saves memory and decreases computation time.

```r
las <- readLAS("file", filter = "-keep_first")
las <- readLAS("file", select = "xyzi", filter = "-keep_first -drop_z_below 5 -drop_z_above 50")
```

## Allowed and non-allowed manipulation of a LAS object

R users who are used to manipulating spatial data are likely to be very familiar with the `sp` package and all the classes used to store spatial data, such as `SpatialPointsDataFrame`, `SpatialPolygonsDataFrame`, and so on. The data contained in these classes are freely modifiable by the user because they can be of any type. A `LAS` object is not freely modifiable because it is a strongly standardized representation of a las file.

For example, users cannot replace the `Classification` attribute with the value `0` because `0` is a decimal number in R and the 'Classification' attribute is an integer. The following throws an error:

```{r, error = TRUE, purl = FALSE}
las_small$Classification <- 0
```

In R `0L` is an integer and thus the following is allowed:

```{r}
las_small$Classification <- 0L
```

It would be possible to automatically cast the input into the correct type without throwing an error. But for the lidR package we chose to be very pedantic on this point to avoid any potential problems and because we would prefer users to be careful about the content of their data.

The addition of a new column is also restricted. For example, one may want to add an attribute `R` corresponding to the red channel.
```{r, error = TRUE, purl = FALSE}
las_small$R <- 0
```

This is not allowed because a LAS object should always be valid. By allowing the user to add an R column the LAS object would no longer be valid for two reasons:

1. `R` is a reserved name of the core attributes and must be an integer. In the example above it is a decimal number.
2. A LAS file with RGB attributes is of type 3, 7 or 8. As a result the header must be updated, but in the previous example it is not.

In consequence, adding a column must be done via the functions `lasadddata` or `lasaddextrabytes`. This way users are forced to read the documentation of these two functions. And yet some restrictions are still in place. For example, the following is not allowed for the same reasons as above:

```{r, error = TRUE, purl = FALSE}
las_small <- lasadddata(las_small, 0, "R")
```

But anyway, R being R there is no way to completely restrict editing of objects. Users can always by-pass the restrictions to make LAS objects that are not strictly valid:

```{r}
las_small@data$R <- 0
```

```{r, echo = FALSE}
las_small@data$R <- NULL
```

In conclusion, a LAS object is not actually immutable but at least there are some restrictions to ensure that the user is aware that not everything is authorized.

## Extra attributes and extra bytes in a LAS object

As we have seen, a LAS object contains a core of attributes associated with reserved names in accordance with the las specifications. It is possible, however, to add more attributes to a LAS object even if they are not part of the core attributes imposed by the las specifications.

### Extra attributes

Extra attributes are just like adding a column in a regular table in R. One can freely modify the data using the function `lasadddata`. It is thus possible to add an attribute to a LAS object. For example, it is possible to attribute an ID to each point and use this value in subsequent code:

```{r}
las_small  <- lasadddata(las_small, 1:1468843, "ID")
las_small2 <- lasfilter(las_small, ID > 1000000)
plot(las_small2)
```

But it is important to understand that this attribute is invalid with respect to the las specifications. Thus it can be used at the R level but will not be written in a las file and thus will be lost at write time. Depending on the purpose of this attribute it may or may not be useful to be able to write this extra data. Most of the time the information is only useful at the R level but sometimes it might be appropriate to store the data in a file.

### Extra bytes attributes

The las specifications allow for storing extra attributes that are not part of the core attributes. but the way to do this is more complex. Basically it is called extra bytes attributes and it implies modification of the LAS object header to indicate that the contents of the file contains more than the core attributes. This is abstracted with the function `lasaddextrabytes`.

```{r}
las_small  <- lasaddextrabytes(las_small, 1:1468843, "ID", "An ID for each point")
```

Using this function, the header is updated according to the las specification and thus the extra bytes attributes can be written in the file. lidR supports up to 10 extra bytes attributes. The extra bytes attributes are limited to being of type numeric. Indeed, the las specifications do not allow for storing extra bytes attributes of type string or type boolean. Thus the following fails:

```{r, error = TRUE, purl = FALSE}
abc  <- sample(letters, 1468843, replace = TRUE)
las_small  <- lasaddextrabytes(las_small, abc, "ID", "An ID for each point")
```