Skip to content

Commit e102738

Browse files
Merge pull request #55 from IGNF/dev
Integration of Entropy in decision process
2 parents 27d048d + e23043b commit e102738

22 files changed

+271
-2529
lines changed

.github/workflows/cicd.yaml

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -25,17 +25,28 @@ jobs:
2525
run: docker run lidar_prod_im pytest --ignore=actions-runner --ignore="notebooks"
2626

2727
- name: Full module run on LAS subset
28-
run: docker run -v /var/data/cicd/CICD_github_assets:/CICD_github_assets lidar_prod_im
29-
30-
- name: Evaluate decisions using optimization code on a single, corrected LAS
3128
run: >
32-
docker run -v /var/data/cicd/CICD_github_assets:/CICD_github_assets lidar_prod_im
33-
python lidar_prod/run.py print_config=true +task='optimize'
29+
docker run
30+
-v /var/data/cicd/CICD_github_assets/M8.4/inputs/:/inputs/
31+
-v /var/data/cicd/CICD_github_assets/M8.4/outputs/:/outputs/ lidar_prod_im
32+
python lidar_prod/run.py
33+
print_config=true
34+
paths.src_las=/inputs/730000_6360000.subset.prototype_format202.las
35+
paths.output_dir=/outputs/
36+
37+
- name: Evaluate decisions using optimization task (debug mode, on a single, corrected LAS)
38+
run: >
39+
docker run
40+
-v /var/data/cicd/CICD_github_assets/M8.4/inputs/evaluation/:/inputs/
41+
-v /var/data/cicd/CICD_github_assets/M8.4/outputs/evaluation/:/outputs/ lidar_prod_im
42+
python lidar_prod/run.py
43+
print_config=true
44+
+task='optimize'
3445
+building_validation.optimization.debug=true
3546
building_validation.optimization.todo='prepare+evaluate+update'
36-
building_validation.optimization.paths.input_las_dir=/CICD_github_assets/M8.0/20220204_building_val_V0.0_model/20211001_buiding_val_val/
37-
building_validation.optimization.paths.results_output_dir=/CICD_github_assets/opti/
38-
building_validation.optimization.paths.building_validation_thresholds_pickle=/CICD_github_assets/M8.3B2V0.0/optimized_thresholds.pickle
47+
building_validation.optimization.paths.input_las_dir=/inputs/
48+
building_validation.optimization.paths.results_output_dir=/outputs/
49+
building_validation.optimization.paths.building_validation_thresholds_pickle=/inputs/optimized_thresholds.pickle
3950
4051
- name: clean the server for further uses
4152
if: always() # always do it, even if something failed

dockerfile renamed to Dockerfile

Lines changed: 8 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,6 @@ RUN apt-get update && apt-get upgrade -y && apt-get install -y \
1414
wget \
1515
git \
1616
postgis \
17-
pdal \
1817
libgl1-mesa-glx libegl1-mesa libxrandr2 libxrandr2 libxss1 libxcursor1 libxcomposite1 libasound2 libxi6 libxtst6 # package needed for anaconda
1918

2019
# install anaconda
@@ -38,17 +37,15 @@ SHELL ["conda", "run", "-n", "lidar_prod", "/bin/bash", "-c"]
3837
RUN echo "Make sure pdal is installed:"
3938
RUN python -c "import pdal"
4039

41-
# the entrypoint garanty that all command will be runned in the conda environment
42-
ENTRYPOINT ["conda", \
43-
"run", \
44-
"-n", \
40+
# the entrypoint garanties that all command will be runned in the conda environment
41+
ENTRYPOINT ["conda", \
42+
"run", \
43+
"-n", \
4544
"lidar_prod"]
4645

4746
# cmd for a normal run (non evaluate)
48-
CMD ["python", \
49-
"lidar_prod/run.py", \
47+
CMD ["python", \
48+
"lidar_prod/run.py", \
5049
"print_config=true", \
51-
"paths.src_las=/CICD_github_assets/M8.0/20220204_building_val_V0.0_model/subsets/871000_6617000_subset_with_probas.las", \
52-
"paths.output_dir=/CICD_github_assets/app/", \
53-
"data_format.codes.building.candidates=[202]", \
54-
"building_validation.application.building_validation_thresholds_pickle=/CICD_github_assets/M8.3B2V0.0/optimized_thresholds.pickle"]
50+
"paths.src_las=your_las.las", \
51+
"paths.output_dir=./path/to/outputs/"]

README.md

Lines changed: 19 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -35,23 +35,29 @@ Goal: Confirm or refute groups of candidate building points when possible, mark
3535

3636
1) Clustering of _candidate buildings points_ into connected components.
3737
2) Point-level decision
38-
1) Decision at the point-level based on probabilities : `confirmed` if p>=`C1` / `refuted` if (1-p)>=`R1`
39-
2) Identification of points that are `overlayed` by a building vector from the database.
38+
1) Identification of points with ambiguous probability: `high entropy` if entropy $\geq$ E1
39+
2) Identification of points that are `overlayed` by a building vector from the database.
40+
3) Decision at the point-level based on probabilities :
41+
1) `confirmed` if:
42+
1) p$\geq$`C1`, or
43+
2) `overlayed` and p$\geq$ (`C1` * `Cr`), where `Cr` is a relaxation factor that reduces the confidence we require to confirm when a point overlayed by a building vector.
44+
2) `refuted` if (1-p)$\geq$`R1`
4045
3) Group-level decision :
41-
1) Confirmation: if proportion of `confirmed` points >= `C2` OR if proportion of `overlayed` points >= `O1`
42-
2) Refutation: if proportion of `refuted` points >= `R2` AND proportion of `overlayed` points < `O1`
43-
3) Uncertainty: elsewise.
46+
1) Uncertain due to high entropy: if proportion of `high entropy` points $\geq$ `E2`
47+
2) Confirmation: if proportion of `confirmed` points $\geq$ `C2` OR if proportion of `overlayed` points $\geq$ `O1`
48+
3) Refutation: if proportion of `refuted` points $\geq$ `R2` AND proportion of `overlayed` points < `O1`
49+
4) Uncertainty: elsewise (this is a safeguard: uncertain groups are supposed to be already captured via their entropy)
4450
4) Update of the point cloud classification
4551

46-
Decision thresholds `C1`, `C2`, `R1`, `R2`, `O1` are chosen via a multi-objective hyperparameter optimization that aims to maximize automation, precision, and recall of the decisions. Right now we have automation=90%, precision=98%, recall=98% on a validation dataset. Illustration comes from older version.
52+
Decision thresholds `E1`, `E2` , `C1`, `C2`, `R1`, `R2`, `O1` are chosen via a multi-objective hyperparameter optimization that aims to maximize automation, precision, and recall of the decisions. Right now we have automation=91%, precision=98.5%, recall=98.1% on a validation dataset. Illustration comes from older version.
4753

4854
![](assets/img/LidarBati-BuildingValidationM7.1V2.0.png)
4955

5056
#### B) Building Completion
5157

5258
Goal: Confirm points that were too isolated to make up a group but have high-enough probability nevertheless (e.g. walls)
5359

54-
Identify _candidate buildings points_ that have not been clustered in previous step due AND have high enough probability (p>=0.5)).
60+
Among _candidate buildings points_ that have not been clustered in previous step due, identify those which nevertheless meet the requirement to be `confirmed`.
5561
Cluster them together with previously confirmed building points in a relaxed, vertical fashion (higher tolerance, XY plan).
5662
For each cluster, if some points were confirmed, the others are considered to belong to the same building, and are
5763
therefore confirmed as well.
@@ -63,7 +69,9 @@ therefore confirmed as well.
6369

6470
Goal: Highlight potential buildings that were missed by the rule-based algorithm, for human inspection.
6571

66-
Clustering of points that have a probability of beind a building p>=`C1` AND are **not** _candidate buildings points_. This clustering defines a LAS extra dimensions (default name `Group`).
72+
Among points that were **not** _candidate buildings points_ identify those which meet the requirement to be `confirmed`, and cluster them.
73+
74+
This clustering defines a LAS extra dimensions (`Group`) which indexes newly found cluster that may be some missed buildings.
6775

6876
![](assets/img/LidarBati-BuildingIdentification.png)
6977

@@ -100,7 +108,7 @@ To run the module from anywhere, you can install as a package in a your virtual
100108
conda activate lidar_prod
101109

102110
# install the package
103-
pip install --upgrade https://github.com/IGNF/lidar-prod-quality-control/tarball/main # from github directly
111+
pip install --upgrade https://github.com/IGNF/lidar-prod-quality-control/tarball/prod # from github directly, using production branch
104112
pip install -e . # from local sources
105113
```
106114

@@ -153,13 +161,12 @@ conda activate lidar_prod
153161
python lidar_prod/run.py +task=optimize building_validation.optimization.todo='prepare+evaluate+update' building_validation.optimization.paths.input_las_dir=[path/to/labelled/test/dataset/] building_validation.optimization.paths.results_output_dir=[path/to/save/results] building_validation.optimization.paths.building_validation_thresholds_pickle=[path/to/optimized_thresholds.pickle]
154162
```
155163

156-
### CICD, Releases and versions
164+
### CICD and versions
157165

158166
New features are staged in the `dev` branch, and CICD workflow is run when a pull requets to merge is created.
159167
In Actions, check the output of a full evaluation on a single LAS to spot potential regression. The app is also run
160168
on a subset of a LAS, which can be visually inspected before merging - there can always be surprises.
161169

162170
Package version follows semantic versionning conventions and is defined in `setup.py`.
163171

164-
Releases are generated when new high-level functionnality are implemented (e.g. a new step in the production process) or
165-
when key parameters are changed. Generally speaking, the latest release `Vx.y.z` is the one to use in production.
172+
Releases are generated when new high-level functionnality are implemented (e.g. a new step in the production process), with a documentation role. Production-ready code is fast-forwarded in the `prod` branch when needed.

bash/setup_environment/requirements.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@ dependencies:
1111
- isort # import sorting
1212
- flake8 # code analysis
1313
# --------- geo --------- #
14-
- conda-forge:python-pdal
14+
- conda-forge:pdal==2.3.*
15+
- conda-forge:python-pdal==3.0.*
1516
- conda-forge:laspy==2.1.*
1617
- numpy
1718
- scikit-learn

configs/building_validation/application/default.yaml

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,11 @@ bd_uni_request:
2222

2323
# TODO: update min_frac_confirmation_factor_if_bd_uni_overlay and others after optimization...
2424
thresholds:
25-
min_confidence_confirmation: 0.697
26-
min_frac_confirmation: 0.384
27-
min_frac_confirmation_factor_if_bd_uni_overlay: 0.808
28-
min_uni_db_overlay_frac: 0.508
29-
min_confidence_refutation: 0.973
30-
min_frac_refutation: 0.285
25+
min_confidence_confirmation: 0.6400365762003571 # min proba to validate a point
26+
min_frac_confirmation: 0.779844069887882 # min fractin of confirmed points per group for confirmation
27+
min_frac_confirmation_factor_if_bd_uni_overlay: 0.5894477997785892 # relaxation factor to min proba when point is under BDUni vector
28+
min_uni_db_overlay_frac: 0.5041941489707767 # min fraction of points under BDUni vector per group for confirmation
29+
min_confidence_refutation: 0.7477148092712739 # min proba to refute a point
30+
min_frac_refutation: 0.7979734453001499 # min fractin of refuted points per group for confirmation
31+
min_entropy_uncertainty: 0.884546947499147 # min entropy to flag a point as uncertain
32+
min_frac_entropy_uncertain: 0.7271206406484895 # min fractin of uncertain points (based on entropy) per group to flag as uncertain

configs/building_validation/optimization/default.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -33,10 +33,10 @@ study:
3333
directions: ["maximize","maximize","maximize"]
3434
sampler:
3535
_target_: optuna.samplers.NSGAIISampler
36-
population_size: 30
36+
population_size: 50
3737
mutation_prob: 0.25
38-
crossover_prob: 0.8
39-
swapping_prob: 0.5
38+
crossover_prob: 0.1
39+
swapping_prob: 0.1
4040
seed: 12345
4141
constraints_func:
4242
_target_: functools.partial

configs/data_format/cleaning/default.yaml

Lines changed: 0 additions & 7 deletions
This file was deleted.

configs/data_format/default.yaml

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -5,35 +5,50 @@ crs: 2154
55
# Those names connect the logics between successive tasks
66
las_dimensions:
77
# input
8-
classification: classification #las format
8+
classification: classification # las format
9+
10+
# Extra dims
11+
# ATTENTION: If extra dimensions are added, you may want to add them in cleaning.in parameter as well.
912
ai_building_proba: building # user-defined - output by deep learning model
13+
entropy: entropy # user-defined - output by deep learning model
1014

11-
# intermediary channels
15+
# Intermediary channels
1216
cluster_id: ClusterID # pdal-defined -> created by clustering operations
1317
uni_db_overlay: BDTopoOverlay # user-defined -> a 0/1 flag for presence of a BDUni vector
1418
candidate_buildings_flag: F_CandidateB # -> a 0/1 flag identifying candidate buildings found by rules-based classification
1519
ClusterID_candidate_building: CID_CandidateB # -> Cluster index from BuildingValidator, 0 if no cluster, 1-n elsewise
1620
ClusterID_isolated_plus_confirmed: CID_IsolatedOrConfirmed # -> Cluster index from BuildingCompletor, 0 if no cluster, 1-n elsewise
1721

18-
19-
# additionnal output channel
22+
# Additionnal output channel
2023
ai_building_identified: Group
2124

25+
cleaning:
26+
input:
27+
_target_: lidar_prod.tasks.cleaning.Cleaner
28+
extra_dims:
29+
- "${data_format.las_dimensions.ai_building_proba}=float"
30+
- "${data_format.las_dimensions.entropy}=float"
31+
output:
32+
# Extra dims that are kept when cleaning dimensions
33+
# You can override with "all" to keep all extra dimensions at development time.
34+
_target_: lidar_prod.tasks.cleaning.Cleaner
35+
extra_dims:
36+
- "${data_format.las_dimensions.ai_building_identified}=uint"
37+
- "${data_format.las_dimensions.ai_building_proba}=float"
38+
2239
codes:
2340
building:
2441
candidates: [202] # found by rules-based classification (TerraScan)
2542
detailed: # used for detailed output when doing threshold optimization
43+
unsure_by_entropy: 200 # unsure (based on entropy)
2644
unclustered: 202 # refuted
2745
ia_refuted: 110 # refuted
28-
ia_refuted_and_db_overlayed: 111 # unsure
29-
both_unsure: 112 # unsure
46+
ia_refuted_but_under_db_uni: 111 # unsure
47+
both_unsure: 112 # unsure (elsewise)
3048
ia_confirmed_only: 113 # confirmed
3149
db_overlayed_only: 114 # confirmed
3250
both_confirmed: 115 # confirmed
3351
final: # used at the end of the building process
3452
unsure: 214 # unsure
3553
not_building: 208 # refuted
36-
building: 6 # confirmed
37-
38-
defaults:
39-
- cleaning: default.yaml
54+
building: 6 # confirmed

flake8_output.txt

Lines changed: 0 additions & 42 deletions
This file was deleted.

lidar_prod/application.py

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -29,23 +29,31 @@ def apply(config: DictConfig):
2929
3030
"""
3131
assert os.path.exists(config.paths.src_las)
32-
in_f = config.paths.src_las
33-
out_f = osp.join(config.paths.output_dir, osp.basename(in_f))
32+
IN_F = config.paths.src_las
33+
OUF_F = osp.join(config.paths.output_dir, osp.basename(IN_F))
3434

3535
with TemporaryDirectory() as td:
3636
# Temporary LAS file for intermediary results.
37-
temp_f = osp.join(td, osp.basename(in_f))
37+
temp_f = osp.join(td, osp.basename(IN_F))
3838

39+
# Removes unnecessary input dimensions to reduce memory usage
40+
cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning.input)
41+
cl.run(IN_F, temp_f)
42+
43+
# Validate buildings (unsure/confirmed/refuted) on a per-group basis.
3944
bv: BuildingValidator = hydra.utils.instantiate(
4045
config.building_validation.application
4146
)
42-
bv.run(in_f, temp_f)
47+
bv.run(temp_f, temp_f)
4348

49+
# Complete buildings with non-candidates that were nevertheless confirmed
4450
bc: BuildingCompletor = hydra.utils.instantiate(config.building_completion)
4551
bc.run(temp_f, temp_f)
4652

53+
# Define groups of confirmed building points among non-candidates
4754
bi: BuildingIdentifier = hydra.utils.instantiate(config.building_identification)
4855
bi.run(temp_f, temp_f)
4956

50-
cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning)
51-
cl.run(temp_f, out_f)
57+
# Remove unnecessary intermediary dimensions
58+
cl: Cleaner = hydra.utils.instantiate(config.data_format.cleaning.output)
59+
cl.run(temp_f, OUF_F)

0 commit comments

Comments
 (0)