This adds proof of concept LI flash geometry as propose by ESSL as a composite including some writers #3150

BENR0 · 2025-06-16T12:12:43Z

This implements the LI flash geometry product as proposed by ESSl as a composite so it can easily produces with Satpy.

This is a proof of concept PR. As discussed during the spring PCW this is not really a composite but fits more in line with a concept of a "processor". Currently this is still implemented under composites just because for a proof of concept it did not make sense to add a "processor" loading similiar to the composites to Satpy yet.

Things to note:

There are multiple things to still discuss including:
- How do we proceed with this new concept of "features" as geometries. Is it a good idea to introduce a "processor" concept as a more general approach to modify data which does not fit into a composite output?
- What does it implicate for other functionality possibly needed/nice to have if Satpy will be able to process/use geometries? Currently to comply with Satpy DatyID and Dataset access methods this PR introduces a bare bones geometry container just wrapping a geopandas geodataframe.

I will add some documentation later on, so just a few notes on how to use this in the mean time.
Together with the LI LGR files you can just load "acc_flash_geometry" and you will get the flash geometry product.
With the feature writer you can save this as a sqlite file or as geojson by just using the writer together with the corresponding file extension.

The flash geometry algorithm implemented here is already including thresholding with a default of 10km (currently also used by ESSL) to mitigate the issue that sometimes unreasonable connections between groups are made. This threshold is adaptable by using the satpy config in order to be easily be played around with. Just include a new setting of "flash_geom_distance_treshold" under "composites" in the Satpy config or use a:

with satpy.config.set({"composites.flash_geom_distance_threshold": value}):

statement

Tests added
Fully documented

codecov · 2025-06-16T12:20:04Z

Codecov Report

❌ Patch coverage is 16.90141% with 118 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.10%. Comparing base (6602e1e) to head (d7fb9f3).
⚠️ Report is 57 commits behind head on main.

Files with missing lines	Patch %	Lines
satpy/composites/lightning.py	21.79%	61 Missing ⚠️
satpy/writers/oracle.py	2.17%	45 Missing ⚠️
satpy/writers/feature.py	40.00%	9 Missing ⚠️
satpy/vectorscene.py	0.00%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3150      +/-   ##
==========================================
- Coverage   96.30%   96.10%   -0.21%     
==========================================
  Files         463      466       +3     
  Lines       58200    58828     +628     
==========================================
+ Hits        56048    56535     +487     
- Misses       2152     2293     +141

Flag	Coverage Δ
behaviourtests	`3.60% <0.00%> (-0.05%)`	⬇️
unittests	`96.19% <16.90%> (-0.21%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coveralls · 2025-06-16T12:38:28Z

Pull Request Test Coverage Report for Build 15680470090

Details

24 of 139 (17.27%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.2%) to 96.2%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
satpy/writers/feature.py	6	15	40.0%
satpy/writers/oracle.py	1	46	2.17%
satpy/composites/lightning.py	17	78	21.79%

Totals
Change from base Build 15633405589:	-0.2%
Covered Lines:	56001
Relevant Lines:	58213

💛 - Coveralls

gerritholl

First small thoughts based on initial read-through. I know this is a draft PR so it's still being worked on. Biggest open tasks I see:

Is this dask-friendly or can it be made dask-friendly?
Unit tests.
Documentation.

gerritholl · 2025-06-17T06:29:39Z

satpy/composites/lightning.py

 import logging
+from typing import Optional, Sequence

+import geopandas as gpd


If this is an optional dependency, the import should probably be inside the processor implementation. Not sure how that would work for annotations, though.

gerritholl · 2025-06-17T06:30:29Z

satpy/composites/lightning.py

+import geopandas as gpd
 import numpy as np
+import pandas as pd
+import pyproj as pro


In other modules this is simply imported as pyproj.

gerritholl · 2025-06-17T06:38:24Z

satpy/composites/lightning.py

+        pandas.DataFrame
+
+    References:
+        [1] https://gist.github.com/emiliom/87a6aa137610bf59787868943b406e8f


Is this reference correct? It links to a gist on holoviz by Emilio Mayorga and does not seem to have anything to do with Pieter Groenemeijers flash geometry algorithm.

Indeed. I don't know how the wrong link got into this. I will try and find the correct one.

gerritholl · 2025-06-17T06:47:13Z

satpy/writers/feature.py

+        super().__init__(name, filename, base_dir,  **kwargs)
+
+    def save_dataset(self, dataset, filename=None, layer_name=None, **kwargs):
+        """Save the geodataframe to sqlite."""


It seems it writes either to sqlite or to JSON, correct?

gerritholl · 2025-06-17T06:54:47Z

satpy/writers/oracle.py

+def write_gdf_to_oracle(gdf, table_name, user, password, host, port, service_name):
+    """Write geodataframe to oracle database.
+
+    The connection data like user/pw/host/port/service_name are taken from the config.


I don't see that they are taken from the config, rather they are passed by the user to this function.

Yes that is correct. I had to refactor a little bit from our custom chain and did not adapt the documentation.

mraspaud · 2025-10-15T14:57:52Z

We just had a meeting on this topic, here are my notes:

Pytroll LI geometry meeting

Last PCW Ro started implementing it.
It's still a bit rough.

Should satpy support geometries?

No reason why not.

(Geometries could also be used to crop/filter data ?)

Right now there is a PR that hacks the geometries into satpy. It can be written to database.
The algorithm input is points (xarray.DataArray), and outputs multilines (geodataframe).

Windbarbs and Lightnings are another example of vector data that could use the same processing way.

Datatree (thought of a future internal data structure for the Scene) would not be compatible with a dataframe.

Ideally, the Scene should be split into container vs processor.

We could have a VectorScene as an alternative to a RasterScene.

If the readers know what kind of data they produce, we could decide what scene to use.

Can we use the existing scene and just add the vector functionality to it, and the scene would know when there is a need to rasterise.

As long as there is no need to combine geometries with rasters for now, we can start with a separate vectorscene object.

At the moment, the LI reader returns a "swath" which should really be a geometry: could we have a "get_vector_dataset" that the yaml reader (subclass?) could use?

Dataframes has experimental attributes.

We agree on using GeoDataFrame (check attrs), try this new container (VectorScene). Try vector product coming out directly from the reader.
We should not use the composite class for geometries.

A modifier is a good match for what we want to do with flash geometries, but it's not exactly the same. The logic of putting points together could also be a composite.

Discussion about composites vs algorithm that could potentially generate multiple products.

Start with the existing PR and try to adapt it.

pnuu · 2025-10-16T10:25:12Z

Some thoughts I also wrote partially to Slack follow.

What if the writer handled the geometry creation?

Create a writer called, lets say li_shapes_to_geojson that includes the conversion from LFL/LGR raster data to geometries and then saves the file as geojson.
Extract the shape/geometry creation code to a new module
Create more writers that use the same shape/geometry creation code but handle the data so that they can be saved to whichever database or format necessary

The usage would be straightforward

scn = Scene(...)
scn.load([something])
scn.save_dataset(something, filename="li.json", writer="li_shapes_to_geojson")

and for databases

...
# connection address, credentials, table, format string, and whatever else is needed
postgis_settings = {...}
scn.save_dataset(something, writer="li_shapes_to_postgis", writer_kwargs=postgis_settings)

To limit the data to a specific area there might be a need for some data reduction. The most simple to implement would be in the "compositor" that is used for "something" and the area definition given in the composite YAML definition. The more correct way would be to have a "resampler" that masks out pixels outside of the given area (a modified version of crop, I guess?) and the writer ignores NaNs in its handling.

Usage together with rasters in the same Scene? No idea 😅

I do something like this in my Trollflow2 plugin (see example config here) that writes point data to our PostGIS database. The data can be anything that can be described as point data, but I've used it for LI/LFL.

pnuu · 2025-10-16T10:48:24Z

Possibilities for the "use geometries with rasters loaded in the same Scene" case with the above approach that has no VectorScene and the geometries are handled as-is where needed:

add an option to the compositor to call the geometry creator and another "rasterizer"
- very hacky, confusing and duplicates things
add an enhancement that creates the geometries and then rasterizes the data
- to which area to rasterize?
- It should be the final desired output area and resolution without any additional resampling/scaling, but this might not be known at this stage depending on generate=False/True setting

pnuu · 2025-10-16T10:59:48Z

At the moment, the LI reader returns a "swath" which should really be a geometry: could we have a "get_vector_dataset" that the yaml reader (subclass?) could use?

If the reader for LFL and LGR data returned a geometry instead of swath, a lot of the more traditional processing capabilities would be lost, or made unnecessarily complicated, like creating heatmaps with resampler="bucket_sum".

pnuu · 2025-10-16T11:34:11Z

Pseudo-code for my approach would be something like:

class LightningPathGeometry:
    """Create lightning path geometries."""
    def __init__(self, data, ..., adef=None):
        ...
        self.adef = adef:
        self._filter_by_area(adef)
        ...
    def _filter_by_area(self, adef):
        if self.adef:
            ...
        ...
    def generate_geometries(self):
        ...
    def to_geojson(self):
        if not self.geometries:
            self.generate_geometries()
        ...
    def to_raster(self):
        if not self.geometries:
            self.generate_geometries()
        ...

class GeoJsonWriter:
    def __init__(self, ..., processor):
        ...
    def save(self, filename, data, ..., adef=None):
        proc = processor(data, ..., adef=adef)
        gjson = proc.to_geojson()
        self._write(filename, gjson)

def GeometryRasterCompositor:
    def __init__(self, ..., processor):
        self.processor = processor
    def __call__(self, datasets, ...):
        proc = self.processor(datasets, ..., adef=adef)
        raster_data = proc.to_raster()
        ...

Using the writer is like in my first braindump. For raster case, the processor class would be defined in the composite YAML. Area definition handling for filtering and rasterization is to be figured out.

mraspaud · 2025-10-17T09:39:18Z

At the moment, the LI reader returns a "swath" which should really be a geometry: could we have a "get_vector_dataset" that the yaml reader (subclass?) could use?

If the reader for LFL and LGR data returned a geometry instead of swath, a lot of the more traditional processing capabilities would be lost, or made unnecessarily complicated, like creating heatmaps with resampler="bucket_sum".

Yes, the resampling today acts as an implicit rasterising method.
The idea here is to have an expicit way of rasterising a VectorScene to get the corresponding RasterScene. Resampling will surely be one implementation of it.

This will allow a better separation of concerns, where the VectorScene does vector things, easily coupling it to dedicated vector writers for example, and thus not encumbering the existing (Raster)Scene. A workflow could be for example:

vscene = VectorScene(...)
vscene.load(...some LI stuff...)
vscene.save_datasets_to_database(<database_url>)  # allows writing without converting from raster/dataarray

rscene = vscene.rasterise(method="accumulate", area="fci_1km")
# possibly combine rscene with an fci scene for example
rscene.save_datasets()  # to images

One reason also for splitting into another scene class is that the current Scene class is already doing so much that (at least) I am really reluctant on adding more functionality to it if we can avoid it.

gerritholl · 2025-10-22T10:57:56Z

Thanks @mraspaud for the write-up. As for:

If the readers know what kind of data they produce, we could decide what scene to use

I think that would be the FileHandler, wouldn't it? The word reader is used for multiple things in satpy,

pnuu · 2025-10-30T07:43:50Z

I'd say the data should be read directly as vector only and only if the data are vectors already in the files being handled. At the moment I think the only such data I know of is NWC SAF GEO RDT (Rapidly Developing Thunderstorms), which we don't have a reader for.

I don't consider point data such as LI/LFL as vector data, nor atmospheric motion vectors if they are given as u- and v- components or as speed and direction. These are already data that could be put in a database (as an example) as-is without any vectorization. In fact if the data are read to a vector format, it might require deconstruction before being usable.

Started a module containing a VectorScene class. So far no implementation and no tests. In fact this commit serves mostly to test if I can push to benjamins branch ☺

… feat_geometry_processor

Split off from the Scene functionality shared with VectorScene into a BaseScene class. Leaves anything related to resampling, areas, composites in Scene. Just a first attempt, will very likely be further adjusted in later commits.

BENR0 and others added 10 commits May 14, 2025 10:28

add: flash geometry processor draft of proof of concept

67fd1d9

add: proof of concept sqlite feature writer.

15465c0

add: sqlite writer for features.

606e8e3

refactor: renamed classes, added distance threshold.

38b72cd

Add: getting threshold value from config.

e1b0d12

Add: dependencies

097780f

Add: geojson

3aece3c

Add Oracle spatial database writer.

6b3fbf7

Merge branch 'main' into feat_geometry_processor

6169d1e

fix: typo

a4357c1

gerritholl reviewed Jun 17, 2025

View reviewed changes

gerritholl added this to Remote PCW Autumn 2025 Nov 10, 2025

gerritholl moved this to In Progress in Remote PCW Autumn 2025 Nov 10, 2025

BENR0 and others added 5 commits November 10, 2025 15:51

Merge branch 'main' into feat_geometry_processor

e6b1488

Started VectorScene class. No implementation yet.

681a1f4

Started a module containing a VectorScene class. So far no implementation and no tests. In fact this commit serves mostly to test if I can push to benjamins branch ☺

Merge branch 'feat_geometry_processor' of github.com:BENR0/satpy into…

d7fb9f3

… feat_geometry_processor

Started test function for vectorscene (basically empty)

4c9719b

Split Scene in BaseScene

ff5e227

Split off from the Scene functionality shared with VectorScene into a BaseScene class. Leaves anything related to resampling, areas, composites in Scene. Just a first attempt, will very likely be further adjusted in later commits.

This adds proof of concept LI flash geometry as propose by ESSL as a composite including some writers #3150

Are you sure you want to change the base?

This adds proof of concept LI flash geometry as propose by ESSL as a composite including some writers #3150

Conversation

BENR0 commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coveralls commented Jun 16, 2025

Pull Request Test Coverage Report for Build 15680470090

Details

💛 - Coveralls

Uh oh!

gerritholl left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mraspaud commented Oct 15, 2025

Pytroll LI geometry meeting

Uh oh!

pnuu commented Oct 16, 2025

Uh oh!

pnuu commented Oct 16, 2025

Uh oh!

pnuu commented Oct 16, 2025

Uh oh!

pnuu commented Oct 16, 2025

Uh oh!

mraspaud commented Oct 17, 2025

Uh oh!

gerritholl commented Oct 22, 2025

Uh oh!

pnuu commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

BENR0 commented Jun 16, 2025 •

edited

Loading

codecov bot commented Jun 16, 2025 •

edited

Loading

pnuu commented Oct 30, 2025 •

edited

Loading