-
Notifications
You must be signed in to change notification settings - Fork 321
This adds proof of concept LI flash geometry as propose by ESSL as a composite including some writers #3150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3150 +/- ##
==========================================
- Coverage 96.30% 96.10% -0.21%
==========================================
Files 463 466 +3
Lines 58200 58828 +628
==========================================
+ Hits 56048 56535 +487
- Misses 2152 2293 +141
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Pull Request Test Coverage Report for Build 15680470090Details
💛 - Coveralls |
gerritholl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First small thoughts based on initial read-through. I know this is a draft PR so it's still being worked on. Biggest open tasks I see:
- Is this dask-friendly or can it be made dask-friendly?
- Unit tests.
- Documentation.
| import logging | ||
| from typing import Optional, Sequence | ||
|
|
||
| import geopandas as gpd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is an optional dependency, the import should probably be inside the processor implementation. Not sure how that would work for annotations, though.
| import geopandas as gpd | ||
| import numpy as np | ||
| import pandas as pd | ||
| import pyproj as pro |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other modules this is simply imported as pyproj.
| pandas.DataFrame | ||
| References: | ||
| [1] https://gist.github.com/emiliom/87a6aa137610bf59787868943b406e8f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this reference correct? It links to a gist on holoviz by Emilio Mayorga and does not seem to have anything to do with Pieter Groenemeijers flash geometry algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I don't know how the wrong link got into this. I will try and find the correct one.
| super().__init__(name, filename, base_dir, **kwargs) | ||
|
|
||
| def save_dataset(self, dataset, filename=None, layer_name=None, **kwargs): | ||
| """Save the geodataframe to sqlite.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems it writes either to sqlite or to JSON, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
| def write_gdf_to_oracle(gdf, table_name, user, password, host, port, service_name): | ||
| """Write geodataframe to oracle database. | ||
| The connection data like user/pw/host/port/service_name are taken from the config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see that they are taken from the config, rather they are passed by the user to this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is correct. I had to refactor a little bit from our custom chain and did not adapt the documentation.
|
We just had a meeting on this topic, here are my notes: Pytroll LI geometry meetingLast PCW Ro started implementing it. Should satpy support geometries? No reason why not. (Geometries could also be used to crop/filter data ?) Right now there is a PR that hacks the geometries into satpy. It can be written to database. Windbarbs and Lightnings are another example of vector data that could use the same processing way. Datatree (thought of a future internal data structure for the Scene) would not be compatible with a dataframe. Ideally, the Scene should be split into container vs processor. We could have a VectorScene as an alternative to a RasterScene. If the readers know what kind of data they produce, we could decide what scene to use. Can we use the existing scene and just add the vector functionality to it, and the scene would know when there is a need to rasterise. As long as there is no need to combine geometries with rasters for now, we can start with a separate vectorscene object. At the moment, the LI reader returns a "swath" which should really be a geometry: could we have a "get_vector_dataset" that the yaml reader (subclass?) could use? Dataframes has experimental attributes. We agree on using GeoDataFrame (check attrs), try this new container (VectorScene). Try vector product coming out directly from the reader. A modifier is a good match for what we want to do with flash geometries, but it's not exactly the same. The logic of putting points together could also be a composite. Discussion about composites vs algorithm that could potentially generate multiple products. Start with the existing PR and try to adapt it. |
|
Some thoughts I also wrote partially to Slack follow. What if the writer handled the geometry creation?
The usage would be straightforward scn = Scene(...)
scn.load([something])
scn.save_dataset(something, filename="li.json", writer="li_shapes_to_geojson")and for databases ...
# connection address, credentials, table, format string, and whatever else is needed
postgis_settings = {...}
scn.save_dataset(something, writer="li_shapes_to_postgis", writer_kwargs=postgis_settings)To limit the data to a specific area there might be a need for some data reduction. The most simple to implement would be in the "compositor" that is used for Usage together with rasters in the same Scene? No idea 😅 I do something like this in my Trollflow2 plugin (see example config here) that writes point data to our PostGIS database. The data can be anything that can be described as point data, but I've used it for LI/LFL. |
|
Possibilities for the "use geometries with rasters loaded in the same
|
If the reader for LFL and LGR data returned a geometry instead of swath, a lot of the more traditional processing capabilities would be lost, or made unnecessarily complicated, like creating heatmaps with |
|
Pseudo-code for my approach would be something like: class LightningPathGeometry:
"""Create lightning path geometries."""
def __init__(self, data, ..., adef=None):
...
self.adef = adef:
self._filter_by_area(adef)
...
def _filter_by_area(self, adef):
if self.adef:
...
...
def generate_geometries(self):
...
def to_geojson(self):
if not self.geometries:
self.generate_geometries()
...
def to_raster(self):
if not self.geometries:
self.generate_geometries()
...
class GeoJsonWriter:
def __init__(self, ..., processor):
...
def save(self, filename, data, ..., adef=None):
proc = processor(data, ..., adef=adef)
gjson = proc.to_geojson()
self._write(filename, gjson)
def GeometryRasterCompositor:
def __init__(self, ..., processor):
self.processor = processor
def __call__(self, datasets, ...):
proc = self.processor(datasets, ..., adef=adef)
raster_data = proc.to_raster()
...Using the writer is like in my first braindump. For raster case, the processor class would be defined in the composite YAML. Area definition handling for filtering and rasterization is to be figured out. |
Yes, the resampling today acts as an implicit rasterising method. This will allow a better separation of concerns, where the VectorScene does vector things, easily coupling it to dedicated vector writers for example, and thus not encumbering the existing (Raster)Scene. A workflow could be for example: vscene = VectorScene(...)
vscene.load(...some LI stuff...)
vscene.save_datasets_to_database(<database_url>) # allows writing without converting from raster/dataarray
rscene = vscene.rasterise(method="accumulate", area="fci_1km")
# possibly combine rscene with an fci scene for example
rscene.save_datasets() # to imagesOne reason also for splitting into another scene class is that the current Scene class is already doing so much that (at least) I am really reluctant on adding more functionality to it if we can avoid it. |
|
Thanks @mraspaud for the write-up. As for:
I think that would be the FileHandler, wouldn't it? The word reader is used for multiple things in satpy, |
|
I'd say the data should be read directly as vector only and only if the data are vectors already in the files being handled. At the moment I think the only such data I know of is NWC SAF GEO RDT (Rapidly Developing Thunderstorms), which we don't have a reader for. I don't consider point data such as LI/LFL as vector data, nor atmospheric motion vectors if they are given as u- and v- components or as speed and direction. These are already data that could be put in a database (as an example) as-is without any vectorization. In fact if the data are read to a vector format, it might require deconstruction before being usable. |
Started a module containing a VectorScene class. So far no implementation and no tests. In fact this commit serves mostly to test if I can push to benjamins branch ☺
… feat_geometry_processor
Split off from the Scene functionality shared with VectorScene into a BaseScene class. Leaves anything related to resampling, areas, composites in Scene. Just a first attempt, will very likely be further adjusted in later commits.
This implements the LI flash geometry product as proposed by ESSl as a composite so it can easily produces with Satpy.
This is a proof of concept PR. As discussed during the spring PCW this is not really a composite but fits more in line with a concept of a "processor". Currently this is still implemented under composites just because for a proof of concept it did not make sense to add a "processor" loading similiar to the composites to Satpy yet.
Things to note:
DatyIDand Dataset access methods this PR introduces a bare bones geometry container just wrapping a geopandas geodataframe.I will add some documentation later on, so just a few notes on how to use this in the mean time.
Together with the LI LGR files you can just load "acc_flash_geometry" and you will get the flash geometry product.
With the feature writer you can save this as a sqlite file or as geojson by just using the writer together with the corresponding file extension.
The flash geometry algorithm implemented here is already including thresholding with a default of 10km (currently also used by ESSL) to mitigate the issue that sometimes unreasonable connections between groups are made. This threshold is adaptable by using the satpy config in order to be easily be played around with. Just include a new setting of "flash_geom_distance_treshold" under "composites" in the Satpy config or use a:
statement