-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage
Implement a Prefect flow which will process and concatenate xarray datasets stored in a specified Azure Blob Storage container, as zarr stores. The flow will:
- Identify individual file IDs from the container directory structure.
- Open both normal and denoised Zarr files using existing functions.
- Retrieve additional metadata from the database for each file ID.
- Concatenate the datasets while preserving metadata.
- Convert the concatenated dataset to NetCDF4 format.
- Optionally create echograms of the new concatenated dataset (both denoised and normal), compute MVBS, and NASC.
- Store the NetCDF4 file in an output Blob Storage container and generate an access link.
The flow will have the following signature:
load_and_process_files.serve(
name='convert-to-netcdf',
parameters={
'cruise_id': 'example_cruise',
'load_from_blobstorage': True,
'get_list_from_db': False,
'start_datetime': None,
'end_datetime': None,
'source_container': 'input-zarr-container',
'save_to_blobstorage': True,
'output_container': 'output-netcdf-container',
'save_to_directory': False,
'output_directory': '',
'plot_echograms': False,
'compute_nasc': False,
'compute_mvbs': False,
'chunks_ping_time': 500,
'chunks_range_sample': 500,
'batch_size': BATCH_SIZE
}
)Workflow Steps
1. Retrieve List of File IDs from Container
- List all folders under
{cruise_id}/ - Extract
{individual_file_id}from folder names. - Identify the presence of both
{individual_file_id}.zarrand{individual_file_id_denoised}.zarr.
2. Retrieve Metadata from Database
- Extend
FileSegmentServiceto fetch metadata for each file, including:locationfile_nameidlocation_datafile_freqsfile_start_timefile_end_time
3. Load Zarr Datasets
- Use
open_zarr_store()to lazily load both normal and denoised datasets.
4. Concatenate Zarr Datasets
- Call
concatenate_zarr_files()to merge all datasets while keeping metadata. - Ensure datasets are rechunked appropriately.
5. Convert to NetCDF4
- Use
save_dataset_to_netcdf()to convert the dataset.
6. Upload to Output Container
- Store the NetCDF4 file in
output_container. - Generate an access link via
generate_container_access_url().
Metadata
Metadata
Assignees
Labels
No labels