Implement a Prefect flow to combine a list of zarr files and convert to NetCDF

### **Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage**

Implement a Prefect flow which will process and concatenate xarray datasets stored in a specified Azure Blob Storage container, as zarr stores. The flow will:
1. Identify individual file IDs from the container directory structure.
2. Open both normal and denoised Zarr files using existing functions.
3. Retrieve additional metadata from the database for each file ID.
4. Concatenate the datasets while preserving metadata.
5. Convert the concatenated dataset to NetCDF4 format.
6. Optionally create echograms of the new concatenated dataset (both denoised and normal), compute MVBS, and NASC.
7. Store the NetCDF4 file in an output Blob Storage container and generate an access link.
---

The flow will have the following signature:

```python
load_and_process_files.serve(
    name='convert-to-netcdf',
    parameters={
        'cruise_id': 'example_cruise',
        'load_from_blobstorage': True,
        'get_list_from_db': False,
        'start_datetime': None,
        'end_datetime': None,
        'source_container': 'input-zarr-container',
        'save_to_blobstorage': True,
        'output_container': 'output-netcdf-container',
        'save_to_directory': False,
        'output_directory': '',
        'plot_echograms': False,
        'compute_nasc': False,
        'compute_mvbs': False,
        'chunks_ping_time': 500,
        'chunks_range_sample': 500,
        'batch_size': BATCH_SIZE
    }
)

```

---

### **Workflow Steps**
#### **1. Retrieve List of File IDs from Container**
- List all folders under `{cruise_id}/`
- Extract `{individual_file_id}` from folder names.
- Identify the presence of both `{individual_file_id}.zarr` and `{individual_file_id_denoised}.zarr`.

#### **2. Retrieve Metadata from Database**
- Extend `FileSegmentService` to fetch metadata for each file, including:
    - `location`
    - `file_name`
    - `id`
    - `location_data`
    - `file_freqs`
    - `file_start_time`
    - `file_end_time`

#### **3. Load Zarr Datasets**
- Use `open_zarr_store()` to lazily load both normal and denoised datasets.

#### **4. Concatenate Zarr Datasets**
- Call `concatenate_zarr_files()` to merge all datasets while keeping metadata.
- Ensure datasets are rechunked appropriately.

#### **5. Convert to NetCDF4**
- Use `save_dataset_to_netcdf()` to convert the dataset.

#### **6. Upload to Output Container**
- Store the NetCDF4 file in `output_container`.
- Generate an access link via `generate_container_access_url()`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement a Prefect flow to combine a list of zarr files and convert to NetCDF #2

Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage

Workflow Steps

1. Retrieve List of File IDs from Container

2. Retrieve Metadata from Database

3. Load Zarr Datasets

4. Concatenate Zarr Datasets

5. Convert to NetCDF4

6. Upload to Output Container

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement a Prefect flow to combine a list of zarr files and convert to NetCDF #2

Description

Prefect 3.0 Flow: Convert Zarr Files to NetCDF4 and Store in Blob Storage

Workflow Steps

1. Retrieve List of File IDs from Container

2. Retrieve Metadata from Database

3. Load Zarr Datasets

4. Concatenate Zarr Datasets

5. Convert to NetCDF4

6. Upload to Output Container

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions