|
| 1 | +# Lithops Package for MUR SST Data Processing |
| 2 | + |
| 3 | +This package provides functionality for processing MUR SST (Multi-scale Ultra-high Resolution Sea Surface Temperature) data using [Lithops](https://lithops-cloud.github.io/), a framework for serverless computing. |
| 4 | + |
| 5 | +## Environment + Lithops Setup |
| 6 | + |
| 7 | +1. Set up a Python environment. The below example uses [`uv`](https://docs.astral.sh/uv/), but other environment mangers should work as well: |
| 8 | + |
| 9 | +```sh |
| 10 | +uv venv virtualizarr-lithops --python 3.11 |
| 11 | +source virtualizarr-lithops/bin/activate |
| 12 | +uv pip install -r requirements.txt |
| 13 | +``` |
| 14 | + |
| 15 | +2. Follow the [AWS Lambda Configuration](https://lithops-cloud.github.io/docs/source/compute_config/aws_lambda.html#configuration) instructions, unless you already have an appropriate AWS IAM role to use. |
| 16 | + |
| 17 | +3. Follow the [AWS Credential setup](https://lithops-cloud.github.io/docs/source/compute_config/aws_lambda.html#aws-credential-setup) instructions. |
| 18 | + |
| 19 | +4. Check and modify as necessary compute and storage backends for [lithops](https://lithops-cloud.github.io/docs/source/configuration.html) in `lithops.yaml`. |
| 20 | + |
| 21 | + |
| 22 | +5. Build the lithops lambda runtime if it does not exist in your target AWS environemnt. |
| 23 | +```bash |
| 24 | +export LITHOPS_CONFIG_FILE=$(pwd)/lithops.yaml |
| 25 | +lithops runtime build -b aws_lambda -f Dockerfile vz-runtime |
| 26 | +``` |
| 27 | + |
| 28 | +For various reasons, you may want to build the lambda runtime on EC2 (docker can be a resource hog and pushing to ECR is faster, for example). If you wish to use EC2, please see the scripts in `ec2_for_lithops_runtime/` in this directory. |
| 29 | + |
| 30 | +> [!IMPORTANT] |
| 31 | +> If the runtime was created with a different IAM identity, an appropriate `user_id` will need to be included in the lithops configuration under `aws_lamda`. |
| 32 | +
|
| 33 | +> [!TIP] |
| 34 | +> You can configure the AWS Lambda architecture via the `architecture` key under `aws_lambda` in the lithops configuration file. |
| 35 | +
|
| 36 | + |
| 37 | +6. (Optional) To rebuild the Lithops Lambda runtime image, delete the existing one: |
| 38 | + |
| 39 | +```bash |
| 40 | +lithops runtime delete -b aws_lambda -d virtualizarr-runtime |
| 41 | +``` |
| 42 | + |
| 43 | +## Package Structure |
| 44 | + |
| 45 | +The package is organized into the following modules: |
| 46 | + |
| 47 | +- `__init__.py`: Package initialization and exports |
| 48 | +- `config.py`: Configuration settings and constants |
| 49 | +- `models.py`: Data models and structures |
| 50 | +- `url_utils.py`: URL generation and file listing |
| 51 | +- `repo.py`: Icechunk repository management |
| 52 | +- `virtual_datasets.py`: Virtual dataset operations |
| 53 | +- `zarr_operations.py`: Zarr array operations |
| 54 | +- `helpers.py`: Data helpers |
| 55 | +- `lithops_functions.py`: Lithops execution wrappers |
| 56 | +- `cli.py`: Command-line interface |
| 57 | + |
| 58 | +## Usage |
| 59 | + |
| 60 | +### Command-line Interface |
| 61 | + |
| 62 | +The package provides a command-line interface for running various functions: |
| 63 | + |
| 64 | +```bash |
| 65 | +python main.py <function> [options] |
| 66 | +``` |
| 67 | + |
| 68 | +Available functions: |
| 69 | + |
| 70 | +- `write_to_icechunk`: Write data to Icechunk |
| 71 | +- `check_data_store_access`: Check access to the data store |
| 72 | +- `calc_icechunk_store_mean`: Calculate the mean of the Icechunk store |
| 73 | +- `calc_original_files_mean`: Calculate the mean of the original files |
| 74 | +- `list_installed_packages`: List installed packages |
| 75 | + |
| 76 | +Options: |
| 77 | + |
| 78 | +- `--start_date`: Start date for data processing (YYYY-MM-DD) |
| 79 | +- `--end_date`: End date for data processing (YYYY-MM-DD) |
| 80 | +- `--append_dim`: Append dimension for writing to Icechunk |
| 81 | + |
| 82 | +### Examples |
| 83 | + |
| 84 | +#### Writing Data to Icechunk |
| 85 | + |
| 86 | +```bash |
| 87 | +python main.py write_to_icechunk --start_date 2022-01-01 --end_date 2022-01-02 |
| 88 | +``` |
| 89 | + |
| 90 | +#### Calculating the Mean of the Icechunk Store |
| 91 | + |
| 92 | +```bash |
| 93 | +python main.py calc_icechunk_store_mean --start_date 2022-01-01 --end_date 2022-01-31 |
| 94 | +``` |
| 95 | + |
| 96 | +#### Checking Data Store Access |
| 97 | + |
| 98 | +```bash |
| 99 | +python main.py check_data_store_access |
| 100 | +``` |
| 101 | + |
| 102 | +## Programmatic Usage |
| 103 | + |
| 104 | +You can also use the package programmatically: |
| 105 | + |
| 106 | +```python |
| 107 | +from lithops_functions import write_to_icechunk |
| 108 | + |
| 109 | +# Write data to Icechunk |
| 110 | +write_to_icechunk(start_date="2022-01-01", end_date="2022-01-31") |
| 111 | +``` |
| 112 | + |
| 113 | +## Testing |
| 114 | + |
| 115 | +To test the package, you can use the provided test functions: |
| 116 | + |
| 117 | +```bash |
| 118 | +python main.py check_data_store_access |
| 119 | +``` |
| 120 | + |
| 121 | +This will verify that the package can access the data store. |
0 commit comments