Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 25 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,16 @@

[![Unittest Gaiaflow](https://github.com/bcdev/gaiaflow/actions/workflows/unittest.yml/badge.svg)](https://github.com/bcdev/gaiaflow/actions/workflows/unittest.yml)
[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/charliermarsh/ruff/main/assets/badge/v0.json)](https://github.com/charliermarsh/ruff)
[![Docs](https://img.shields.io/badge/docs-mkdocs-blue)](https://bcdev.github.io/gaiaflow/)
![Static Badge](https://img.shields.io/badge/Airflow-3.0-8A2BE2?logo=apacheairflow)
![Static Badge](https://img.shields.io/badge/MLFlow-darkblue?logo=mlflow)
![Static Badge](https://img.shields.io/badge/MinIO-red?logo=minio)
![Static Badge](https://img.shields.io/badge/Jupyter-grey?logo=jupyter)
![Static Badge](https://img.shields.io/badge/Minikube-lightblue?logo=kubernetes)

![Gaiaflow](assets/gaiaflow.png)


<sub>(Image created using ChatGPT)</sub>

The word `GaiaFlow` is a combination of `Gaia` (the Greek goddess of Earth, symbolizing our planet)
Expand Down Expand Up @@ -98,7 +106,9 @@ Any files or folders marked with `^` can be extended, but carefully.
├── utils.py * # Utility function to get the minikube gateway IP required for testing.
├── docker_config.py * # Utility function to get the docker image name based on your project.
├── kube_config_inline * # This file is needed for Airflow to communicate with Minikube when testing locally in a prod env.
└── dockerfiles/ * # Dockerfiles and compose files
├── airflow_test.cfg * # This file is needed for testing your airflow dags.
├── Dockerfile ^ # Dockerfile for your package.
└── dockerfiles/ * # Dockerfiles required by Docker compose
```


Expand Down Expand Up @@ -245,7 +255,7 @@ Once the pre-requisites are done, you can go ahead with the project creation:
When prompted for input, enter the details requested. If you dont provide any
input for a given choice, the first choice from the list is taken as the default.

Once the project is created, please read the README.md from that.
Once the project is created, please read the [user guide](https://bcdev.github.io/gaiaflow/dev/).


## Troubleshooting
Expand Down Expand Up @@ -322,3 +332,16 @@ If you face any other problems not mentioned above, please reach out to us.
- [Minio](https://min.io/docs/minio/container/index.html)
- [JupyterLab](https://jupyterlab.readthedocs.io/)
- [Minikube](https://minikube.sigs.k8s.io/docs/)


### TODO:

Make ECR work. How to add credentials?

S3 credentials access?

Add sensor based DAGs

Make CI unittest using conda instead

Update CI to use ECR credentials.
13 changes: 9 additions & 4 deletions docs/dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ Any files or folders marked with `^` can be extended, but carefully.
├── utils.py * # Utility function to get the minikube gateway IP required for testing.
├── docker_config.py * # Utility function to get the docker image name based on your project.
├── kube_config_inline * # This file is needed for Airflow to communicate with Minikube when testing locally in a prod env.
├── airflow_test.cfg * # This file is needed for testing your airflow dags.
├── Dockerfile ^ # Dockerfile for your package.
└── dockerfiles/ * # Dockerfiles required by Docker compose
```
Expand Down Expand Up @@ -125,10 +126,11 @@ Then start the MLOps services using:
python mlops_mananger.py --start -b
```

NOTE: When you run this for the first time, make sure you use the `-b` flag as
it builds the images for the first time as shown above.
Next time when you start it again, you start it without the flag as it saves
time by not building the same images again:


**NOTE**: The `-b` flag only needs to be used for the first time.
For consecutive starts
and restarts, use the same command as above but without `-b` flag.


### 3. Accessing the services
Expand Down Expand Up @@ -275,6 +277,9 @@ mlflow logging. You can literally just add a `#` to the
the files, empty files have 0 bytes of content and that
creates issues with the urllib3 upload to S3 (this
happens inside MLFlow)
- If there are any errors in using the Minikube manager, try restarting it
by `python minikube_manager.py --restart` followed by
`python mlops_manager.py --restart` to make sure that the changes are synced.



Expand Down
36 changes: 19 additions & 17 deletions docs/prod.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,26 +173,28 @@ per project by any team member of that project)
- Make sure you change the `task_factory` mode to `prod`.
- Contact the CDR maintainers and ask them to add your repository as a
`submodule` (They will know what to do!)
- Create a `PYPI_API_TOKEN` from the PyPi website and add it as a secret to the
repository.
- Do the same with `CODECOV_TOKEN` from the CodeCov website.
- Create a Personal Access Token from your account (make sure your account is a
member of the bcdev org for this to work).

- Create a Personal Access Token from your account (make sure your account is a
member of the bcdev org for this to work).

Do as follows:
Do as follows:

- Go to your Github account Settings
- Navigate to `<> Developer Settings` (the last button on the page)
- Create a `Personal access token (classic)`
- See the image below for reference:
- Go to your Github account Settings
- Navigate to `<> Developer Settings` (the last button on the page)
- Create a `Personal access token (classic)`
- See the image below for reference:

- Only click on the `repo` permissions
- Create the token
- Copy and keep it safe somewhere (maybe on KeePassXC)
- Now, navigate to your newly created project, create a new Secret
- Go to Repository settings
- Navigate to `Secrets and Variables -> Actions -> New repository secret`
- In the `Name` field, insert `CDR_PAT`
- In the `Secret` field, insert your token that generated a few moments ago
- Click on `Add Secret`
- Only click on the `repo` permissions
- Create the token
- Copy and keep it safe somewhere (maybe on KeePassXC)
- Now, navigate to your newly created project, create a new Secret
- Go to Repository settings
- Navigate to `Secrets and Variables -> Actions -> New repository secret`
- In the `Name` field, insert `CDR_PAT`
- In the `Secret` field, insert your token that generated a few moments ago
- Click on `Add Secret`

- Now you are ready to deploy your dags to the production Airflow.

Expand Down
13 changes: 0 additions & 13 deletions env.yml

This file was deleted.

2 changes: 1 addition & 1 deletion tests/test_cookiecutter.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@
"minikube_manager.py",
"kube_config_inline",
"utils.py",
"docker_config.py",
"docker_image_name_generator.py",
"docker-compose.yml",
"README.md",
".env",
Expand Down
2 changes: 2 additions & 0 deletions {{ cookiecutter.folder_name }}/.env
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,5 @@ POSTGRES_AIRFLOW_DB=airflow

J_MLFLOW_TRACKING_URI="http://localhost:5000"
J_MLFLOW_S3_ENDPOINT_URL="http://localhost:9000"


20 changes: 11 additions & 9 deletions {{ cookiecutter.folder_name }}/.github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,24 +17,26 @@ jobs:
- name: checkout
uses: actions/checkout@v4
{% raw %}
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
- name: Set up MicroMamba and install dependencies with Python ${{ matrix.python-version }}
uses: mamba-org/setup-micromamba@v1
with:
python-version: ${{ matrix.python-version }}
environment-file: environment.yml
create-args: >-
python=${{ matrix.python-version }}
{% endraw %}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev,lint,test]

- name: Lint with ruff
run: |
ruff check

- name: Run unit tests
shell: bash -l {0}
run:
pytest test/ --cov=xcube_gedidb --cov-report=xml

- name: Run unit tests
shell: bash -l {0}
run: |
pytest --cov={{ cookiecutter.package_name }} --cov-branch --cov-report=xml
pytest -m "not gaiaflow" --cov={{ cookiecutter.package_name }} --cov-branch --cov-report=xml

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v4
Expand Down
22 changes: 12 additions & 10 deletions {{ cookiecutter.folder_name }}/.github/workflows/unittest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,25 +16,27 @@ jobs:
steps:
- name: checkout
uses: actions/checkout@v4
{% raw %}
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v3
{% raw %}
- name: Set up MicroMamba and install dependencies with Python ${{ matrix.python-version }}
uses: mamba-org/setup-micromamba@v1
with:
python-version: ${{ matrix.python-version }}
environment-file: environment.yml
create-args: >-
python=${{ matrix.python-version }}
{% endraw %}
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install .[dev,lint,test]

- name: Lint with ruff
run: |
ruff check

- name: Run unit tests
shell: bash -l {0}
run:
pytest test/ --cov=xcube_gedidb --cov-report=xml

- name: Run unit tests
shell: bash -l {0}
run: |
pytest --cov={{ cookiecutter.package_name }} --cov-branch --cov-report=xml
pytest -m "not gaiaflow" --cov={{ cookiecutter.package_name }} --cov-branch --cov-report=xml

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v4
Expand Down
2 changes: 1 addition & 1 deletion {{ cookiecutter.folder_name }}/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Add License Badge

{{ cookiecutter.project_name|capitalize }}

Please have a look at the documentation to get started!
Please have a look at the [documentation](https://bcdev.github.io/gaiaflow/getting_started/) to get started!
Feel free to update this README as your own.

To add a license, please choose which license you want for your project
Expand Down
3 changes: 3 additions & 0 deletions {{ cookiecutter.folder_name }}/airflow_test.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[core]
dags_folder = ./dags
load_examples = False
2 changes: 0 additions & 2 deletions {{ cookiecutter.folder_name }}/dags/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,6 @@ DAGs.
- For testing purposes, you can trigger them manually. If you would like to also manually trigger them for your workflow
you can!
- But if you want your DAG to run periodically, setting the start_date and schedule is important.
- NOTE: By default, if you set a `start_date` in the past, Airflow will try to backfill all those runs. To avoid that,
use catchup=False inside the dag definitions.


## Common parameters used while defining a DAG
Expand Down
39 changes: 4 additions & 35 deletions {{ cookiecutter.folder_name }}/dags/change_me_task_factory_dag.py
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
# create a docker image to run your package with all the dependencies included.
# Please update the image name below:
# TODO: Talk with Tejas to align on image naming.
image="my-local-image/my-package:0.0.1",
image="<your-image-name>",

# TODO: Discuss with Tejas about a process for creating secrets
secrets=["my-minio-creds"],
Expand Down Expand Up @@ -149,7 +149,7 @@
},
},

image="my-local-image/my-package:0.0.1",
image="<your-image-name>",
secrets=["my-minio-creds"],
env_vars={
"MLFLOW_TRACKING_URI": f"http://{MINIKUBE_GATEWAY}:5000",
Expand All @@ -174,7 +174,7 @@
"key": "return_value",
},
},
image="my-local-image/my-package:0.0.1",
image="<your-image-name>",
secrets=["my-minio-creds"],
env_vars={
"MLFLOW_TRACKING_URI": f"http://{MINIKUBE_GATEWAY}:5000",
Expand All @@ -187,35 +187,4 @@
trainer >> predictor


# TODO:
# [DONE] Update ti.xcom code with simple return dict statements.
# [DONE] Update the cookiecutter so that it allows using Airflow standalone (
# without
# MLOps) for projects requiring only Airflow.
# Make ECR work. How to add credentials?
# [DONE]Make sure task factory works out of the box when new projects are
# created.
# [DONE]Add tests for Airflow dags.
# [DONE]Update the documentation stating that we should only return simple
# objects from the
# main function that airflow needs to execute.
# [DONE]Update documentation providing best practices while working with
# Docker (
# cleanup images on registry, local etc.)
# S3 credentials access?
# Add sensor based DAGs
# [DONE] Add version.py in package
# [DONE] Improve change_me_train.py and other files.
# Make CI unittest using conda instead
# Update CI to use ECR credentials.
# Run ruff, isort.
# [done] Update documentation also including, restarting airflow service after
# env update. now possible using --restart
# [done] after starting prod, restart airflow containers.
# [done] on windows, run pytest --ignore=logs and before that run set
# [done] AIRFLOW_CONFIG=%cd%\airflow_test.cfg
# check jupyter notebooks if they work to be sure.
# [DONE] add task_factory tutorial
# [DONE] write up about the architecture
# [DONE] check all files and readmes once more.
# [DONE] update the architecture diagram in main README

Original file line number Diff line number Diff line change
Expand Up @@ -62,10 +62,10 @@
func_path="{{ cookiecutter.package_name }}.example_preprocess",
func_kwargs={"dummy_arg": "hello world"},

# For prod_local and prod mode only
# When you run the ./prod_local_setup.sh as shown above, it will also
# create a docker image from your package with your environment.yml.
# Please put the image name below
# # For prod_local and prod mode only
# You must run the `python minikube_manager.py --build-only`, it will then
# create a docker image to run your package with all the dependencies included.
# Please update the image name below:
image="<your-image-name>",
secrets=["my-minio-creds"],
env_vars={
Expand Down
Loading