Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,16 @@
## Version 0.0.3 (in development)

* **Bug fix** - `ExternalPythonOperator` does not need Airflow in external environment now.

* Jupyter Lab can now be started in any conda/mamba environment via Gaiaflow.

* Added user workflow diagram in the `Overview` page of the documentation.

* Added a new subcommand `gaiaflow dev update-deps --help` to update
dependencies on the fly in Airflow containers for workflow tasks.



## Version 0.0.2

* **Chore**: Update `pyproject.toml` to include `README.md` and necessary links.
Expand Down
61 changes: 61 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,3 +215,64 @@ Quick rule of thumb:
- Use `prod_local` for testing end-to-end workflows on production-like settngs.
- Use `prod` for production pipelines in the real cluster.

## User workflow:
A typical user workflow could look like this:

```mermaid
flowchart TD
A1[Start]--> A
A{Prerequisites installed except template?}-->|Yes| D{Which OS?}
A -->|No| STOP[Stop]

D -->|Windows WSL2| E1[Install template via conda/mamba/miniforge prompt<br/>then create env in WSL2 CLI + install gaiaflow]
D -->|Linux| E2[Install template + create env<br/>then install gaiaflow]
E1 --> M
E2 --> M

M[Start services in Dev Mode<br/>`gaiaflow start dev -p .`] --> N[Experiment in JupyterLab if needed]

N --> P[Refactor to production code<br/>inside Python package and logging experiements to MLFlow and artifacts to S3 MinIO if needed]
P --> Q[Write tests in tests/ folder]
Q --> R[Create workflows with create_task mode=dev in Airflow]
R --> S[Run & monitor DAG in Airflow]
S --> T[Track experiments in MLflow & outputs in MinIO]
T --> V{Workflow ran successfully in dev mode?}
V --> |Yes| V1{Test your package in docker mode?}

V1 --> |Yes| V2[Change create_task mode to dev_docker and run gaiaflow dev dockerize -p .]

V2 --> V3{Docker mode ran successfully?}
V3 --> |Yes| V4{Move to production?}

V4 --> |Yes| X1[Start Prod-Local infra<br/>`gaiaflow prod-local start -p .`]

V --> |No| V5[Fix bugs]
V1 --> |No| STOP[Stop]
V3 --> |No| V6[Fix bugs]
V4 --> |No| STOP[Stop]

V5 --> R
V6 --> V3

X1 --> X2[Build Docker image<br/>`gaiaflow prod-local dockerize -p .`]
X2 --> X3[Configure secrets if needed]
X3 --> Y{Prod-Local Success?}
Y2 --> Y
Y -->|Yes| Y1[Set create_task mode to prod - Local services are not needed anymore]
Y1 --> Z[Deploy to Production Cluster ]

Y -->|No| Y2[Fix the bugs]

class A,D,U,Y,V,V1,V3,V4 decision;
class E1,F1,G1,H1 windows;
class E2,G2,H2 linux;
class Z prod;
class STOP,STOP2 stop;

classDef decision fill:#FFDD99,stroke:#E67E22,stroke-width:2px;
classDef windows fill:#AED6F1,stroke:#2874A6,stroke-width:2px;
classDef linux fill:#ABEBC6,stroke:#1E8449,stroke-width:2px;
classDef prod fill:#D7BDE2,stroke:#7D3C98,stroke-width:2px;
classDef stop fill:#F5B7B1,stroke:#922B21,stroke-width:2px;

```
2 changes: 2 additions & 0 deletions docs/start.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,6 +207,8 @@ This means now you have successfully installed Docker.

### Gaiaflow Installation

In Windows, install `gaiaflow` using WSL2 terminal in a new mamba/conda environment.

You can install Gaiaflow directly via pip:

`pip install gaiaflow`
Expand Down
6 changes: 6 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,14 @@ markdown_extensions:
- pymdownx.emoji:
emoji_index: !!python/name:material.extensions.emoji.twemoji
emoji_generator: !!python/name:material.extensions.emoji.to_svg
- pymdownx.superfences:
custom_fences:
- name: mermaid
class: mermaid
format: !!python/name:pymdownx.superfences.fence_code_format

extra:
mermaid: true
social:
- icon: fontawesome/brands/github
link: https://github.com/bcdev/gaiaflow
Expand Down
1,022 changes: 1,020 additions & 2 deletions pixi.lock

Large diffs are not rendered by default.

5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[project]
name = "gaiaflow"
requires-python = ">= 3.11"
version = "0.0.2"
version = "0.0.3.dev0"
description = "Local-first MLOps infrastructure python tool that simplifies the process of building, testing, and deploying ML workflows."
authors = [{name = "Yogesh Kumar Baljeet Singh", email = "[email protected]"}]
dependencies = [
Expand All @@ -16,6 +16,7 @@ dependencies = [
"apache-airflow-providers-fab>=2.3.1,<3",
"apache-airflow-providers-docker>=4.4.2,<5",
"mlflow>=3.2.0,<4",
"jupyter>=1.1.1,<2",
]
readme= "README.md"

Expand Down Expand Up @@ -78,6 +79,8 @@ mkdocs-material = ">=9.6.17,<10"
mkdocstrings = ">=0.30.0,<0.31"
mkdocstrings-python = ">=1.17.0,<2"
pytest-cov = ">=6.2.1,<7"
jupyter = ">=1.1.1,<2"
pymdown-extensions = ">=10.16.1,<11"

[project.scripts]
gaiaflow = "gaiaflow.cli.cli:app"
Expand Down
2 changes: 1 addition & 1 deletion src/docker_stuff/airflow/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ RUN micromamba --version
USER airflow
COPY environment.yml .

RUN micromamba env create -f environment.yml -n default_user_env
RUN micromamba env create -f environment.yml -n default_user_env && rm environment.yml
RUN micromamba env list

RUN micromamba shell init -s bash
47 changes: 38 additions & 9 deletions src/gaiaflow/cli/commands/mlops.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,12 +52,16 @@ def start(
jupyter_port: int = typer.Option(
8895, "--jupyter-port", "-j", help="Port for JupyterLab"
),
delete_volume: bool = typer.Option(
False, "--delete-volume", "-v", help="Delete volumes on shutdown"
),
docker_build: bool = typer.Option(
False, "--docker-build", "-b", help="Force Docker image build"
),
user_env_name: str = typer.Option(
None, "--env", "-e", help="Provide conda/mamba environment name for "
"Jupyter Lab to run. If not set, it will use the name from your environment.yml file."
),
env_tool: "str" = typer.Option(
"mamba", "--env-tool", "-t", help="Which tool to use for running your Jupyter lab. Options: mamba, conda",
),
):
imports = load_imports()
typer.echo(f"Selected Gaiaflow services: {service}")
Expand All @@ -84,8 +88,9 @@ def start(
service=s,
cache=cache,
jupyter_port=jupyter_port,
delete_volume=delete_volume,
docker_build=docker_build,
user_env_name=user_env_name,
env_tool=env_tool,
)
else:
typer.echo("Running start with all services")
Expand All @@ -97,8 +102,9 @@ def start(
service=Service.all,
cache=cache,
jupyter_port=jupyter_port,
delete_volume=delete_volume,
docker_build=docker_build,
user_env_name=user_env_name,
env_tool=env_tool,
)


Expand Down Expand Up @@ -265,8 +271,31 @@ def dockerize(
image_name=image_name
)

@app.command(help="Update the dependencies for the Airflow tasks. This command "
"synchronizes the running container environments with the project's"
"`environment.yml`. Make sure you have updated "
"`environment.yml` before running"
"this, as the container environments are updated based on "
"its contents.")
def update_deps(
project_path: Path = typer.Option(..., "--path", "-p", help="Path to your project"),
):
imports = load_imports()
gaiaflow_path, user_project_path = imports.create_gaiaflow_context_path(
project_path
)
gaiaflow_path_exists = imports.gaiaflow_path_exists_in_state(gaiaflow_path, True)
if not gaiaflow_path_exists:
imports.save_project_state(user_project_path, gaiaflow_path)
else:
typer.echo(
f"Gaiaflow project already exists at {gaiaflow_path}. Skipping "
f"saving to the state"
)


# TODO: To let the user update the current infra with new local packages or
# mounts as they want it.
# def update():
typer.echo("Running update_deps")
imports.MlopsManager.run(
gaiaflow_path=gaiaflow_path,
user_project_path=user_project_path,
action=imports.ExtendedAction.UPDATE_DEPS,
)
1 change: 1 addition & 0 deletions src/gaiaflow/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ class ExtendedAction:
DOCKERIZE = Action("dockerize")
CREATE_CONFIG = Action("create_config")
CREATE_SECRET = Action("create_secret")
UPDATE_DEPS = Action("update_deps")


GAIAFLOW_CONFIG_DIR = Path.home() / ".gaiaflow"
Expand Down
4 changes: 3 additions & 1 deletion src/gaiaflow/core/operators.py
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ def create_task(self):
from gaiaflow.core.runner import run

args, kwargs = self.resolve_args_kwargs()
kwargs["params"] = self.params
kwargs["params"] = dict(self.params)
op_kwargs = {"func_path": self.func_path, "args": args, "kwargs": kwargs}

return ExternalPythonOperator(
Expand All @@ -135,6 +135,8 @@ def create_task(self):
op_kwargs=op_kwargs,
do_xcom_push=True,
retries=self.retries,
expect_airflow=False,
expect_pendulum=False,
)


Expand Down
4 changes: 0 additions & 4 deletions src/gaiaflow/core/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,17 +49,13 @@ def run(
print(f"Running {func_path} with args: {args} and kwargs :{kwargs}")
result = func(*args, **kwargs)
print("Function result:", result)
print("mode::::", mode, type(mode))
if mode == "prod" or mode == "prod_local":
# This is needed when we use KubernetesPodOperator and want to
# share information via XCOM.
_write_xcom_result(result)
if mode == "dev_docker":
print("inside dev_docker condition")
with open("/tmp/script.out", "wb+") as tmp:
pickle.dump(result, tmp)
# print("printing result now:::")
# print(json.dumps(result))

return result

Expand Down
8 changes: 3 additions & 5 deletions src/gaiaflow/managers/minikube_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,15 +59,13 @@ def __init__(
)

def _get_valid_actions(self) -> Set[Action]:
return {
BaseAction.START,
BaseAction.STOP,
BaseAction.RESTART,
BaseAction.CLEANUP,
base_actions = super()._get_valid_actions()
extra_actions = {
ExtendedAction.DOCKERIZE,
ExtendedAction.CREATE_CONFIG,
ExtendedAction.CREATE_SECRET,
}
return base_actions | extra_actions

@classmethod
def run(cls, **kwargs):
Expand Down
Loading