NetApp
diff --git a/‎Kubernetes/Examples/Airflow/README.md‎
Lines changed: 103 additions & 0 deletions b/‎Kubernetes/Examples/Airflow/README.md‎
Lines changed: 103 additions & 0 deletions
diff --git a/‎Kubernetes/Examples/Airflow/ai-training-run.py‎
Lines changed: 180 additions & 0 deletions b/‎Kubernetes/Examples/Airflow/ai-training-run.py‎
Lines changed: 180 additions & 0 deletions
diff --git a/‎Kubernetes/Examples/Airflow/clone-volume.py‎
Lines changed: 65 additions & 0 deletions b/‎Kubernetes/Examples/Airflow/clone-volume.py‎
Lines changed: 65 additions & 0 deletions
diff --git a/‎Kubernetes/Examples/Airflow/cluster-role-ntap-dsutil.yaml‎
Lines changed: 18 additions & 0 deletions b/‎Kubernetes/Examples/Airflow/cluster-role-ntap-dsutil.yaml‎
Lines changed: 18 additions & 0 deletions
@@ -0,0 +1,103 @@
+# Apache Airflow Examples
+This directory contains example [DAG](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#dags) definitions that show how NetApp data management functions can be incorporated into automated workflows that are orchestrated using the [Apache Airflow](https://airflow.apache.org) framework.
+
+## Getting Started
+
+### Instructions for Use
+The Python files referenced in the [DAG Definitions](#dag-definitions) section contain Airflow DAG definitions. To utilize one of these example DAGs, define your parameters within the Python code as indicated in the comments and then upload the Python file to Airflow. The method of uploading the file to Airflow will depend on your specific Airflow deployment. Typically, Airflow is configured to automatically pull DAG definitions from a specific Git repo or persistent volume.
+
+### Prerequisites
+
+These DAGs require the following prerequisites in order to function correctly.
+
+- Airflow must be deployed within a Kubernetes cluster. These example DAGs do not support Airflow deployments that are not Kubernetes-based.
+- Airflow must be configured to use the [Celery Executor](https://airflow.apache.org/docs/apache-airflow/stable/executor/celery.html). Although they may work with other executors, these DAGs have only been validated with the Celery Executor.
+- [Trident](https://netapp.io/persistent-storage-provisioner-for-kubernetes/), NetApp's dynamic storage orchestrator for Kubernetes, must be installed within the Kubernetes cluster.
+- A cluster role that has all of the required permissions for executing NetApp Data Science Toolkit for Kubernetes operations must be present in the Kubernetes cluster. For an example, see [cluster-role-ntap-dsutil.yaml](cluster-role-ntap-dsutil.yaml). This file contains the manifest for a Kubernetes ClusterRole named 'ntap-dsutil' that has all of the required permissions for executing toolkit operations within the cluster.
+- Your Airflow [Kubernetes Pod Operator](https://airflow.apache.org/docs/apache-airflow-providers-cncf-kubernetes/stable/operators.html#kubernetespodoperator) service account must be bound to the the previously mentioned cluster role within the namespace that you intend to execute the DAGs in. Note that the default Airflow Kubernetes Pod Operator service account is 'default'. For an example, see [role-binding-airflow-ntap-dsutil.yaml](role-binding-airflow-ntap-dsutil.yaml). This file contains the manifest for a Kubernetes RoleBinding named 'airflow-ntap-dsutil' that will bind the 'default' ServiceAccount to the 'ntap-dsutil' cluster role within the 'airflow' namespace.
+
+Some of the DAGs have additional prerequisites, which are noted under the specific DAG definitions below.
+
+<a name="dag-definitions"></a>
+
+## DAG Definitions
+
+### [ai-training-run.py](ai-training-run.py)
+
+#### Additional Prerequisites
+
+In addition to the standard prerequisites outlined above, this DAG requires the following additional prerequisites in order to function correctly.
+
+- Volume snapshots must be enabled within the Kubernetes cluster. Refer to the [Trident documentation](https://netapp-trident.readthedocs.io/en/latest/kubernetes/operations/tasks/volumes/snapshots.html) for more information on volume snapshots.
+
+#### Description
+DAG definition for an AI/ML training run with built-in, near-instantaneous, dataset and model versioning. This is intended to demonstrate how a data scientist could define an automated AI/ML workflow that incorporates automated dataset and model versioning, and dataset-to-model traceability.
+
+#### Workflow Steps
+1. Optional: Execute a data prep step.
+2. Create a Snapshot copy, using NetApp Snapshot technology, of the dataset volume. This Snapshot copy is created for traceability purposes. Each time that this pipeline workflow is executed, a Snapshot copy is created. Therefore, as long as the Snapshot copy is not deleted, it is always possible to trace a specific training run back to the exact training dataset that was used for that run.
+3. Execute a training step.
+4. Create a Snapshot copy, using NetApp Snapshot technology, of the trained model volume. This Snapshot copy is created for versioning purposes. Each time that this pipeline workflow is executed, a Snapshot copy is created. Therefore, for each individual training run, a read-only versioned copy of the resulting trained model is automatically saved.
+5. Execute a validation step.
+
+### [clone-volume.py](clone-volume.py)
+
+#### Description
+DAG definition for a workflow that can be used to near-instantaneously and efficiently clone any Trident-managed volume within the Kubernetes cluster, regardless of size. This is intended to demonstrate how a data scientist or data engineer could define an automated workflow that incorporates the rapid cloning of datasets and/or models for use in workspaces, etc.
+
+#### Workflow Steps
+1. Create a clone, using NetApp FlexClone technology, of the source volume.
+
+### [replicate-data-cloud-sync.py](replicate-data-cloud-sync.py)
+
+#### Additional Prerequisites
+
+In addition to the standard prerequisites outlined above, this DAG requires the following additional prerequisites in order to function correctly.
+
+- An Airflow connection of type "http" containing your Cloud Sync API refresh token must exist within the Airflow connections database. This connection can be created via the Airflow UI dashboard by navigating to 'Admin' -> 'Connections' using the main menu. When creating this connection, enter your Cloud Sync API refresh token into the 'Password' field.
+
+#### Description
+DAG definition for a workflow that can be used to perform a sync operation for an existing [Cloud Sync](https://cloudsync.netapp.com) relationship. This is intended to demonstrate how a data scientist or data engineer could define an automated AI/ML workflow that incorporates Cloud Sync for data movement between platforms (e.g. NFS, S3) and/or across environments (e.g. edge data center, core data center, private cloud, public cloud).
+
+#### Workflow Steps
+1. Perform a sync operation for the specified Cloud Sync relationship.
+
+> Tip: If you do not know the Cloud Sync relationship ID for a specific relationship, you can retrieve it by using NetApp Data Science Toolkit for Traditional Environments (refer to the 'list all Cloud Sync relationships' operation).
+
+### [replicate-data-snapmirror.py](replicate-data-snapmirror.py)
+
+#### Compatiibility
+
+This DAG is only compatible with ONTAP storage systems/instances runnning ONTAP 9.7 or above.
+
+#### Additional Prerequisites
+
+In addition to the standard prerequisites outlined above, this DAG requires the following additional prerequisites in order to function correctly.
+
+- An Airflow connection of type "http" containing your ONTAP cluster or SVM admin account details must exist within the Airflow connections database. This connection can be created via the Airflow UI dashboard by navigating to 'Admin' -> 'Connections' using the main menu. When creating this connection, enter your ONTAP cluster or SVM management LIF into the 'Host' field, your ONTAP cluster/SVM admin username into the 'Login' field, and your ONTAP cluster/SVM admin password into the 'Password' field.
+
+#### Description
+DAG definition for a workflow that can be used to perform a sync operation for an existing asynchronous SnapMirror relationship. This is intended to demonstrate how a data scientist or data engineer could define an automated AI/ML workflow that incorporates SnapMirror replication for data movement across environments (e.g. edge data center, core data center, private cloud, public cloud).
+
+#### Pipeline Steps
+1. Perform a sync operation for the specified asynchronous SnapMirror relationship.
+
+> Tip: If you do not know the SnapMirror relationship UUID for a specific relationship, you can retrieve it by using NetApp Data Science Toolkit for Traditional Environments (refer to the 'list all SnapMirror relationships' operation).
+
+### [replicate-data-xcp.py](replicate-data-xcp.py)
+
+#### Additional Prerequisites
+
+In addition to the standard prerequisites outlined above, this DAG requires the following additional prerequisites in order to function correctly.
+
+- An Airflow connection of type "SSH", containing SSH access details for a Linux host on which NetApp XCP is installed and configured, must exist within the Airflow connections database. This connection can be created via the Airflow UI dashboard by navigating to 'Admin' -> 'Connections' using the main menu.
+
+#### Description
+DAG definition for a workflow that that invokes NetApp XCP to quickly and reliably replicate data between NFS endpoints. Potential use cases include the following:
+- Replicating newly acquired sensor data gathered at the edge back to the core data center or to the cloud to be used for AI/ML model training or retraining.
+- Replicating a newly trained or newly updated model from the core data center to the edge or to the cloud to be deployed as part of an inferencing application.
+- Copying data from a Hadoop data lake (through Hadoop NFS Gateway) to a high-performance AI/ML training environment for use in the training of an AI/ML model.
+- Copying NFS-accessible data from a legacy or non-NetApp system of record to a high-performance AI/ML training environment for use in the training of an AI/ML model.
+
+#### Workflow Steps
+1. Invoke an XCP copy or sync operation.
@@ -0,0 +1,180 @@
+# Airflow DAG Definition: AI Training Run
+#
+# Steps:
+#   1. Data prep job
+#   2. Dataset snapshot (for traceability)
+#   3. Training job
+#   4. Model snapshot (for versioning/baselining)
+#   5. Inference validation job
+
+
+from airflow import DAG
+from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
+from airflow.operators.python_operator import PythonOperator
+from airflow.utils.dates import days_ago
+from kubernetes.client import models as k8s
+import uuid
+
+
+##### DEFINE PARAMETERS: Modify parameter values in this section to match your environment #####
+
+## Define default args for DAG
+ai_training_run_dag_default_args = {
+    'owner': 'NetApp'
+}
+
+## Define DAG details
+ai_training_run_dag = DAG(
+    dag_id='ai_training_run',
+    default_args=ai_training_run_dag_default_args,
+    schedule_interval=None,
+    start_date=days_ago(2),
+    tags=['training']
+)
+
+# Define Kubernetes namespace to execute DAG in
+namespace = 'airflow'
+
+## Define volume details (change values as necessary to match your environment)
+
+# Dataset volume
+dataset_volume_pvc_existing = 'dataset-vol'
+dataset_volume = k8s.V1Volume(
+    name=dataset_volume_pvc_existing,
+    persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name=dataset_volume_pvc_existing),
+)
+dataset_volume_mount = k8s.V1VolumeMount(
+    name=dataset_volume_pvc_existing, 
+    mount_path='/mnt/dataset', 
+    sub_path=None, 
+    read_only=False
+)
+
+# Model volume
+model_volume_pvc_existing = 'airflow-model-vol'
+model_volume = k8s.V1Volume(
+    name=model_volume_pvc_existing,
+    persistent_volume_claim=k8s.V1PersistentVolumeClaimVolumeSource(claim_name=model_volume_pvc_existing),
+)
+model_volume_mount = k8s.V1VolumeMount(
+    name=model_volume_pvc_existing, 
+    mount_path='/mnt/model', 
+    sub_path=None, 
+    read_only=False
+)
+
+## Define job details (change values as needed)
+
+# Data prep step
+data_prep_step_container_image = "nvcr.io/nvidia/tensorflow:21.03-tf1-py3"
+data_prep_step_command = ["echo", "'No data prep command entered'"] # Replace this echo command with the data prep command that you wish to execute
+data_prep_step_resources = {} # Hint: To request that 1 GPU be allocated to job pod, change to: {'limit_gpu': 1}
+
+# Training step
+train_step_container_image = "nvcr.io/nvidia/tensorflow:21.03-tf1-py3"
+train_step_command = ["echo", "'No training command entered'"] # Replace this echo command with the training command that you wish to execute
+train_step_resources = {} # Hint: To request that 1 GPU be allocated to job pod, change to: {'limit_gpu': 1}
+
+# Inference validation step
+validate_step_container_image = "nvcr.io/nvidia/tensorflow:21.03-tf1-py3"
+validate_step_command = ["echo", "'No inference validation command entered'"] # Replace this echo command with the inference validation command that you wish to execute
+validate_step_resources = {} # Hint: To request that 1 GPU be allocated to job pod, change to: {'limit_gpu': 1}
+
+################################################################################################
+
+
+# Define DAG steps/workflow
+with ai_training_run_dag as dag :
+
+    # Define step to generate uuid for run
+    generate_uuid = PythonOperator(
+        task_id='generate-uuid',
+        python_callable=lambda: str(uuid.uuid4())
+    )
+
+    # Define data prep step using Kubernetes Pod operator (https://airflow.apache.org/docs/stable/kubernetes.html#kubernetespodoperator)
+    data_prep = KubernetesPodOperator(
+        namespace=namespace,
+        image=data_prep_step_container_image,
+        cmds=data_prep_step_command,
+        resources = data_prep_step_resources,
+        volumes=[dataset_volume, model_volume],
+        volume_mounts=[dataset_volume_mount, model_volume_mount],
+        name="ai-training-run-data-prep",
+        task_id="data-prep",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
+
+    # Define step to take a snapshot of the dataset volume for traceability
+    dataset_snapshot = KubernetesPodOperator(
+        namespace=namespace,
+        image="python:3",
+        cmds=["/bin/bash", "-c"],
+        arguments=["\
+            python3 -m pip install ipython kubernetes pandas tabulate && \
+            git clone https://github.com/NetApp/netapp-data-science-toolkit && \
+            mv /netapp-data-science-toolkit/Kubernetes/ntap_dsutil_k8s.py / && \
+            /ntap_dsutil_k8s.py create volume-snapshot --pvc-name=" + str(dataset_volume_pvc_existing) + " --snapshot-name=dataset-{{ task_instance.xcom_pull(task_ids='generate-uuid', dag_id='ai_training_run', key='return_value') }} --namespace=" + namespace],
+        name="ai-training-run-dataset-snapshot",
+        task_id="dataset-snapshot",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
+
+    # State that the dataset snapshot should be created after the data prep job completes and the uuid job completes
+    data_prep >> dataset_snapshot
+    generate_uuid >> dataset_snapshot
+
+    # Define training step using Kubernetes Pod operator (https://airflow.apache.org/docs/stable/kubernetes.html#kubernetespodoperator)
+    train = KubernetesPodOperator(
+        namespace=namespace,
+        image=train_step_container_image,
+        cmds=train_step_command,
+        resources = train_step_resources,
+        volumes=[dataset_volume, model_volume],
+        volume_mounts=[dataset_volume_mount, model_volume_mount],
+        name="ai-training-run-train",
+        task_id="train",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
+
+    # State that training job should be executed after dataset volume snapshot is taken
+    dataset_snapshot >> train
+
+    # Define step to take a snapshot of the model volume for versioning/baselining
+    model_snapshot = KubernetesPodOperator(
+        namespace=namespace,
+        image="python:3",
+        cmds=["/bin/bash", "-c"],
+        arguments=["\
+            python3 -m pip install ipython kubernetes pandas tabulate && \
+            git clone https://github.com/NetApp/netapp-data-science-toolkit && \
+            mv /netapp-data-science-toolkit/Kubernetes/ntap_dsutil_k8s.py / && \
+            /ntap_dsutil_k8s.py create volume-snapshot --pvc-name=" + str(model_volume_pvc_existing) + " --snapshot-name=model-{{ task_instance.xcom_pull(task_ids='generate-uuid', dag_id='ai_training_run', key='return_value') }} --namespace=" + namespace],
+        name="ai-training-run-model-snapshot",
+        task_id="model-snapshot",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
+
+    # State that the model snapshot should be created after the training job completes
+    train >> model_snapshot
+
+    # Define inference validation step using Kubernetes Pod operator (https://airflow.apache.org/docs/stable/kubernetes.html#kubernetespodoperator)
+    validate = KubernetesPodOperator(
+        namespace=namespace,
+        image=validate_step_container_image,
+        cmds=validate_step_command,
+        resources = validate_step_resources,
+        volumes=[dataset_volume, model_volume],
+        volume_mounts=[dataset_volume_mount, model_volume_mount],
+        name="ai-training-run-validate",
+        task_id="validate",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
+
+    # State that inference validation job should be executed after model volume snapshot is taken
+    model_snapshot >> validate
@@ -0,0 +1,65 @@
+# Airflow DAG Definition: Clone Volume
+#
+# Steps:
+#   1. Clone source volume
+
+
+from airflow import DAG
+from airflow.providers.cncf.kubernetes.operators.kubernetes_pod import KubernetesPodOperator
+from airflow.utils.dates import days_ago
+
+
+##### DEFINE PARAMETERS: Modify parameter values in this section to match your environment #####
+
+## Define default args for DAG
+clone_volume_dag_default_args = {
+    'owner': 'NetApp'
+}
+
+## Define DAG details
+clone_volume_dag = DAG(
+    dag_id='clone_volume',
+    default_args=clone_volume_dag_default_args,
+    schedule_interval=None,
+    start_date=days_ago(2),
+    tags=['vol-clone']
+)
+
+# Define Kubernetes namespace to execute DAG in (volume must be located in same namespace)
+namespace = 'airflow'
+
+## Define volume details (change values as necessary to match your environment)
+source_volume_pvc_name = "gold-datavol"
+new_volume_pvc_name = "datavol-clone-2"
+clone_from_snapshot = True
+source_volume_snapshot_name = "snap1"
+
+################################################################################################
+
+
+# Construct command args
+arg = "\
+    python3 -m pip install ipython kubernetes pandas tabulate && \
+    git clone https://github.com/NetApp/netapp-data-science-toolkit && \
+    mv /netapp-data-science-toolkit/Kubernetes/ntap_dsutil_k8s.py / && \
+    /ntap_dsutil_k8s.py clone volume --namespace=" + str(namespace) + " --new-pvc-name=" + str(new_volume_pvc_name)
+if clone_from_snapshot :
+    arg += " --source-snapshot-name=" + str(source_volume_snapshot_name)
+else :
+    arg += " --source-pvc-name=" + str(source_volume_pvc_name)
+
+
+# Define DAG steps/workflow
+with clone_volume_dag as dag :
+
+    # Define step to clone source volume
+    clone_volume = KubernetesPodOperator(
+        namespace=namespace,
+        image="python:3",
+        cmds=["/bin/bash", "-c"],
+        arguments=[arg],
+        name="clone-volume-clone-volume",
+        task_id="clone-volume",
+        is_delete_operator_pod=True,
+        hostnetwork=False
+    )
@@ -0,0 +1,18 @@
+---
+kind: ClusterRole
+apiVersion: rbac.authorization.k8s.io/v1
+metadata:
+  name: ntap-dsutil
+rules:
+- apiGroups: [""]
+  resources: ["persistentvolumeclaims", "persistentvolumeclaims/status", "services"]
+  verbs: ["get", "list", "create", "delete"]
+- apiGroups: ["snapshot.storage.k8s.io"]
+  resources: ["volumesnapshots", "volumesnapshots/status", "volumesnapshotcontents", "volumesnapshotcontents/status"]
+  verbs: ["get", "list", "create", "delete"]
+- apiGroups: ["apps", "extensions"]
+  resources: ["deployments", "deployments/scale", "deployments/status"]
+  verbs: ["get", "list", "create", "delete", "patch", "update"]
+- apiGroups: [""]
+  resources: ["nodes"]
+  verbs: ["get", "list"]