- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 3
Description
Current Situation
If you want to use non-standard python libraries in an Airflow job, you'd need to build a custom image, pip install those and then use your custom image in your cluster.
Preferred Situation
You can configure a requirements.txt, which then will be installed in the Airflow deployment.
Example
E.g. you want to use pandas==2.2.2 in a DAG, currently you would need to setup a CI/CD way of building and deploying a custom Airflow image. The Dockerfile would look like:
FROM oci.stackable.tech/sdp/airflow:${AIRFLOW_VERSION}-stackable${STACKABLE_VERSION}
ARG PYTHON_VERSION=3.9
# Install custom  python libraries
RUN pip install \
    --no-cache-dir \
    --upgrade \
    pandas==2.2.2 
Although this is fairly easy doable it implies maintenance and resources. I consider this being a fairly common use case and thus we should think about if we could cover it with e.g. ( no strong opinion neither on naming nor where it should be in the crd and how )
---
apiVersion: airflow.stackable.tech/v1alpha1
kind: AirflowCluster
metadata:
  name: airflow
spec:
  image:
    productVersion: 2.9.3
  clusterConfig:
    loadExamples: false
    exposeConfig: false
    credentialsSecret: simple-airflow-credentials
    requirements:
      configMap:
          name: custom_requirementsand a configMap
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: custom_requirements
data:
  requirements.txt: |
    pandas==2.2.2 I think a solution on operator level would remove the pain to construct and maintain a build pipeline to the cluster. It moves the maintenance effort into the Airflow Operator, but this already needs attention ( stackable versions, product versions ).
However, I can't evaluate how much effort we need to put in to archive this and what kind of risks this would imply.