Skip to content

leonkosak/kestra-automation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 

Repository files navigation

The Purpose

This is sample project to domonstrate how to implement automation in Kestra with emphasis on fully containarized runtime execution and GitOps-first approcah:

  • Executes commands in container images (OCI image) - e.g. execute Python scripts inside container image (with additional parameters)

  • Building reusable subflows (modular approach) (using io.kestra.plugin.core.flow.Subflow plugin)

  • Building various types of workflows - parallel execution of tasks (using io.kestra.plugin.core.flow.Parallel plugin), parallel with sequential parts (combination of io.kestra.plugin.core.flow.Parallel and io.kestra.plugin.core.flow.Sequential plugins)

  • Triggering another workflow when the first workflow is finished (successfully).

  • Building container image (OCI image) with application (designed to build applications with scripting languages like Python, PowerShell, Bash,...) - using Kestra as CI/CD tool

  • ...

About OCI image: https://opencontainers.org/

Sample Python Automation Boilerplate project that is used in workflows: https://github.com/leonkosak/python-automation-boilerplate
It's important to understand that this project is fully standalone and there "unaware" of potential orchestrator which makes an actual automation (executing shell commands inside io.kestra.plugin.docker.Run in (sub)workflows).
Described approach takes modular approach even further - easier replacement of orchestrator in the future if needed.

Remarks

  • Even this demo project for Kestra is based on sample Python project, it can be be used with minimal amount of trivial changes (especially for building and executing other projects written in scripting languages). This is possible due to "fully-containarized approach" - from development to deployment.

  • For building container (OCI) images in languages that requires compilation process (e.g. Java, C#, C++,...)there are more changes in workflow which makes CI/CD functionality. Other workflows which execute automation require minimal amount of changes (mostly just changed shell commandes) inside io.kestra.plugin.docker.Run part (in workflows and subflows).

Development Environment

Because of GitOps-first approach of development, it's recommended that development of workflows is done outside Kestra.
(It's possible to set workflow in Kestra that periodically commits workflows (.yml files) to Git repository, but here the opposite approach is implemented - specific workflow in Kestra which periodically from a specific Git repositry workflow definitions (.yml files).
Development of Kestra workflows is therefore done in Visual Studio Code with Kestra extension installed and produced workflow definitions pushed to Git repo.)

Requirements

Important
When Kestra extension is installed, command prompt should be opened in visual Studio Code (press F1 function key) and start typing "Kestra". Select option Kestra: Download Kestra schema when appears.
Press "Enter" to download schema from official URL and wait notification message about successfull downloading in the right-bottom corner of Visual Studio Code.
It's recommended to periodically refresh this schema (downloading) to be aligned with newer versions of Kestra.

When YAML file (.yml) is created and opened, there should be selected kestra:/flow-schema.json shown.

When .yml file is opened (active window) in Visual Studio Code, there is purple (Kestra color) icon on te right side inside the row where tabs are present. Clicking on this icon opens dual-side panel where documentation appears.
This documentation part window actively changes based cursor position inside .yml file. It shows documentation based on defined Kestra plugin in this row.
This documentation window can also be opened by pressing F1 function key and selecting option Kestra: Open Kestra documentation (start typing "Kestra").

Development

Because of the decision, that a specific Kestra workflow periodically fetches Git repository for changes, there should be workflow inside Kestra established before this periodic operation can be executed.

Recommended first actions when initial setup of Kestra is established

  • It's highly-recommended that for each environment (e.g. test, production) dedicated instance is established in order to have full predictibility when developing workflows and also when upgrading Kestra itself.

  • It's also highly-recommended that each environment (and consequently Kestra instance) is associated with defined Git branch name (where workflow definitions are taken/synchronised).

  • In order to make Git operations easier (e.g. merging between branches,...), it's important to recognize specific values inside workflow definitions (.yml files) and "extract" them to KV store (Key-Value Store) and Secrets (when specific values are sensitive to operate/handle).

  • It's highly-recommended that all workflow definitions are stored inside one folder. The default folder name for workflows in Kestra is _flows and then organised with subfolder structure.

  • Development team should decide where "technically-related" workflows (such as syncing workflows (.yml files), building OCI images,...) be stored.

    • If the development team decide to leverage predefined system namespace for such workflows, then it's recommended to make folder named _flows_system alonside _flows folder and place technically-related workflows inside it(and reference "system" for namespace in .yml files).
      It's important not to forget write synchronization workflow for this system workflows (or write additional task inside existing workflow which synchronizes content-related workflows inside _flows folder.
      If Kestra is used by more development teams, it's important to know that maybe other teams also create workflows in system namespace and therefore when writing workflow which makes synchronization with system workflows SHOULD NOT delete other workflow definition inside this namespace!
      In what ways is system namespace different from other namespaces in Kestra: https://kestra.io/docs/concepts/system-flows

    • If the development team decide to treat it's own system workflows as regular content-related workflows, then it's reccommended to create folder named system somewhere inside _flows folder structure and place system-related workflows (and set namespace in such workflows correctly).

      Important

      • When a new Kestra instance is established, manually copy workflow which executes Git synchronizations and execute it to bootstrap initial state of workflows from Git.
        (If this workflow has defined periodic trigger, then automation for workflow synchronization is established.)

      • It's also recommended using entries from KV store and Secrets in synchronization-based system workflows due to easier GitOps operations.

  1. When Kestra instance is established it's highly recommended to define and set some entries in KV Store (Key-Value Store).
    In case of GitOps approach, this properties would be (examples):

    • ENVIRONMENT_GIT_BRANCH_NAME: Git branch name which holds the name for the current environment (e.g. "main", "master", "test", "prod", "stagging",...)
      This KV entry is useful when there is more phases in software (workflow) development and development team wants to have this info stored in one place and referenced in multiple workflows.
      It's recommended that this Key-Value item is stored inside system namespace for the sake of convention ("technically-related" KV item info).

    • ENVIRONMENT_NAME: "Friendly" name for environment (in case thar Git branch name is not good enough in some cases (e.g. some reports,...))

    • ENVIRONMENT_WORKFLOW_SYNC_USER_GIT_USERNAME: Username for user on Git that have access to repository where workflow definitions are stored.
      Needed only if this Git repo is protected.
      If development team prefer, username can also be stored inside Secrets.

    Those are just examples. The real KV Store items should be defined based on specific project.
    Other example entries suggestions: URLs to Git repos, URLs to container images to container registries,...
    It's recommended to use highly-descriptive names for Keys in KV Store in order to prevent confusions when number of items grows (consider also DDD naming conventions: https://medium.com/unil-ci-software-engineering/clean-ddd-lessons-project-structure-and-naming-conventions-00d0b9c57610).

  2. When Kestra instance is established it's highly recommended to define and set some entries in Secrets.
    All recommendations written for KV Store are applicable also for Secrets.
    Development team should identify which items should got to KV Store and which to Secrets based on data value sensitivity.

    Important
    Secrets functionality in Kestra behaves differently compared to KV Store.
    In Kestra, KV Store items can be referenced also by defining a namespace in which are stored (e.g. {{ kv('MY_KV_STORE_KEY', 'namespace_where_item_is_stored') }}) which makes specific KV Store item globally available for referencing.
    This is not the case for Secrets. Inside a specific workflow only secrets inside the same namespace or parent namespaces can be accessed.

General Development Recommendations and Good Practices

  • Start planning workflow namespace (folder) structure early, because later changes may break workflows (namespace naming changes in subflows,...).
    For larger projects it's highly recommended to strictly follow (defining) DDD naming conventions and namespace nesting.
    Based on how namespaces should be named based on DDD conventions, the names should start from "general to more specific.
    Example: com.mycompany.department.team. ...
    Defining namespace structure this way, it also helps filtering workflows and other related operations.

  • Use subflows (io.kestra.plugin.core.flow.Subflow plugin) and create minimal fully-rounded (content-wise) execution units that can be executed as standalone unit if needed.
    Use these subflows inside taksk of bigger workflows as modular units.

  • If possible - pack all automation application logic as OCI (docker) image and then use io.kestra.plugin.docker.Run plugin to execute commands inside container.
    (Build and save OCI images locally - there is no need to store those images to container registry if there is source code and Dockerfile definition to build quickly from scratch.)
    Executing logic inside container also makes runtime more predictable and portable (replacing orchestrator,...).

  • For "serious" workflow and automation development, it's recommendded that Kestra is used just as orchestrator and monitor platform. Use GitOps-first approach and develop workflows and application logic with other developer tools (e.g. Visual Studio Code) and commit to Git from them.

Example YAML workflow definitions

  • Workflows Synchronization from Git
id: sync_flows_from_git
namespace: company.team

tasks:
  - id: sync_flows
    type: io.kestra.plugin.git.SyncFlows
    gitDirectory: _flows # optional; set to _flows by default
    targetNamespace: "system" # required
    includeChildNamespaces: true # optional; by default, it's set to false to allow explicit definition
    delete: true # optional; by default, it's set to false to avoid destructive behavior
    url: "https://<url_to_git_repository>" # required
    branch: "{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}"
    username: "{{ kv('ENVIRONMENT_WORKFLOW_SYNC_USER_GIT_USERNAME', 'system') }}" # if Git repo is protected
    password: "{{ secret('ENVIRONMENT_WORKFLOW_SYNC_USER_GIT_TOKEN') }}" # if Git repo is protected
    dryRun: false  # if true, the task will only log which flows from Git will be added/modified or deleted in kestra without making any changes in kestra backend yet

triggers:
  - id: every_minute
    type: io.kestra.plugin.core.trigger.Schedule
    cron: "*/2 * * * *" # every 2 minutes
  • Build (Create) OCI (container) image based on sample Python project
id: build_python_oci_image_deploy_multistage
namespace: company.team.system

tasks:
  - id: workspace
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: generate_build_version
        type: io.kestra.plugin.core.debug.Return
        format: "{{ now() | date('yyyyMMdd-HHmmss') }}"

      - id: clone_repo
        type: io.kestra.plugin.git.Clone
        url: "https://<url_to_git_repository>"
        branch: "{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}"
        username: "{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}" # if Git repo is protected
        password: "{{ secret('ENVIRONMENT_WORKFLOW_SYNC_USER_GIT_USERNAME') }}" # if Git repo is protected
        directory: repo
    
      - id: get_commit_hash
        type: io.kestra.plugin.scripts.shell.Commands
        containerImage: alpine/git:latest
        commands:
          - |
            cd repo
            HASH=$(git rev-parse HEAD)
            echo $HASH > $HASH
            mv $HASH ..
        outputFiles:
          - "*"

      - id: read_commit_hash
        type: io.kestra.plugin.core.debug.Return
        format: "{{ outputs.get_commit_hash.outputFiles | keys | first }}"

      - id: copy_requirements
        type: io.kestra.plugin.scripts.shell.Commands
        commands:
          - cp repo/requirements.txt repo/docker/deploy
        
      - id: copy_src
        type: io.kestra.plugin.scripts.shell.Commands
        commands:
          - cp -r repo/src repo/docker/deploy

      - id: build_image
        type: io.kestra.plugin.docker.Build
        dockerfile: repo/docker/deploy/Dockerfile.multistage
        push: false
        tags:
          - "local/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:latest"
          - "local/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:{{ outputs.generate_build_version.value }}"
        buildArgs:
          BUILD_VERSION: "{{ outputs.generate_build_version.value }}"
          BASE_BUILD_VERSION: "{{ outputs.generate_build_version.value }}"
          SVC_COMMIT_INFO: "{{ outputs.read_commit_hash.value }}"

      - id: inspect_image_final
        type: io.kestra.plugin.docker.Run
        containerImage: "local/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:latest"
        commands:
          - sh
          - -c
          - "echo BUILD_VERSION=$BUILD_VERSION && echo BASE_BUILD_VERSION=$BASE_BUILD_VERSION && echo SVC_COMMIT_INFO=$SVC_COMMIT_INFO && ls -R /app"
  • Example Subflow YAML definition (based on sample Python project)
    Use additional parameter named credentials inside io.kestra.plugin.docker.Run plugin if container image referenced by containerImage attribute requires authentication to container registry where this image is stored.
    Usually, authentication to container registry is done via username and password/token. Example of credentials object definition:
credentials:
      username: "{{ kv('ENVIRONMENT_CONTAINER_REGISTRY_USER_USERNAME', 'system') }}"
      password: "{{ secret('ENVIRONMENT_CONTAINER_REGISTRY_USER_PWD') }}"

Consider to "compose" containerImage value using KV store items to make it configure globally if for instance image is transferred to different location (container registry)
Example:

containerImage: "{{ kv('ENVIRONMENT_CONTAINER_REGISTRY_BASE_URL', 'system') }}/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:latest"
id: script_demo_subflow_f1
namespace: company.team.demo

tasks:
  - id: wd_01
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: run_01
        type: io.kestra.plugin.docker.Run
        containerImage: "local/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:latest"
        commands:
          - python3
          - /app/src/<path_to_file>/<filename>.py
          - --orchestrator
          - KESTRA
  • Example workflow that demonstrates parallel execution of four subflows
id: parallel_four_features
namespace: company.team.demo

tasks:
  - id: all_parallel
    type: io.kestra.plugin.core.flow.Parallel
    tasks:
      - id: run_01
        type: io.kestra.plugin.core.flow.Subflow
        flowId: script_demo_subflow_f1
        namespace: company.team.demo

      - id: run_02
        type: io.kestra.plugin.core.flow.Subflow
        flowId: script_demo_subflow_f2
        namespace: company.team.demo

      - id: run_03
        type: io.kestra.plugin.core.flow.Subflow
        flowId: script_demo_subflow_f3
        namespace: company.team.demo

      - id: run_04
        type: io.kestra.plugin.core.flow.Subflow
        flowId: script_demo_subflow_f4
        namespace: company.team.demo
  • Example workflow where features F1 and F3 executes in parallel in the first batch and if both are successfull, then next batch with features F2 and F4 are executed also in parallel
id: parallel_scripts_in_batches_subtasks
namespace: company.team.demo

tasks:
  - id: sequential_batches
    type: io.kestra.plugin.core.flow.Sequential
    tasks:
      - id: batch_01_03
        type: io.kestra.plugin.core.flow.Parallel
        tasks:
          - id: run_01
            type: io.kestra.plugin.core.flow.Subflow
            flowId: script_demo_subflow_f1
            namespace: company.team.demo

          - id: run_03
            type: io.kestra.plugin.core.flow.Subflow
            flowId: script_demo_subflow_f3
            namespace: company.team.demo

      - id: batch_02_04
        type: io.kestra.plugin.core.flow.Parallel
        tasks:
          - id: run_02
            type: io.kestra.plugin.core.flow.Subflow
            flowId: script_demo_subflow_f2
            namespace: company.team.demo

          - id: run_04
            type: io.kestra.plugin.core.flow.Subflow
            flowId: script_demo_subflow_f4
            namespace: company.team.demo
  • Example how to trigger another workflow when the first finishes (with conditions)
    (The definition below should be placed inside the second workflow (the workflow which is called) as one of triggers.)
    The second workflow is triggered even the first workflow fails.
triggers:
  - id: trigger_second_flow
    type: io.kestra.plugin.core.trigger.Flow
    conditions:
      - type: io.kestra.plugin.core.condition.ExecutionFlow
        namespace: company.team.demo
        flowId: first_workflow_id
    states:
      - SUCCESS
      - FAILED
      - WARNING
  • Example how to set and pass environment variables inside io.kestra.plugin.docker.Run plugin for (sub)workflows
    (This is a good solution to pass secrets as runtime defined variables from Kestra secrets store (e.g. not "baked-in" variables in OCI (docker) image).)
id: script_demo_passing_environment_variables
namespace: company.team.demo

tasks:
  - id: wd_01
    type: io.kestra.plugin.core.flow.WorkingDirectory
    tasks:
      - id: run_01
        type: io.kestra.plugin.docker.Run
        containerImage: "local/project-python-deploy-{{ kv('ENVIRONMENT_GIT_BRANCH_NAME', 'system') }}:latest"
        env:
          MY_ENV_VAL1: "{{ kv('MY_DESIRED_KEY', 'system') }}"
          MY_ENV_VAL2: "{{ secret('MY_DESIRED_SECRET_KEY') }}"
        commands:
          - python3
          - /app/src/<path_to_file>/<filename>.py
          - --orchestrator
          - KESTRA

(In programming languages, those environment vales can be read based on key (for this example MY_ENV_VAL1 and MY_ENV_VAL2)).

Python

PowerShell

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors