BigFlow package offers a command-line tool called bigflow.
It lets you run, build, access logs, and deploy your workflows from command-line on any machine with Python.
BigFlow CLI is the recommended way of working with BigFlow projects on a local machine as well as for build and deployment automation on CI/CD servers.
Install the BigFlow PIP package in a fresh virtual environment in your project directory.
Test BigFlow CLI:
bigflow -hYou should see the welcome message and the list of all bigflow commands:
...
Welcome to BigFlow CLI. Type: bigflow {command} -h to print detailed help for
a selected command.
positional arguments:
{run,deploy-dags,deploy-image,deploy,build-dags,build-image,build-package,build,project-version,pv,release,start-project,logs,build-requirements}
...
Each command has its own set of arguments. Check it with -h, for example:
bigflow run -hComplete sources for Cheat Sheet examples are available in this repository as a part of the Docs project.
git clone https://github.com/allegro/bigflow.git
cd bigflow/docs
If you have BigFlow installed — you can easily run them.
The bigflow run command lets you run a job or a workflow
for a given runtime.
Typically, bigflow run is used for local development because it's the simplest way to execute a workflow.
It's not recommended to be used on production, because:
- No deployment to Composer (Airflow) is done.
- The
bigflow runprocess is executed on a local machine. If you kill or suspend it, what happens on GCP is undefined. - It uses local authentication so it relies on permissions of your Google account.
- It executes a job or workflow only once (while on production environment you probably want your workflows to be run periodically by Composer).
Getting help for the run command
bigflow run -hRun the hello_world_workflow.py workflow:
bigflow run --workflow hello_world_workflowRun the single job:
bigflow run --job hello_world_workflow.say_goodbyeRun the workflow with concrete runtime
When running a workflow or a job with CLI, the runtime parameter
is defaulted to now. You can set it to concrete value using the --runtime argument.
bigflow run --workflow hello_world_workflow --runtime '2020-08-01 10:00:00'Run the workflow on selected environment
If you don't set the config parameter,
the workflow configuration (environment)
is taken from the default
config (dev in this case):
bigflow run --workflow hello_config_workflowSelect the concrete environment using the config parameter.
bigflow run --workflow hello_config_workflow --config dev
bigflow run --workflow hello_config_workflow --config prodThere are five commands to build your deployment artifacts:
build-dagsgenerates Airflow DAG files from your workflows. DAG files are saved to a local.dagsdir.build-packagegenerates a PIP package from your project based onsetup.py.build-imagegenerates a Docker image with this package and all requirements.build-requirementscompilesresources/requirements.inintoresources/requirements.txt(it's an optional step).buildsimply runsbuild-dags,build-package,build-imageandbuild-requirements.
Before using the build commands make sure that you have
a valid deployment_config.py file.
It should define the docker_repository parameter.
Getting help for build commands:
bigflow build-dags -h
bigflow build-package -h
bigflow build-image -h
bigflow build-requirements -h
bigflow build -hThe build-dags command takes two optional parameters:
--start-time— the first runtime of your workflows. If empty, a current hour (datetime.datetime.now().replace(minute=0, second=0, microsecond=0)) is used.--workflow— leave empty to build DAGs from all workflows. Set a workflow Id to build a selected workflow only.
Build DAG files for all workflows with default start-time:
bigflow build-dagsBuild the DAG file for the hello_config_workflow.py workflow
with given start-time:
bigflow build-dags --workflow hello_world_workflow --start-time '2020-08-01 10:00:00'Building a PIP package
Call the build-package command to build a PIP package from your project.
The command requires no parameters, all configuration is taken from setup.py and deployment_config.py
(see Project structure and build).
Your PIP package is saved to a .tar.gz file in the dist dir.
bigflow build-packageBuilding a Docker image
The build-image command builds
a Docker image with Python, your project's PIP package, and
all requirements. Next, the image is exported to a tar file in the ./.image dir.
bigflow build-imageBuild requirements.txt
The build-requirements command tries to resolve and freeze dependencies based on the resources/requirements.in file.
You can learn more about that concept in the Project structure and build chapter.
bigflow build-requirementsBuild a whole project with a single command
The build command builds both artifacts (DAG files and a Docker image).
Internally, it executes the build-dags, build-package, and build-image commands.
bigflow buildOn this stage, you should have two deployment artifacts
created by the bigflow build command.
There are three commands to deploy your workflows to Google Cloud Composer:
-
deploy-dagsuploads all DAG files from a.dagsfolder to a Google Cloud Storage Bucket which underlies your Composer's DAGs Folder. -
deploy-imagepushes a docker image to Docker Registry which should be readable from your Composer's Kubernetes cluster. -
deploysimply runs bothdeploy-dagsanddeploy-image.
Important. By default, BigFlow takes deployment configuration parameters
from a ./deployment_config.py file.
If you need more flexibility you can set these parameters explicitly via command line.
Getting help for deploy commands:
bigflow deploy-dags -h
bigflow deploy-image -h
bigflow deploy -hDeploy DAG files
Upload DAG files from a .dags dir to a dev Composer using
local account authentication.
Configuration is taken from deployment_config.py:
bigflow deploy-dags --config devUpload DAG files from a given dir using authentication with Vault. Configuration is passed via command line arguments:
bigflow deploy-dags \
--dags-dir '/tmp/my_dags' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault' \
--dags-bucket europe-west1-12323a-bucket \
--gcp-project-id my_gcp_dev_project \
--clear-dags-folderDeploy Docker image
Upload a Docker image imported from a .tar file with the default path
(default path is: the first file from the .image dir with a name with pattern .*-.*\.tar).
Configuration is taken from deployment_config.py.
Local account authentication is used:
bigflow deploy-image --config devUpload a Docker image imported from the .tar file with the given path.
Configuration is passed via command line arguments.
Authentication with Vault is used:
bigflow deploy-image \
--image-tar-path '/tmp/image-0.1.0-tar' \
--docker-repository 'eu.gcr.io/my_gcp_dev_project/my_project' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault'Complete deploy examples
Upload DAG files from the .dags dir and a Docker image from the default path.
Configuration is taken from deployment_config.py.
Local account authentication is used:
bigflow deploy --config devThe same, but the config (environment) name is defaulted to dev:
bigflow deployUpload DAG files from the specified dir and the Docker image from the specified path. Configuration is passed via command line arguments. Authentication with Vault is used:
bigflow deploy \
--image-tar-path '/tmp/image-0.1.0-tar' \
--dags-dir '/tmp/my_dags' \
--docker-repository 'eu.gcr.io/my_gcp_dev_project/my_project' \
--auth-method=vault \
--vault-secret ***** \
--vault-endpoint 'https://example.com/vault' \
--dags-bucket europe-west1-12323a-bucket \
--gcp-project-id my_gcp_dev_project \
--clear-dags-folderDeploy using deployment_config.py from non-default path
By default, a deployment_config.py file
is located in the main directory of your project, so bigflow expects it exists under this path:
./deployment_config.py.
You can change this location by setting the deployment-config-path parameter:
bigflow deploy --deployment-config-path '/tmp/my_deployment_config.py'The bigflow logs command lets you generate a link leading to your project/workflow logs in GCP Logging. It will generate
link for every workflow that has logging configuration.
The output of bigflow logs command consists of two parts, an infrastructure link, and a workflow link.
Workflow link contains logs from user code, Dataflow jobs, Dataproc jobs, and exceptions that may occur during executing the workflow.
The links will be created for every workflow found by Bigflow in the project directory.
The infrastructure link contains logs from Kubernetes pods/containers and Dataflow workers. The links will be created
for every unique project id found in workflows.
Use the bigflow start-project command to create a sample project and try all of the above commands yourself.