Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.
https://docs.kubox.ai/introduction
Tip
Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.
- Introduction
- Installation and Setup
- Finding Urban Hotspots for 21 Major Urban Areas of Australia
- Local Development
- License
In this example, we use Kubox to provision a cluster in AWS cloud and leverage open source tools like Dask and Dagster to process satellite images. You can read more information about at Urban growth hotspots across Australia for $15
To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation
Run the following command to verify your AWS CLI credentials:
aws sts get-caller-identity# Example output
{
"UserId": "AIDAIEXAMPLEID",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/example-user"
}Download and install Kubox CLI
curl https://kubox.sh | shVerify Installation
kubox versionThe Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.
git clone git@github.com:kubox-ai/urban-extent.git
cd urban-extentThis example requires access to ap-southeast-2 to avoid egress costs.
Run the following command to check if ap-southeast-2 is available regions:
aws ec2 describe-availability-zones --region ap-southeast-2Create an AWS IAM role to dynamically create EBS volume to run PostgreSQL. See Kubox Advanced Configuration Documentation, Role for Kubox EC2 Instances for setup command in details.
AWS Quotas for one of m5.4xlarge and c5.2xlarge. and three of m5.12xlarge.
Create an AWS S3 Bucket to store the output files:
aws s3api create-bucket --bucket my-unique-bucket-name --region ap-southeast-2 --create-bucket-configuration LocationConstraint=ap-southeast-2Use your 12-digit account ID to create a policy for the bucket:
aws sts get-caller-identityAWS S3 Bucket Policy
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DelegateS3Access",
"Effect": "Allow",
"Principal": {
"AWS": ["arn:aws:iam::AWS_ACCOUNT_ID:role/KuboxEC2InstanceRole"]
},
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::my-unique-bucket-name/*",
"arn:aws:s3:::my-unique-bucket-name"
]
}
]
}Apply the policy to the bucket
aws s3api put-bucket-policy --bucket my-unique-bucket-name --policy file://s3-policy.jsonModify the following files in the repository with the AWS S3 bucket name:
cluster/infrastructure/apps/urban-extent/dagster-configmap.yamlpipeline/.env.example
kubox create -f cluster.yamlkubectl get pods -n kuboxConnect to the Dask Scheduler
kubectl port-forward service/dask-scheduler 8786:80 -n kuboxConnect to Dagster Web UI
kubectl port-forward service/nginx 8080:80 -n kuboxOpen the browser and navigate to http://dagster.localhost:8080
Get Jupyter Notebook token
make get-notebook-tokenConnect to juputer notebook
kubectl port-forward service/dask-jupyter 8888:80 -n kuboxConnect to Dagster
kubox delete -f cluster.yaml- UV - An extremely fast Python package and project manage
- GDAL - Open source library for reading and writing raster data
- npm - Node Package Manager
- yarn - Fast, reliable, and secure dependency management
Create a python virtual environment:
uv syncThis will create .venv and install all the required packages.
To activate the virtual environment, run:
source .venv/bin/activateTo run the pipeline locally, run:
dagster devTo add a new dependency, run:
uv add <package-name>To update the dependencies, run:
uv sync -UTo remove a dependency, run:
uv remove <package-name>To export to requirements.txt, run:
make exportInstall Yarn
npm install --global yarnInstall Project
yarn install
yarn buildFirst, run the development server:
yarn devOpen http://localhost:3000 with your browser to see the result.
This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.

