Build and deploy a fast, scalable ML pipeline for audio transcription, entity extraction, and summarisation using Ray, AWS, and Kubox.
Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.
https://docs.kubox.ai/introduction
Tip
Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.
- Introduction
- ML Pipeline
- Quickstart
- Installation and Setup
- Running The Application
- Local Development
- Contributing
- License
This example demonstrates an end-to-end ML pipeline that transcribes audio, extracts entities, and summarises content using self-hosted models. It uses Open AI Whisper for transcription and Llama-3.2 for summarisation on Kubox, optimised for NVIDIA L4 GPUs on AWS. The system processes hour-long audio in just 20 seconds at a cost of around 10 cents.
For efficient distributed processing, we deploy our ML pipeline using Ray on Kubox, enabling self-hosted distributed transcription and summarisation with GPU acceleration. The pipeline consists of three key stages:
For transcription, we use Open AI Whisper a state-of-the-art speech recognition model, to convert audio to text. The large (1.5GB) model runs on Ray Workers within Kubox, leveraging NVIDIA L4 GPUs for fast, high-accuracy transcription of audio.
Once transcribed, spaCy’s en_core_web_sm extracts named entities such as people, organisations, locations, and legal references. This categorises unstructured text, improving searchability and filtering.
Finally for summarisation, Llama-3.2-3B-Instruct-FP8 runs on Ray-powered GPU nodes, delivering concise, high-quality summaries while using 50% less memory. This self-hosted model ensures rapid processing, distilling hour-long audio into key insights in just 20 seconds.
# 1. Install Kubox CLI
curl https://kubox.sh | sh
# 2. Clone this repository
git clone git@github.com:kubox-ai/audio-ml.git && cd audio-ml
# 3. Create your cluster
kubox create -f cluster.yamlNote
See full instructions below for AWS setup, local development, and advanced options.
To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation
Run the following command to verify your AWS CLI credentials:
aws sts get-caller-identity# Example output
{
"UserId": "AIDAIEXAMPLEID",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/example-user"
}Download and install Kubox CLI
curl https://kubox.sh | shVerify Installation
kubox versionThe Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.
Kustomize introduces a template-free way to customize application configuration that simplifies the use of off-the-shelf applications. You can download it from kustomize.io
git clone git@github.com:kubox-ai/audio-ml.git
cd audio-mlAWS Quotas for two of r6i.2xlarge and one g6.4xlarge.
Using a GPU requires creating an AMI in your desired AWS region. If you are using the ap-southeast-2 region, a public AMI is ami-0db4a0fc42c49f8c6 and by default configured in the cluster.yaml file. To use GPU instances in other AWS regions, you need to create a custom AMI in that region and update cluster.yaml file accordingly. For more information see Creating GPU Amazon Machine Image (AMI)
kubox create -f cluster.yamlexport KUBECONFIG=./cluster/config/kubeconfigkubectl get pods -n kuboxNote
Currently it take about 16 minutes to start the GPU containers, download the ML models and serve the REST endpoints.
NAME READY STATUS RESTARTS AGE
audio-gui-6b6c7799bb-dch2l 1/1 Running 0 16m
audio-service-raycluster-lkhxz-head-p6z47 1/1 Running 0 16m
audio-service-raycluster-lkhxz-worker-small-group-4mtqm 1/1 Running 0 16m
kuberay-operator-5f45c7fd48-m5qdw 1/1 Running 0 20mkubectl port-forward -n kubox svc/audio-gui 8080:3000Now you can access GUI at http://localhost:8080
kubox delete -f cluster.yaml- UV - An extremely fast Python package and project manager
- npm - Node Package Manager
- yarn - Fast, reliable, and secure dependency management
Checking the ray head service
kubectl port-forward -n kubox svc/audio-service-head-svc 8265:8265Forwarding the remote ray service to local and using it as a backing service for ui development.
kubectl port-forward -n kubox svc/audio-service-serve-svc 8081:8000Copy the example environment file
cp .env.example .envStart the UI
npm run dev
# or
yarn devNow you can access GUI at http://localhost:3000
Create virtual environment
uv venv --python 3.11.9Activate the virtual environment
source .venv/bin/activateInstall dependencies
uv syncserve run deployment.yamlNow you can access GUI at http://localhost:8265
Tip
If you have issues creating see the troubleshooting guide here
We 💜 contributions from the community!
Whether it's a bug report, feature suggestion, documentation improvement, or code contribution — you are welcome.
- Open an Issue to report bugs or request features
- Submit a Pull Request for improvements
This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.
Thank you for using Kubox. Let's build something awesome together! 🚀



