diff --git a/cypress/e2e/broken-links.cy.ts b/cypress/e2e/broken-links.cy.ts index 977aa9dc8..648f755a3 100644 --- a/cypress/e2e/broken-links.cy.ts +++ b/cypress/e2e/broken-links.cy.ts @@ -12,6 +12,7 @@ const IGNORED_URLS = [ 'youtube.com/api', 'localhost:49152', 'app.supabase.io', + 'podbay.fm', 'docs.planetscale.com', 'turborepo.org', 'prisma.io', diff --git a/dictionary.txt b/dictionary.txt index 8af2f3b8e..a190d521c 100644 --- a/dictionary.txt +++ b/dictionary.txt @@ -12,6 +12,7 @@ CDKs CORS ECR GCP +VRAM GCR HPC IAM diff --git a/docs/guides/python/podcast-transcription.mdx b/docs/guides/python/podcast-transcription.mdx new file mode 100644 index 000000000..b6f90a774 --- /dev/null +++ b/docs/guides/python/podcast-transcription.mdx @@ -0,0 +1,435 @@ +--- +description: Use the Nitric framework to build a service for creating podcast transcripts. +tags: + - API + - AI & Machine Learning +languages: + - python +featured: + image: /docs/images/guides/podcast-transcription/featured.png + image_alt: 'Podcast Transcription featured image' +published_at: 2024-11-15 +updated_at: 2024-11-15 +--- + +# Transcribing Podcasts using OpenAI Whisper + +## Prerequisites + +- [uv](https://docs.astral.sh/uv/#getting-started) - for Python dependency management +- The [Nitric CLI](/get-started/installation) +- _(optional)_ An [AWS](https://aws.amazon.com) account + +## Getting started + +We'll start by creating a new project using Nitric's python starter template. + + + If you want to take a look at the finished code, it can be found + [here](https://github.com/nitrictech/examples/tree/main/v1/podcast-transcription). + + +```bash +nitric new podcast-transcription py-starter +cd podcast-transcription +``` + +Next, let's install our base dependencies, then add the `openai-whisper` library as an optional dependency. + + + We are adding `numba==0.60` as `librosa` is currently forcing an outdated + version. This may be removed in the future pending [this + issue](https://github.com/librosa/librosa/issues/1883) + + +```bash +# Install the base dependencies +uv sync +# Add OpenAI whisper dependency +uv add openai-whisper librosa numpy numba==0.60 --optional ml +``` + + + We add the extra dependencies to the 'ml' optional dependencies to keep them + separate since they can be quite large. This lets us just install them in the + containers that need them. + + +We'll organize our project structure like so: + +```text ++--common/ +| +-- __init__.py +| +-- resources.py ++--batches/ +| +-- transcribe.py ++--services/ +| +-- api.py ++--.gitignore ++--.python-version ++-- pyproject.toml ++-- python.dockerfile ++-- python.dockerignore ++-- nitric.yaml ++-- transcribe.dockerfile ++-- transcribe.dockerignore ++-- README.md +``` + +## Define our resources + +We'll start by creating a file to define our Nitric resources. For this project we'll need an API, Batch Job, and two buckets, one for the audio files to be transcribed and one for the resulting transcripts. The API will interface with the buckets, while the Batch Job will handle the transcription. + +```python title:common/resources.py +from nitric.resources import job, bucket, api + +main_api = api("main") + +transcribe_job = job("transcribe") + +podcast_bucket = bucket("podcasts") +transcript_bucket = bucket("transcripts") +``` + +## Add our transcription services + +Now that we have defined resources, we can import our API and add some routes to access the buckets as well as a storage listener that is triggered by new podcasts added to a bucket. + +```python title:services/api.py +from common.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job +from nitric.application import Nitric +from nitric.resources import BucketNotificationContext +from nitric.context import HttpContext + +readable_transcript_bucket = transcript_bucket.allow("read") +writable_podcast_bucket = podcast_bucket.allow("write") +submittable_transcribe_job = transcribe_job.allow("submit") + +# Get a podcast transcript from the bucket +# !collapse(1:12) collapsed +@main_api.get("/transcript/:name") +async def get_podcast(ctx: HttpContext): + name = ctx.req.params['name'] + + # Get a URL to download our transcript + download_url = await readable_transcript_bucket.file(f"{name}-transcript.txt").download_url() + + # Redirect the user to the download URL + ctx.res.headers["Location"] = download_url + ctx.res.status = 303 + + return ctx + +# Get a URL for uploading podcasts to the buckets +# !collapse(1:11) collapsed +@main_api.get("/podcast/:name") +async def get_podcast(ctx: HttpContext): + name = ctx.req.params['name'] + + # Get a URL to upload a podcast + upload_url = await writable_podcast_bucket.file(name).upload_url() + + # Return the upload URL + ctx.res.body = upload_url + + return ctx + +# Trigger a bucket notification on any write notifications to the podcast bucket +# !collapse(1:6) collapsed +@podcast_bucket.on("write", "*") +async def on_add_podcast(ctx: BucketNotificationContext): + # Submit a job request to the transcription job for the newly written podcast + await submittable_transcribe_job.submit({ "podcast_name": ctx.req.key }) + + return ctx + +Nitric.run() +``` + +## Downloading our model + +We can download our model and embed it into our container to reduce the start up time of our transcription. We'll create a script which can be triggered using `uv run download_model.py --model_name turbo`. + +```python title:download_model.py +from whisper import _MODELS, _download +import argparse +import os + +default = os.path.join(os.path.expanduser("~"), ".cache") +download_root = os.path.join(os.getenv("XDG_CACHE_HOME", default), "whisper") + +def download_whisper_model(model_name="base"): + print("downloading model...") + # if we have the original download go to the default whisper cache + model = _download(_MODELS[model_name], root=download_root, in_memory=True) + + # make sure the ./model directory exists + os.makedirs("./.model", exist_ok=True) + + # write the model to disk + save_path = f"./.model/model.pt" + with open(save_path, "wb") as f: + f.write(model) + + print(f"Model '{model_name}' has been downloaded and saved to './model/model.pt'.") + +if __name__ == "__main__": + parser = argparse.ArgumentParser(description="Download a Whisper model.") + parser.add_argument("--model_name", type=str, default="base", help="Name of the model to download.") + + args = parser.parse_args() + + download_whisper_model(model_name=args.model_name) +``` + +## Add Transcribe Batch Job + +For the job, we'll be setting the memory to 12000. If you're using anything less than the large model this should be safe. You can see the memory requirements for each of the model sizes below. + +| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed | +| ------ | ---------- | ------------------ | ------------------ | ------------- | -------------- | +| tiny | 39 M | tiny.en | tiny | `~1 GB ` | `~32x ` | +| base | 74 M | base.en | base | `~1 GB` | `~16x` | +| small | 244 M | small.en | small | `~2 GB` | `~6x` | +| medium | 769 M | medium.en | medium | `~5 GB` | `~2x` | +| large | 1550 M | N/A | large | `~10 GB` | `1x` | + +We can create our job using the following code: + +```python title:batches/transcribe.py +import whisper +import io +import numpy as np +import os +import librosa +from common.resources import transcribe_job, transcript_bucket, podcast_bucket +from nitric.context import JobContext +from nitric.application import Nitric + +# Set permissions for the transcript bucket for writing +writeable_transcript_bucket = transcript_bucket.allow("write") +# Set permissions for the podcast bucket for reading +readable_podcast_bucket = podcast_bucket.allow("read") + +# Get the location of the model +MODEL = os.environ.get("MODEL", "./.model/model.pt") + +@transcribe_job(cpus=1, memory=12000, gpus=0) +async def transcribe_podcast(ctx: JobContext): + podcast_name = ctx.req.data["podcast_name"] + print(f"Transcribing: {podcast_name}") + + # Read the bytes from the podcast that was uploaded to the bucket + podcast = await readable_podcast_bucket.file(podcast_name).read() + + # Convert the audio bytes into a numpy array for the models consumption + podcast_io = io.BytesIO(podcast) + y, sr = librosa.load(podcast_io) + audio_array = np.array(y) + + # Load the local whisper model + model = whisper.load_model(MODEL) + + # Transcribe the model and make the output verbose + # We can turn off `FP16` with `fp16=False` which will use `FP32` instead. + # This will depend on what is supported on your CPU when testing locally, however, `FP16` and `FP32` are supported on Lambda. + result = model.transcribe(audio_array, verbose=True, fp16=False) + + # Take the outputted transcript and write that to the transcript bucket + transcript = result["text"].encode() + + print("Finished transcoding... Writing to Bucket") + await writeable_transcript_bucket.file(f"{podcast_name}-transcript.txt").write(transcript) + print("Done!") + + return ctx + +Nitric.run() +``` + +## Deployment Dockerfiles + +With our code complete, we can write a dockerfile that our batch job will run in. It's split into 3 main sections, with 2 stages. + +1. The base image that copies our application code and resolves the dependencies using `uv`. +2. The next stage is to build upon our base with another image with Nvidia drivers. It sets environment variables to enable GPU use and download Python 3.11 with apt. +3. Get our application from the base image and run our application. + +```docker title:transcribe.dockerfile +# !collapse(1:13) collapsed +FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim AS builder + +ARG HANDLER +ENV HANDLER=${HANDLER} + +WORKDIR /app + +# Get Python dependencies +ENV UV_COMPILE_BYTECODE=1 UV_LINK_MODE=copy PYTHONPATH=. +COPY uv.lock pyproject.toml /app/ +RUN --mount=type=cache,target=/root/.cache/uv \ + uv sync --frozen -v --no-install-project --extra ml --no-dev --no-python-downloads +COPY . /app + +# !collapse(1:19) collapsed +FROM nvcr.io/nvidia/driver:550-5.15.0-1065-nvidia-ubuntu22.04 + +ARG HANDLER + +ENV HANDLER=${HANDLER} +ENV PYTHONUNBUFFERED=TRUE +ENV PYTHONPATH="." + +# Set which NVIDIA drivers will be mounted in the container +ENV NVIDIA_DRIVER_CAPABILITIES=all +ENV NVIDIA_REQUIRE_CUDA="cuda>=8.0" + +# Add Python 3.11 and FFMPEG +RUN apt-get update -y && \ + apt-get install -y software-properties-common ffmpeg && \ + add-apt-repository ppa:deadsnakes/ppa && \ + apt-get update -y && \ + apt-get install -y python3.11 && \ + ln -sf /usr/bin/python3.11 /usr/local/bin/python3.11 && \ + ln -sf /usr/bin/python3.11 /usr/local/bin/python3 && \ + ln -sf /usr/bin/python3.11 /usr/local/bin/python + +# !collapse(1:8) collapsed +COPY --from=builder /app /app +WORKDIR /app + +# Place executables in the environment at the front of the path +ENV PATH="/app/.venv/bin:$PATH" + +# Run the service using the path to the handler +ENTRYPOINT python -u $HANDLER +``` + +We'll add a `dockerignore` to help reduce the size of the Docker Image that is being deployed. + +```text title:transcribe.dockerignore +.mypy_cache/ +.nitric/ +.venv/ +nitric-spec.json +nitric.yaml +README.md +``` + +And add `./model` to the python docker ignore. + +```text tile:python.dockerignore +.mypy_cache/ +.nitric/ +.venv/ +.model/ +nitric-spec.json +nitric.yaml +README.md +``` + +Finally, we can update the project file to point our batch job to our new dockerfile. We also want to add the `batch-services` to our enabled preview features. + +```yaml title:nitric.yaml +name: podcast-transcription +services: + - match: services/*.py + start: uv run watchmedo auto-restart -p *.py --no-restart-on-command-exit -R uv run $SERVICE_PATH + runtime: python + +batch-services: + - match: batches/*.py + start: uv run watchmedo auto-restart -p *.py --no-restart-on-command-exit -R uv run $SERVICE_PATH + runtime: transcribe + +runtimes: + python: + dockerfile: python.dockerfile + transcribe: + dockerfile: transcribe.dockerfile + +preview: + - batch-services +``` + +### Testing the project + +Before deploying our project, we can test that it works as expected locally. You can do this using `nitric start` or if you'd prefer to run the program in containers use `nitric run`. Either way you can test the transcription by first uploading an audio file to the podcast bucket. + + + You can find most free podcasts for download by searching for it on + [Podbay](https://podbay.fm/). + + +You can upload the podcast directly to the bucket using the [local dashboard](/get-started/foundations/projects/local-development#local-dashboard) or use the API to do it instead. If you want to use the API, start by getting the upload URL for the bucket. + +```bash +curl http://localhost:4002/podcast/serial +http://localhost:55736/write/eyJhbGciOi... +``` + +We'll then use the URL to put our data binary. I've stored the podcast as `serial.mp3`. + +```bash +curl -X PUT --data-binary @"serial.mp3" http://localhost:55736/write/eyJhbGciOi... +``` + +Once that's done, the batch job will be triggered so you can just sit back and watch the transcription logs. When it finishes you can download the transcription from the bucket using the following cURL request. + +```bash +curl -sL http://localhost:4002/transcript/serial +``` + +### Requesting a G instance quota increase + +Most AWS accounts **will not** have access to on-demand GPU instances (G +Instances), if you'd like to run models using a GPU you'll need to request a quota increase for G instances. + +If you prefer not to use a GPU you can set `gpus=0` in the `@transcribe_podcast` decorator in `batches/transcribe.py`. The model runs pretty well on CPU, so a GPU is not entirely necessary. + + + **Important:** If the gpus value in `batches/transcribe.py` exceeds the number + of available GPUs in your AWS account, the job will never start. If you want + to run without a GPU, make sure to set `gpus=0` in the `@transcribe_podcast` + decorator. This is just a quirk of how AWS Batch works. + + +If you want to use a GPU you'll need to request a quota increase for G instances in AWS. + +To request a quota increase for G instances in AWS you can follow these steps: + +1. Go to the [AWS Service Quotas for EC2](https://console.aws.amazon.com/servicequotas/home/services/ec2/quotas) page. +2. Find/Search for **All G and VT Spot Instance Requests** +3. Click **Request quota increase** +4. Choose an appropriate value, e.g. 4, 8 or 16 depending on your needs + +screen shot of requesting a G instance quota increase on AWS + +Once you've requested the quota increase it may take time for AWS to approve it. + +### Deploy the project + +Once the above is complete, we can deploy the project to the cloud using: + +```bash +nitric up +``` + + + The initial deployment may take time due to the size of the python/Nvidia + driver and CUDA runtime dependencies. + + +Once the project is deployed you can try out some transcriptions, just add a podcast to the bucket and the bucket notification will be triggered. + +You can destroy the project once it is finished using `nitric down`. + +## Summary + +In this guide, we've created a podcast transcription service using OpenAI Whisper and Nitric's Python SDK. We showed how to use batch jobs to run long-running workloads and connect these jobs to buckets to store generated transcripts. We also demonstrated how to expose buckets using simple CRUD routes on a cloud API. Finally, we were able to create dockerfiles with GPU support to optimize the generation speeds on the cloud. + +For more information and advanced usage, refer to the [Nitric documentation](/). diff --git a/public/images/guides/podcast-transcription/featured.png b/public/images/guides/podcast-transcription/featured.png new file mode 100644 index 000000000..5d69a468a Binary files /dev/null and b/public/images/guides/podcast-transcription/featured.png differ