Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.

Commit 21ee8c6

Browse files
committed
Add deployment section.
1 parent 372fc3b commit 21ee8c6

File tree

1 file changed

+228
-3
lines changed

1 file changed

+228
-3
lines changed

docs/guides/python/ai-podcast-part-1.mdx

Lines changed: 228 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -364,13 +364,238 @@ async def do_generate_audio(ctx: JobContext):
364364
print("Done!")
365365
```
366366

367-
Then we can add an api endpoint to trigger the download job. This will allow us to prefetch models before we need them.
368-
369367
<Note>
370368
If you like the download/cache step can also be rolled into the audio
371-
generation job.
369+
generation job. However having the download in a separate job is more cost
370+
effective as you aren't downloading and caching the model on an instance where
371+
you may also be paying for GPU time.
372372
</Note>
373373

374+
Then we can add an api endpoint to trigger the download job and update out api endpoint to allow selection of models and voice presets.
375+
376+
```python
377+
from common.resources import main_api, gen_audio_job, download_audio_model_job
378+
from nitric.application import Nitric
379+
from nitric.context import HttpContext
380+
381+
generate_audio = gen_audio_job.allow('submit')
382+
download_audio_model = download_audio_model_job.allow('submit')
383+
384+
audio_model_id = "suno/bark"
385+
default_voice_preset = "v2/en_speaker_6"
386+
387+
@main_api.post("/download-audio-model")
388+
async def download_audio(ctx: HttpContext):
389+
model_id = ctx.req.query.get("model", audio_model_id)
390+
391+
if isinstance(model_id, list):
392+
model_id = model_id[0]
393+
394+
await download_audio_model.submit({ "model_id": model_id })
395+
396+
# Generate a sample voice line
397+
@main_api.post("/audio/:filename")
398+
async def submit_auto(ctx: HttpContext):
399+
name = ctx.req.params["filename"]
400+
model_id = ctx.req.query.get("model", audio_model_id)
401+
preset = ctx.req.query.get("preset", default_voice_preset)
402+
403+
if isinstance(model_id, list):
404+
model_id = model_id[0]
405+
406+
if isinstance(preset, list):
407+
preset = preset[0]
408+
409+
body = ctx.req.data
410+
if body is None:
411+
ctx.res.status = 400
412+
return
413+
414+
print(f"using preset {preset}")
415+
416+
await generate_audio.submit({"file": name, "model_id": model_id, "text": body.decode('utf-8'), "preset": preset})
417+
418+
419+
Nitric.run()
420+
```
421+
422+
Once this is done we can give our project another test using:
423+
424+
```bash
425+
nitric start
426+
```
427+
428+
Just to make sure everything is working as expected.
429+
430+
## Defining our service docker images
431+
432+
In order to make sure our AI workload can properly use GPUs in the cloud we'll need to make sure it ships with drivers and libraries to support that.
433+
434+
We'll start by creating a new Dockerfile for our batch service under `docker/torch.dockerfile`.
435+
436+
```dockerfile
437+
# Torch dockerfile
438+
# Used for torch runtime based nitric batch services
439+
# Don't need to include the CUDA runtime as the nvidia pypi dep already ships with it
440+
FROM nvcr.io/nvidia/driver:550-5.15.0-1065-nvidia-ubuntu22.04
441+
442+
ARG HANDLER
443+
444+
ENV HANDLER=${HANDLER}
445+
ENV PYTHONUNBUFFERED=TRUE
446+
ENV PYTHONPATH="."
447+
ENV NVIDIA_DRIVER_CAPABILITIES=all
448+
ENV NVIDIA_REQUIRE_CUDA="cuda>=8.0"
449+
450+
RUN apt-get update -y && \
451+
apt-get install -y ca-certificates curl git python3.11 && \
452+
update-ca-certificates && \
453+
curl https://bootstrap.pypa.io/get-pip.py | python3.11 && \
454+
ln -sf /usr/bin/python3.11 /usr/bin/python3 && \
455+
ln -sf /usr/bin/python3.11 /usr/bin/python && \
456+
ln -sf /usr/bin/pip3.11 /usr/bin/pip3 && \
457+
ln -sf /usr/bin/pip3.11 /usr/bin/pip
458+
459+
RUN pip install --no-cache-dir --upgrade pip pipenv
460+
461+
COPY Pipfile Pipfile.lock ./
462+
463+
RUN pipenv install --system --categories="packages ml" --skip-lock --deploy --verbose
464+
465+
COPY . .
466+
467+
ENTRYPOINT python3.11 -u $HANDLER
468+
```
469+
470+
We'll also add an ignorefile for this Dockerfile to try and keep the image size down.
471+
472+
```gitignore
473+
.mypy_cache/
474+
.nitric/
475+
.venv/
476+
.model/
374477
```
375478

479+
Next we'll define a `standard` runtime for our normal services that we want to deploy along with our batch service. under `docker/standard.dockerfile`.
480+
481+
```dockerfile
482+
ARG IMAGE_BASE=python:3.11-slim
483+
484+
FROM ${IMAGE_BASE}
485+
486+
ARG HANDLER
487+
488+
ENV HANDLER=${HANDLER}
489+
ENV PYTHONUNBUFFERED=TRUE
490+
ENV PYTHONPATH="."
491+
492+
RUN apt-get update -y && \
493+
apt-get install -y ca-certificates git && \
494+
update-ca-certificates
495+
496+
RUN pip install --no-cache-dir --upgrade pip pipenv
497+
498+
COPY . .
499+
500+
ARG CATEGORIES="packages"
501+
502+
RUN pipenv install --categories="${CATEGORIES}" --skip-lock --system --deploy --verbose
503+
504+
ENTRYPOINT python -u $HANDLER
376505
```
506+
507+
And we'll add an ignorefile for this Dockerfile as well.
508+
509+
```gitignore
510+
.mypy_cache/
511+
.nitric/
512+
.venv/
513+
.model/
514+
```
515+
516+
Finally we'll update our `nitric.yaml` to include the new dockerfiles.
517+
518+
```yaml
519+
name: podcast-ai
520+
services:
521+
- match: services/*.py
522+
start: pipenv run dev $SERVICE_PATH
523+
runtime: standard
524+
batch-services:
525+
- match: batches/*.py
526+
start: pipenv run dev $SERVICE_PATH
527+
runtime: torch
528+
529+
runtimes:
530+
torch:
531+
dockerfile: './docker/torch.dockerfile'
532+
args: {}
533+
standard:
534+
dockerfile: './docker/standard.dockerfile'
535+
args: {}
536+
537+
preview:
538+
- batch-services
539+
```
540+
541+
Finally we can define out nitric stack file for deploying to the cloud.
542+
543+
```bash
544+
nitric stack new aws aws
545+
```
546+
547+
This will generate a nitric stack called `aws` to deploy to AWS.
548+
549+
Then we can tweak out stack with settings with some configuration for our batch service and the AWS Compute environment it will run in.
550+
551+
```yaml
552+
# The nitric provider to use
553+
provider: nitric/[email protected]
554+
# The target aws region to deploy to
555+
# See available regions:
556+
# https://docs.aws.amazon.com/general/latest/gr/lambda-service.html
557+
region: ap-southeast-2
558+
559+
batch-compute-env:
560+
min-cpus: 0
561+
# Allow a maximum of 4 CPUs to be used
562+
max-cpus: 4
563+
instance-types:
564+
# Allow use of G5 instances
565+
- g5
566+
- optimal
567+
```
568+
569+
<Note>
570+
You will need to make sure your machine is configured to deploy to AWS. See
571+
the [Nitric AWS provider documentation](/providers/aws) for more information.
572+
</Note>
573+
574+
<Note>
575+
Most AWS accounts will not have access to on demand GPU instances so you may
576+
need to update your AWS service quotas to allow GPU instances to spin up. The
577+
model will also work on CPU so if you can't get access to GPUs you can always
578+
increase the CPU count and memory to compensate.
579+
</Note>
580+
581+
Once that's configured we can deploy our project to the cloud using:
582+
583+
```bash
584+
nitric up
585+
```
586+
587+
<Note>
588+
Deployment may take sometime due to the size of the python/Nvidia driver and
589+
CUDA runtime dependencies. Be patient.
590+
</Note>
591+
592+
Once the project is deployed you can try out some generation, just like before depending on the hardware you were running on locally you may notice a speed up in generation time.
593+
594+
<Note>
595+
Running the project in the cloud will incur costs. Make sure to monitor your usage and shut down the project when you're done.
596+
597+
Running on g5.xlarge from testing this project will cost ~$0.05/minute of audio you generate. Based on standard EC2 pricing for US regions.
598+
599+
</Note>
600+
601+
In part two of this guide we'll look at adding an LLM agent to our project to automatically generate scripts for our podcasts from small prompts.

0 commit comments

Comments
 (0)