Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.

Commit 1e0392e

Browse files
change project structure
1 parent b3b5f54 commit 1e0392e

File tree

1 file changed

+46
-50
lines changed

1 file changed

+46
-50
lines changed

docs/guides/python/podcast-transcription.mdx

Lines changed: 46 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,11 @@ languages:
1919

2020
We'll start by creating a new project using Nitric's python starter template.
2121

22+
<Note>
23+
If you want to take a look at the finished code, it can be found
24+
[here](https://github.com/nitrictech/examples/tree/main/v1/podcast-transcription).
25+
</Note>
26+
2227
```bash
2328
nitric new podcast-transcription py-starter
2429
cd podcast-transcription
@@ -42,32 +47,29 @@ uv add openai-whisper --optional ml
4247
We'll organize our project structure like so:
4348

4449
```text
45-
+--src/
46-
| +--__init__.py
47-
| +--resources.py
48-
| +--services/
49-
| +--__init__.py
50-
| +--main.py
51-
| +--jobs/
52-
| +--__init__.py
53-
| +--api.pypy
54-
+--nitric.yaml
55-
+--docker/
56-
| +-- transcribe.dockerfile
57-
| +-- transcribe.dockerignore
58-
| +-- python.dockerfile
59-
| +-- python.dockerignore
50+
+--common/
51+
| +-- __init__.py
52+
| +-- resources.py
53+
+--batches/
54+
| +-- transcribe.py
55+
+--services/
56+
| +-- api.py
6057
+--.gitignore
6158
+--.python-version
62-
+--pyproject.toml
63-
+--README.md
59+
+-- pyproject.toml
60+
+-- python.dockerfile
61+
+-- python.dockerignore
62+
+-- nitric.yaml
63+
+-- transcribe.dockerfile
64+
+-- transcribe.dockerignore
65+
+-- README.md
6466
```
6567

6668
## Define our resources
6769

6870
We'll start by creating a file to define our Nitric resources. For this project we'll need an API, Batch Job, and two buckets, one for the audio files to be transcribed and one for the resulting transcripts. The API will interface with the buckets, while the Batch Job will handle the transcription.
6971

70-
```python title:src/resources.py
72+
```python title:common/resources.py
7173
from nitric.resources import job, bucket, api
7274

7375
main_api = api("main")
@@ -82,8 +84,8 @@ transcript_bucket = bucket("transcripts")
8284

8385
Now that we have defined resources, we can import our API and add some routes to access the buckets. Start by importing the resources and adding permissions to the resources.
8486

85-
```python title:src/services/api.py
86-
from src.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
87+
```python title:services/api.py
88+
from common.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
8789
from nitric.application import Nitric
8890
from nitric.resources import BucketNotificationContext
8991
from nitric.context import HttpContext
@@ -96,9 +98,9 @@ Nitric.run()
9698

9799
We'll then write a route for getting a file from the transcription bucket. These will get a signed download url and redirect the user to this url for downloading the text content.
98100

99-
```python title:src/services/api.py
101+
```python title:services/api.py
100102
# !collapse(1:7) collapsed
101-
from src.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
103+
from common.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
102104
from nitric.application import Nitric
103105
from nitric.resources import BucketNotificationContext
104106
from nitric.context import HttpContext
@@ -122,9 +124,9 @@ Nitric.run()
122124

123125
We will add a storage listener which will be triggered by files being added to the `podcast_bucket`.
124126

125-
```python
127+
```python title:services/api.py
126128
# !collapse(1:18) collapsed
127-
from src.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
129+
from common.resources import main_api, transcript_bucket, podcast_bucket, transcribe_job
128130
from nitric.application import Nitric
129131
from nitric.resources import BucketNotificationContext
130132
from nitric.context import HttpContext
@@ -156,9 +158,9 @@ Nitric.run()
156158

157159
Start by adding our imports and adding permissions to the resources we defined earlier.
158160

159-
```python title:src/job/transcribe.py
161+
```python title:batches/transcribe.py
160162
import whisper
161-
from src.resources import transcribe_job, transcript_bucket, podcast_bucket
163+
from common.resources import transcribe_job, transcript_bucket, podcast_bucket
162164
from nitric.context import JobContext
163165
from nitric.application import Nitric
164166

@@ -178,10 +180,10 @@ We'll then create our Job and set the required memory to `12000`. This is a safe
178180
| medium | 769 M | medium.en | medium | `~5 GB` | `~2x` |
179181
| large | 1550 M | N/A | large | `~10 GB` | `1x` |
180182

181-
```python title:src/job/transcribe.py
183+
```python title:batches/transcribe.py
182184
# !collapse(1:7) collapsed
183185
import whisper
184-
from src.resources import transcribe_job, transcript_bucket, podcast_bucket
186+
from common.resources import transcribe_job, transcript_bucket, podcast_bucket
185187
from nitric.context import JobContext
186188
from nitric.application import Nitric
187189

@@ -197,10 +199,10 @@ Nitric.run()
197199

198200
We'll then read the audio file that is referenced in the `JobContext` data that was sent with the submit request. We'll write the podcast to a local file so that the model can read from it.
199201

200-
```python title:src/job/transcribe.py
202+
```python title:batches/transcribe.py
201203
# !collapse(1:7) collapsed
202204
import whisper
203-
from src.resources import transcribe_job, transcript_bucket, podcast_bucket
205+
from common.resources import transcribe_job, transcript_bucket, podcast_bucket
204206
from nitric.context import JobContext
205207
from nitric.application import Nitric
206208

@@ -224,10 +226,10 @@ Nitric.run()
224226

225227
We'll then load our model and transcribe the audio. This is where we can choose the model based on balancing speed, size, and accuracy. We can turn off `FP16` with `fp16=False` which will use `FP32` instead. This will depend on what is supported on your CPU when testing locally, however, `FP16` and `FP32` are supported on Lambda.
226228

227-
```python title:src/job/transcribe.py
229+
```python title:batches/transcribe.py
228230
# !collapse(1:7) collapsed
229231
import whisper
230-
from src.resources import transcribe_job, transcript_bucket, podcast_bucket
232+
from common.resources import transcribe_job, transcript_bucket, podcast_bucket
231233
from nitric.context import JobContext
232234
from nitric.application import Nitric
233235

@@ -255,10 +257,10 @@ Nitric.run()
255257

256258
Finally, we'll take the outputted transcript and write that to the transcript bucket. This transcript is stored in `result["text"]`.
257259

258-
```python title:src/job/transcribe.py
260+
```python title:batches/transcribe.py
259261
# !collapse(1:7) collapsed
260262
import whisper
261-
from src.resources import transcribe_job, transcript_bucket, podcast_bucket
263+
from common.resources import transcribe_job, transcript_bucket, podcast_bucket
262264
from nitric.context import JobContext
263265
from nitric.application import Nitric
264266

@@ -293,7 +295,7 @@ Nitric.run()
293295

294296
With our code complete, we can write a dockerfile that our batch job will run in. Start with the base image that copies our application code and resolves the dependencies using `uv`.
295297

296-
```docker title:docker/transcribe.dockerfile
298+
```docker title:transcribe.dockerfile
297299
# The python version must match the version in .python-version
298300
FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim AS builder
299301

@@ -313,7 +315,7 @@ RUN --mount=type=cache,target=/root/.cache/uv \
313315

314316
The next stage is to build upon our base with another image with Nvidia drivers. We'll set some environment variables to enable GPU use and download Python 3.11 with apt.
315317

316-
```docker title:docker/transcribe.dockerfile
318+
```docker title:transcribe.dockerfile
317319
# !collapse(1:14) collapsed
318320
FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim AS builder
319321

@@ -350,7 +352,7 @@ RUN apt-get update -y && \
350352

351353
Finally, we'll get our application from the base image and run our application.
352354

353-
```docker title:docker/transcribe.dockerfile
355+
```docker title:transcribe.dockerfile
354356
# !collapse(1:31) collapsed
355357
FROM ghcr.io/astral-sh/uv:python3.11-bookworm-slim AS builder
356358

@@ -397,7 +399,7 @@ ENTRYPOINT python -u $HANDLER
397399

398400
We'll add a `dockerignore` to help reduce the size of the Docker Image that is being deployed.
399401

400-
```text title:docker/transcribe.dockerignore
402+
```text title:transcribe.dockerignore
401403
.mypy_cache/
402404
.nitric/
403405
.venv/
@@ -407,29 +409,23 @@ README.md
407409

408410
Finally, we can update the project file to point our batch job to our new dockerfile.
409411

410-
```yaml
412+
```yaml title:nitric.yaml
411413
name: podcast-transcription
412414
services:
413-
- basedir: ''
414-
match: src/services/api.py
415+
- match: services/api.py
415416
runtime: python
416417
start: uv run watchmedo auto-restart -p *.py --no-restart-on-command-exit -R python -- -u $SERVICE_PATH
417418

418419
batch-services:
419-
- basedir: ''
420-
match: src/jobs/transcribe.py
420+
- match: batches/transcribe.py
421421
runtime: transcribe
422422
start: uv run watchmedo auto-restart -p *.py --no-restart-on-command-exit -R python -- -u $SERVICE_PATH
423423

424424
runtimes:
425425
python:
426-
dockerfile: ./docker/python.dockerfile
427-
context: ''
428-
args: {}
426+
dockerfile: python.dockerfile
429427
transcribe:
430-
dockerfile: ./docker/transcribe.dockerfile
431-
context: ''
432-
args: {}
428+
dockerfile: transcribe.dockerfile
433429

434430
preview:
435431
- batch-services
@@ -494,4 +490,4 @@ You can destroy the project once it is finished using `nitric down`.
494490

495491
In this guide, we've created a podcast transcription service using OpenAI Whisper and Nitric's Python SDK. We showed how to use batch jobs to run long-running workloads and connect these jobs to buckets to store generated transcripts. We also demonstrated how to expose buckets using simple CRUD routes on a cloud API. Finally, we were able to create dockerfiles with GPU support to optimize the generation speeds on the cloud.
496492

497-
For more information and advanced usage, refer to the [Nitric documentation](/docs).
493+
For more information and advanced usage, refer to the [Nitric documentation](/).

0 commit comments

Comments
 (0)