Skip to content
This repository was archived by the owner on May 20, 2025. It is now read-only.

Commit 81c77c0

Browse files
add featured image
1 parent 4a6c589 commit 81c77c0

File tree

2 files changed

+37
-4
lines changed

2 files changed

+37
-4
lines changed

docs/guides/python/podcast-transcription.mdx

Lines changed: 37 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,11 @@ tags:
55
- AI & Machine Learning
66
languages:
77
- python
8-
published_at: 2024-11-07
9-
updated_at: 2024-11-07
8+
featured:
9+
image: /docs/images/guides/podcast-transcription/featured.png
10+
image_alt: 'Podcast Transcription featured image'
11+
published_at: 2024-11-15
12+
updated_at: 2024-11-15
1013
---
1114

1215
# Transcribing Podcasts using OpenAI Whisper
@@ -286,7 +289,9 @@ RUN apt-get update -y && \
286289
add-apt-repository ppa:deadsnakes/ppa && \
287290
apt-get update -y && \
288291
apt-get install -y python3.11 && \
289-
ln -sf /usr/bin/python3.11 /usr/local/bin/python3.11
292+
ln -sf /usr/bin/python3.11 /usr/local/bin/python3.11 && \
293+
ln -sf /usr/bin/python3.11 /usr/local/bin/python3 && \
294+
ln -sf /usr/bin/python3.11 /usr/local/bin/python
290295

291296
# !collapse(1:8) collapsed
292297
COPY --from=builder /app /app
@@ -346,12 +351,40 @@ preview:
346351
- batch-services
347352
```
348353
354+
### Testing the project
355+
356+
Before deploying our project, we can test that it works as expected locally. You can do this using `nitric start` or if you'd prefer to run the program in containers use `nitric run`. Either way you can test the transcription by first uploading an audio file to the podcast bucket.
357+
358+
<Note>
359+
You can find most free podcasts for download by searching for it on
360+
[Podbay](https://podbay.fm/).
361+
</Note>
362+
363+
You can upload the podcast directly to the bucket using the [local dashboard](/get-started/foundations/projects/local-development#local-dashboard) or use the API to do it instead. If you want to use the API, start by getting the upload URL for the bucket.
364+
365+
```bash
366+
curl http://localhost:4002/podcast/serial
367+
http://localhost:55736/write/eyJhbGciOi...
368+
```
369+
370+
We'll then use the URL to put our data binary. I've stored the podcast as `serial.mp3`.
371+
372+
```bash
373+
curl -X PUT --data-binary @"serial.mp3" http://localhost:55736/write/eyJhbGciOi...
374+
```
375+
376+
Once that's done, the batch job will be triggered so you can just sit back and watch the transcription logs. When it finishes you can download the transcription from the bucket using the following cURL request.
377+
378+
```bash
379+
curl -sL http://localhost:4002/transcript/serial
380+
```
381+
349382
### Requesting a G instance quota increase
350383

351384
Most AWS accounts **will not** have access to on-demand GPU instances (G
352385
Instances), if you'd like to run models using a GPU you'll need to request a quota increase for G instances.
353386

354-
If you prefer not to use a GPU you can set `gpus=0` in the `@transcribe_podcast` decorator in `batches/transcribe.py`.
387+
If you prefer not to use a GPU you can set `gpus=0` in the `@transcribe_podcast` decorator in `batches/transcribe.py`. The model runs pretty well on CPU, so a GPU is not entirely necessary.
355388

356389
<Note>
357390
**Important:** If the gpus value in `batches/transcribe.py` exceeds the number
28.2 KB
Loading

0 commit comments

Comments
 (0)