| 
5 | 5 |   - AI & Machine Learning  | 
6 | 6 | languages:  | 
7 | 7 |   - python  | 
8 |  | -published_at: 2024-11-07  | 
9 |  | -updated_at: 2024-11-07  | 
 | 8 | +featured:  | 
 | 9 | +  image: /docs/images/guides/podcast-transcription/featured.png  | 
 | 10 | +  image_alt: 'Podcast Transcription featured image'  | 
 | 11 | +published_at: 2024-11-15  | 
 | 12 | +updated_at: 2024-11-15  | 
10 | 13 | ---  | 
11 | 14 | 
 
  | 
12 | 15 | # Transcribing Podcasts using OpenAI Whisper  | 
@@ -286,7 +289,9 @@ RUN apt-get update -y && \  | 
286 | 289 |   add-apt-repository ppa:deadsnakes/ppa && \  | 
287 | 290 |   apt-get update -y && \  | 
288 | 291 |   apt-get install -y python3.11 && \  | 
289 |  | -  ln -sf /usr/bin/python3.11 /usr/local/bin/python3.11  | 
 | 292 | +  ln -sf /usr/bin/python3.11 /usr/local/bin/python3.11 && \  | 
 | 293 | +  ln -sf /usr/bin/python3.11 /usr/local/bin/python3 && \  | 
 | 294 | +  ln -sf /usr/bin/python3.11 /usr/local/bin/python  | 
290 | 295 | 
 
  | 
291 | 296 | # !collapse(1:8) collapsed  | 
292 | 297 | COPY --from=builder /app /app  | 
@@ -346,12 +351,40 @@ preview:  | 
346 | 351 |   - batch-services  | 
347 | 352 | ```  | 
348 | 353 | 
  | 
 | 354 | +### Testing the project  | 
 | 355 | +
  | 
 | 356 | +Before deploying our project, we can test that it works as expected locally. You can do this using `nitric start` or if you'd prefer to run the program in containers use `nitric run`. Either way you can test the transcription by first uploading an audio file to the podcast bucket.  | 
 | 357 | + | 
 | 358 | +<Note>  | 
 | 359 | +  You can find most free podcasts for download by searching for it on  | 
 | 360 | +  [Podbay](https://podbay.fm/).  | 
 | 361 | +</Note>  | 
 | 362 | + | 
 | 363 | +You can upload the podcast directly to the bucket using the [local dashboard](/get-started/foundations/projects/local-development#local-dashboard) or use the API to do it instead. If you want to use the API, start by getting the upload URL for the bucket.  | 
 | 364 | + | 
 | 365 | +```bash  | 
 | 366 | +curl http://localhost:4002/podcast/serial  | 
 | 367 | +http://localhost:55736/write/eyJhbGciOi...  | 
 | 368 | +```  | 
 | 369 | + | 
 | 370 | +We'll then use the URL to put our data binary. I've stored the podcast as `serial.mp3`.  | 
 | 371 | + | 
 | 372 | +```bash  | 
 | 373 | +curl -X PUT --data-binary @"serial.mp3" http://localhost:55736/write/eyJhbGciOi...  | 
 | 374 | +```  | 
 | 375 | + | 
 | 376 | +Once that's done, the batch job will be triggered so you can just sit back and watch the transcription logs. When it finishes you can download the transcription from the bucket using the following cURL request.  | 
 | 377 | + | 
 | 378 | +```bash  | 
 | 379 | +curl -sL http://localhost:4002/transcript/serial  | 
 | 380 | +```  | 
 | 381 | + | 
349 | 382 | ### Requesting a G instance quota increase  | 
350 | 383 | 
 
  | 
351 | 384 | Most AWS accounts **will not** have access to on-demand GPU instances (G  | 
352 | 385 | Instances), if you'd like to run models using a GPU you'll need to request a quota increase for G instances.  | 
353 | 386 | 
 
  | 
354 |  | -If you prefer not to use a GPU you can set `gpus=0` in the `@transcribe_podcast` decorator in `batches/transcribe.py`.  | 
 | 387 | +If you prefer not to use a GPU you can set `gpus=0` in the `@transcribe_podcast` decorator in `batches/transcribe.py`. The model runs pretty well on CPU, so a GPU is not entirely necessary.  | 
355 | 388 | 
 
  | 
356 | 389 | <Note>  | 
357 | 390 |   **Important:** If the gpus value in `batches/transcribe.py` exceeds the number  | 
 | 
0 commit comments