Skip to content

Commit 43128f4

Browse files
add genai guide (#19262)
* add genai guide Signed-off-by: Craig Osterhout <[email protected]> Co-authored-by: Stephanie Aurelio <[email protected]>
1 parent 51b3996 commit 43128f4

File tree

7 files changed

+253
-6
lines changed

7 files changed

+253
-6
lines changed
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
---
2+
description: Explore a generative AI video analysis app that uses Docker, OpenAI, and Pinecone.
3+
keywords: python, generative ai, genai, llm, whisper, pinecone, openai, whisper
4+
title: GenAI video transcription and chat
5+
---
6+
7+
## Overview
8+
9+
This guide presents a project on video transcription and analysis using a set of
10+
technologies related to the
11+
[GenAI Stack](https://www.docker.com/blog/introducing-a-new-genai-stack/).
12+
13+
The project showcases the following technologies:
14+
- [Docker and Docker Compose](#docker-and-docker-compose)
15+
- [OpenAI](#openai-api)
16+
- [Whisper](#whisper)
17+
- [Embeddings](#embeddings)
18+
- [Chat completions](#chat-completions)
19+
- [Pinecone](#pinecone)
20+
- [Retrieval-Augmented Generation](#retrieval-augmented-generation)
21+
22+
> **Acknowledgment**
23+
>
24+
> This guide is a community contribution. Docker would like to thank
25+
> [David Cardozo](https://www.davidcardozo.com/) for his contribution
26+
> to this guide.
27+
28+
## Prerequisites
29+
30+
- You have an [OpenAI API Key](https://platform.openai.com/api-keys).
31+
- You have a [Pinecone API Key](https://app.pinecone.io/).
32+
- You have installed the latest version of [Docker Desktop](../../../get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop.
33+
- You have a [Git client](https://git-scm.com/downloads). The examples in this section use a command-line based Git client, but you can use any client.
34+
35+
## About the application
36+
37+
The application is a chatbot that can answer questions from a video. In
38+
addition, it provides timestamps from the video that can help you find the sources used to answer your question.
39+
40+
## Get and run the application
41+
42+
1. Clone the sample application's repository. In a terminal, run the following
43+
command.
44+
```console
45+
$ git clone https://github.com/Davidnet/docker-genai.git
46+
```
47+
The project contains the following directories and files:
48+
```text
49+
├── docker-genai/
50+
│ ├── docker-bot/
51+
│ ├── yt-whisper/
52+
│ ├── .env.example
53+
│ ├── .gitignore
54+
│ ├── LICENSE
55+
│ ├── README.md
56+
│ └── docker-compose.yaml
57+
```
58+
59+
2. Specify your API keys. In the `docker-genai` directory, create a text file
60+
called `.env` and specify your API keys inside. The following is the contents of the `.env.example` file that you can refer to as an example.
61+
62+
```text
63+
#----------------------------------------------------------------------------
64+
# OpenAI
65+
#----------------------------------------------------------------------------
66+
OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key
67+
68+
#----------------------------------------------------------------------------
69+
# Pinecone
70+
#----------------------------------------------------------------------------
71+
PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key
72+
```
73+
74+
3. Build and run the application. In a terminal, change directory to your
75+
`docker-genai` directory and run the following command.
76+
```console
77+
$ docker compose up --build
78+
```
79+
Docker Compose builds and runs the application based on the services defined
80+
in the `docker-compose.yaml` file. When the application is running, you'll
81+
see the logs of 2 services in the terminal.
82+
83+
In the logs, you'll see the services are exposed on ports `8503` and `8504`.
84+
The two services are complimentary to each other.
85+
86+
The `yt-whisper` service is running on port `8503`. This service feeds the
87+
Pinecone database with videos that you want to archive in your knowledge
88+
database. The following section explores this service.
89+
90+
## Using the yt-whisper service
91+
92+
The yt-whisper service is a YouTube video processing service that uses the OpenAI
93+
Whisper model to generate transcriptions of videos and stores them in a Pinecone
94+
database. The following steps show how to use the service.
95+
96+
1. Open a browser and access the yt-whisper service at [http://localhost:8503](http://localhost:8503).
97+
2. Once the application appears, in the **Youtube URL** field specify a Youtube video URL
98+
and select **Submit**. The following example uses
99+
[https://www.youtube.com/watch?v=yaQZFhrW0fU](https://www.youtube.com/watch?v=yaQZFhrW0fU).
100+
101+
![Submitting a video in the yt-whisper service](images/yt-whisper.webp)
102+
103+
The yt-whisper service downloads the audio of the video, uses Whisper to
104+
transcribe it into a WebVTT (`*.vtt`) format (which you can download), then
105+
uses the text-embedding-3-small model to create embeddings, and finally
106+
uploads those embeddings in to the Pinecone database.
107+
108+
After processing the video, a video list appears in the web app that informs
109+
you which videos have been indexed in Pinecone. It also provides a button to
110+
download the transcript.
111+
112+
![A processed video in the yt-whisper service](images/yt-whisper-2.webp)
113+
114+
You can now access the dockerbot service on port `8504` and ask questions
115+
about the videos.
116+
117+
## Using the dockerbot service
118+
119+
The dockerbot service is a question-answering service that leverages both the
120+
Pinecone database and an AI model to provide responses. The following steps show
121+
how to use the service.
122+
123+
> **Note**
124+
>
125+
> You must process at least one video via the
126+
> [yt-whisper service](#using-the-yt-whisper-service) before using
127+
> the dockerbot service.
128+
129+
1. Open a browser and access the service at
130+
[http://localhost:8504](http://localhost:8504).
131+
132+
2. In the **What do you want to know about your videos?** text box, ask the
133+
Dockerbot a question about a video that was processed by the yt-whisper
134+
service. The following example asks the question, "What is a sugar cookie?".
135+
The answer to that question exists in the video processed in the previous
136+
example,
137+
[https://www.youtube.com/watch?v=yaQZFhrW0fU](https://www.youtube.com/watch?v=yaQZFhrW0fU).
138+
139+
![Asking a question to the Dockerbot](images/bot.webp)
140+
141+
In this example, the Dockerbot answers the question and
142+
provides links to the video with timestamps, which may contain more
143+
information about the answer.
144+
145+
The dockerbot service takes the question, turns it into an embedding using
146+
the text-embedding-3-small model, queries the Pinecone database to find
147+
similar embeddings, and then passes that context into the gpt-4-turbo-preview
148+
to generate an answer.
149+
150+
3. Select the first link to see what information it provides. Based on the
151+
previous example, select
152+
[https://www.youtube.com/watch?v=yaQZFhrW0fU&t=553s](https://www.youtube.com/watch?v=yaQZFhrW0fU&t=553s).
153+
154+
In the example link, you can see that the section of video perfectly answers
155+
the question, "What is a sugar cookie?".
156+
157+
## Explore the application architecture
158+
159+
The following image shows the application's high-level service architecture, which includes:
160+
- yt-whisper: A local service, ran by Docker Compose, that interacts with the
161+
remote OpenAI and Pinecone services.
162+
- dockerbot: A local service, ran by Docker Compose, that interacts with the
163+
remote OpenAI and Pinecone services.
164+
- OpenAI: A remote third-party service.
165+
- Pinecone: A remote third-party service.
166+
167+
![Application architecture diagram](images/architecture.webp)
168+
169+
## Explore the technologies used and their role
170+
171+
### Docker and Docker Compose
172+
173+
The application uses Docker to run the application in containers, providing a
174+
consistent and isolated environment for running it. This means the application
175+
will operate as intended within its Docker containers, regardless of the
176+
underlying system differences. To learn more about Docker, see the [Getting started overview](../../get-started/_index.md).
177+
178+
Docker Compose is a tool for defining and running multi-container applications.
179+
Compose makes it easy to run this application with a single command, `docker
180+
compose up`. For more details, see the [Compose overview](../../../compose/_index.md).
181+
182+
### OpenAI API
183+
184+
The OpenAI API provides an LLM service that's known for its cutting-edge AI and
185+
machine learning technologies. In this application, OpenAI's technology is used
186+
to generate transcriptions from audio (using the Whisper model) and to create
187+
embeddings for text data, as well as to generate responses to user queries
188+
(using GPT and chat completions). For more details, see
189+
[openai.com](https://openai.com/product).
190+
191+
### Whisper
192+
193+
Whisper is an automatic speech recognition system developed by OpenAI, designed
194+
to transcribe spoken language into text. In this application, Whisper is used to
195+
transcribe the audio from YouTube videos into text, enabling further processing
196+
and analysis of the video content. For more details, see [Introducing Whisper](https://openai.com/research/whisper).
197+
198+
### Embeddings
199+
200+
Embeddings are numerical representations of text or other data types, which
201+
capture their meaning in a way that can be processed by machine learning
202+
algorithms. In this application, embeddings are used to convert video
203+
transcriptions into a vector format that can be queried and analyzed for
204+
relevance to user input, facilitating efficient search and response generation
205+
in the application. For more details, see OpenAI's
206+
[Embeddings](https://platform.openai.com/docs/guides/embeddings) documentation.
207+
208+
![Embedding diagram](images/embeddings.webp)
209+
210+
### Chat completions
211+
212+
Chat completion, as utilized in this application through OpenAI's API, refers to
213+
the generation of conversational responses based on a given context or prompt.
214+
In the application, it is used to provide intelligent, context-aware answers to
215+
user queries by processing and integrating information from video transcriptions
216+
and other inputs, enhancing the chatbot's interactive capabilities. For more
217+
details, see OpenAI's
218+
[Chat Completions API](https://platform.openai.com/docs/guides/text-generation) documentation.
219+
220+
### Pinecone
221+
222+
Pinecone is a vector database service optimized for similarity search, used for
223+
building and deploying large-scale vector search applications. In this
224+
application, Pinecone is employed to store and retrieve the embeddings of video
225+
transcriptions, enabling efficient and relevant search functionality within the
226+
application based on user queries. For more details, see
227+
[pincone.io](https://www.pinecone.io/).
228+
229+
### Retrieval-Augmented Generation
230+
231+
Retrieval-Augmented Generation (RAG) is a technique that combines information
232+
retrieval with a language model to generate responses based on retrieved
233+
documents or data. In RAG, the system retrieves relevant information (in this
234+
case, via embeddings from video transcriptions) and then uses a language model
235+
to generate responses based on this retrieved data. For more details, see
236+
OpenAI's cookbook for
237+
[Retrieval Augmented Generative Question Answering with Pinecone](https://cookbook.openai.com/examples/vector_databases/pinecone/gen_qa).
238+
239+
## Next steps
240+
241+
Explore how to [create a PDF bot application](../genai-pdf-bot/_index.md) using
242+
generative AI, or view more GenAI samples in the
243+
[GenAI Stack](https://github.com/docker/genai-stack) repository.
64.8 KB
Loading
37.8 KB
Loading
10.4 KB
Loading
17.8 KB
Loading
10.4 KB
Loading

data/toc.yaml

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -155,12 +155,16 @@ Guides:
155155
section:
156156
- sectiontitle: Generative AI
157157
section:
158-
- path: /guides/use-case/genai-pdf-bot/
159-
title: Overview
160-
- path: /guides/use-case/genai-pdf-bot/containerize/
161-
title: Containerize your app
162-
- path: /guides/use-case/genai-pdf-bot/develop/
163-
title: Develop your app
158+
- sectiontitle: PDF analysis and chat
159+
section:
160+
- path: /guides/use-case/genai-pdf-bot/
161+
title: Overview
162+
- path: /guides/use-case/genai-pdf-bot/containerize/
163+
title: Containerize your app
164+
- path: /guides/use-case/genai-pdf-bot/develop/
165+
title: Develop your app
166+
- path: /guides/use-case/genai-video-bot/
167+
title: Video transcription and chat
164168

165169
- sectiontitle: Develop with Docker
166170
section:

0 commit comments

Comments
 (0)