|
| 1 | +--- |
| 2 | +description: Explore a generative AI video analysis app that uses Docker, OpenAI, and Pinecone. |
| 3 | +keywords: python, generative ai, genai, llm, whisper, pinecone, openai, whisper |
| 4 | +title: GenAI video transcription and chat |
| 5 | +--- |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +This guide presents a project on video transcription and analysis using a set of |
| 10 | +technologies related to the |
| 11 | +[GenAI Stack](https://www.docker.com/blog/introducing-a-new-genai-stack/). |
| 12 | + |
| 13 | +The project showcases the following technologies: |
| 14 | +- [Docker and Docker Compose](#docker-and-docker-compose) |
| 15 | +- [OpenAI](#openai-api) |
| 16 | +- [Whisper](#whisper) |
| 17 | +- [Embeddings](#embeddings) |
| 18 | +- [Chat completions](#chat-completions) |
| 19 | +- [Pinecone](#pinecone) |
| 20 | +- [Retrieval-Augmented Generation](#retrieval-augmented-generation) |
| 21 | + |
| 22 | +> **Acknowledgment** |
| 23 | +> |
| 24 | +> This guide is a community contribution. Docker would like to thank |
| 25 | +> [David Cardozo](https://www.davidcardozo.com/) for his contribution |
| 26 | +> to this guide. |
| 27 | +
|
| 28 | +## Prerequisites |
| 29 | + |
| 30 | +- You have an [OpenAI API Key](https://platform.openai.com/api-keys). |
| 31 | +- You have a [Pinecone API Key](https://app.pinecone.io/). |
| 32 | +- You have installed the latest version of [Docker Desktop](../../../get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop. |
| 33 | +- You have a [Git client](https://git-scm.com/downloads). The examples in this section use a command-line based Git client, but you can use any client. |
| 34 | + |
| 35 | +## About the application |
| 36 | + |
| 37 | +The application is a chatbot that can answer questions from a video. In |
| 38 | +addition, it provides timestamps from the video that can help you find the sources used to answer your question. |
| 39 | + |
| 40 | +## Get and run the application |
| 41 | + |
| 42 | +1. Clone the sample application's repository. In a terminal, run the following |
| 43 | + command. |
| 44 | + ```console |
| 45 | + $ git clone https://github.com/Davidnet/docker-genai.git |
| 46 | + ``` |
| 47 | + The project contains the following directories and files: |
| 48 | + ```text |
| 49 | + ├── docker-genai/ |
| 50 | + │ ├── docker-bot/ |
| 51 | + │ ├── yt-whisper/ |
| 52 | + │ ├── .env.example |
| 53 | + │ ├── .gitignore |
| 54 | + │ ├── LICENSE |
| 55 | + │ ├── README.md |
| 56 | + │ └── docker-compose.yaml |
| 57 | + ``` |
| 58 | + |
| 59 | +2. Specify your API keys. In the `docker-genai` directory, create a text file |
| 60 | + called `.env` and specify your API keys inside. The following is the contents of the `.env.example` file that you can refer to as an example. |
| 61 | + |
| 62 | + ```text |
| 63 | + #---------------------------------------------------------------------------- |
| 64 | + # OpenAI |
| 65 | + #---------------------------------------------------------------------------- |
| 66 | + OPENAI_TOKEN=your-api-key # Replace your-api-key with your personal API key |
| 67 | +
|
| 68 | + #---------------------------------------------------------------------------- |
| 69 | + # Pinecone |
| 70 | + #---------------------------------------------------------------------------- |
| 71 | + PINECONE_TOKEN=your-api-key # Replace your-api-key with your personal API key |
| 72 | + ``` |
| 73 | + |
| 74 | +3. Build and run the application. In a terminal, change directory to your |
| 75 | + `docker-genai` directory and run the following command. |
| 76 | + ```console |
| 77 | + $ docker compose up --build |
| 78 | + ``` |
| 79 | + Docker Compose builds and runs the application based on the services defined |
| 80 | + in the `docker-compose.yaml` file. When the application is running, you'll |
| 81 | + see the logs of 2 services in the terminal. |
| 82 | + |
| 83 | + In the logs, you'll see the services are exposed on ports `8503` and `8504`. |
| 84 | + The two services are complimentary to each other. |
| 85 | + |
| 86 | + The `yt-whisper` service is running on port `8503`. This service feeds the |
| 87 | + Pinecone database with videos that you want to archive in your knowledge |
| 88 | + database. The following section explores this service. |
| 89 | + |
| 90 | +## Using the yt-whisper service |
| 91 | + |
| 92 | +The yt-whisper service is a YouTube video processing service that uses the OpenAI |
| 93 | +Whisper model to generate transcriptions of videos and stores them in a Pinecone |
| 94 | +database. The following steps show how to use the service. |
| 95 | + |
| 96 | +1. Open a browser and access the yt-whisper service at [http://localhost:8503](http://localhost:8503). |
| 97 | +2. Once the application appears, in the **Youtube URL** field specify a Youtube video URL |
| 98 | + and select **Submit**. The following example uses |
| 99 | + [https://www.youtube.com/watch?v=yaQZFhrW0fU](https://www.youtube.com/watch?v=yaQZFhrW0fU). |
| 100 | + |
| 101 | +  |
| 102 | + |
| 103 | + The yt-whisper service downloads the audio of the video, uses Whisper to |
| 104 | + transcribe it into a WebVTT (`*.vtt`) format (which you can download), then |
| 105 | + uses the text-embedding-3-small model to create embeddings, and finally |
| 106 | + uploads those embeddings in to the Pinecone database. |
| 107 | + |
| 108 | + After processing the video, a video list appears in the web app that informs |
| 109 | + you which videos have been indexed in Pinecone. It also provides a button to |
| 110 | + download the transcript. |
| 111 | + |
| 112 | +  |
| 113 | + |
| 114 | + You can now access the dockerbot service on port `8504` and ask questions |
| 115 | + about the videos. |
| 116 | + |
| 117 | +## Using the dockerbot service |
| 118 | + |
| 119 | +The dockerbot service is a question-answering service that leverages both the |
| 120 | +Pinecone database and an AI model to provide responses. The following steps show |
| 121 | +how to use the service. |
| 122 | + |
| 123 | +> **Note** |
| 124 | +> |
| 125 | +> You must process at least one video via the |
| 126 | +> [yt-whisper service](#using-the-yt-whisper-service) before using |
| 127 | +> the dockerbot service. |
| 128 | +
|
| 129 | +1. Open a browser and access the service at |
| 130 | + [http://localhost:8504](http://localhost:8504). |
| 131 | + |
| 132 | +2. In the **What do you want to know about your videos?** text box, ask the |
| 133 | + Dockerbot a question about a video that was processed by the yt-whisper |
| 134 | + service. The following example asks the question, "What is a sugar cookie?". |
| 135 | + The answer to that question exists in the video processed in the previous |
| 136 | + example, |
| 137 | + [https://www.youtube.com/watch?v=yaQZFhrW0fU](https://www.youtube.com/watch?v=yaQZFhrW0fU). |
| 138 | + |
| 139 | +  |
| 140 | + |
| 141 | + In this example, the Dockerbot answers the question and |
| 142 | + provides links to the video with timestamps, which may contain more |
| 143 | + information about the answer. |
| 144 | + |
| 145 | + The dockerbot service takes the question, turns it into an embedding using |
| 146 | + the text-embedding-3-small model, queries the Pinecone database to find |
| 147 | + similar embeddings, and then passes that context into the gpt-4-turbo-preview |
| 148 | + to generate an answer. |
| 149 | + |
| 150 | +3. Select the first link to see what information it provides. Based on the |
| 151 | + previous example, select |
| 152 | + [https://www.youtube.com/watch?v=yaQZFhrW0fU&t=553s](https://www.youtube.com/watch?v=yaQZFhrW0fU&t=553s). |
| 153 | + |
| 154 | + In the example link, you can see that the section of video perfectly answers |
| 155 | + the question, "What is a sugar cookie?". |
| 156 | + |
| 157 | +## Explore the application architecture |
| 158 | + |
| 159 | +The following image shows the application's high-level service architecture, which includes: |
| 160 | +- yt-whisper: A local service, ran by Docker Compose, that interacts with the |
| 161 | + remote OpenAI and Pinecone services. |
| 162 | +- dockerbot: A local service, ran by Docker Compose, that interacts with the |
| 163 | + remote OpenAI and Pinecone services. |
| 164 | +- OpenAI: A remote third-party service. |
| 165 | +- Pinecone: A remote third-party service. |
| 166 | + |
| 167 | + |
| 168 | + |
| 169 | +## Explore the technologies used and their role |
| 170 | + |
| 171 | +### Docker and Docker Compose |
| 172 | + |
| 173 | +The application uses Docker to run the application in containers, providing a |
| 174 | +consistent and isolated environment for running it. This means the application |
| 175 | +will operate as intended within its Docker containers, regardless of the |
| 176 | +underlying system differences. To learn more about Docker, see the [Getting started overview](../../get-started/_index.md). |
| 177 | + |
| 178 | +Docker Compose is a tool for defining and running multi-container applications. |
| 179 | +Compose makes it easy to run this application with a single command, `docker |
| 180 | +compose up`. For more details, see the [Compose overview](../../../compose/_index.md). |
| 181 | + |
| 182 | +### OpenAI API |
| 183 | + |
| 184 | +The OpenAI API provides an LLM service that's known for its cutting-edge AI and |
| 185 | +machine learning technologies. In this application, OpenAI's technology is used |
| 186 | +to generate transcriptions from audio (using the Whisper model) and to create |
| 187 | +embeddings for text data, as well as to generate responses to user queries |
| 188 | +(using GPT and chat completions). For more details, see |
| 189 | +[openai.com](https://openai.com/product). |
| 190 | + |
| 191 | +### Whisper |
| 192 | + |
| 193 | +Whisper is an automatic speech recognition system developed by OpenAI, designed |
| 194 | +to transcribe spoken language into text. In this application, Whisper is used to |
| 195 | +transcribe the audio from YouTube videos into text, enabling further processing |
| 196 | +and analysis of the video content. For more details, see [Introducing Whisper](https://openai.com/research/whisper). |
| 197 | + |
| 198 | +### Embeddings |
| 199 | + |
| 200 | +Embeddings are numerical representations of text or other data types, which |
| 201 | +capture their meaning in a way that can be processed by machine learning |
| 202 | +algorithms. In this application, embeddings are used to convert video |
| 203 | +transcriptions into a vector format that can be queried and analyzed for |
| 204 | +relevance to user input, facilitating efficient search and response generation |
| 205 | +in the application. For more details, see OpenAI's |
| 206 | +[Embeddings](https://platform.openai.com/docs/guides/embeddings) documentation. |
| 207 | + |
| 208 | + |
| 209 | + |
| 210 | +### Chat completions |
| 211 | + |
| 212 | +Chat completion, as utilized in this application through OpenAI's API, refers to |
| 213 | +the generation of conversational responses based on a given context or prompt. |
| 214 | +In the application, it is used to provide intelligent, context-aware answers to |
| 215 | +user queries by processing and integrating information from video transcriptions |
| 216 | +and other inputs, enhancing the chatbot's interactive capabilities. For more |
| 217 | +details, see OpenAI's |
| 218 | +[Chat Completions API](https://platform.openai.com/docs/guides/text-generation) documentation. |
| 219 | + |
| 220 | +### Pinecone |
| 221 | + |
| 222 | +Pinecone is a vector database service optimized for similarity search, used for |
| 223 | +building and deploying large-scale vector search applications. In this |
| 224 | +application, Pinecone is employed to store and retrieve the embeddings of video |
| 225 | +transcriptions, enabling efficient and relevant search functionality within the |
| 226 | +application based on user queries. For more details, see |
| 227 | +[pincone.io](https://www.pinecone.io/). |
| 228 | + |
| 229 | +### Retrieval-Augmented Generation |
| 230 | + |
| 231 | +Retrieval-Augmented Generation (RAG) is a technique that combines information |
| 232 | +retrieval with a language model to generate responses based on retrieved |
| 233 | +documents or data. In RAG, the system retrieves relevant information (in this |
| 234 | +case, via embeddings from video transcriptions) and then uses a language model |
| 235 | +to generate responses based on this retrieved data. For more details, see |
| 236 | +OpenAI's cookbook for |
| 237 | +[Retrieval Augmented Generative Question Answering with Pinecone](https://cookbook.openai.com/examples/vector_databases/pinecone/gen_qa). |
| 238 | + |
| 239 | +## Next steps |
| 240 | + |
| 241 | +Explore how to [create a PDF bot application](../genai-pdf-bot/_index.md) using |
| 242 | +generative AI, or view more GenAI samples in the |
| 243 | +[GenAI Stack](https://github.com/docker/genai-stack) repository. |
0 commit comments