Skip to content

Commit e66f080

Browse files
committed
Fix YouTube transcript capture
1 parent 7004641 commit e66f080

File tree

5 files changed

+222
-493
lines changed

5 files changed

+222
-493
lines changed

docs/docs.json

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -273,14 +273,14 @@
273273
"integrations/data/pandas_and_pyarrow",
274274
"integrations/data/polars_arrow",
275275
"integrations/data/dlt",
276-
"integrations/data/phidata",
277276
"integrations/data/voxel51"
278277
]
279278
},
280279
{
281280
"group": "AI Platforms & Frameworks",
282281
"pages": [
283282
"integrations/ai/huggingface",
283+
"integrations/ai/agno",
284284
"integrations/ai/langchain",
285285
"integrations/ai/llamaIndex",
286286
"integrations/ai/genkit",
@@ -391,6 +391,10 @@
391391
"source": "/integrations/frameworks/:slug*",
392392
"destination": "integrations/ai/:slug*"
393393
},
394+
{
395+
"source": "/integrations/data/phidata",
396+
"destination": "integrations/ai/agno"
397+
},
394398
{
395399
"source": "/tutorials/rag/:slug*",
396400
"destination": "tutorials/agents/:slug*"

docs/integrations/ai/agno.mdx

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
---
2+
title: "Agno"
3+
sidebarTitle: "Agno"
4+
description: "Build a search assistant using the Agno agent framework with LanceDB as the knowledge backend."
5+
---
6+
7+
import {
8+
PyFrameworksAgnoAgent,
9+
PyFrameworksAgnoCliChat,
10+
PyFrameworksAgnoIngestYoutube,
11+
PyFrameworksAgnoSetup,
12+
} from '/snippets/integrations.mdx';
13+
14+
[Agno](https://docs.agno.com/introduction) is a framework for building agentic AI applications.
15+
It supports LanceDB as a knowledge backend, allowing you to easily ingest and retrieve external content for your agents.
16+
17+
When you pair Agno's `Knowledge` system with LanceDB, you get a clean Agentic RAG setup.
18+
We'll walk through the steps below to build a YouTube transcript-aware Agno assistant that can:
19+
- Ingest a transcript from a YouTube video via the YouTube API
20+
- Store embeddings and metadata in LanceDB
21+
- Retrieve context during responses with hybrid search
22+
- Ask questions about the video content in a CLI chat loop
23+
24+
## Prerequisites
25+
26+
Install dependencies:
27+
28+
<CodeGroup>
29+
```bash pip icon="terminal"
30+
pip install -U agno openai lancedb youtube-transcript-api beautifulsoup4
31+
```
32+
33+
```bash uv icon="terminal"
34+
uv add agno openai lancedb youtube-transcript-api beautifulsoup4
35+
```
36+
</CodeGroup>
37+
38+
## Step 1: Configure LanceDB-backed knowledge
39+
40+
First, you can initialize the core `Knowledge` object that your agent will use for retrieval.
41+
It configures LanceDB as the vector store, enables hybrid search with native LanceDB FTS, and sets the embedding model.
42+
43+
<CodeBlock filename="Python" language="Python" icon="python">
44+
{PyFrameworksAgnoSetup}
45+
</CodeBlock>
46+
47+
## Step 2: Fetch and ingest the YouTube transcript
48+
49+
Next, extract a YouTube video ID, fetch the full transcript, and flatten it into text for indexing.
50+
The snippet shown below then inserts that transcript text into the Agno knowledge base, which writes vectors and metadata to LanceDB.
51+
52+
<CodeBlock filename="Python" language="Python" icon="python">
53+
{PyFrameworksAgnoIngestYoutube}
54+
</CodeBlock>
55+
56+
<Info>
57+
This path explicitly fetches the transcript first, then inserts transcript text into LanceDB through Agno.
58+
</Info>
59+
60+
## Step 3: Create an Agno agent with knowledge search
61+
62+
The next step is to construct an Agno `Agent` and attach the knowledge base you just populated.
63+
With `search_knowledge=True`, the agent performs retrieval before answering, so responses stay grounded in transcript context.
64+
65+
In Agno, retrieval is exposed as a tool call that the model can invoke at runtime.
66+
When `search_knowledge=True`, Agno makes a knowledge-search tool (shown in output as `search_knowledge_base(...)`) available to the model; the model decides when to call it, Agno executes the tool, and the returned context is fed back into the final answer.
67+
68+
<CodeBlock filename="Python" language="Python" icon="python">
69+
{PyFrameworksAgnoAgent}
70+
</CodeBlock>
71+
72+
## Step 4: Start a CLI chat loop
73+
74+
You can now ask an initial question and then start an interactive loop for follow-up queries.
75+
Each prompt runs through the same retrieval pipeline, so you can iteratively inspect what the transcript contains.
76+
77+
<CodeBlock filename="Python" language="Python" icon="python">
78+
{PyFrameworksAgnoCliChat}
79+
</CodeBlock>
80+
81+
<Info>
82+
Want local-first inference? Replace OpenAI model/embedder classes with Agno's Ollama providers. See Agno's Ollama knowledge examples: [docs.agno.com/examples/models/ollama/chat/knowledge](https://docs.agno.com/examples/models/ollama/chat/knowledge).
83+
</Info>
84+
85+
### Question 1
86+
87+
The following question is asked in the CLI chat loop:
88+
```
89+
┏━ Message ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
90+
┃ ┃
91+
┃ Q: What kinds of data can LanceDB handle? ┃
92+
┃ ┃
93+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
94+
┏━ Tool Calls ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
95+
┃ ┃
96+
┃ • search_knowledge_base(query=What kinds of data can LanceDB handle?) ┃
97+
┃ • search_knowledge_base(query=LanceDB images audio video handle kinds of data ┃
98+
┃ can handle 'LanceDB can handle' 'kinds of data' 'images audio video' transcript) ┃
99+
┃ ┃
100+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
101+
┏━ Response (19.1s) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
102+
┃ ┃
103+
┃ ┃
104+
┃ • Images, audio, video — i.e., multimodal AI data and “all manners of things ┃
105+
┃ you don't put into traditional databases” (per the transcript). ┃
106+
┃ ┃
107+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
108+
```
109+
110+
We get the response based on the transcript's contents as expected.
111+
112+
### Question 2
113+
114+
Let's ask a more specific question about the CEO of LanceDB, which is also in the transcript:
115+
116+
```
117+
You: What is the name of the CEO of LanceDB?
118+
INFO Found 10 documents
119+
┏━ Message ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
120+
┃ ┃
121+
┃ What is the name of the CEO of LanceDB? ┃
122+
┃ ┃
123+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
124+
┏━ Tool Calls ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
125+
┃ ┃
126+
┃ • search_knowledge_base(query=CEO of LanceDB) ┃
127+
┃ ┃
128+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
129+
┏━ Response (16.7s) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
130+
┃ ┃
131+
┃ ┃
132+
┃ • According to the retrieved YouTube transcript/title, the CEO of LanceDB is ┃
133+
┃ Chang She. ┃
134+
┃ ┃
135+
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
136+
```
137+
138+
We get the response based on the transcript's contents and title as expected.
139+
140+
## Why this works well
141+
142+
To start, LanceDB OSS can run from a local directory, so transcript data can stay on your machine when you are using the OSS stack.
143+
144+
- You do not need to maintain a separate transcript parser in your application code.
145+
- You do not need to hand-roll chunking and retrieval orchestration across multiple modules.
146+
- One explicit Agno `Knowledge` object, backed by LanceDB, defines both ingestion and search behavior in one place.
147+
- Fewer moving parts means the tutorial stays readable and the same pattern is easier to carry into production code.
148+
149+
As your application needs grow, you can migrate to LanceDB [Enterprise](/enterprise) for
150+
convenience features like automatic compaction and reindexing and the ability to scale to
151+
really large datasets.

0 commit comments

Comments
 (0)