Skip to content

Commit 410db1f

Browse files
docs: migrate embedder→embedding_model and require vectordb across tool docs; add provider examples (en/ko/pt-BR) (#3804)
* docs(tools): migrate embedder->embedding_model, require vectordb; add Chroma/Qdrant examples across en/ko/pt-BR PDF/TXT/XML/MDX/DOCX/CSV/Directory docs * docs(observability): apply latest Datadog tweaks in ko and pt-BR
1 parent 5d6b4c9 commit 410db1f

23 files changed

+540
-390
lines changed

docs/en/observability/datadog.mdx

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -93,11 +93,15 @@ After running the application, you can view the traces in [Datadog LLM Observabi
9393

9494
Clicking on a trace will show you the details of the trace, including total tokens used, number of LLM calls, models used, and estimated cost. Clicking into a specific span will narrow down these details, and show related input, output, and metadata.
9595

96-
![Datadog LLM Observability Trace View](/images/datadog-llm-observability-1.png)
96+
<Frame>
97+
<img src="/images/datadog-llm-observability-1.png" alt="Datadog LLM Observability Trace View" />
98+
</Frame>
9799

98100
Additionally, you can view the execution graph view of the trace, which shows the control and data flow of the trace, which will scale with larger agents to show handoffs and relationships between LLM calls, tool calls, and agent interactions.
99101

100-
![Datadog LLM Observability Agent Execution Flow View](/images/datadog-llm-observability-2.png)
102+
<Frame>
103+
<img src="/images/datadog-llm-observability-2.png" alt="Datadog LLM Observability Agent Execution Flow View" />
104+
</Frame>
101105

102106
## References
103107

docs/en/tools/file-document/csvsearchtool.mdx

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -54,25 +54,25 @@ The following parameters can be used to customize the `CSVSearchTool`'s behavior
5454
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
5555

5656
```python Code
57+
from chromadb.config import Settings
58+
5759
tool = CSVSearchTool(
58-
config=dict(
59-
llm=dict(
60-
provider="ollama", # or google, openai, anthropic, llama2, ...
61-
config=dict(
62-
model="llama2",
63-
# temperature=0.5,
64-
# top_p=1,
65-
# stream=true,
66-
),
67-
),
68-
embedder=dict(
69-
provider="google", # or openai, ollama, ...
70-
config=dict(
71-
model="models/embedding-001",
72-
task_type="retrieval_document",
73-
# title="Embeddings",
74-
),
75-
),
76-
)
60+
config={
61+
"embedding_model": {
62+
"provider": "openai",
63+
"config": {
64+
"model": "text-embedding-3-small",
65+
# "api_key": "sk-...",
66+
},
67+
},
68+
"vectordb": {
69+
"provider": "chromadb", # or "qdrant"
70+
"config": {
71+
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
72+
# from qdrant_client.models import VectorParams, Distance
73+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
74+
}
75+
},
76+
}
7777
)
7878
```

docs/en/tools/file-document/directorysearchtool.mdx

Lines changed: 19 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -46,23 +46,25 @@ tool = DirectorySearchTool(directory='/path/to/directory')
4646
The DirectorySearchTool uses OpenAI for embeddings and summarization by default. Customization options for these settings include changing the model provider and configuration, enhancing flexibility for advanced users.
4747

4848
```python Code
49+
from chromadb.config import Settings
50+
4951
tool = DirectorySearchTool(
50-
config=dict(
51-
llm=dict(
52-
provider="ollama", # Options include ollama, google, anthropic, llama2, and more
53-
config=dict(
54-
model="llama2",
55-
# Additional configurations here
56-
),
57-
),
58-
embedder=dict(
59-
provider="google", # or openai, ollama, ...
60-
config=dict(
61-
model="models/embedding-001",
62-
task_type="retrieval_document",
63-
# title="Embeddings",
64-
),
65-
),
66-
)
52+
config={
53+
"embedding_model": {
54+
"provider": "openai",
55+
"config": {
56+
"model": "text-embedding-3-small",
57+
# "api_key": "sk-...",
58+
},
59+
},
60+
"vectordb": {
61+
"provider": "chromadb", # or "qdrant"
62+
"config": {
63+
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
64+
# from qdrant_client.models import VectorParams, Distance
65+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
66+
}
67+
},
68+
}
6769
)
6870
```

docs/en/tools/file-document/docxsearchtool.mdx

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -56,25 +56,25 @@ The following parameters can be used to customize the `DOCXSearchTool`'s behavio
5656
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
5757

5858
```python Code
59+
from chromadb.config import Settings
60+
5961
tool = DOCXSearchTool(
60-
config=dict(
61-
llm=dict(
62-
provider="ollama", # or google, openai, anthropic, llama2, ...
63-
config=dict(
64-
model="llama2",
65-
# temperature=0.5,
66-
# top_p=1,
67-
# stream=true,
68-
),
69-
),
70-
embedder=dict(
71-
provider="google", # or openai, ollama, ...
72-
config=dict(
73-
model="models/embedding-001",
74-
task_type="retrieval_document",
75-
# title="Embeddings",
76-
),
77-
),
78-
)
62+
config={
63+
"embedding_model": {
64+
"provider": "openai",
65+
"config": {
66+
"model": "text-embedding-3-small",
67+
# "api_key": "sk-...",
68+
},
69+
},
70+
"vectordb": {
71+
"provider": "chromadb", # or "qdrant"
72+
"config": {
73+
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
74+
# from qdrant_client.models import VectorParams, Distance
75+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
76+
}
77+
},
78+
}
7979
)
8080
```

docs/en/tools/file-document/mdxsearchtool.mdx

Lines changed: 19 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -48,27 +48,25 @@ tool = MDXSearchTool(mdx='path/to/your/document.mdx')
4848
The tool defaults to using OpenAI for embeddings and summarization. For customization, utilize a configuration dictionary as shown below:
4949

5050
```python Code
51+
from chromadb.config import Settings
52+
5153
tool = MDXSearchTool(
52-
config=dict(
53-
llm=dict(
54-
provider="ollama", # Options include google, openai, anthropic, llama2, etc.
55-
config=dict(
56-
model="llama2",
57-
# Optional parameters can be included here.
58-
# temperature=0.5,
59-
# top_p=1,
60-
# stream=true,
61-
),
62-
),
63-
embedder=dict(
64-
provider="google", # or openai, ollama, ...
65-
config=dict(
66-
model="models/embedding-001",
67-
task_type="retrieval_document",
68-
# Optional title for the embeddings can be added here.
69-
# title="Embeddings",
70-
),
71-
),
72-
)
54+
config={
55+
"embedding_model": {
56+
"provider": "openai",
57+
"config": {
58+
"model": "text-embedding-3-small",
59+
# "api_key": "sk-...",
60+
},
61+
},
62+
"vectordb": {
63+
"provider": "chromadb", # or "qdrant"
64+
"config": {
65+
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
66+
# from qdrant_client.models import VectorParams, Distance
67+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
68+
}
69+
},
70+
}
7371
)
7472
```

docs/en/tools/file-document/pdfsearchtool.mdx

Lines changed: 56 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -45,28 +45,64 @@ tool = PDFSearchTool(pdf='path/to/your/document.pdf')
4545

4646
## Custom model and embeddings
4747

48-
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
48+
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows. Note: a vector database is required because generated embeddings must be stored and queried from a vectordb.
4949

5050
```python Code
51+
from crewai_tools import PDFSearchTool
52+
53+
# - embedding_model (required): choose provider + provider-specific config
54+
# - vectordb (required): choose vector DB and pass its config
55+
5156
tool = PDFSearchTool(
52-
config=dict(
53-
llm=dict(
54-
provider="ollama", # or google, openai, anthropic, llama2, ...
55-
config=dict(
56-
model="llama2",
57-
# temperature=0.5,
58-
# top_p=1,
59-
# stream=true,
60-
),
61-
),
62-
embedder=dict(
63-
provider="google", # or openai, ollama, ...
64-
config=dict(
65-
model="models/embedding-001",
66-
task_type="retrieval_document",
67-
# title="Embeddings",
68-
),
69-
),
70-
)
57+
config={
58+
"embedding_model": {
59+
# Supported providers: "openai", "azure", "google-generativeai", "google-vertex",
60+
# "voyageai", "cohere", "huggingface", "jina", "sentence-transformer",
61+
# "text2vec", "ollama", "openclip", "instructor", "onnx", "roboflow", "watsonx", "custom"
62+
"provider": "openai", # or: "google-generativeai", "cohere", "ollama", ...
63+
"config": {
64+
# Model identifier for the chosen provider. "model" will be auto-mapped to "model_name" internally.
65+
"model": "text-embedding-3-small",
66+
# Optional: API key. If omitted, the tool will use provider-specific env vars when available
67+
# (e.g., OPENAI_API_KEY for provider="openai").
68+
# "api_key": "sk-...",
69+
70+
# Provider-specific examples:
71+
# --- Google Generative AI ---
72+
# (Set provider="google-generativeai" above)
73+
# "model": "models/embedding-001",
74+
# "task_type": "retrieval_document",
75+
# "title": "Embeddings",
76+
77+
# --- Cohere ---
78+
# (Set provider="cohere" above)
79+
# "model": "embed-english-v3.0",
80+
81+
# --- Ollama (local) ---
82+
# (Set provider="ollama" above)
83+
# "model": "nomic-embed-text",
84+
},
85+
},
86+
"vectordb": {
87+
"provider": "chromadb", # or "qdrant"
88+
"config": {
89+
# For ChromaDB: pass "settings" (chromadb.config.Settings) or rely on defaults.
90+
# Example (uncomment and import):
91+
# from chromadb.config import Settings
92+
# "settings": Settings(
93+
# persist_directory="/content/chroma",
94+
# allow_reset=True,
95+
# is_persistent=True,
96+
# ),
97+
98+
# For Qdrant: pass "vectors_config" (qdrant_client.models.VectorParams).
99+
# Example (uncomment and import):
100+
# from qdrant_client.models import VectorParams, Distance
101+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
102+
103+
# Note: collection name is controlled by the tool (default: "rag_tool_collection"), not set here.
104+
}
105+
},
106+
}
71107
)
72108
```

docs/en/tools/file-document/txtsearchtool.mdx

Lines changed: 35 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -57,25 +57,41 @@ By default, the tool uses OpenAI for both embeddings and summarization.
5757
To customize the model, you can use a config dictionary as follows:
5858

5959
```python Code
60+
from chromadb.config import Settings
61+
6062
tool = TXTSearchTool(
61-
config=dict(
62-
llm=dict(
63-
provider="ollama", # or google, openai, anthropic, llama2, ...
64-
config=dict(
65-
model="llama2",
66-
# temperature=0.5,
67-
# top_p=1,
68-
# stream=true,
69-
),
70-
),
71-
embedder=dict(
72-
provider="google", # or openai, ollama, ...
73-
config=dict(
74-
model="models/embedding-001",
75-
task_type="retrieval_document",
76-
# title="Embeddings",
77-
),
78-
),
79-
)
63+
config={
64+
# Required: embeddings provider + config
65+
"embedding_model": {
66+
"provider": "openai", # or google-generativeai, cohere, ollama, ...
67+
"config": {
68+
"model": "text-embedding-3-small",
69+
# "api_key": "sk-...", # optional if env var is set
70+
# Provider examples:
71+
# Google → model: "models/embedding-001", task_type: "retrieval_document"
72+
# Cohere → model: "embed-english-v3.0"
73+
# Ollama → model: "nomic-embed-text"
74+
},
75+
},
76+
77+
# Required: vector database config
78+
"vectordb": {
79+
"provider": "chromadb", # or "qdrant"
80+
"config": {
81+
# Chroma settings (optional persistence)
82+
# "settings": Settings(
83+
# persist_directory="/content/chroma",
84+
# allow_reset=True,
85+
# is_persistent=True,
86+
# ),
87+
88+
# Qdrant vector params example:
89+
# from qdrant_client.models import VectorParams, Distance
90+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
91+
92+
# Note: collection name is controlled by the tool (default: "rag_tool_collection").
93+
}
94+
},
95+
}
8096
)
8197
```

docs/en/tools/file-document/xmlsearchtool.mdx

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -54,25 +54,25 @@ It is an optional parameter during the tool's initialization but must be provide
5454
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
5555

5656
```python Code
57+
from chromadb.config import Settings
58+
5759
tool = XMLSearchTool(
58-
config=dict(
59-
llm=dict(
60-
provider="ollama", # or google, openai, anthropic, llama2, ...
61-
config=dict(
62-
model="llama2",
63-
# temperature=0.5,
64-
# top_p=1,
65-
# stream=true,
66-
),
67-
),
68-
embedder=dict(
69-
provider="google", # or openai, ollama, ...
70-
config=dict(
71-
model="models/embedding-001",
72-
task_type="retrieval_document",
73-
# title="Embeddings",
74-
),
75-
),
76-
)
60+
config={
61+
"embedding_model": {
62+
"provider": "openai",
63+
"config": {
64+
"model": "text-embedding-3-small",
65+
# "api_key": "sk-...",
66+
},
67+
},
68+
"vectordb": {
69+
"provider": "chromadb", # or "qdrant"
70+
"config": {
71+
# "settings": Settings(persist_directory="/content/chroma", allow_reset=True, is_persistent=True),
72+
# from qdrant_client.models import VectorParams, Distance
73+
# "vectors_config": VectorParams(size=384, distance=Distance.COSINE),
74+
}
75+
},
76+
}
7777
)
7878
```

0 commit comments

Comments
 (0)