You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/voice_agent/README.md
+7-4Lines changed: 7 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,7 +11,6 @@ As of now, we only support English input and output, but more languages will be
11
11
-[🚀 Quick Start](#-quick-start)
12
12
-[📑 Supported Models and Features](#-supported-models-and-features)
13
13
-[🤖 LLM](#-llm)
14
-
-[Thinking/reasoning Mode for LLMs](#thinkingreasoning-mode-for-llms)
15
14
-[🎤 ASR](#-asr)
16
15
-[💬 Speaker Diarization](#-speaker-diarization)
17
16
-[🔉 TTS](#-tts)
@@ -171,6 +170,9 @@ For vLLM server, if you specify `--reasoning_parser` in `vllm_server_params`, th
171
170
172
171
We use [cache-aware streaming FastConformer](https://arxiv.org/abs/2312.17279) to transcribe the user's speech into text. While new models will be released soon, we use the existing English models for now:
@@ -244,10 +246,11 @@ The tools are then registered to the LLM via the `register_direct_tools_to_llm`
244
246
245
247
More details on tool calling with Pipecat can be found in the [Pipecat documentation](https://docs.pipecat.ai/guides/learn/function-calling).
246
248
247
-
#### Notes on system prompt with tools
249
+
#### Notes on tool calling issues
248
250
249
-
We notice that sometimes the LLM cannot do anything that's not related to the provided tools, or it might not actually use the tools even though it says it's using them. To alleviate this issue, we insert additional instructions to the system prompt (e.g., in `server/server_configs/llm_configs/nemotron_nano_v2.yaml`):
250
-
- "Before responding to the user, check if the user request requires using external tools, and use the tools if they match with the user's intention. Otherwise, use your internal knowledge to answer the user's question. Do not use tools for casual conversation or when the tools don't fit the use cases. You should still try to address the user's request when it's not related to the provided tools."
251
+
We notice that sometimes the LLM cannot do anything that's not related to the provided tools, or it might not actually use the tools even though it says it's using them. To alleviate this issue, we insert additional instructions to the system prompt to regulate its behavior (e.g., in `server/server_configs/llm_configs/nemotron_nano_v2.yaml`).
252
+
253
+
Sometimes, after answering a question related to the tools, the LLM might refuce to answer questions that are not related to the tools, or vice versa. This phenomenon can be called "commitment bias" or "tunnel vision". To alleviate this issue, we can insert additional instructions to the system prompt and explicitly asking the LLM to use or not use the tools in the user's query.
Copy file name to clipboardExpand all lines: examples/voice_agent/server/server_configs/default.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ llm:
46
46
enable_reasoning: false # it's best to turn-off reasoning for lowest latency, setting it to True will use the same config ending with `_think.yaml` instead
47
47
# `system_prompt` is used as the sytem prompt to the LLM, please refer to differnt LLM webpage for spcial functions like enabling/disabling thinking
48
48
# system_prompt: /path/to/prompt.txt # or use path to a txt file that contains a long prompt, for example in `../example_prompts/fast_bite.txt`
49
-
system_prompt: "You are a helpful AI agent named Lisa. Start by greeting the user warmly and introducing yourself within one sentence. Your answer should be concise and to the point. You might also see speaker tags (<speaker_0>, <speaker_1>, etc.) in the user context. You should respond to the user based on the speaker tag and the context of that speaker. Do not include the speaker tags in your response, use them only to identify the speaker. Do not include any emoji in response."
49
+
system_prompt: "You are a helpful AI agent named Lisa. Start by greeting the user warmly and introducing yourself within one sentence. Your answer should be concise and to the point. You might also see speaker tags (<speaker_0>, <speaker_1>, etc.) in the user context. You should respond to the user based on the speaker tag and the context of that speaker. Do not include the speaker tags in your response, use them only to identify the speaker. Avoid using emoji in your response."
Copy file name to clipboardExpand all lines: examples/voice_agent/server/server_configs/llm_configs/nemotron_nano_v2.yaml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ type: vllm # Overwrite to vllm to enable tool calling, the HF backend currently
6
6
dtype: bfloat16 # torch.dtype for LLM
7
7
device: "cuda"
8
8
system_role: "system"# role for system prompt, set it to `user` for models that do not support system prompt
9
-
system_prompt_suffix: "Before responding to the user, check if the user request requires using external tools, and use the tools if they match with the user's intention. Otherwise, use your internal knowledge to answer the user's question. Do not use tools for casual conversation or when the tools don't fit the use cases. You should still try to address the user's request when it's not related to the provided tools. /no_think"# a string that would be appended to the system prompt, `/think` and `/no_think` are used to enable/disable thinking
9
+
system_prompt_suffix: "Before responding to the user, check if the user request requires using external tools, and use the tools if they match with the user's intention. Otherwise, use your internal knowledge to answer the user's question. Do not use tools for casual conversation or when the tools don't fit the use cases. You should still try to address the user's request when it's not related to the provided tools. If you are provided with a set of tools, use them only when needed, do not limit your capabilities to the scope of the tools. If the purpose of a tool matches well with a user's request, always try to call the tool first. Conversation history should not limit your behavior on whether you can use tools. You must answer questions not related to the tools. /no_think"# a string that would be appended to the system prompt, `/think` and `/no_think` are used to enable/disable thinking
10
10
enable_tool_calling: True # set to True since the vllm config below supports tool calling
0 commit comments