Skip to content

Commit 792004d

Browse files
committed
fix (chat): voice mode smoothness
1 parent 1a0d32a commit 792004d

File tree

6 files changed

+320
-210
lines changed

6 files changed

+320
-210
lines changed

src/client/app/chat/page.js

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -963,6 +963,10 @@ export default function ChatPage() {
963963
const handleStartVoice = async () => {
964964
if (connectionStatus !== "disconnected") return
965965
console.log("[ChatPage] handleStartVoice triggered.")
966+
// --- ADD POSTHOG EVENT TRACKING ---
967+
posthog?.capture("voice_mode_activated")
968+
voiceModeStartTimeRef.current = Date.now() // Set start time
969+
// --- END POSTHOG EVENT TRACKING ---
966970

967971
setConnectionStatus("connecting")
968972
setVoiceStatusText("Connecting...")
@@ -1207,6 +1211,7 @@ export default function ChatPage() {
12071211
console.log("[ChatPage] Toggling voice mode OFF.")
12081212
handleStopVoice()
12091213
setIsVoiceMode(false)
1214+
fetchInitialMessages()
12101215
} else {
12111216
console.log("[ChatPage] Toggling voice mode ON.")
12121217
// Switching TO voice mode, first get permissions
@@ -1215,10 +1220,6 @@ export default function ChatPage() {
12151220
console.log(
12161221
"[ChatPage] Permissions granted, activating voice mode."
12171222
)
1218-
// --- ADD POSTHOG EVENT TRACKING ---
1219-
posthog?.capture("voice_mode_activated")
1220-
voiceModeStartTimeRef.current = Date.now() // Set start time
1221-
// --- END POSTHOG EVENT TRACKING ---
12221223
setIsVoiceMode(true)
12231224
} else {
12241225
console.warn(

src/client/lib/webrtc-client.js

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -274,6 +274,12 @@ export class WebRTCClient {
274274
}
275275

276276
disconnect() {
277+
// --- MODIFICATION: Make disconnect idempotent ---
278+
// If peerConnection is already null, we've already cleaned up.
279+
if (this.peerConnection === null) {
280+
console.log("[WebRTCClient] Already disconnected, skipping redundant disconnect call.");
281+
return;
282+
}
277283
console.log("[WebRTCClient] Disconnecting...")
278284
if (this.disconnectTimer) {
279285
clearTimeout(this.disconnectTimer)

src/server/main/chat/prompts.py

Lines changed: 74 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -44,70 +44,96 @@
4444
DO NOT PROVIDE ANY ADDITIONAL TEXT OR EXPLANATIONS. ONLY RETURN THE JSON OBJECT.
4545
"""
4646

47+
# NEW: Language code to full name mapping
48+
LANGUAGE_CODE_MAPPING = {
49+
'en': 'English', 'hi': 'Hindi', 'es': 'Spanish', 'fr': 'French',
50+
'de': 'German', 'it': 'Italian', 'pt': 'Portuguese', 'ru': 'Russian',
51+
'ja': 'Japanese', 'ko': 'Korean', 'zh': 'Chinese', 'ar': 'Arabic',
52+
'bn': 'Bengali', 'gu': 'Gujarati', 'kn': 'Kannada', 'ml': 'Malayalam',
53+
'mr': 'Marathi', 'pa': 'Punjabi', 'ta': 'Tamil', 'te': 'Telugu',
54+
'ur': 'Urdu',
55+
# Add more mappings as needed
56+
}
57+
4758
VOICE_STAGE_1_SYSTEM_PROMPT = """
48-
You are an expert Triage AI for a real-time voice conversation. Your primary responsibility is to VERY QUICKLY classify the user's intent and extract necessary information. Latency is critical.
59+
You are an expert Triage AI for a real-time voice conversation. Your primary responsibility is to VERY QUICKLY classify the user's query and provide an immediate response in their language. Latency is critical.
4960
50-
You have two classifications for the user's intent:
51-
1. `simple_request`: A request that can be answered quickly. This usually involves retrieving information (e.g., "what's the weather?", "what's my next meeting?"), a single action (e.g., "send a short message"), or a simple question. These can be handled synchronously by the main AI.
52-
2. `complex_task`: A request that requires multiple steps, external research, creating/modifying documents, or will take more than a few seconds to complete. Examples: "Summarize my unread emails", "plan a trip to Paris", "draft a blog post about AI", "research these topics and create a report". These tasks must be offloaded to an asynchronous background worker.
61+
The user is speaking in **{detected_language}**.
62+
63+
You have two classifications for the user's query:
64+
1. `conversational`: A query that does not require any tools to answer. This includes greetings, simple questions, or chit-chat.
65+
2. `task`: A query that requires one or more tools to be executed. Tasks can be further divided into:
66+
- `simple`: Can be completed in a few seconds (e.g., checking weather, a single calendar event).
67+
- `complex`: Requires multiple steps or significant time (e.g., summarizing emails, planning a trip).
5368
5469
CRITICAL INSTRUCTIONS:
5570
- Analyze the user's latest message and the conversation history.
56-
- `intent_type` (string): MUST be either "simple_request" or "complex_task".
57-
- `summary_for_task` (string): If the intent is a `complex_task`, provide a concise, self-contained summary of the user's request. This will be used as the name for the background task. For `simple_request`, this can be an empty string.
58-
- `tools` (list of strings): For a `simple_request`, provide a list of tools needed to answer it. For a `complex_task`, this list can be empty, as a more detailed planner will select tools later.
71+
- `query_type` (string): MUST be either "conversational" or "task".
72+
- `task_type` (string): If `query_type` is "task", this MUST be "simple" or "complex". If `query_type` is "conversational", this MUST be `null`.
73+
- `response` (string): This is a crucial field for immediate user feedback. Your response in this field MUST be in **{detected_language}**.
74+
- If `query_type` is "conversational", this field MUST contain the direct, complete answer to the user's question in **{detected_language}**.
75+
- If `query_type` is "task" and `task_type` is "simple", this field MUST contain a short, reassuring phrase in **{detected_language}**, like "Sure, let me check that for you." or "Okay, one moment."
76+
- If `query_type` is "task" and `task_type` is "complex", this field MUST contain a confirmation that the task has been created in **{detected_language}**, like "Okay, I've added that to your tasks list."
77+
- `summary_for_task` (string): If `task_type` is "complex", provide a concise, self-contained summary of the request for the background worker. This summary MUST be in **English**. Otherwise, this MUST be `null`.
78+
- `tools` (list of strings): If `task_type` is "simple", provide a list of tools needed. Otherwise, this MUST be an empty list `[]`.
5979
6080
Here is the list of available tools:
61-
{
62-
"file_management": "Reading, writing, and listing files.",
63-
"accuweather": "Getting weather information.",
64-
"discord": "Interacting with Discord.",
65-
"gcalendar": "Managing Google Calendar events.",
66-
"gdocs": "Creating/editing Google Docs.",
67-
"gdrive": "Searching and reading files in Google Drive.",
68-
"github": "Interacting with GitHub.",
69-
"gmail": "Managing emails in Gmail.",
70-
"gmaps": "Navigation and location search.",
71-
"gpeople": "Managing contacts.",
72-
"gsheets": "Creating/editing Google Sheets.",
73-
"gslides": "Creating/editing Google Slides.",
74-
"internet_search": "Searching the internet.",
75-
"news": "Getting news updates.",
76-
"notion": "Managing pages in Notion.",
77-
"quickchart": "Generating charts.",
78-
"slack": "Interacting with Slack.",
79-
"trello": "Managing Trello boards.",
80-
"whatsapp": "Sending WhatsApp messages."
81-
}
81+
{{
82+
"file_management": "Reading, writing, and listing files.",
83+
"accuweather": "Getting weather information.",
84+
"discord": "Interacting with Discord.",
85+
"gcalendar": "Managing Google Calendar events.",
86+
"gdocs": "Creating/editing Google Docs.",
87+
"gdrive": "Searching and reading files in Google Drive.",
88+
"github": "Interacting with GitHub.",
89+
"gmail": "Managing emails in Gmail.",
90+
"gmaps": "Navigation and location search.",
91+
"gpeople": "Managing contacts.",
92+
"gsheets": "Creating/editing Google Sheets.",
93+
"gslides": "Creating/editing Google Slides.",
94+
"internet_search": "Searching the internet.",
95+
"news": "Getting news updates.",
96+
"notion": "Managing pages in Notion.",
97+
"quickchart": "Generating charts.",
98+
"slack": "Interacting with Slack.",
99+
"trello": "Managing Trello boards.",
100+
"whatsapp": "Sending WhatsApp messages."
101+
}}
82102
83103
Your response MUST be a single, valid JSON object. DO NOT provide any other text.
84104
85-
Example 1:
86-
User: "what's on my calendar for today?"
105+
Example 1 (Simple Task, User speaks Hindi):
106+
User: "आज का मौसम कैसा है?"
87107
Your JSON Output:
88-
{
89-
"intent_type": "simple_request",
90-
"summary_for_task": "",
91-
"tools": ["gcalendar"]
92-
}
93-
94-
Example 2:
108+
{{
109+
"query_type": "task",
110+
"task_type": "simple",
111+
"response": "ज़रूर, मैं अभी जांच करता हूँ।",
112+
"summary_for_task": null,
113+
"tools": ["accuweather"]
114+
}}
115+
116+
Example 2 (Complex Task):
95117
User: "Can you research the latest advancements in AI and create a summary document for me?"
96118
Your JSON Output:
97-
{
98-
"intent_type": "complex_task",
119+
{{
120+
"query_type": "task",
121+
"task_type": "complex",
122+
"response": "Okay, the task has been added to your list. I'll start the research now.",
99123
"summary_for_task": "Research latest AI advancements and create a summary document.",
100124
"tools": []
101-
}
125+
}}
102126
103-
Example 3:
104-
User: "send a quick message to john on slack saying I'm running 5 minutes late"
127+
Example 3 (Conversational, User speaks Spanish):
128+
User: "Hola, ¿cómo estás?"
105129
Your JSON Output:
106-
{
107-
"intent_type": "simple_request",
108-
"summary_for_task": "",
109-
"tools": ["slack"]
110-
}
130+
{{
131+
"query_type": "conversational",
132+
"task_type": null,
133+
"response": "¡Estoy muy bien, gracias por preguntar! ¿Cómo puedo ayudarte hoy?",
134+
"summary_for_task": null,
135+
"tools": []
136+
}}
111137
"""
112138

113139
STAGE_2_SYSTEM_PROMPT = """
@@ -168,6 +194,7 @@
168194
1. **Language Handling**: The user is speaking in `{detected_language}`. ALL of your internal reasoning, thoughts (`<think>` tags), and tool calls MUST be in English. However, your final user-facing response (inside `<answer>` tags) MUST be in `{detected_language}`.
169195
2. **Be Fast and Concise**: This is a voice conversation. Provide a direct answer without unnecessary preamble. Get straight to the point.
170196
3. **Execute Directly**: Use the tools you have been given to fulfill the user's request immediately. Do not plan long tasks.
197+
4. **Tool Lifecycle**: After calling a tool, you will receive the result. You MUST then analyze this result and formulate your final, user-facing answer based on it. Do not simply repeat your thought process.
171198
4. **Use Memory**: If you need personal information about the user (e.g., their manager's name, their preferences), use the `memory_mcp-search_memory` tool.
172199
5. **Handle Failures Gracefully**: If a tool fails, inform the user clearly and concisely in their language. For example, "I couldn't access your calendar right now."
173200
6. **Final Answer for Voice**: Your final response will be converted to speech. It MUST be a single, complete, conversational answer in the user's language. Wrap your final response in `<answer>` tags. For example: `<answer>Your next meeting is at 3 PM with the design team.</answer>`.

0 commit comments

Comments
 (0)