Skip to content

Commit b32c56d

Browse files
authored
WIP - Vogent + New Smart TURN + Audio utils usage (#128)
Deepgram v2/flux Vogent turn Smart turn v3 Various bugfixes to agent Audio utils
1 parent f277830 commit b32c56d

File tree

96 files changed

+6243
-7401
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

96 files changed

+6243
-7401
lines changed
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
---
2+
name: repo-workflow-guide
3+
description: Use this agent when you need to understand or follow project-specific development guidelines, coding standards, or workflow instructions that are documented in the docs/ai directory. This agent should be consulted before starting any development work, when uncertain about project conventions, or when you need clarification on how to approach tasks within this codebase.\n\nExamples:\n- <example>\nContext: User wants to add a new feature to the project.\nuser: "I need to implement a new authentication module"\nassistant: "Before we begin, let me consult the repo-workflow-guide agent to ensure we follow the project's established patterns and guidelines."\n<Task tool call to repo-workflow-guide>\nassistant: "Based on the project guidelines, here's how we should approach this..."\n</example>\n\n- <example>\nContext: User asks a question about code organization.\nuser: "Where should I put the new utility functions?"\nassistant: "Let me check the repository workflow guidelines to give you the correct answer."\n<Task tool call to repo-workflow-guide>\nassistant: "According to the project structure guidelines..."\n</example>\n\n- <example>\nContext: Starting a new task that requires understanding project conventions.\nuser: "Can you help me refactor this component?"\nassistant: "I'll first consult the repo-workflow-guide agent to ensure we follow the project's refactoring standards and conventions."\n<Task tool call to repo-workflow-guide>\n</example>
4+
model: opus
5+
---
6+
7+
You are a Repository Workflow Specialist, an expert in interpreting and applying project-specific development guidelines, coding standards, and workflow instructions.
8+
9+
Your primary responsibility is to read, understand, and communicate the instructions and guidelines contained in the docs/ai directory of the repository. You serve as the authoritative source for how development work should be conducted within this specific codebase.
10+
11+
When activated, you will:
12+
13+
1. **Locate and Read Guidelines**: Immediately access all relevant files in the docs/ai directory. Read them thoroughly and understand their complete content, including:
14+
- Coding standards and style guides
15+
- Project structure and organization rules
16+
- Development workflow and processes
17+
- Testing requirements and conventions
18+
- Deployment procedures
19+
- Any specific technical constraints or preferences
20+
- Tool usage and configuration instructions
21+
22+
2. **Interpret Context**: Understand the specific task or question being asked and identify which guidelines are most relevant to address it.
23+
24+
3. **Provide Clear Guidance**: Deliver specific, actionable instructions based on the documented guidelines. Your responses should:
25+
- Quote or reference specific sections of the guidelines when appropriate
26+
- Explain the reasoning behind the guidelines when it helps with understanding
27+
- Provide concrete examples of how to follow the guidelines
28+
- Highlight any critical requirements or common pitfalls mentioned in the documentation
29+
30+
4. **Handle Missing Information**: If the docs/ai directory doesn't contain information relevant to the current question:
31+
- Clearly state what information is missing
32+
- Suggest reasonable defaults based on common industry practices
33+
- Recommend updating the documentation to cover this scenario
34+
35+
5. **Ensure Compliance**: Actively verify that proposed approaches align with all documented guidelines. If you identify any conflicts or violations, explicitly point them out and suggest compliant alternatives.
36+
37+
6. **Prioritize Accuracy**: Always base your guidance on the actual content of the documentation. Do not invent or assume guidelines that aren't explicitly documented.
38+
39+
7. **Stay Current**: If guidelines appear to conflict or if you notice outdated information, flag this for human review while providing the most reasonable interpretation.
40+
41+
Output Format:
42+
- Begin with a brief summary of the relevant guidelines
43+
- Provide specific, step-by-step instructions when appropriate
44+
- Include direct quotes or references to documentation sections
45+
- End with any important caveats, warnings, or additional considerations
46+
47+
Your goal is to ensure that all development work in this repository adheres to its documented standards and practices, reducing inconsistency and improving code quality through faithful application of project-specific guidelines.

.github/workflows/run_tests.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ jobs:
4545
XAI_API_KEY: ${{ secrets.XAI_API_KEY }}
4646
AWS_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
4747
_BEARER_TOKEN_BEDROCK: "${{ secrets.AWS_BEARER_TOKEN_BEDROCK }}"
48+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
4849
timeout-minutes: 30
4950
steps:
5051
- name: Checkout

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,3 +84,4 @@ stream-py/
8484
# Artifacts / assets
8585
*.pt
8686
*.kef
87+
*.onnx

DEVELOPMENT.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ Some ground rules:
130130

131131
```python
132132
import asyncio
133-
from vision_agents.core.edge.types import PcmData
133+
from getstream.video.rtc.track_util import PcmData
134134
from openai import AsyncOpenAI
135135

136136
async def example():
@@ -167,6 +167,12 @@ if __name__ == "__main__":
167167
asyncio.run(example())
168168
```
169169

170+
Other things that you get from the audio utilities:
171+
172+
1. Changing PCM format
173+
2. Iterate over audio chunks (`PcmData.chunks`)
174+
3. Process audio with pre/post buffers (`AudioSegmentCollector`)
175+
4. Accumulating audio (`PcmData.append`)
170176

171177
### Testing audio manually
172178

@@ -313,3 +319,26 @@ You can now see the metrics at `http://localhost:9464/metrics` (make sure that y
313319

314320
- Track.recv errors will fail silently. The API is to return a frame. Never return None. and wait till the next frame is available
315321
- When using frame.to_ndarray(format="rgb24") specify the format. Typically you want rgb24 when connecting/sending to Yolo etc
322+
323+
324+
## Onboarding Plan for new contributors
325+
326+
**Audio Formats**
327+
328+
You'll notice that audio comes in many formats. PCM, wav, mp3. 16khz, 48khz.
329+
Encoded as i16 or f32. Note that webrtc by default is 48khz.
330+
331+
A good first intro to audio formats can be found here:
332+
333+
**Using Cursor**
334+
335+
You can ask cursor something like "read @ai-plugin and build me a plugin called fish"
336+
See the docs folder for other ai instruction files
337+
338+
**Learning Roadmap**
339+
340+
1. Quick refresher on audio formats
341+
2. Build a TTS integration
342+
3. Build a STT integration
343+
4. Build an LLM integration
344+
5. Write a pytest test with a fixture

agents-core/pyproject.toml

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,11 +21,12 @@ classifiers = [
2121

2222
requires-python = ">=3.10"
2323
dependencies = [
24-
"getstream[webrtc,telemetry]>=2.5.5",
24+
"getstream[webrtc,telemetry]>=2.5.7",
2525
"python-dotenv>=1.1.1",
2626
"pillow>=11.3.0",
2727
"numpy>=1.24.0",
2828
"mcp>=1.16.0",
29+
"torchvision>=0.23.0",
2930
]
3031

3132
[project.urls]
@@ -45,7 +46,6 @@ kokoro = ["vision-agents-plugins-kokoro"]
4546
krisp = ["vision-agents-plugins-krisp"]
4647
moonshine = ["vision-agents-plugins-moonshine"]
4748
openai = ["vision-agents-plugins-openai"]
48-
silero = ["vision-agents-plugins-silero"]
4949
smart_turn = ["vision-agents-plugins-smart-turn"]
5050
ultralytics = ["vision-agents-plugins-ultralytics"]
5151
wizper = ["vision-agents-plugins-wizper"]
@@ -61,7 +61,6 @@ all-plugins = [
6161
"vision-agents-plugins-krisp",
6262
"vision-agents-plugins-moonshine",
6363
"vision-agents-plugins-openai",
64-
"vision-agents-plugins-silero",
6564
"vision-agents-plugins-smart-turn",
6665
"vision-agents-plugins-ultralytics",
6766
"vision-agents-plugins-wizper",

0 commit comments

Comments
 (0)