System Architecture

1. Overview

Duvent is a real-time voice language learning application designed for "Home Lab" deployment with a path to cloud scalability. It uses a Hybrid Microservices architecture to separate business logic from heavy AI computation.

2. High-Level Architecture Diagram

+-----------------+       +-----------------------+
|                 |       |                       |
|  User / Browser |<----->|   Frontend (Next.js)  |
|                 |       |                       |
+-----------------+       +-----------+-----------+
        ^                             |
        | WS (Audio Stream)           | REST (Auth/State)
        v                             v
+-------------------------------------------------------+
|                 Backend (Golang)                      |
|             "The Orchestrator"                        |
+-------------------------------------------------------+
        |                  |                 |
   gRPC | (Bidirectional)  | HTTP (JSON)     | TCP (SQL)
        |                  v                 v
        |         +-----------------+  +--------------+
        |         |  Ollama (LLM)   |  |  PostgreSQL  |
        |         |  (Localhost)    |  |              |
        |         +-----------------+  +--------------+
        v
+-----------------------+
|   AI Service (Python) |
|      "The Worker"     |
|  (Whisper + Kokoro)   |
+-----------------------+

3. Component Details

3.1. Frontend (Next.js)

Responsibility:
- Voice Activity Detection (VAD): Detects speech to optimize bandwidth.
- WebSocket Client: Streams audio chunks to the Go Backend.
- UI: Chat interface, topic selection, audio visualization.
Communication:
- WebSocket -> Backend (Real-time Audio).
- REST/HTTP -> Backend (Management).

3.2. Backend (Golang)

Responsibility:
- WebSocket Server: Terminates user connections.
- Session Management: Tracks conversation state.
- Orchestration:
  - Routes incoming audio -> Python AI Service (gRPC).
  - Routes recognized text -> Ollama (HTTP).
  - Routes LLM response -> Python AI Service (gRPC) for TTS.
  - Streams TTS audio -> User.
Key Libraries: gorilla/websocket, grpc-go, pgx (Postgres driver).

3.3. AI Service (Python)

Responsibility:
- gRPC Server: Exposes StreamAudio and SynthesizeSpeech endpoints.
- STT (Speech-to-Text): Runs faster-whisper (or whisper.cpp) on CPU/GPU.
- TTS (Text-to-Speech): Runs kokoro on CPU/GPU.
- Optimization: Keeps models loaded in memory.
Key Libraries: grpcio, faster-whisper, kokoro, torch.

3.4. LLM Provider (Ollama)

Responsibility:
- Runs the Large Language Model (e.g., Llama 3 8B Quantized).
- Handles chat context and generation.
Interface: Standard Ollama REST API (/api/chat).

3.5. Database (PostgreSQL)

Responsibility: Persists Users, Topics, Conversations, and Feedback logs.
Schema: Relational data + JSONB for flexible feedback structures.

4. Data Flow: The "Phone Call" Loop

User Speaks: Audio chunks sent via WS to Golang.
Transcribe: Golang streams audio via gRPC to Python AI Service.
Result: Python streams text transcript back to Golang.
Think: Golang buffers transcript (until silence/sentence end), then sends text to Ollama (HTTP).
Respond: Ollama returns text response to Golang.
Synthesize: Golang streams response text via gRPC to Python AI Service.
Play: Python streams audio bytes back to Golang -> Golang forwards to User via WS.

5. Deployment Strategy

Development (Local): All components run on localhost. Docker for Postgres.
Production (Hybrid):
- Backend/DB on Cloud VPS.
- AI Service/Ollama on GPU Node (or Local machine via Tunnel).

5. Database Schema (Draft)

Users

Field	Type	Description
`id`	UUID	Primary Key
`created_at`	TIMESTAMP

Topics

Field	Type	Description
`id`	UUID	Primary Key
`name`	VARCHAR	e.g., "Software Engineering Interview"
`system_prompt`	TEXT	The initial instruction for the AI

Conversations

Field	Type	Description
`id`	UUID	Primary Key
`user_id`	UUID	FK to Users
`topic_id`	UUID	FK to Topics
`status`	VARCHAR	active, completed
`created_at`	TIMESTAMP

Messages

Field	Type	Description
`id`	UUID	Primary Key
`conversation_id`	UUID	FK to Conversations
`role`	VARCHAR	user, assistant
`content`	TEXT	The text content
`audio_path`	VARCHAR	Path to local file: `/uploads/conv_{id}_{seq}.mp3`
`feedback`	JSONB	Structured feedback for this turn
`created_at`	TIMESTAMP

6. API Endpoints

REST (Management)

POST /api/session/init - Create guest user.
GET /api/topics - List topics.
GET /api/conversations/{id}/history - Get past messages.

WebSocket (Real-time)

WS /ws/conversation/{topic_id} - Main entry point for the call.
- Upstream (Client -> Server):
  - Binary: Audio Data (PCM/WebM).
  - JSON: { "event": "start_speaking" }
  - JSON: { "event": "stop_speaking" }
- Downstream (Server -> Client):
  - Binary: Audio Data (MP3/PCM).
  - JSON: { "event": "transcript", "data": "..." }
  - JSON: { "event": "feedback", "data": "..." }

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.gemini		.gemini
ai-service		ai-service
backend		backend
docs		docs
frontend		frontend
protos/v1		protos/v1
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
specs.md		specs.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

System Architecture

1. Overview

2. High-Level Architecture Diagram

3. Component Details

3.1. Frontend (Next.js)

3.2. Backend (Golang)

3.3. AI Service (Python)

3.4. LLM Provider (Ollama)

3.5. Database (PostgreSQL)

4. Data Flow: The "Phone Call" Loop

5. Deployment Strategy

5. Database Schema (Draft)

Users

Topics

Conversations

Messages

6. API Endpoints

REST (Management)

WebSocket (Real-time)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

System Architecture

1. Overview

2. High-Level Architecture Diagram

3. Component Details

3.1. Frontend (Next.js)

3.2. Backend (Golang)

3.3. AI Service (Python)

3.4. LLM Provider (Ollama)

3.5. Database (PostgreSQL)

4. Data Flow: The "Phone Call" Loop

5. Deployment Strategy

5. Database Schema (Draft)

Users

Topics

Conversations

Messages

6. API Endpoints

REST (Management)

WebSocket (Real-time)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages