ๆAI(renai) is a web application that improves communication skills (especially with the opposite sex) through 3D real-time AI conversation simulation. It provides feedback on both verbal communication (conversation skills) and non-verbal communication (facial expressions, eye contact).
graph TB
subgraph "ใฏใฉใคใขใณใๅฑค"
User[ใฆใผใถใผ]
Partner[ใใผใใใผ]
Camera[ใซใกใฉ]
Mic[ใใคใฏ]
end
subgraph "ใใญใณใใจใณใ - Next.js 15 on Vercel"
subgraph "ไธป่ฆใใผใธ"
SimPage[/simulation<br/>AIไผ่ฉฑใทใใฅใฌใผใทใงใณ]
FeedbackPage[/feedback<br/>ใใฃใผใใใใฏ่กจ็คบ]
PracticePage[/practice<br/>็ทด็ฟไบ็ด]
PartnerPage[/partner<br/>ใใผใใใผใใใทใฅใใผใ]
SessionPage[/session<br/>ใปใใทใงใณ่ฉณ็ดฐ]
TestCallPage[/test-call<br/>้่ฉฑใในใ]
end
subgraph "ใณใขใใใฏ"
UseConversation[useConversation<br/>ไผ่ฉฑ็ฎก็]
UseFacialAnalysis[useFacialAnalysis<br/>MediaPipe่กจๆ
ๅๆ]
UseLipSync[useLipSync<br/>Web Audio APIใชใใใทใณใฏ]
UseVRM[useVRM<br/>VRMๅถๅพก]
UseAgoraCall[useAgoraCall<br/>Agoraใใใช้่ฉฑ]
end
subgraph "3D/UIๅฑค"
VRMAvatar[VRMใขใใฟใผ<br/>React Three Fiber]
MediaPipe[MediaPipe<br/>478็น้กใฉใณใใใผใฏ]
AudioRecorder[AudioRecorder<br/>้ณๅฃฐ้ฒ้ณ]
end
end
subgraph "ใใใฏใจใณใ - Hono on Cloudflare Workers"
subgraph "APIใซใผใ"
SessionsAPI[/api/sessions<br/>ใปใใทใงใณ็ฎก็]
ConversationAPI[/api/conversation<br/>AIไผ่ฉฑ็ๆ]
FeedbackAPI[/api/feedback<br/>ใใฃใผใใใใฏ็ๆ]
SpeechAPI[/api/stt, /api/tts<br/>้ณๅฃฐๅฆ็]
AgoraAPI[/api/agora/token<br/>ใใผใฏใณ็ๆ]
PartnersAPI[/api/partners<br/>ใใผใใใผ็ฎก็]
AuthAPI[/api/auth<br/>Better Auth]
end
subgraph "ใตใผใในๅฑค"
ConversationService[conversation.ts<br/>ใใธใในใญใธใใฏ]
AIClient[ai-client.ts<br/>Gemini APIใฏใฉใคใขใณใ]
STTService[stt.ts<br/>STT/TTSๅฆ็]
end
end
subgraph "ใใผใฟใใผในๅฑค - Supabase"
PostgreSQL[(PostgreSQL)]
Storage[(Supabase Storage<br/>้ณๅฃฐใใกใคใซ)]
end
subgraph "ๅค้จใตใผใใน"
Gemini[Google Gemini 2.5 Flash<br/>AIไผ่ฉฑใปใใฃใผใใใใฏ]
ElevenLabs[ElevenLabs API<br/>TTS/STT]
Agora[Agora RTC<br/>ใใใช้่ฉฑ]
end
subgraph "ใใผใฟใขใใซ - Prisma"
Conversation[Conversation<br/>ไผ่ฉฑใปใใทใงใณ]
Message[Message<br/>ใกใใปใผใธๅฑฅๆญด]
Feedback[Feedback<br/>ใใฃใผใใใใฏ]
GestureMetrics[GestureMetrics<br/>่กจๆ
ใป่ฆ็ทใใผใฟ]
HumanPartnerSession[HumanPartnerSession<br/>ไบบ้ใใผใใใผใปใใทใงใณ]
HumanPartnerFeedback[HumanPartnerFeedback<br/>ใใผใใใผใใฃใผใใใใฏ]
PracticeSlot[PracticeSlot<br/>ไบ็ดๆ ]
UserModel[User<br/>ใฆใผใถใผ]
end
%% ใฏใฉใคใขใณใ โ ใใญใณใใจใณใ
User --> SimPage
User --> FeedbackPage
User --> PracticePage
Partner --> PartnerPage
Camera --> MediaPipe
Mic --> AudioRecorder
%% ใใญใณใใจใณใๅ
้จใใญใผ
SimPage --> UseConversation
SimPage --> UseFacialAnalysis
SimPage --> UseVRM
UseConversation --> AudioRecorder
UseFacialAnalysis --> MediaPipe
UseVRM --> VRMAvatar
UseLipSync --> VRMAvatar
TestCallPage --> UseAgoraCall
SessionPage --> UseAgoraCall
%% ใใญใณใใจใณใ โ ใใใฏใจใณใ
UseConversation --> SpeechAPI
UseConversation --> ConversationAPI
FeedbackPage --> FeedbackAPI
PracticePage --> PartnersAPI
PartnerPage --> PartnersAPI
UseAgoraCall --> AgoraAPI
SimPage --> SessionsAPI
%% ใใใฏใจใณใๅ
้จใใญใผ
ConversationAPI --> ConversationService
FeedbackAPI --> ConversationService
ConversationService --> AIClient
SpeechAPI --> STTService
%% ใใใฏใจใณใ โ ๅค้จใตใผใใน
AIClient --> Gemini
STTService --> ElevenLabs
AgoraAPI --> Agora
%% ใใใฏใจใณใ โ ใใผใฟใใผใน
SessionsAPI --> PostgreSQL
ConversationService --> PostgreSQL
STTService --> Storage
PartnersAPI --> PostgreSQL
AuthAPI --> PostgreSQL
%% ใใผใฟใขใใซใชใฌใผใทใงใณ
Conversation --> Message
Conversation --> Feedback
Conversation --> GestureMetrics
HumanPartnerSession --> HumanPartnerFeedback
HumanPartnerSession --> PracticeSlot
UserModel --> HumanPartnerSession
%% ในใฟใคใชใณใฐ
classDef frontend fill:#61dafb,stroke:#333,stroke-width:2px,color:#000
classDef backend fill:#f39c12,stroke:#333,stroke-width:2px,color:#000
classDef database fill:#2ecc71,stroke:#333,stroke-width:2px,color:#000
classDef external fill:#e74c3c,stroke:#333,stroke-width:2px,color:#fff
classDef model fill:#9b59b6,stroke:#333,stroke-width:2px,color:#fff
class SimPage,FeedbackPage,PracticePage,PartnerPage,SessionPage,TestCallPage,UseConversation,UseFacialAnalysis,UseLipSync,UseVRM,UseAgoraCall,VRMAvatar,MediaPipe,AudioRecorder frontend
class SessionsAPI,ConversationAPI,FeedbackAPI,SpeechAPI,AgoraAPI,PartnersAPI,AuthAPI,ConversationService,AIClient,STTService backend
class PostgreSQL,Storage database
class Gemini,ElevenLabs,Agora external
class Conversation,Message,Feedback,GestureMetrics,HumanPartnerSession,HumanPartnerFeedback,PracticeSlot,UserModel model
sequenceDiagram
participant User as ใฆใผใถใผ
participant Frontend as ใใญใณใใจใณใ<br/>(Next.js)
participant Backend as ใใใฏใจใณใ<br/>(Hono/Workers)
participant ElevenLabs as ElevenLabs API
participant Gemini as Gemini 2.5 Flash
participant DB as PostgreSQL
participant Storage as Supabase Storage
User->>Frontend: ้ณๅฃฐๅ
ฅๅ
Frontend->>Frontend: AudioRecorder้ฒ้ณ
Frontend->>Backend: POST /api/stt
Backend->>ElevenLabs: ้ณๅฃฐโใใญในใๅคๆ
ElevenLabs-->>Backend: ใใญในใ
Backend-->>Frontend: ่ช่ญใใญในใ
Frontend->>Backend: POST /api/conversation/generate
Note over Backend: ่ฆชๅฏๅบฆใทในใใ ๅคๅฎ<br/>(shy/friendly/open)
Backend->>Gemini: AIๅฟ็ญ็ๆใชใฏใจในใ<br/>(persona, ใทใใฅใจใผใทใงใณๅซใ)
Gemini-->>Backend: AIๅฟ็ญใใญในใ
Backend-->>Frontend: AIๅฟ็ญ
Frontend->>Backend: POST /api/tts
Backend->>ElevenLabs: ใใญในใโ้ณๅฃฐๅคๆ
ElevenLabs-->>Backend: ้ณๅฃฐใใกใคใซ
Backend->>Storage: ้ณๅฃฐใใกใคใซไฟๅญ
Storage-->>Backend: ไฟๅญๅฎไบ
Backend-->>Frontend: ้ณๅฃฐURL
Frontend->>Frontend: Web Audio API<br/>ๅจๆณขๆฐๅๆ(FFT)
Frontend->>Frontend: VRMใขใใฟใผ<br/>ใชใใใทใณใฏๅถๅพก
par ่กจๆ
ๅๆ
Frontend->>Frontend: MediaPipe<br/>478็นใฉใณใใใผใฏๆคๅบ
Frontend->>Frontend: ็ฌ้กใป่ฆ็ทๅๆ
end
Frontend->>Backend: POST /api/sessions/:id/messages
Backend->>DB: ใกใใปใผใธไฟๅญ
Frontend->>Backend: POST /api/sessions/:id/gestures
Backend->>DB: GestureMetricsไฟๅญ
User->>Frontend: ใปใใทใงใณ็ตไบ
Frontend->>Backend: POST /api/conversation/feedback
Backend->>Gemini: ใใฃใผใใใใฏ็ๆ<br/>(ไผ่ฉฑๅฑฅๆญด+ใธใงในใใฃใผใกใใชใฏใน)
Gemini-->>Backend: ่ฉไพกใปๆนๅ็น
Backend->>DB: Feedbackไฟๅญ
Backend-->>Frontend: ใใฃใผใใใใฏ
Frontend->>User: ใใฃใผใใใใฏ่กจ็คบ
graph LR
subgraph "ใใญใณใใจใณใๆ่ก"
NextJS[Next.js 15<br/>React 19]
R3F[React Three Fiber<br/>Three.js]
VRM[@pixiv/three-vrm<br/>VRMใขใใซ]
MediaPipeLib[MediaPipe<br/>้กๅๆ]
WebAPIs[Web APIs<br/>Speech/Audio]
Agora1[Agora RTC SDK]
end
subgraph "ใใใฏใจใณใๆ่ก"
Hono[Hono Framework]
Workers[Cloudflare Workers]
ZodOpenAPI[@hono/zod-openapi]
BetterAuth[Better Auth]
end
subgraph "ใใผใฟๅฑค"
Prisma[Prisma ORM]
PrismaAccelerate[Prisma Accelerate]
Supabase[Supabase<br/>PostgreSQL + Storage]
end
subgraph "AI/ๅค้จใตใผใใน"
Gemini2[Google Gemini<br/>2.5 Flash]
ElevenLabs2[ElevenLabs<br/>TTS/STT]
Agora2[Agora RTC]
end
NextJS --> R3F
R3F --> VRM
NextJS --> MediaPipeLib
NextJS --> WebAPIs
NextJS --> Agora1
Hono --> Workers
Hono --> ZodOpenAPI
Hono --> BetterAuth
Hono --> Prisma
Prisma --> PrismaAccelerate
PrismaAccelerate --> Supabase
NextJS --> Prisma
Hono --> Gemini2
Hono --> ElevenLabs2
Hono --> Agora2
- Framework: Next.js 15 (App Router), React 19
- 3D Rendering:
- React Three Fiber (Three.js React wrapper)
- @pixiv/three-vrm (VRM model loading and rendering)
- @pixiv/three-vrm-animation (Animation control)
- Facial Analysis: MediaPipe Tasks Vision (478-point face landmark detection)
- Audio Processing:
- Web Speech API (Speech recognition)
- Web Audio API (Frequency analysis for lip sync)
- ElevenLabs API (TTS/STT)
- Video Calling: Agora RTC SDK
- Styling: TailwindCSS 4
- UI: Radix UI
- Testing: Jest, React Testing Library
- Type Safety: TypeScript
- Framework: Hono (Lightweight web framework)
- Runtime: Cloudflare Workers (Edge computing)
- API Design: @hono/zod-openapi (Auto-generate OpenAPI schema)
- ORM: Prisma (PostgreSQL)
- Authentication: Better Auth
- Validation: Zod
- DB: Supabase PostgreSQL
- ORM: Prisma (Two separate client generations for frontend and backend)
- Storage: Supabase Storage (Audio file storage)
- Edge DB Connection: Prisma Accelerate (Cloudflare Workers support)
- Conversation Generation: Google Gemini 2.5 Flash
- Speech Synthesis: ElevenLabs API
- Video Calling: Agora RTC
- Frontend: Vercel
- Backend: Cloudflare Workers
- Monorepo Management: pnpm workspace
- CI/CD: GitHub Actions, Husky (Git hooks)
- Code Quality: Biome (Linter & Formatter)
/home/daccho/code/tk_b_2515/
โโโ frontend/ # Next.js frontend
โโโ backend/ # Hono backend
โโโ prisma/ # Prisma schema (shared)
โโโ .claude/ # Claude Code configuration
โโโ .tmp/ # Temporary files (design docs, task management)
โโโ docs/ # Documentation
โโโ pnpm-workspace.yaml # Monorepo configuration
โโโ package.json # Root package
frontend/src/
โโโ app/ # Next.js App Router pages
โ โโโ page.tsx # Home page
โ โโโ login/ # Login
โ โโโ signup/ # Signup
โ โโโ simulation/ # AI conversation simulation
โ โโโ feedback/ # Feedback display
โ โโโ practice/ # Human partner practice booking
โ โโโ partner/ # Partner pages
โ โโโ session/ # Session details
โ โโโ test-call/ # Video call test
โ โโโ api/ # Next.js API Routes (auth callbacks, etc.)
โโโ components/ # React components
โ โโโ Avatar/ # VRM avatar components
โ โโโ simulation/ # Simulation UI
โ โโโ auth/ # Auth UI
โ โโโ ui/ # Common UI components
โโโ hooks/ # Custom hooks
โ โโโ useConversation.ts # Conversation management (STTโAIโTTS)
โ โโโ useFacialAnalysis.ts # MediaPipe facial analysis
โ โโโ useLipSync.ts # Web Audio API lip sync
โ โโโ useVRM.ts # VRM control
โ โโโ useAudioRecorder.ts # Audio recording
โ โโโ useGestureTracking.ts # Gesture tracking
โ โโโ useAgoraCall.ts # Agora video call
โ โโโ useSimulationTimer.ts # Timer
โโโ lib/ # Utilities
โ โโโ api/ # API client (fetch wrapper)
โ โโโ audio/ # Audio analysis utilities
โ โโโ cache/ # Audio cache
โ โโโ auth.ts # Better Auth configuration
โ โโโ prisma.ts # Prisma client
โ โโโ supabase.ts # Supabase client
โโโ __tests__/ # Test code
backend/src/
โโโ index.ts # Entry point
โโโ server.ts # Local development server
โโโ routes/ # API routes
โ โโโ api.ts # Route integration
โ โโโ modules/ # Feature-based route modules
โ โโโ sessions.routes.ts # Session management
โ โโโ messages.routes.ts # Messages
โ โโโ conversation.routes.ts # AI conversation generation
โ โโโ feedback.routes.ts # Feedback
โ โโโ speech.routes.ts # STT/TTS
โ โโโ agora.routes.ts # Agora token generation
โ โโโ partners.routes.ts # Partner management
โ โโโ auth.routes.ts # Authentication
โ โโโ debug.routes.ts # Debug endpoints
โโโ services/ # Business logic
โ โโโ conversation.ts # AI conversation & feedback generation
โ โโโ ai-client.ts # Gemini API client
โ โโโ stt.ts # STT/TTS processing
โ โโโ client.ts # Common API client
โโโ middleware/ # Middleware
โ โโโ env.ts # Environment variable validation
โ โโโ logger.ts # Logging
โ โโโ error.ts # Error handling
โโโ lib/ # Libraries
โโโ prisma.ts # Prisma client
โโโ supabase.ts # Supabase client
- Conversation: Conversation session
- Message: Message history (user/assistant)
- Feedback: AI feedback (verbal/non-verbal evaluation, scores)
- GestureMetrics: Facial expression and eye contact data aggregation
- HumanPartnerSession: Video call session with human partner
- HumanPartnerFeedback: Session feedback (AI + partner evaluation)
- PracticeSlot: Partner booking slot management
- User: User (role: user/partner/admin)
- Account: Account information (Better Auth)
- Session: Session information
- Verification: Verification information
Conversation1:NMessageConversation1:1FeedbackConversation1:1GestureMetricsHumanPartnerSession1:1HumanPartnerFeedbackHumanPartnerSessionN:1PracticeSlotUser1:NHumanPartnerSession(both user and partner relations)
Three levels of intimacy (shy/friendly/open) that change AI response style. System prompts are dynamically generated, controlling emoji usage rules as well.
Location: backend/src/services/conversation.ts
Independent algorithm analyzes eye contact and smiles from 478 face landmarks. Performance optimized at 3fps (333ms intervals).
Location: frontend/src/hooks/useFacialAnalysis.ts
Frequency analysis (FFT 2048) with Web Audio API, human voice range extraction (300-3400Hz), natural animation with attack/release control.
Location: frontend/src/hooks/useLipSync.ts
Evaluation from both verbal (conversation skills) and non-verbal (expressions, eye contact) aspects. Deduction system, specific improvement suggestions.
[Frontend: Vercel]
Next.js 15
โ
[Backend: Cloudflare Workers]
Hono + Prisma Accelerate
โ
[Database: Supabase PostgreSQL]
[Storage: Supabase Storage]
[External APIs]
- Google Gemini
- ElevenLabs
- Agora RTC
- Frontend: Vercel Environment Variables
- Backend: Cloudflare Workers Secrets
- Local Development:
.env(refer to.env.example)
- Monorepo Structure: Separate frontend/backend while sharing Prisma schema
- Two Prisma Clients: Separate output for frontend and backend
- Intimacy Level Control: Conversation style changes based on message count
- 3fps Facial Analysis: Balance between performance and accuracy
- Edge Computing: Low latency with Cloudflare Workers
- Human Partner Feature: Practice with real people, not just AI
- Challenge: Direct DB connection not possible on edge runtime
- Solution: Use Prisma Accelerate
- Challenge: detectForVideo requires monotonically increasing integers (microseconds)
- Solution: Maintain previous timestamp, implement monotonic increase guarantee logic
- Challenge: TTS audio playback fails due to browser autoplay restrictions
- Solution: Error handling, display message prompting user interaction
/api
โโโ /health # Health check
โโโ /auth # Authentication (Better Auth)
โโโ /sessions # Session management
โ โโโ POST /sessions # Create session
โ โโโ GET /sessions/:id # Get session
โ โโโ PATCH /sessions/:id/finish # Finish session
โ โโโ POST /sessions/:id/messages # Save message
โ โโโ POST /sessions/:id/gestures # Save gesture metrics
โ โโโ POST /sessions/:id/feedback # Save feedback
โโโ /conversation # AI conversation
โ โโโ POST /generate # Generate AI response (Gemini)
โ โโโ POST /feedback # Generate feedback
โโโ /stt # Speech-to-text
โโโ /tts # Text-to-speech
โโโ /voices # Available voices list
โโโ /agora/token # Agora token generation
โโโ /partners # Partner management
โโโ GET /sessions/waiting # List waiting sessions
โโโ PATCH /sessions/:id/join # Join session
- 3D VRM avatar with realistic lip sync
- Real-time facial expression and eye contact analysis
- Multi-level intimacy system
- Multiple scenarios (library, classroom, Christmas)
- Verbal evaluation (leadership, continuity, development, empathy, question appropriateness)
- Non-verbal evaluation (smile, eye contact stability, eye direction)
- Deduction system (AI question count, conversation breaks, short utterances)
- Specific improvement suggestions
- Video call sessions with real partners
- Booking slot management
- Combined AI and partner feedback
- Audio Caching: Cache TTS audio files to reduce API calls
- 3fps Facial Analysis: Reduce processing frequency for better performance
- Edge Computing: Deploy backend on Cloudflare Workers for low latency
- Lazy Loading: Lazy load heavy 3D models and libraries
- Prisma Accelerate: Connection pooling for database access