Skip to content

Latest commit

ย 

History

History
531 lines (450 loc) ยท 19.8 KB

File metadata and controls

531 lines (450 loc) ยท 19.8 KB

ๆ‹AI - Architecture Documentation

Overview

ๆ‹AI(renai) is a web application that improves communication skills (especially with the opposite sex) through 3D real-time AI conversation simulation. It provides feedback on both verbal communication (conversation skills) and non-verbal communication (facial expressions, eye contact).

System Architecture

Overall Architecture Diagram

graph TB
    subgraph "ใ‚ฏใƒฉใ‚คใ‚ขใƒณใƒˆๅฑค"
        User[ใƒฆใƒผใ‚ถใƒผ]
        Partner[ใƒ‘ใƒผใƒˆใƒŠใƒผ]
        Camera[ใ‚ซใƒกใƒฉ]
        Mic[ใƒžใ‚คใ‚ฏ]
    end

    subgraph "ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰ - Next.js 15 on Vercel"
        subgraph "ไธป่ฆใƒšใƒผใ‚ธ"
            SimPage[/simulation<br/>AIไผš่ฉฑใ‚ทใƒŸใƒฅใƒฌใƒผใ‚ทใƒงใƒณ]
            FeedbackPage[/feedback<br/>ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ่กจ็คบ]
            PracticePage[/practice<br/>็ทด็ฟ’ไบˆ็ด„]
            PartnerPage[/partner<br/>ใƒ‘ใƒผใƒˆใƒŠใƒผใƒ€ใƒƒใ‚ทใƒฅใƒœใƒผใƒ‰]
            SessionPage[/session<br/>ใ‚ปใƒƒใ‚ทใƒงใƒณ่ฉณ็ดฐ]
            TestCallPage[/test-call<br/>้€š่ฉฑใƒ†ใ‚นใƒˆ]
        end

        subgraph "ใ‚ณใ‚ขใƒ•ใƒƒใ‚ฏ"
            UseConversation[useConversation<br/>ไผš่ฉฑ็ฎก็†]
            UseFacialAnalysis[useFacialAnalysis<br/>MediaPipe่กจๆƒ…ๅˆ†ๆž]
            UseLipSync[useLipSync<br/>Web Audio APIใƒชใƒƒใƒ—ใ‚ทใƒณใ‚ฏ]
            UseVRM[useVRM<br/>VRMๅˆถๅพก]
            UseAgoraCall[useAgoraCall<br/>Agoraใƒ“ใƒ‡ใ‚ช้€š่ฉฑ]
        end

        subgraph "3D/UIๅฑค"
            VRMAvatar[VRMใ‚ขใƒใ‚ฟใƒผ<br/>React Three Fiber]
            MediaPipe[MediaPipe<br/>478็‚น้ก”ใƒฉใƒณใƒ‰ใƒžใƒผใ‚ฏ]
            AudioRecorder[AudioRecorder<br/>้Ÿณๅฃฐ้Œฒ้Ÿณ]
        end
    end

    subgraph "ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ - Hono on Cloudflare Workers"
        subgraph "APIใƒซใƒผใƒˆ"
            SessionsAPI[/api/sessions<br/>ใ‚ปใƒƒใ‚ทใƒงใƒณ็ฎก็†]
            ConversationAPI[/api/conversation<br/>AIไผš่ฉฑ็”Ÿๆˆ]
            FeedbackAPI[/api/feedback<br/>ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ็”Ÿๆˆ]
            SpeechAPI[/api/stt, /api/tts<br/>้Ÿณๅฃฐๅ‡ฆ็†]
            AgoraAPI[/api/agora/token<br/>ใƒˆใƒผใ‚ฏใƒณ็”Ÿๆˆ]
            PartnersAPI[/api/partners<br/>ใƒ‘ใƒผใƒˆใƒŠใƒผ็ฎก็†]
            AuthAPI[/api/auth<br/>Better Auth]
        end

        subgraph "ใ‚ตใƒผใƒ“ใ‚นๅฑค"
            ConversationService[conversation.ts<br/>ใƒ“ใ‚ธใƒใ‚นใƒญใ‚ธใƒƒใ‚ฏ]
            AIClient[ai-client.ts<br/>Gemini APIใ‚ฏใƒฉใ‚คใ‚ขใƒณใƒˆ]
            STTService[stt.ts<br/>STT/TTSๅ‡ฆ็†]
        end
    end

    subgraph "ใƒ‡ใƒผใ‚ฟใƒ™ใƒผใ‚นๅฑค - Supabase"
        PostgreSQL[(PostgreSQL)]
        Storage[(Supabase Storage<br/>้Ÿณๅฃฐใƒ•ใ‚กใ‚คใƒซ)]
    end

    subgraph "ๅค–้ƒจใ‚ตใƒผใƒ“ใ‚น"
        Gemini[Google Gemini 2.5 Flash<br/>AIไผš่ฉฑใƒปใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ]
        ElevenLabs[ElevenLabs API<br/>TTS/STT]
        Agora[Agora RTC<br/>ใƒ“ใƒ‡ใ‚ช้€š่ฉฑ]
    end

    subgraph "ใƒ‡ใƒผใ‚ฟใƒขใƒ‡ใƒซ - Prisma"
        Conversation[Conversation<br/>ไผš่ฉฑใ‚ปใƒƒใ‚ทใƒงใƒณ]
        Message[Message<br/>ใƒกใƒƒใ‚ปใƒผใ‚ธๅฑฅๆญด]
        Feedback[Feedback<br/>ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ]
        GestureMetrics[GestureMetrics<br/>่กจๆƒ…ใƒป่ฆ–็ทšใƒ‡ใƒผใ‚ฟ]
        HumanPartnerSession[HumanPartnerSession<br/>ไบบ้–“ใƒ‘ใƒผใƒˆใƒŠใƒผใ‚ปใƒƒใ‚ทใƒงใƒณ]
        HumanPartnerFeedback[HumanPartnerFeedback<br/>ใƒ‘ใƒผใƒˆใƒŠใƒผใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ]
        PracticeSlot[PracticeSlot<br/>ไบˆ็ด„ๆž ]
        UserModel[User<br/>ใƒฆใƒผใ‚ถใƒผ]
    end

    %% ใ‚ฏใƒฉใ‚คใ‚ขใƒณใƒˆ โ†’ ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰
    User --> SimPage
    User --> FeedbackPage
    User --> PracticePage
    Partner --> PartnerPage
    Camera --> MediaPipe
    Mic --> AudioRecorder

    %% ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰ๅ†…้ƒจใƒ•ใƒญใƒผ
    SimPage --> UseConversation
    SimPage --> UseFacialAnalysis
    SimPage --> UseVRM
    UseConversation --> AudioRecorder
    UseFacialAnalysis --> MediaPipe
    UseVRM --> VRMAvatar
    UseLipSync --> VRMAvatar

    TestCallPage --> UseAgoraCall
    SessionPage --> UseAgoraCall

    %% ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰ โ†’ ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰
    UseConversation --> SpeechAPI
    UseConversation --> ConversationAPI
    FeedbackPage --> FeedbackAPI
    PracticePage --> PartnersAPI
    PartnerPage --> PartnersAPI
    UseAgoraCall --> AgoraAPI
    SimPage --> SessionsAPI

    %% ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ๅ†…้ƒจใƒ•ใƒญใƒผ
    ConversationAPI --> ConversationService
    FeedbackAPI --> ConversationService
    ConversationService --> AIClient
    SpeechAPI --> STTService

    %% ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ โ†’ ๅค–้ƒจใ‚ตใƒผใƒ“ใ‚น
    AIClient --> Gemini
    STTService --> ElevenLabs
    AgoraAPI --> Agora

    %% ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ โ†’ ใƒ‡ใƒผใ‚ฟใƒ™ใƒผใ‚น
    SessionsAPI --> PostgreSQL
    ConversationService --> PostgreSQL
    STTService --> Storage
    PartnersAPI --> PostgreSQL
    AuthAPI --> PostgreSQL

    %% ใƒ‡ใƒผใ‚ฟใƒขใƒ‡ใƒซใƒชใƒฌใƒผใ‚ทใƒงใƒณ
    Conversation --> Message
    Conversation --> Feedback
    Conversation --> GestureMetrics
    HumanPartnerSession --> HumanPartnerFeedback
    HumanPartnerSession --> PracticeSlot
    UserModel --> HumanPartnerSession

    %% ใ‚นใ‚ฟใ‚คใƒชใƒณใ‚ฐ
    classDef frontend fill:#61dafb,stroke:#333,stroke-width:2px,color:#000
    classDef backend fill:#f39c12,stroke:#333,stroke-width:2px,color:#000
    classDef database fill:#2ecc71,stroke:#333,stroke-width:2px,color:#000
    classDef external fill:#e74c3c,stroke:#333,stroke-width:2px,color:#fff
    classDef model fill:#9b59b6,stroke:#333,stroke-width:2px,color:#fff

    class SimPage,FeedbackPage,PracticePage,PartnerPage,SessionPage,TestCallPage,UseConversation,UseFacialAnalysis,UseLipSync,UseVRM,UseAgoraCall,VRMAvatar,MediaPipe,AudioRecorder frontend
    class SessionsAPI,ConversationAPI,FeedbackAPI,SpeechAPI,AgoraAPI,PartnersAPI,AuthAPI,ConversationService,AIClient,STTService backend
    class PostgreSQL,Storage database
    class Gemini,ElevenLabs,Agora external
    class Conversation,Message,Feedback,GestureMetrics,HumanPartnerSession,HumanPartnerFeedback,PracticeSlot,UserModel model
Loading

Conversation Flow Sequence

sequenceDiagram
    participant User as ใƒฆใƒผใ‚ถใƒผ
    participant Frontend as ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰<br/>(Next.js)
    participant Backend as ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰<br/>(Hono/Workers)
    participant ElevenLabs as ElevenLabs API
    participant Gemini as Gemini 2.5 Flash
    participant DB as PostgreSQL
    participant Storage as Supabase Storage

    User->>Frontend: ้Ÿณๅฃฐๅ…ฅๅŠ›
    Frontend->>Frontend: AudioRecorder้Œฒ้Ÿณ
    Frontend->>Backend: POST /api/stt
    Backend->>ElevenLabs: ้Ÿณๅฃฐโ†’ใƒ†ใ‚ญใ‚นใƒˆๅค‰ๆ›
    ElevenLabs-->>Backend: ใƒ†ใ‚ญใ‚นใƒˆ
    Backend-->>Frontend: ่ช่ญ˜ใƒ†ใ‚ญใ‚นใƒˆ

    Frontend->>Backend: POST /api/conversation/generate
    Note over Backend: ่ฆชๅฏ†ๅบฆใ‚ทใ‚นใƒ†ใƒ ๅˆคๅฎš<br/>(shy/friendly/open)
    Backend->>Gemini: AIๅฟœ็ญ”็”Ÿๆˆใƒชใ‚ฏใ‚จใ‚นใƒˆ<br/>(persona, ใ‚ทใƒใƒฅใ‚จใƒผใ‚ทใƒงใƒณๅซใ‚€)
    Gemini-->>Backend: AIๅฟœ็ญ”ใƒ†ใ‚ญใ‚นใƒˆ
    Backend-->>Frontend: AIๅฟœ็ญ”

    Frontend->>Backend: POST /api/tts
    Backend->>ElevenLabs: ใƒ†ใ‚ญใ‚นใƒˆโ†’้Ÿณๅฃฐๅค‰ๆ›
    ElevenLabs-->>Backend: ้Ÿณๅฃฐใƒ•ใ‚กใ‚คใƒซ
    Backend->>Storage: ้Ÿณๅฃฐใƒ•ใ‚กใ‚คใƒซไฟๅญ˜
    Storage-->>Backend: ไฟๅญ˜ๅฎŒไบ†
    Backend-->>Frontend: ้ŸณๅฃฐURL

    Frontend->>Frontend: Web Audio API<br/>ๅ‘จๆณขๆ•ฐๅˆ†ๆž(FFT)
    Frontend->>Frontend: VRMใ‚ขใƒใ‚ฟใƒผ<br/>ใƒชใƒƒใƒ—ใ‚ทใƒณใ‚ฏๅˆถๅพก

    par ่กจๆƒ…ๅˆ†ๆž
        Frontend->>Frontend: MediaPipe<br/>478็‚นใƒฉใƒณใƒ‰ใƒžใƒผใ‚ฏๆคœๅ‡บ
        Frontend->>Frontend: ็ฌ‘้ก”ใƒป่ฆ–็ทšๅˆ†ๆž
    end

    Frontend->>Backend: POST /api/sessions/:id/messages
    Backend->>DB: ใƒกใƒƒใ‚ปใƒผใ‚ธไฟๅญ˜

    Frontend->>Backend: POST /api/sessions/:id/gestures
    Backend->>DB: GestureMetricsไฟๅญ˜

    User->>Frontend: ใ‚ปใƒƒใ‚ทใƒงใƒณ็ต‚ไบ†
    Frontend->>Backend: POST /api/conversation/feedback
    Backend->>Gemini: ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ็”Ÿๆˆ<br/>(ไผš่ฉฑๅฑฅๆญด+ใ‚ธใ‚งใ‚นใƒใƒฃใƒผใƒกใƒˆใƒชใ‚ฏใ‚น)
    Gemini-->>Backend: ่ฉ•ไพกใƒปๆ”นๅ–„็‚น
    Backend->>DB: Feedbackไฟๅญ˜
    Backend-->>Frontend: ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ
    Frontend->>User: ใƒ•ใ‚ฃใƒผใƒ‰ใƒใƒƒใ‚ฏ่กจ็คบ
Loading

Technology Stack

graph LR
    subgraph "ใƒ•ใƒญใƒณใƒˆใ‚จใƒณใƒ‰ๆŠ€่ก“"
        NextJS[Next.js 15<br/>React 19]
        R3F[React Three Fiber<br/>Three.js]
        VRM[@pixiv/three-vrm<br/>VRMใƒขใƒ‡ใƒซ]
        MediaPipeLib[MediaPipe<br/>้ก”ๅˆ†ๆž]
        WebAPIs[Web APIs<br/>Speech/Audio]
        Agora1[Agora RTC SDK]
    end

    subgraph "ใƒใƒƒใ‚ฏใ‚จใƒณใƒ‰ๆŠ€่ก“"
        Hono[Hono Framework]
        Workers[Cloudflare Workers]
        ZodOpenAPI[@hono/zod-openapi]
        BetterAuth[Better Auth]
    end

    subgraph "ใƒ‡ใƒผใ‚ฟๅฑค"
        Prisma[Prisma ORM]
        PrismaAccelerate[Prisma Accelerate]
        Supabase[Supabase<br/>PostgreSQL + Storage]
    end

    subgraph "AI/ๅค–้ƒจใ‚ตใƒผใƒ“ใ‚น"
        Gemini2[Google Gemini<br/>2.5 Flash]
        ElevenLabs2[ElevenLabs<br/>TTS/STT]
        Agora2[Agora RTC]
    end

    NextJS --> R3F
    R3F --> VRM
    NextJS --> MediaPipeLib
    NextJS --> WebAPIs
    NextJS --> Agora1

    Hono --> Workers
    Hono --> ZodOpenAPI
    Hono --> BetterAuth

    Hono --> Prisma
    Prisma --> PrismaAccelerate
    PrismaAccelerate --> Supabase
    NextJS --> Prisma

    Hono --> Gemini2
    Hono --> ElevenLabs2
    Hono --> Agora2
Loading

Technology Stack Details

Frontend

  • Framework: Next.js 15 (App Router), React 19
  • 3D Rendering:
    • React Three Fiber (Three.js React wrapper)
    • @pixiv/three-vrm (VRM model loading and rendering)
    • @pixiv/three-vrm-animation (Animation control)
  • Facial Analysis: MediaPipe Tasks Vision (478-point face landmark detection)
  • Audio Processing:
    • Web Speech API (Speech recognition)
    • Web Audio API (Frequency analysis for lip sync)
    • ElevenLabs API (TTS/STT)
  • Video Calling: Agora RTC SDK
  • Styling: TailwindCSS 4
  • UI: Radix UI
  • Testing: Jest, React Testing Library
  • Type Safety: TypeScript

Backend

  • Framework: Hono (Lightweight web framework)
  • Runtime: Cloudflare Workers (Edge computing)
  • API Design: @hono/zod-openapi (Auto-generate OpenAPI schema)
  • ORM: Prisma (PostgreSQL)
  • Authentication: Better Auth
  • Validation: Zod

Database & Storage

  • DB: Supabase PostgreSQL
  • ORM: Prisma (Two separate client generations for frontend and backend)
  • Storage: Supabase Storage (Audio file storage)
  • Edge DB Connection: Prisma Accelerate (Cloudflare Workers support)

AI & External Services

  • Conversation Generation: Google Gemini 2.5 Flash
  • Speech Synthesis: ElevenLabs API
  • Video Calling: Agora RTC

Infrastructure & Deployment

  • Frontend: Vercel
  • Backend: Cloudflare Workers
  • Monorepo Management: pnpm workspace
  • CI/CD: GitHub Actions, Husky (Git hooks)
  • Code Quality: Biome (Linter & Formatter)

Directory Structure

Root Structure

/home/daccho/code/tk_b_2515/
โ”œโ”€โ”€ frontend/          # Next.js frontend
โ”œโ”€โ”€ backend/           # Hono backend
โ”œโ”€โ”€ prisma/            # Prisma schema (shared)
โ”œโ”€โ”€ .claude/           # Claude Code configuration
โ”œโ”€โ”€ .tmp/              # Temporary files (design docs, task management)
โ”œโ”€โ”€ docs/              # Documentation
โ”œโ”€โ”€ pnpm-workspace.yaml # Monorepo configuration
โ””โ”€โ”€ package.json       # Root package

Frontend Structure

frontend/src/
โ”œโ”€โ”€ app/                 # Next.js App Router pages
โ”‚   โ”œโ”€โ”€ page.tsx         # Home page
โ”‚   โ”œโ”€โ”€ login/           # Login
โ”‚   โ”œโ”€โ”€ signup/          # Signup
โ”‚   โ”œโ”€โ”€ simulation/      # AI conversation simulation
โ”‚   โ”œโ”€โ”€ feedback/        # Feedback display
โ”‚   โ”œโ”€โ”€ practice/        # Human partner practice booking
โ”‚   โ”œโ”€โ”€ partner/         # Partner pages
โ”‚   โ”œโ”€โ”€ session/         # Session details
โ”‚   โ”œโ”€โ”€ test-call/       # Video call test
โ”‚   โ””โ”€โ”€ api/             # Next.js API Routes (auth callbacks, etc.)
โ”œโ”€โ”€ components/          # React components
โ”‚   โ”œโ”€โ”€ Avatar/          # VRM avatar components
โ”‚   โ”œโ”€โ”€ simulation/      # Simulation UI
โ”‚   โ”œโ”€โ”€ auth/            # Auth UI
โ”‚   โ””โ”€โ”€ ui/              # Common UI components
โ”œโ”€โ”€ hooks/               # Custom hooks
โ”‚   โ”œโ”€โ”€ useConversation.ts      # Conversation management (STTโ†’AIโ†’TTS)
โ”‚   โ”œโ”€โ”€ useFacialAnalysis.ts    # MediaPipe facial analysis
โ”‚   โ”œโ”€โ”€ useLipSync.ts           # Web Audio API lip sync
โ”‚   โ”œโ”€โ”€ useVRM.ts               # VRM control
โ”‚   โ”œโ”€โ”€ useAudioRecorder.ts     # Audio recording
โ”‚   โ”œโ”€โ”€ useGestureTracking.ts   # Gesture tracking
โ”‚   โ”œโ”€โ”€ useAgoraCall.ts         # Agora video call
โ”‚   โ””โ”€โ”€ useSimulationTimer.ts   # Timer
โ”œโ”€โ”€ lib/                 # Utilities
โ”‚   โ”œโ”€โ”€ api/             # API client (fetch wrapper)
โ”‚   โ”œโ”€โ”€ audio/           # Audio analysis utilities
โ”‚   โ”œโ”€โ”€ cache/           # Audio cache
โ”‚   โ”œโ”€โ”€ auth.ts          # Better Auth configuration
โ”‚   โ”œโ”€โ”€ prisma.ts        # Prisma client
โ”‚   โ””โ”€โ”€ supabase.ts      # Supabase client
โ””โ”€โ”€ __tests__/           # Test code

Backend Structure

backend/src/
โ”œโ”€โ”€ index.ts             # Entry point
โ”œโ”€โ”€ server.ts            # Local development server
โ”œโ”€โ”€ routes/              # API routes
โ”‚   โ”œโ”€โ”€ api.ts           # Route integration
โ”‚   โ””โ”€โ”€ modules/         # Feature-based route modules
โ”‚       โ”œโ”€โ”€ sessions.routes.ts      # Session management
โ”‚       โ”œโ”€โ”€ messages.routes.ts      # Messages
โ”‚       โ”œโ”€โ”€ conversation.routes.ts  # AI conversation generation
โ”‚       โ”œโ”€โ”€ feedback.routes.ts      # Feedback
โ”‚       โ”œโ”€โ”€ speech.routes.ts        # STT/TTS
โ”‚       โ”œโ”€โ”€ agora.routes.ts         # Agora token generation
โ”‚       โ”œโ”€โ”€ partners.routes.ts      # Partner management
โ”‚       โ”œโ”€โ”€ auth.routes.ts          # Authentication
โ”‚       โ””โ”€โ”€ debug.routes.ts         # Debug endpoints
โ”œโ”€โ”€ services/            # Business logic
โ”‚   โ”œโ”€โ”€ conversation.ts  # AI conversation & feedback generation
โ”‚   โ”œโ”€โ”€ ai-client.ts     # Gemini API client
โ”‚   โ”œโ”€โ”€ stt.ts           # STT/TTS processing
โ”‚   โ””โ”€โ”€ client.ts        # Common API client
โ”œโ”€โ”€ middleware/          # Middleware
โ”‚   โ”œโ”€โ”€ env.ts           # Environment variable validation
โ”‚   โ”œโ”€โ”€ logger.ts        # Logging
โ”‚   โ””โ”€โ”€ error.ts         # Error handling
โ””โ”€โ”€ lib/                 # Libraries
    โ”œโ”€โ”€ prisma.ts        # Prisma client
    โ””โ”€โ”€ supabase.ts      # Supabase client

Data Models (Prisma Schema)

AI Conversation Related

  • Conversation: Conversation session
  • Message: Message history (user/assistant)
  • Feedback: AI feedback (verbal/non-verbal evaluation, scores)
  • GestureMetrics: Facial expression and eye contact data aggregation

Human Partner Related

  • HumanPartnerSession: Video call session with human partner
  • HumanPartnerFeedback: Session feedback (AI + partner evaluation)
  • PracticeSlot: Partner booking slot management

Authentication

  • User: User (role: user/partner/admin)
  • Account: Account information (Better Auth)
  • Session: Session information
  • Verification: Verification information

Relations

  • Conversation 1:N Message
  • Conversation 1:1 Feedback
  • Conversation 1:1 GestureMetrics
  • HumanPartnerSession 1:1 HumanPartnerFeedback
  • HumanPartnerSession N:1 PracticeSlot
  • User 1:N HumanPartnerSession (both user and partner relations)

Core Architecture Patterns

Intimacy System

Three levels of intimacy (shy/friendly/open) that change AI response style. System prompts are dynamically generated, controlling emoji usage rules as well.

Location: backend/src/services/conversation.ts

MediaPipe Facial Analysis

Independent algorithm analyzes eye contact and smiles from 478 face landmarks. Performance optimized at 3fps (333ms intervals).

Location: frontend/src/hooks/useFacialAnalysis.ts

Lip Sync

Frequency analysis (FFT 2048) with Web Audio API, human voice range extraction (300-3400Hz), natural animation with attack/release control.

Location: frontend/src/hooks/useLipSync.ts

Comprehensive Feedback

Evaluation from both verbal (conversation skills) and non-verbal (expressions, eye contact) aspects. Deduction system, specific improvement suggestions.

Deployment Architecture

[Frontend: Vercel]
    Next.js 15
    โ†“
[Backend: Cloudflare Workers]
    Hono + Prisma Accelerate
    โ†“
[Database: Supabase PostgreSQL]
[Storage: Supabase Storage]
[External APIs]
    - Google Gemini
    - ElevenLabs
    - Agora RTC

Environment Variables

  • Frontend: Vercel Environment Variables
  • Backend: Cloudflare Workers Secrets
  • Local Development: .env (refer to .env.example)

Unique Design Decisions

  1. Monorepo Structure: Separate frontend/backend while sharing Prisma schema
  2. Two Prisma Clients: Separate output for frontend and backend
  3. Intimacy Level Control: Conversation style changes based on message count
  4. 3fps Facial Analysis: Balance between performance and accuracy
  5. Edge Computing: Low latency with Cloudflare Workers
  6. Human Partner Feature: Practice with real people, not just AI

Technical Challenges & Solutions

Prisma on Cloudflare Workers

  • Challenge: Direct DB connection not possible on edge runtime
  • Solution: Use Prisma Accelerate

MediaPipe Timestamp Constraints

  • Challenge: detectForVideo requires monotonically increasing integers (microseconds)
  • Solution: Maintain previous timestamp, implement monotonic increase guarantee logic

Autoplay Restrictions

  • Challenge: TTS audio playback fails due to browser autoplay restrictions
  • Solution: Error handling, display message prompting user interaction

API Endpoints

/api
โ”œโ”€โ”€ /health                         # Health check
โ”œโ”€โ”€ /auth                           # Authentication (Better Auth)
โ”œโ”€โ”€ /sessions                       # Session management
โ”‚   โ”œโ”€โ”€ POST /sessions              # Create session
โ”‚   โ”œโ”€โ”€ GET /sessions/:id           # Get session
โ”‚   โ”œโ”€โ”€ PATCH /sessions/:id/finish  # Finish session
โ”‚   โ”œโ”€โ”€ POST /sessions/:id/messages # Save message
โ”‚   โ”œโ”€โ”€ POST /sessions/:id/gestures # Save gesture metrics
โ”‚   โ””โ”€โ”€ POST /sessions/:id/feedback # Save feedback
โ”œโ”€โ”€ /conversation                   # AI conversation
โ”‚   โ”œโ”€โ”€ POST /generate              # Generate AI response (Gemini)
โ”‚   โ””โ”€โ”€ POST /feedback              # Generate feedback
โ”œโ”€โ”€ /stt                            # Speech-to-text
โ”œโ”€โ”€ /tts                            # Text-to-speech
โ”œโ”€โ”€ /voices                         # Available voices list
โ”œโ”€โ”€ /agora/token                    # Agora token generation
โ””โ”€โ”€ /partners                       # Partner management
    โ”œโ”€โ”€ GET /sessions/waiting       # List waiting sessions
    โ””โ”€โ”€ PATCH /sessions/:id/join    # Join session

Key Features

AI Conversation Simulation

  • 3D VRM avatar with realistic lip sync
  • Real-time facial expression and eye contact analysis
  • Multi-level intimacy system
  • Multiple scenarios (library, classroom, Christmas)

Comprehensive Feedback

  • Verbal evaluation (leadership, continuity, development, empathy, question appropriateness)
  • Non-verbal evaluation (smile, eye contact stability, eye direction)
  • Deduction system (AI question count, conversation breaks, short utterances)
  • Specific improvement suggestions

Human Partner Practice

  • Video call sessions with real partners
  • Booking slot management
  • Combined AI and partner feedback

Performance Optimizations

  1. Audio Caching: Cache TTS audio files to reduce API calls
  2. 3fps Facial Analysis: Reduce processing frequency for better performance
  3. Edge Computing: Deploy backend on Cloudflare Workers for low latency
  4. Lazy Loading: Lazy load heavy 3D models and libraries
  5. Prisma Accelerate: Connection pooling for database access