Skip to content

Implement Local Speech To Text #13

@stone-alex

Description

@stone-alex

Level – advanced.

Requirements

  • Pure Java integration impl, no DLLs, no JNI
  • Minimum Java version 21.
  • Linux first, Windows optional but nice to have.
  • Either use Java API for a local STT model or implement web-socket integration to an STT server (running locally or on a home lab)

Pipeline

  • Detect user audio hardware (use existing AudioFormatDetector class)
  • Implement Audio calibration (use existing AudioCalibrator class)
  • Implement as a true singleton and run speech recognition in its own thread
  • Open the audio stream and build VOD.
    • Check the RMS.
    • If RMS falls below an upper threshold and stay there for ~ 1 second, stop recording and send VOD to STT.
  • Sanitize the received transcript with STTSanitizer.getInstance().correctMistakes(fullTranscript)
  • Send a sanitized transcript to LLM by publishing UserInputEvent (text and confidence) and TTSInterruptEvent.
  • Honor streaming mode. When streaming mode is only only send transcript that contains word "computer"
  • place implementation in to elite.intel.ai.ears.local package
  • (See GoogleSTT class for inspiration)

Acceptance Criteria

  • High accuracy transcriptions
  • Low latency
  • Pure Java, no JNI
  • Runs on Linux
  • Gradle builds fat jar
  • Project runs in IntelliJ IDEA without errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions