-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
help wantedExtra attention is neededExtra attention is needed
Description
Level – advanced.
Requirements
- Pure Java integration impl, no DLLs, no JNI
- Minimum Java version 21.
- Linux first, Windows optional but nice to have.
- Either use Java API for a local STT model or implement web-socket integration to an STT server (running locally or on a home lab)
Pipeline
- Detect user audio hardware (use existing AudioFormatDetector class)
- Implement Audio calibration (use existing AudioCalibrator class)
- Implement as a true singleton and run speech recognition in its own thread
- Open the audio stream and build VOD.
- Check the RMS.
- If RMS falls below an upper threshold and stay there for ~ 1 second, stop recording and send VOD to STT.
- Sanitize the received transcript with STTSanitizer.getInstance().correctMistakes(fullTranscript)
- Send a sanitized transcript to LLM by publishing UserInputEvent (text and confidence) and TTSInterruptEvent.
- Honor streaming mode. When streaming mode is only only send transcript that contains word "computer"
- place implementation in to elite.intel.ai.ears.local package
- (See GoogleSTT class for inspiration)
Acceptance Criteria
- High accuracy transcriptions
- Low latency
- Pure Java, no JNI
- Runs on Linux
- Gradle builds fat jar
- Project runs in IntelliJ IDEA without errors.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
help wantedExtra attention is neededExtra attention is needed