Skip to content

feat(stage-ui): add abstract base interfaces for transcription and sp…#961

Draft
lockrush-dev wants to merge 48 commits intomoeru-ai:mainfrom
lockrush-dev:feat/provider-abstract-interfaces
Draft

feat(stage-ui): add abstract base interfaces for transcription and sp…#961
lockrush-dev wants to merge 48 commits intomoeru-ai:mainfrom
lockrush-dev:feat/provider-abstract-interfaces

Conversation

@lockrush-dev
Copy link
Contributor

@lockrush-dev lockrush-dev commented Jan 16, 2026

Refactor All Speech and Transcription Providers to Unified defineProvider() Pattern

Summary

This PR refactors all speech and transcription providers to use the unified defineProvider() pattern (as established in PR #968), improving architectural consistency, reducing code duplication, and enabling consistent provider discovery across all provider types.

Changes

Core Architecture

  • Unified Pattern: Refactored all speech and transcription providers to use the unified defineProvider() pattern instead of separate registries
  • Converter Function: Added convertProviderDefinitionToMetadata() to bridge the unified defineProvider() pattern with the existing store's ProviderMetadata format
  • Task-Based Classification: Providers now declare their capabilities via tasks array (e.g., ['text-to-speech', 'speech'] for speech providers, ['speech-to-text', 'automatic-speech-recognition', 'asr', 'stt'] for transcription providers)
  • Extra Methods: Provider-specific methods like listModels and listVoices are now defined in extraMethods within the unified pattern
  • Deprecated Interfaces: Marked old BaseSpeechProviderDefinition and BaseTranscriptionProviderDefinition interfaces as deprecated (kept for backward compatibility)
  • Shared Utilities: Created normalizeBaseUrl utility function to eliminate code duplication across provider implementations

Provider Implementations

Speech Providers (9 total) - All Migrated ✅

  • OpenAI Speech: Refactored to use defineProvider() with tasks: ['text-to-speech', 'speech'] and extraMethods for listModels and listVoices
  • OpenAI Compatible Speech: Refactored to use defineProvider() with tasks: ['text-to-speech', 'speech'] and extraMethods for listModels and listVoices
  • ElevenLabs: Migrated to unified pattern with voice listing support
  • Deepgram TTS: Migrated to unified pattern with voice listing support
  • Microsoft Speech: Migrated to unified pattern with region support and voice listing
  • Index-TTS vLLM: Migrated to unified pattern with voice listing support
  • Alibaba Cloud Model Studio: Migrated to unified pattern with voice listing support
  • Volcengine: Migrated to unified pattern with app configuration and voice listing
  • Player2 Speech: Migrated to unified pattern with voice listing support

Transcription Providers (4 total) - All Migrated ✅

  • OpenAI Transcription: Refactored to use defineProvider() with tasks: ['speech-to-text', 'automatic-speech-recognition', 'asr', 'stt'] and extraMethods for listModels
  • OpenAI Compatible Transcription: Refactored to use defineProvider() with tasks: ['speech-to-text', 'automatic-speech-recognition', 'asr', 'stt'] and extraMethods for listModels
  • Browser Web Speech API: Migrated to unified pattern with browser availability validation
  • Aliyun NLS Transcription: Migrated to unified pattern with streaming transcription support and region configuration

All providers are now automatically registered when imported via the unified registry and can be discovered via listProviders() and getDefinedProvider().

Code Quality Improvements

  • Type Safety: Improved type safety by using Zod schemas for configuration validation within the unified pattern
  • Code Deduplication: Extracted base URL normalization logic to a shared utility function used across all provider implementations
  • Consistent Pattern: Speech and transcription providers now follow the exact same pattern as chat providers, enabling consistent architecture across all provider types
  • Type Fixes: Resolved TypeScript errors related to ComposerTranslation types, validator signatures, and provider instance type casting
  • Validator Standardization: All validators now use consistent error format with reasonKey support and proper contextOptions parameter handling

Settings UI & Store Improvements

  • Empty Menu Items: Fixed settings index page to filter out routes with empty titles, preventing empty menu items from rendering
  • Model Compatibility: Added watch handlers in speech and hearing stores to ensure model selections remain valid when switching providers
  • Provider-Specific Defaults: Improved default model selection logic to be provider-aware and prevent incompatible model selections

Benefits

  • Complete Migration: All speech and transcription providers now use the unified defineProvider() pattern
  • Unified Architecture: All provider types (chat, speech, transcription, embed) now use the same defineProvider() pattern
  • Consistency: Speech and transcription providers follow the exact same interface contract and registration pattern as chat providers
  • Discoverability: Providers can be programmatically discovered and listed via the unified listProviders() and getDefinedProvider() functions
  • Reduced Duplication: Validation, configuration logic, and utility functions are centralized
  • Easier Extension: Adding new providers is simplified with clear interface requirements and automatic registration
  • Type Safety: Improved TypeScript type safety with explicit provider contracts, Zod schemas, and proper type casting
  • Future-Ready: Architecture is now consistent across all provider types, making future additions and maintenance easier

Technical Details

Provider Definition Pattern

export const providerOpenAISpeech = defineProvider<Config>({
  id: 'openai-audio-speech',
  tasks: ['text-to-speech', 'speech'],
  createProvider: (config) => { /* ... */ },
  extraMethods: {
    listModels: (config, provider) => { /* ... */ },
    listVoices: (config, provider) => { /* ... */ },
  },
  // ... other fields
})

Store Integration

The convertProviderDefinitionToMetadata() function converts unified provider definitions to the store's ProviderMetadata format, enabling seamless integration with existing provider store logic.

Validator Pattern

All validators follow a consistent pattern with proper error handling:

validators: {
  validateConfig: [
    ({ t }: { t: ComposerTranslation }) => ({
      id: 'provider-id:check-config',
      name: t('validator.name.key'),
      validator: async (config: Config, _contextOptions: { t: ComposerTranslation }): Promise<ProviderValidationResult> => {
        const errors: Array<{ error: unknown, errorKey?: string }> = []
        // ... validation logic
        return {
          errors,
          reason: errors.length > 0 ? errorMessages.join(', ') : '',
          reasonKey: errors.length > 0 ? 'provider-id:check-config:invalid' : '',
          valid: errors.length === 0,
        }
      },
    }),
  ],
}

Testing

  • ✅ Type checks pass: pnpm run typecheck
  • ✅ Linter checks pass: pnpm lint
  • ✅ Build passes: pnpm -F @proj-airi/stage-tamagotchi build
  • ✅ Existing functionality preserved: All providers maintain backward compatibility
  • ✅ Registry functions work correctly: Providers are automatically registered and can be discovered via unified registry functions
  • ✅ Model compatibility: Provider switching correctly handles model selection

Migration Notes

  • Old defineSpeechProvider() and defineTranscriptionProvider() functions are deprecated but still exported for backward compatibility
  • Old BaseSpeechProviderDefinition and BaseTranscriptionProviderDefinition interfaces are deprecated but still available
  • New providers should use the unified defineProvider() pattern
  • Existing code using deprecated functions will continue to work but should be migrated

Related

…eech providers

- Add BaseTranscriptionProviderDefinition and BaseSpeechProviderDefinition interfaces
- Implement provider adapters to convert between ProviderMetadata and base interfaces
- Add converter functions to integrate base providers with existing providers store
- Implement OpenAI and OpenAI Compatible providers using new base interfaces
- Refactor OpenAI providers to use structured provider definitions

This architectural improvement provides a consistent contract for transcription
and speech providers, making it easier to add new providers and reduce code
duplication in validation and configuration logic.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lockrush-dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the provider architecture within the stage-ui package by introducing abstract base interfaces for transcription and speech services. The primary goal is to standardize how different providers are defined and integrated, leading to a more maintainable, extensible, and type-safe system. This change streamlines the process of adding new providers and ensures a uniform approach to configuration and validation across all speech and transcription functionalities.

Highlights

  • New Abstract Base Interfaces: Introduced BaseTranscriptionProviderDefinition and BaseSpeechProviderDefinition to establish a consistent contract for all transcription and speech providers, enhancing architectural consistency.
  • Provider Adapters and Converters: Added createTranscriptionProviderAdapter and createSpeechProviderAdapter functions to seamlessly convert existing ProviderMetadata into the new base interface definitions. New converter functions (convertSpeechProviderToMetadata, convertTranscriptionProviderToMetadata) integrate base provider implementations with the existing provider store.
  • Refactored Provider Implementations: OpenAI Speech, OpenAI Transcription, OpenAI Compatible Speech, and OpenAI Compatible Transcription providers have been refactored to utilize the new base interfaces, centralizing validation, model/voice listing, and configuration management.
  • Architectural Benefits: The changes lead to improved consistency across providers, reduced code duplication by centralizing common logic, simplified extension for adding new providers, and enhanced TypeScript type safety with explicit contracts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a solid architectural improvement by creating base interfaces for transcription and speech providers. The refactoring of OpenAI and OpenAI-compatible providers to use these new interfaces significantly reduces code duplication in the main providers.ts store and improves consistency. The use of converter functions to bridge the new provider definitions with the existing metadata-based system is a good integration strategy.

My review includes a few suggestions to further enhance maintainability and type safety:

  • Removing newly added but unused adapter files.
  • Replacing as any type assertions with more specific types in the converter functions to improve type safety.
  • Consolidating duplicated URL normalization logic into a shared utility.

Resolved conflicts by keeping the new base provider interface implementation
using convertSpeechProviderToMetadata and convertTranscriptionProviderToMetadata.
Updated provider implementations to include the latest models and voices from upstream:
- Added gpt-4o-mini-tts-2025-12-15 model to OpenAI Speech
- Added gpt-4o-mini-transcribe-2025-12-15 and gpt-4o-transcribe-diarize to OpenAI Transcription
- Updated voice compatibleModels to include the new model
- Improved model descriptions
@github-actions
Copy link
Contributor

github-actions bot commented Jan 16, 2026

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, @nekomeowww, @sumimakito, @luoling8192, @LemonNekoGH, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

lockrush-dev and others added 8 commits January 16, 2026 17:18
Remove createSpeechProviderAdapter and createTranscriptionProviderAdapter
as they are not used anywhere in the codebase. These adapters were intended
to convert from ProviderMetadata to BaseProviderDefinition, but all current
providers are implemented directly using the base interfaces.

The adapters can be re-added later if needed for migrating legacy providers.
- Extract normalizeBaseUrl to shared utility to reduce duplication
- Fix type safety in converters.ts by using specific types instead of 'as any'
- Remove redundant async/await wrappers in converter functions
- Update all provider implementations to use shared normalizeBaseUrl utility

This addresses code review feedback about code duplication and type safety.
@nekomeowww
Copy link
Member

Since you've completed so many tasks about speech & transcription API, would you love to try extending and try using this https://github.com/n1n-api/airi/tree/feat/n1n-provider/packages/stage-ui/src/libs/providers/providers pattern to defineProvider(s) for transcription services & speech services?

@lockrush-dev
Copy link
Contributor Author

lockrush-dev commented Jan 17, 2026

Since you've completed so many tasks about speech & transcription API, would you love to try extending and try using this https://github.com/n1n-api/airi/tree/feat/n1n-provider/packages/stage-ui/src/libs/providers/providers pattern to defineProvider(s) for transcription services & speech services?

I can certainly try! I'll try to take care of that as a part of this pull request.

lockrush-dev and others added 10 commits January 17, 2026 15:55
…ription providers

- Create registry-speech.ts and registry-transcription.ts with defineSpeechProvider() and defineTranscriptionProvider() helpers
- Refactor OpenAI and OpenAI-compatible speech/transcription providers to use new registry pattern
- Export registry functions (listSpeechProviders, listTranscriptionProviders, etc.) for programmatic discovery
- Follows same pattern as existing defineProvider() for chat providers

This provides a consistent architecture across all provider types and enables centralized discovery of speech and transcription providers.
- Fix undefined title type errors in settings.vue layout (WindowTitleBar and PageHeader)
- Ensure title is always a string in settings index page (IconItem)
- Remove duplicate 'li' property in docs/uno.config.ts

These fixes resolve CI build failures in stage-tamagotchi typecheck.
- Ensure routeHeaderMetadata.title is always a string when routeHeaderMetadata exists
- Add conditional check in template to only render PageHeader when title exists
- Fixes 'string | undefined' is not assignable to type 'string' error
- Keep both 'ul' and 'li' styles in docs/uno.config.ts
- Auto-merged settings.vue and index.vue files (conflicts resolved automatically)
- All TypeScript fixes preserved
lockrush-dev pushed a commit to lockrush-dev/airi that referenced this pull request Jan 18, 2026
…fineProvider pattern

Refactor OpenAI and OpenAI-compatible speech/transcription providers to use
the unified `defineProvider()` pattern (matching PR moeru-ai#968 n1n provider pattern)
instead of separate registries. This unifies all providers under a single
consistent API.

Changes:
- Refactor OpenAI speech/transcription providers to use `defineProvider()` with
  `tasks` and `extraMethods` instead of `defineSpeechProvider`/`defineTranscriptionProvider`
- Refactor OpenAI-compatible speech/transcription providers similarly
- Add `convertProviderDefinitionToMetadata()` converter function to bridge unified
  pattern with existing store's `ProviderMetadata` format
- Update store to use new converter function instead of old separate converters
- Mark old base interfaces and registry exports as deprecated for backward compatibility
- Fix settings index page to filter out routes with empty titles (preventing empty
  menu items from rendering)

BREAKING CHANGE: Speech and transcription providers now use the unified
`defineProvider()` pattern. Old `defineSpeechProvider` and `defineTranscriptionProvider`
functions are deprecated but still available for backward compatibility.

Refs: PR moeru-ai#961, PR moeru-ai#968
…fineProvider pattern

Refactor OpenAI and OpenAI-compatible speech/transcription providers to use
the unified `defineProvider()` pattern (matching PR moeru-ai#968 n1n provider pattern)
instead of separate registries. This unifies all providers under a single
consistent API.

Changes:
- Refactor OpenAI speech/transcription providers to use `defineProvider()` with
  `tasks` and `extraMethods` instead of `defineSpeechProvider`/`defineTranscriptionProvider`
- Refactor OpenAI-compatible speech/transcription providers similarly
- Add `convertProviderDefinitionToMetadata()` converter function to bridge unified
  pattern with existing store's `ProviderMetadata` format
- Update store to use new converter function instead of old separate converters
- Mark old base interfaces and registry exports as deprecated for backward compatibility
- Fix settings index page to filter out routes with empty titles (preventing empty
  menu items from rendering)

BREAKING CHANGE: Speech and transcription providers now use the unified
`defineProvider()` pattern. Old `defineSpeechProvider` and `defineTranscriptionProvider`
functions are deprecated but still available for backward compatibility.

Refs: PR moeru-ai#961, PR moeru-ai#968
@lockrush-dev lockrush-dev force-pushed the feat/provider-abstract-interfaces branch from 66beb66 to c58ccbd Compare January 18, 2026 01:30
- Fix t function parameter type to use ComposerTranslation
- Add wrapper function to adapt ComposerTranslation signature
- Fix createProvider return type casting
- Remove unused imports
…in converters

- Add explicit type cast for contextOptions in getValidatorsOfProvider call
- Ensure t is properly typed as ComposerTranslation when passed to validators
@lockrush-dev
Copy link
Contributor Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request is an excellent and significant refactoring that unifies speech and transcription providers under the defineProvider() pattern. This greatly improves architectural consistency, reduces code duplication, and enhances maintainability. The changes are well-structured, and the use of a converter function to bridge the new provider definitions with the existing store is a smart approach for gradual migration.

My review includes a few suggestions for improvement: one for code simplification in a Vue component, and a high-severity comment on a potential fragility in the new provider converter that could affect future development.

Avoid passing '{}' into extraMethods and simplify settings PageHeader bindings.
@lockrush-dev

This comment was marked as outdated.

lockrush-dev and others added 7 commits January 20, 2026 12:44
…nified defineProvider pattern

- Migrate all 7 speech providers to unified defineProvider() pattern:
  - elevenlabs
  - deepgram-tts
  - microsoft-speech
  - index-tts-vllm
  - alibaba-cloud-model-studio
  - volcengine
  - player2-speech

- Migrate all 2 transcription providers to unified defineProvider() pattern:
  - browser-web-speech-api
  - aliyun-nls-transcription

- Fix convertProviderDefinitionToMetadata bug where nameKey/descriptionKey
  were storing translated strings instead of i18n keys. Use identity function
  (keyExtractor) instead of translator (tWrapper) to extract i18n key strings.

- Update providers.ts to use convertProviderDefinitionToMetadata for all
  migrated providers instead of old pattern.

- Register all migrated providers in providers/index.ts

This completes the migration requested by Niko in PR moeru-ai#968 to unify provider
patterns across the codebase.
- Fix import paths: change ProviderValidationResult from base-types to types
- Update validator signatures to accept contextOptions parameter
- Fix error format: use Array<{error, errorKey?}> and include reasonKey
- Fix type casting: use 'as unknown as' for VoiceProviderWithExtraOptions
- Fix Zod enum: use literal array instead of readonly const with errorMap
- Fix unused parameters: prefix with underscore (_config, _contextOptions)
- Fix type names: ElevenLabsConfig, DeepgramConfig, etc.
- Fix Microsoft Speech region handling with default value
@lockrush-dev
Copy link
Contributor Author

@gemini-code-assist review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly refactors the speech and transcription providers to align with the unified defineProvider() pattern, enhancing architectural consistency and reducing code duplication. The introduction of convertProviderDefinitionToMetadata() effectively bridges the new pattern with the existing store's ProviderMetadata format. Additionally, the changes correctly address the issue of empty menu items in the settings UI and improve model/voice selection consistency when switching providers. The overall direction of these changes is positive, leading to a more maintainable and extensible provider system.

@lockrush-dev lockrush-dev marked this pull request as draft January 22, 2026 16:26
lockrush-dev and others added 9 commits January 22, 2026 11:27
…xy server guidance

- Update default base URL from unspeech.hyp3r.link to api.elevenlabs.io/v1/
- Add migration logic to fix old incorrect base URLs (unspeech.hyp3r.link, api.elevenlabs.io/v2/)
- Add informational Alert component explaining proxy server requirements for web browsers
- Disable voice dropdown until API key is configured
- Improve apiKeyConfigured validation to check for non-empty trimmed strings
- Add documentation about CORS limitations and proxy server usage
…youts

- Migrate comet-api-transcription and openai-compatible-audio-transcription to use TranscriptionProviderSettings wrapper
- Ensure all providers use consistent side-by-side layout (settings 40%, playground 60%)
- Move validation alerts to advanced-settings slot for better organization
- Simplify credential management by leveraging wrapper components
- All speech providers already use SpeechProviderSettings wrapper consistently
- Maintain custom layouts for browser-web-speech-api and aliyun-nls-transcription due to specialized UIs
…API model lists

- Fix Live2D scale reactivity: restore scale from storeToRefs to maintain
  reactivity to store updates while allowing prop overrides
- Fix icon field handling: restore ?? '' fallback in settings index pages
  to prevent undefined icon values
- Add static model lists for CometAPI providers:
  - Speech: 9 TTS models (TTS, Kling TTS, GPT-4o variants, etc.)
  - Transcription: 12 STT models (Gemini variants, Whisper-1, GPT-4o variants)
- Update CometAPI UI: add model dropdowns to both speech and transcription
  settings pages (replacing manual input for transcription)
- Refactor: create useProviderConfig composable to eliminate code duplication
  across provider settings pages
- Update all provider pages to use useProviderConfig composable for consistent
  API key/base URL validation logic
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants