-
-
Notifications
You must be signed in to change notification settings - Fork 365
Enable queue for the buttons #452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Hi @jhj0517, Thanks for working on this PR to enable queueing for the buttons! This is a highly anticipated feature that could significantly improve workflow and resource management. As you're implementing queueing, I wanted to raise a related point that would greatly enhance its utility, especially for users with varied hardware configurations like mine (low VRAM GPU). My primary need, especially when processing multiple files in a queue, is the ability to specify the compute device (GPU/CPU), the specific model, and even the transcription engine (e.g., Context and Justification:
Desired Queueing Workflow (with per-file control): Ideally, the queue would allow me to submit multiple files, each with its own specified transcription engine, model, and device assignments for its components (e.g., GPU for transcription, CPU for diarization). The system would then:
This concurrent execution capability, alongside per-file control over device, model, and engine, would ensure maximum resource utilisation and flexibility. Is this something that could be considered as an extension to the queueing functionality being introduced here, or perhaps in a subsequent iteration? robertrosenbusch/gfx803_rocm#26 (comment)
|
TitleRobust API Parameter Mapping: Support for Named Parameters in API as Well as Positional ContextCurrently, the API for Proposal
Implementation Sketch
Example# Current (positional):
result = client.predict(
param_7="english",
param_8=False,
...
api_name="/transcribe_file"
)
# Proposed (named):
result = client.predict(
Language="english",
Translate_to_English=False,
...
api_name="/transcribe_file"
)Related Issues/PRs
Thanks!Thanks for your awesome work and for considering this improvement! This would make scripting and API interop much more user-friendly and robust. |
|
Let’s break down your questions and give you actionable guidance for each, referencing how the codebase works and how you can map/label parameters for the API. 1. Where is it possible to specify the transcription device (not diarization or anything else)?The transcription device is the device used for the
main Whisper transcription model (not VAD, diarization, or music
separation). In your API, this is the parameter associated with In the code:
Example from your API mapping:
In the code, this looks like: Python Summary: 2. How can I enumerate the commands in a similar way to
|
| API param | Purpose/Label | Code location / param |
|---|---|---|
| param_45 | Device (for transcription) | WhisperParams.device |
| param_50 | Device (for UVR) | BGMSeparationParams.uvr_device |
| ... | ... | ... |
Example: How to Patch for Better API Labeling
To help with labeling in API requests (and not just param_x), you could:
This will let you output or log a human-readable mapping for every API request!
Summary of Steps:
- The transcription device is set in
WhisperParamsand maps toparam_45. - The order in
to_list()determines the API parameter order; enumerate them by reading or logging these methods. - To map labels, walk through the list construction and pair each slot with its UI label (from
to_gradio_inputs). - For your PR, consider adding a function that exposes
{label: value}mapping for API clarity.
Let me know if you want a ready-to-merge file patch or code snippet for any of these enhancements!
Feature Request: Add WhisperX SupportSummaryCurrently, Whisper-WebUI supports multiple Whisper implementations:
Proposal: Why WhisperX?
Implementation Plan & File Pointers1. Add WhisperX as an available backend
2. Implement a new wrapper for WhisperX
3. Wire up the backend in the factory/selection logic
4. Expose WhisperX-specific options in the UI (optional)
5. Update requirements
6. Update documentation
Example: Integrating WhisperX in the Backendimport whisperx
class WhisperXImplementation:
def __init__(self, model_name, device, compute_type="float16", **kwargs):
self.model = whisperx.load_model(model_name, device, compute_type=compute_type)
self.device = device
def transcribe(self, audio, **kwargs):
result = self.model.transcribe(audio, **kwargs)
# Alignment
if kwargs.get("do_align", True):
model_a, metadata = whisperx.load_align_model(
language_code=result["language"], device=self.device)
result = whisperx.align(
result["segments"], model_a, metadata, audio, self.device)
# Diarization (optional)
if kwargs.get("do_diarization", False):
diarize_model = whisperx.diarize.DiarizationPipeline(device=self.device)
diarize_segments = diarize_model(audio)
result = whisperx.assign_word_speakers(diarize_segments, result)
return resultSummary Table
Let me know if you’d like code snippets for other files or more detail on UI integration! |


Related issues / PRs. Summarize issues.
Summarize Changes
trigger_mode="multiple"for the buttonsdefault_concurrency_limitandmax_sizeas CLI args when runningapp.py