|
| 1 | +# whisperfile http server |
| 2 | + |
| 3 | +Simple http server. WAV Files are passed to the inference model via http requests. MP3, FLAC, and OGG files are automatically converted to WAV format via miniaudio. |
| 4 | + |
| 5 | +https://github.com/ggerganov/whisper.cpp/assets/1991296/e983ee53-8741-4eb5-9048-afe5e4594b8f |
| 6 | + |
| 7 | +## Usage |
| 8 | + |
| 9 | +Download the latest release (e.g. `whisperfile-0.9.0`), a model (e.g. `whisper-tiny.en-q5_1.bin`) and then: |
| 10 | + |
| 11 | +``` |
| 12 | +./whisperfile-0.9.0 -m whisper-tiny.en-q5_1.bin |
| 13 | +
|
| 14 | +usage: ./whisperfile-0.9.0 -m whisper-tiny.en-q5_1.bin [options] |
| 15 | +
|
| 16 | +options: |
| 17 | + -h, --help [default] show this help message and exit |
| 18 | + -t N, --threads N [4 ] number of threads to use during computation |
| 19 | + -p N, --processors N [1 ] number of processors to use during computation |
| 20 | + -ot N, --offset-t N [0 ] time offset in milliseconds |
| 21 | + -on N, --offset-n N [0 ] segment index offset |
| 22 | + -d N, --duration N [0 ] duration of audio to process in milliseconds |
| 23 | + -mc N, --max-context N [-1 ] maximum number of text context tokens to store |
| 24 | + -ml N, --max-len N [0 ] maximum segment length in characters |
| 25 | + -sow, --split-on-word [false ] split on word rather than on token |
| 26 | + -bo N, --best-of N [2 ] number of best candidates to keep |
| 27 | + -bs N, --beam-size N [-1 ] beam size for beam search |
| 28 | + -wt N, --word-thold N [0.01 ] word timestamp probability threshold |
| 29 | + -et N, --entropy-thold N [2.40 ] entropy threshold for decoder fail |
| 30 | + -lpt N, --logprob-thold N [-1.00 ] log probability threshold for decoder fail |
| 31 | + -debug, --debug-mode [false ] enable debug mode (eg. dump log_mel) |
| 32 | + -tr, --translate [false ] translate from source language to english |
| 33 | + -di, --diarize [false ] stereo audio diarization |
| 34 | + -tdrz, --tinydiarize [false ] enable tinydiarize (requires a tdrz model) |
| 35 | + -nf, --no-fallback [false ] do not use temperature fallback while decoding |
| 36 | + -ps, --print-special [false ] print special tokens |
| 37 | + -pc, --print-colors [false ] print colors |
| 38 | + -pr, --print-realtime [false ] print output in realtime |
| 39 | + -pp, --print-progress [false ] print progress |
| 40 | + -nt, --no-timestamps [false ] do not print timestamps |
| 41 | + -l LANG, --language LANG [en ] spoken language ('auto' for auto-detect) |
| 42 | + -dl, --detect-language [false ] exit after automatically detecting language |
| 43 | + --prompt PROMPT [ ] initial prompt |
| 44 | + -m FNAME, --model FNAME [models/ggml-base.en.bin] model path |
| 45 | + -oved D, --ov-e-device DNAME [CPU ] the OpenVINO device used for encode inference |
| 46 | + --host HOST, [127.0.0.1] Hostname/ip-adress for the server |
| 47 | + --port PORT, [8080 ] Port number for the server |
| 48 | +``` |
| 49 | + |
| 50 | +> [!WARNING] |
| 51 | +> **Do not run the server example with administrative privileges and ensure it's operated in a sandbox environment, especially since it involves risky operations like accepting user file uploads. Always validate and sanitize inputs to guard against potential security threats.** |
| 52 | +
|
| 53 | +## request examples |
| 54 | + |
| 55 | +**/inference** |
| 56 | +``` |
| 57 | +curl 127.0.0.1:8080/inference \ |
| 58 | +-H "Content-Type: multipart/form-data" \ |
| 59 | +-F file="@<file-path>" \ |
| 60 | +-F temperature="0.0" \ |
| 61 | +-F temperature_inc="0.2" \ |
| 62 | +-F response_format="json" |
| 63 | +``` |
| 64 | + |
| 65 | +**/load** |
| 66 | +``` |
| 67 | +curl 127.0.0.1:8080/load \ |
| 68 | +-H "Content-Type: multipart/form-data" \ |
| 69 | +-F model="<path-to-model-file>" |
| 70 | +``` |
0 commit comments