Skip to content

Commit c2a741f

Browse files
authored
Update InferenceAPI docs (TGI update mostly) (#1432)
* Update InferenceAPI docs (TGI update mostly) * fix script * fix again
1 parent 2b60350 commit c2a741f

File tree

5 files changed

+80
-59
lines changed

5 files changed

+80
-59
lines changed

docs/api-inference/tasks/automatic-speech-recognition.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ For more details about the `automatic-speech-recognition` task, check out its [d
3030
### Recommended models
3131

3232
- [openai/whisper-large-v3](https://huggingface.co/openai/whisper-large-v3): A powerful ASR model by OpenAI.
33-
- [facebook/seamless-m4t-v2-large](https://huggingface.co/facebook/seamless-m4t-v2-large): An end-to-end model that performs ASR and Speech Translation by MetaAI.
3433
- [pyannote/speaker-diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1): Powerful speaker diarization model.
3534

3635
This is only a subset of the supported models. Find the model that suits you best [here](https://huggingface.co/models?inference=warm&pipeline_tag=automatic-speech-recognition&sort=trending).
@@ -117,9 +116,9 @@ To use the JavaScript client, see `huggingface.js`'s [package reference](https:/
117116
| **                epsilon_cutoff** | _number_ | If set to float strictly between 0 and 1, only tokens with a conditional probability greater than epsilon_cutoff will be sampled. In the paper, suggested values range from 3e-4 to 9e-4, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191) for more details. |
118117
| **                eta_cutoff** | _number_ | Eta sampling is a hybrid of locally typical sampling and epsilon sampling. If set to float strictly between 0 and 1, a token is only considered if it is greater than either eta_cutoff or sqrt(eta_cutoff) * exp(-entropy(softmax(next_token_logits))). The latter term is intuitively the expected next token probability, scaled by sqrt(eta_cutoff). In the paper, suggested values range from 3e-4 to 2e-3, depending on the size of the model. See [Truncation Sampling as Language Model Desmoothing](https://hf.co/papers/2210.15191) for more details. |
119118
| **                max_length** | _integer_ | The maximum length (in tokens) of the generated text, including the input. |
120-
| **                max_new_tokens** | _integer_ | The maximum number of tokens to generate. Takes precedence over maxLength. |
119+
| **                max_new_tokens** | _integer_ | The maximum number of tokens to generate. Takes precedence over max_length. |
121120
| **                min_length** | _integer_ | The minimum length (in tokens) of the generated text, including the input. |
122-
| **                min_new_tokens** | _integer_ | The minimum number of tokens to generate. Takes precedence over maxLength. |
121+
| **                min_new_tokens** | _integer_ | The minimum number of tokens to generate. Takes precedence over min_length. |
123122
| **                do_sample** | _boolean_ | Whether to use sampling instead of greedy decoding when generating new tokens. |
124123
| **                early_stopping** | _enum_ | Possible values: never, true, false. |
125124
| **                num_beams** | _integer_ | Number of beams to use for beam search. |

0 commit comments

Comments
 (0)