You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Note that for externally hosted models, configs such as `--device` which relate to where to place a local model should not be used and do not function. Just like you can use `--model_args` to pass arbitrary arguments to the model constructor for local models, you can use it to pass arbitrary arguments to the model API for hosted models. See the documentation of the hosting service for information on what arguments they support.
365
365
366
366
| API or Inference Server | Implemented? |`--model <xxx>` name | Models supported: | Request Types: |
| Neuron via AWS Inf2 (Causal LMs) |:heavy_check_mark:|`neuronx`| Any decoder-only AutoModelForCausalLM supported to run on [huggingface-ami image for inferentia2](https://aws.amazon.com/marketplace/pp/prodview-gr3e6yiscria2)|`generate_until`, `loglikelihood`, `loglikelihood_rolling`|
380
-
|[Neural Magic DeepSparse](https://github.com/neuralmagic/deepsparse)|:heavy_check_mark:|`deepsparse`| Any LM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub with the "deepsparse" tag](https://huggingface.co/models?other=deepsparse)|`generate_until`, `loglikelihood`|
381
-
|[Neural Magic SparseML](https://github.com/neuralmagic/sparseml)|:heavy_check_mark:|`sparseml`| Any decoder-only AutoModelForCausalLM from [SparseZoo](https://sparsezoo.neuralmagic.com/) or on [HF Hub](https://huggingface.co/neuralmagic). Especially useful for models with quantization like [`zoo:llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized`](https://sparsezoo.neuralmagic.com/models/llama2-7b-gsm8k_llama2_pretrain-pruned60_quantized)|`generate_until`, `loglikelihood`, `loglikelihood_rolling`|
|[Your local inference server!](docs/API_guide.md)|:heavy_check_mark:|`local-completions` or `local-chat-completions`| Support for OpenAI API-compatible servers, with easy customization for other APIs. |`generate_until`, `loglikelihood`, `loglikelihood_rolling`|
@@ -613,7 +611,7 @@ Extras dependencies can be installed via `pip install -e ".[NAME]"`
0 commit comments