Replies: 1 comment
-
Hi @sandys, Thanks for the questions. DJL Serving is a complete end to end solution that I think will fit your needs. You can use DJLServing to host Huggingface LLMs with our Python, DeepSpeed, or FasterTransformer Engines. For mpt-7b, I would recommend either the DeepSpeed or Python engine. You can pull the You will want to create a model directory, and in this directory include a
We have default python handlers that will handle the model loading and inference processing, but if you want to create your own you can also include a model.py file in the same directory. You can use our default handler as a guide here https://github.com/deepjavalibrary/djl-serving/blob/master/engines/python/setup/djl_python/deepspeed.py you can then run the container and serve the model like this:
We have many examples of using our containers with sagemaker. You can find those here https://github.com/aws/amazon-sagemaker-examples/tree/main/inference/generativeai |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
hi all
has anyone here successfully loaded and worked with any Huggingface LLM ?
we tried to use https://huggingface.co/mosaicml/mpt-7b and our attempt is https://github.com/arakoodev/onnx-djl-example , but it doesnt seem to work. (we convert it to ONNX and try to load in DJL)
is there a better way to do it ? It seems that this PR was merged (#2637), so im wondering whether it is possible now.
Beta Was this translation helpful? Give feedback.
All reactions