Welcome to Triton Inference Server Discussions! #5398
Replies: 2 comments
-
I’m using the Merlin multi-stage recommender system example, and both notebooks ran successfully with the default dataset. However, when I try using my own dataset, I encounter an issue during inference from the Triton server. It results in a NoneType error related to the Feast feature repository. InferenceServerException Traceback (most recent call last) File D:\build-rec\building-rec.venv\lib\site-packages\merlin\systems\triton\utils.py:230, in send_triton_request(schema, inputs, outputs_list, client, endpoint, request_id, triton_model) File D:\build-rec\building-rec.venv\lib\site-packages\tritonclient\grpc_client.py:1572, in InferenceServerClient.infer(self, model_name, inputs, model_version, outputs, request_id, sequence_id, sequence_start, sequence_end, priority, timeout, client_timeout, headers, compression_algorithm, parameters) File D:\build-rec\building-rec.venv\lib\site-packages\tritonclient\grpc_utils.py:77, in raise_error_grpc(rpc_error) InferenceServerException: [StatusCode.INTERNAL] Traceback (most recent call last): |
Beta Was this translation helpful? Give feedback.
-
Subject: Triton + TensorRT-LLM (Llama 3.1 8B) – Feasibility of Stateful Serving + KV Cache Reuse + Priority Caching Hello everyone, I’m working with Triton Inference Server + TensorRT-LLM backend serving the Llama-3.1-8B model. Based on my current setup (Attached), My goals for this deployment are:
My question to the community With this configuration:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
Beta Was this translation helpful? Give feedback.
All reactions