-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Question
Is there an optimal way to run this using vLLM serve? I don't think my outputs are correct
Code
LLM Hosting:
#!/bin/bash
CACHE_DIR="/data/model_cache/"
model="models--ibm--granite-docling-258m"
served_name="granite-docling"
HOST=0.0.0.0
PORT=10070
docker run --rm --gpus '"device=0"' \
-v /data/model_cache:/data/model_cache \
-p ${PORT}:${PORT} \
vllm/vllm-openai:v0.11.0 \
--model ${CACHE_DIR}/${model} \
--served-model-name ${served_name} \
--max-model-len 8192 \
--gpu_memory_utilization 0.5 \
--port ${PORT} \
--host ${HOST}
Test:
import base64
import mimetypes
import os
def image_to_base64_data_url(image_path):
if not os.path.isfile(image_path):
raise FileNotFoundError(f"Image file not found: {image_path}")
mime_type, _ = mimetypes.guess_type(image_path)
if not mime_type or not mime_type.startswith('image/'):
mime_type = 'image/jpeg'
with open(image_path, 'rb') as image_file:
encoded_data = base64.b64encode(image_file.read()).decode('utf-8')
return f"data:{mime_type};base64,{encoded_data}"
from openai import OpenAI
client = OpenAI(
api_key='none',
base_url="http://localhost:10070/v1"
)
model_id = "granite-docling"
messages = [{
"role": "user",
"content": [
{
"type": "text",
"text": "Convert this page to docling."
},
{
"type": "image_url",
"image_url": {
"url": image_to_base64_data_url("test.png")
}
}
]
}]
response = client.chat.completions.create(
model=model_id,
messages=messages
)
print(response.choices[0].message.content)
Outputs:
<loc_12><loc_177><loc_242><loc_287><loc_22><loc_31><loc_129><loc_37>Table 2. Summary of Classifiers
<loc_258><loc_175><loc_297><loc_189><loc_22><loc_305><loc_232><loc_351>Figure 9. We test the sensitivity of our deep nets to data outside the range we trained it on. We generate 77760 light curves for each noise value. We find that the size of the transit depth does not influence the accuracy. Instead, the ratio of transit depth to noise dictates the accuracy of each detection algorithm. Based on this plot we can estimate the number of light curves required to significantly detect a planet below the noise by binning data together.
<loc_23><loc_13><loc_141><loc_21>102 K. A. Pearson et al.
<loc_22><loc_144><loc_144><loc_168>(Specificities like photometric ranges ∀>0: ∼4.4 and regular spectra and luminance domain) of details mentioned in the article and figures:
<loc_118><loc_44><loc_380><loc_141>BLSSVMMLPCNN 1DWavelet MLPInput features180180180180Trainable Parameters318113,93717,293Layers1454Total Neurons1105109105Neural Connections249225442494Training Accuracy (%)73.591.0899.7299.60Training False Pos.(%)22.343.050.080.21Training False Neg.(%)4.105.850.200.19Sensitivity Test (%)63.1483.1088.7388.45Test False Pos. (%)31.582.920.290.25Test False Neg. (%)5.3713.9810.9711.29
<loc_22><loc_357><loc_237><loc_467>presents the same model (examples in Table 3 with base proportions among laboratories). Pedestrian motion, like cartilage, and the ability to capture articulations of the brain were represented as a function of input pain responses. Both single eyes and relevant network behaviours relying on the left occipital lobe the same number of lights are simple features [2]. Parameter
<loc_254><loc_160><loc_465><loc_194>Length and Rotatory Unravelation
<loc_254><loc_191><loc_481><loc_318>Duration in which the incident lights have repeated stores. The observation that light spikes across Photoshop images has been found in Dr.
It ends up missing a lot of text and doesn't format tables correctly. Is there a better way to run this using vLLM without the python package or is that the only way?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested