How to run using vLLM with a docker image?

### Question

Is there an optimal way to run this using vLLM serve? I don't think my outputs are correct


### Code

LLM Hosting:
```
#!/bin/bash
CACHE_DIR="/data/model_cache/"

model="models--ibm--granite-docling-258m"
served_name="granite-docling"

HOST=0.0.0.0
PORT=10070

docker run --rm --gpus '"device=0"' \
    -v /data/model_cache:/data/model_cache \
    -p ${PORT}:${PORT} \
    vllm/vllm-openai:v0.11.0 \
    --model ${CACHE_DIR}/${model} \
    --served-model-name ${served_name} \
    --max-model-len 8192 \
    --gpu_memory_utilization 0.5 \
    --port ${PORT} \
    --host ${HOST}
```

Test: 
```
import base64
import mimetypes
import os

def image_to_base64_data_url(image_path):
    if not os.path.isfile(image_path):
        raise FileNotFoundError(f"Image file not found: {image_path}")
    
    mime_type, _ = mimetypes.guess_type(image_path)
    if not mime_type or not mime_type.startswith('image/'):
        mime_type = 'image/jpeg'
    
    with open(image_path, 'rb') as image_file:
        encoded_data = base64.b64encode(image_file.read()).decode('utf-8')
    
    return f"data:{mime_type};base64,{encoded_data}"


from openai import OpenAI

client = OpenAI(
  api_key='none',
  base_url="http://localhost:10070/v1"
)

model_id = "granite-docling"

messages =  [{
    "role": "user",
    "content": [
        {
            "type": "text", 
            "text": "Convert this page to docling."
        },
        {
            "type": "image_url",
            "image_url": {
                "url": image_to_base64_data_url("test.png")
            }
        }
    ]
  }]

response = client.chat.completions.create(
    model=model_id, 
    messages=messages
)

print(response.choices[0].message.content)
```

Outputs: 
```
<loc_12><loc_177><loc_242><loc_287><loc_22><loc_31><loc_129><loc_37>Table 2. Summary of Classifiers
<loc_258><loc_175><loc_297><loc_189><loc_22><loc_305><loc_232><loc_351>Figure 9. We test the sensitivity of our deep nets to data outside the range we trained it on. We generate 77760 light curves for each noise value. We find that the size of the transit depth does not influence the accuracy. Instead, the ratio of transit depth to noise dictates the accuracy of each detection algorithm. Based on this plot we can estimate the number of light curves required to significantly detect a planet below the noise by binning data together.
<loc_23><loc_13><loc_141><loc_21>102 K. A. Pearson et al.
<loc_22><loc_144><loc_144><loc_168>(Specificities like photometric ranges ∀>0: ∼4.4 and regular spectra and luminance domain) of details mentioned in the article and figures:
<loc_118><loc_44><loc_380><loc_141>BLSSVMMLPCNN 1DWavelet MLPInput features180180180180Trainable Parameters318113,93717,293Layers1454Total Neurons1105109105Neural Connections249225442494Training Accuracy (%)73.591.0899.7299.60Training False Pos.(%)22.343.050.080.21Training False Neg.(%)4.105.850.200.19Sensitivity Test (%)63.1483.1088.7388.45Test False Pos. (%)31.582.920.290.25Test False Neg. (%)5.3713.9810.9711.29
<loc_22><loc_357><loc_237><loc_467>presents the same model (examples in Table 3 with base proportions among laboratories). Pedestrian motion, like cartilage, and the ability to capture articulations of the brain were represented as a function of input pain responses. Both single eyes and relevant network behaviours relying on the left occipital lobe the same number of lights are simple features [2]. Parameter
<loc_254><loc_160><loc_465><loc_194>Length and Rotatory Unravelation
<loc_254><loc_191><loc_481><loc_318>Duration in which the incident lights have repeated stores. The observation that light spikes across Photoshop images has been found in Dr.
```

It ends up missing a lot of text and doesn't format tables correctly. Is there a better way to run this using vLLM without the python package or is that the only way?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to run using vLLM with a docker image? #2609

Question

Code

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to run using vLLM with a docker image? #2609

Description

Question

Code

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions