Skip to content

Commit f6d4910

Browse files
authored
Merge pull request meta-llama#14 from meta-llama/lmm_infer
Create multi_modal_infer.py
2 parents e1bbffc + e45b4c6 commit f6d4910

File tree

2 files changed

+74
-1
lines changed

2 files changed

+74
-1
lines changed

recipes/quickstart/inference/local_inference/README.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,12 @@
11
# Local Inference
22

3+
For Multi-Modal inference we have added [multi_modal_infer.py](multi_modal_infer.py) which uses the transformers library
4+
5+
The way to run this would be
6+
```
7+
python multi_modal_infer.py --image_path "./resources/image.jpg" --prompt_text "Describe this image" --temperature 0.5 --top_p 0.8 --model_name "meta-llama/Llama-3.2-11B-Vision-Instruct"
8+
```
9+
310
For local inference we have provided an [inference script](inference.py). Depending on the type of finetuning performed during training the [inference script](inference.py) takes different arguments.
411
To finetune all model parameters the output dir of the training has to be given as --model_name argument.
512
In the case of a parameter efficient method like lora the base model has to be given as --model_name and the output dir of the training has to be given as --peft_model argument.
@@ -87,4 +94,4 @@ python inference.py --model_name <training_config.output_dir> --prompt_file <tes
8794

8895
## Inference on large models like Meta Llama 405B
8996
The FP8 quantized variants of Meta Llama (i.e. meta-llama/Meta-Llama-3.1-405B-FP8 and meta-llama/Meta-Llama-3.1-405B-Instruct-FP8) can be executed on a single node with 8x80GB H100 using the scripts located in this folder.
90-
To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).
97+
To run the unquantized Meta Llama 405B variants (i.e. meta-llama/Meta-Llama-3.1-405B and meta-llama/Meta-Llama-3.1-405B-Instruct) we need to use a multi-node setup for inference. The llama-recipes inference script currently does not allow multi-node inference. To run this model you can use vLLM with pipeline and tensor parallelism as showed in [this example](../../../3p_integrations/vllm/README.md).
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import os
2+
import sys
3+
import argparse
4+
from PIL import Image as PIL_Image
5+
import torch
6+
from transformers import MllamaForConditionalGeneration, MllamaProcessor
7+
8+
9+
# Constants
10+
DEFAULT_MODEL = "meta-llama/Llama-3.2-11B-Vision-Instruct"
11+
12+
13+
def load_model_and_processor(model_name: str, hf_token: str):
14+
"""
15+
Load the model and processor based on the 11B or 90B model.
16+
"""
17+
model = MllamaForConditionalGeneration.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16, token=hf_token)
18+
processor = MllamaProcessor.from_pretrained(model_name, token=hf_token)
19+
return model, processor
20+
21+
22+
def process_image(image_path: str) -> PIL_Image.Image:
23+
"""
24+
Open and convert an image from the specified path.
25+
"""
26+
if not os.path.exists(image_path):
27+
print(f"The image file '{image_path}' does not exist.")
28+
sys.exit(1)
29+
with open(image_path, "rb") as f:
30+
return PIL_Image.open(f).convert("RGB")
31+
32+
33+
def generate_text_from_image(model, processor, image, prompt_text: str, temperature: float, top_p: float):
34+
"""
35+
Generate text from an image using the model and processor.
36+
"""
37+
conversation = [
38+
{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": prompt_text}]}
39+
]
40+
prompt = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
41+
inputs = processor(prompt, image, return_tensors="pt").to(model.device)
42+
output = model.generate(**inputs, temperature=temperature, top_p=top_p, max_new_tokens=512)
43+
return processor.decode(output[0])[len(prompt):]
44+
45+
46+
def main(image_path: str, prompt_text: str, temperature: float, top_p: float, model_name: str, hf_token: str):
47+
"""
48+
Call all the functions.
49+
"""
50+
model, processor = load_model_and_processor(model_name, hf_token)
51+
image = process_image(image_path)
52+
result = generate_text_from_image(model, processor, image, prompt_text, temperature, top_p)
53+
print("Generated Text: " + result)
54+
55+
56+
if __name__ == "__main__":
57+
parser = argparse.ArgumentParser(description="Generate text from an image and prompt using the 3.2 MM Llama model.")
58+
parser.add_argument("--image_path", type=str, help="Path to the image file")
59+
parser.add_argument("--prompt_text", type=str, help="Prompt text to describe the image")
60+
parser.add_argument("--temperature", type=float, default=0.7, help="Temperature for generation (default: 0.7)")
61+
parser.add_argument("--top_p", type=float, default=0.9, help="Top p for generation (default: 0.9)")
62+
parser.add_argument("--model_name", type=str, default=DEFAULT_MODEL, help=f"Model name (default: '{DEFAULT_MODEL}')")
63+
parser.add_argument("--hf_token", type=str, required=True, help="Hugging Face token for authentication")
64+
65+
args = parser.parse_args()
66+
main(args.image_path, args.prompt_text, args.temperature, args.top_p, args.model_name, args.hf_token)

0 commit comments

Comments
 (0)