|
| 1 | += Extracting data from images |
| 2 | + |
| 3 | +include::./includes/attributes.adoc[] |
| 4 | +include::./includes/customization.adoc[] |
| 5 | + |
| 6 | +Some large language models now support vision inputs, letting you automate tasks like OCR-ing receipts, detecting objects in photos, or generating image captions. |
| 7 | +This guide shows you how to build a Quarkus microservice that sends image data—either via URL or Base64—to a vision-capable LLM (e.g., GPT-4o) using Quarkus LangChain4j. |
| 8 | + |
| 9 | +== Prerequisites |
| 10 | + |
| 11 | +* A Quarkus project with the `quarkus-langchain4j-openai` extension (or another model provider that supports a model with vision capabilities) |
| 12 | +* `quarkus.langchain4j.openai.api-key` set in `application.properties` |
| 13 | +* A vision-capable model (for example: `gpt-4.1-mini`, `o3`. The default model has vision capabilities, but you can specify a different one if needed) |
| 14 | + |
| 15 | +[source,properties] |
| 16 | +---- |
| 17 | +quarkus.langchain4j.openai.api-key=${OPENAI_API_KEY} |
| 18 | +quarkus.langchain4j.openai.chat-model.model-name=gpt-4.1-mini |
| 19 | +---- |
| 20 | + |
| 21 | +* Set the temperature to 0.0 for deterministic outputs, especially for tasks like OCR or object detection where precision matters: |
| 22 | + |
| 23 | +[source,properties] |
| 24 | +---- |
| 25 | +quarkus.langchain4j.openai.chat-model.temperature=0 |
| 26 | +---- |
| 27 | + |
| 28 | +== Vision Capability |
| 29 | + |
| 30 | +Vision-capable LLMs can process and understand images alongside text. |
| 31 | +Common use cases include: |
| 32 | + |
| 33 | +* **OCR (Optical Character Recognition)** – extract text from receipts, invoices, or documents |
| 34 | +* **Object Detection** – identify and classify objects in a photo |
| 35 | +* **Image Captioning** – generate descriptive text for an image |
| 36 | +* **Visual Question Answering** – answer questions about image content |
| 37 | + |
| 38 | +NOTE: Image payloads count toward the model’s context window limits. |
| 39 | +Always validate image size and format before sending. |
| 40 | + |
| 41 | + |
| 42 | +== Step 1. Define the AI service |
| 43 | + |
| 44 | +Declare an AI Service interface to encapsulate your vision calls: |
| 45 | + |
| 46 | +[source,java] |
| 47 | +---- |
| 48 | +include::{examples-dir}/io/quarkiverse/langchain4j/samples/images/ImageAiService.java[tags=head] |
| 49 | +---- |
| 50 | + |
| 51 | +Here, `@RegisterAiService` creates the xref:ai-services.adoc[AI Service], and `@SystemMessage` supplies the global instruction for all methods in the service. |
| 52 | + |
| 53 | +== Step 2. Passing an image by URL |
| 54 | + |
| 55 | +Use `@ImageUrl` to mark a String parameter as a remote image URL: |
| 56 | + |
| 57 | +[source,java] |
| 58 | +---- |
| 59 | +include::{examples-dir}/io/quarkiverse/langchain4j/samples/images/ImageAiService.java[tags=head;url] |
| 60 | +---- |
| 61 | +<1> The `@ImageUrl` annotation tells Quarkus LangChain4j to wrap this String as an image URL payload. |
| 62 | + |
| 63 | +[source,java] |
| 64 | +---- |
| 65 | +include::{examples-dir}/io/quarkiverse/langchain4j/samples/images/Endpoint.java[tags=head;url] |
| 66 | +---- |
| 67 | +<1> This endpoint accepts `?u=<imageUrl>` and returns the extracted data |
| 68 | + |
| 69 | + |
| 70 | +== Step 3. Passing images as Base64 data |
| 71 | + |
| 72 | +Use the `Image` data type for local or in-memory images: |
| 73 | + |
| 74 | +[source,java] |
| 75 | +---- |
| 76 | +include::{examples-dir}/io/quarkiverse/langchain4j/samples/images/ImageAiService.java[tags=head;ocr] |
| 77 | +---- |
| 78 | +<1> The `Image` parameter carries Base64 data plus a _MIME_ type. |
| 79 | + |
| 80 | +In your application code, read and encode the image: |
| 81 | + |
| 82 | +[source,java] |
| 83 | +---- |
| 84 | +include::{examples-dir}/io/quarkiverse/langchain4j/samples/images/Endpoint.java[tags=head;ocr] |
| 85 | +---- |
| 86 | + |
| 87 | +== Error-Handling Tips |
| 88 | + |
| 89 | +* **Invalid URL or unreachable host:** makes sure the URL is valid and accessible. |
| 90 | +* **Oversized Base64 payload:** validate file size (e.g., `< 4 MB`) before encoding to avoid context-window errors. |
| 91 | +* **Unsupported MIME type:** check file extension and only accept `image/jpeg`, `image/png`, etc. |
| 92 | + |
| 93 | +== Conclusion |
| 94 | + |
| 95 | +In this guide, you learned two ways to pass images to a vision-capable LLM using Quarkus LangChain4j: |
| 96 | + |
| 97 | +* By URL with `@ImageUrl` |
| 98 | +* By Base64 data with the `Image` type |
| 99 | + |
| 100 | +Next steps: |
| 101 | + |
| 102 | +* Combine text and image inputs in a single prompt for richer multimodal interactions |
| 103 | +* Chain image extraction into downstream workflows (e.g., store OCR results in a database) |
0 commit comments