Skip to content

Commit 60301df

Browse files
authored
Blog post about LangChain4J and PDF analysis (#2233)
1 parent 6c29286 commit 60301df

File tree

4 files changed

+204
-1
lines changed

4 files changed

+204
-1
lines changed

_data/authors.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -579,4 +579,10 @@ jmartisk:
579579
emailhash: "165fddadd5535ca662008df08e8ad59b"
580580
job_title: "Software Engineer"
581581
twitter: "janmartiska"
582-
bio: "Software engineer at Red Hat"
582+
bio: "Software engineer at Red Hat"
583+
melloware:
584+
name: "Emil Lefkof"
585+
586+
emailhash: "50d0c0653cd1a798bcc5ba1cbfc70ded"
587+
job_title: "Chief Technology Officer"
588+
bio: "Chief Technology Officer at KSM Technology Partners LLC"
Lines changed: 197 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
---
2+
layout: post
3+
title: 'Using LangChain4j to analyze PDF documents'
4+
date: 2025-02-17
5+
tags: user-story langchain4j llm ai
6+
synopsis: 'Learn how to extract structured metadata from PDF documents using LangChain4j and AI to automate document analysis.'
7+
author: melloware
8+
thumbnailimage: /assets/images/posts/quarkus-user-stories/melloware/ksm-logo.png
9+
---
10+
11+
:imagesdir: /assets/images/posts/quarkus-user-stories/melloware
12+
ifdef::env-github,env-browser,env-vscode[:imagesdir: ../assets/images/posts/quarkus-user-stories/melloware]
13+
14+
In my consulting work, clients frequently present us with challenging problems that require innovative solutions.
15+
Recently, we were tasked with extracting structured metadata from PDF documents through automated analysis. Below, I'll share a simplified version of this real-world challenge and how we approached it.
16+
17+
== Use Case
18+
19+
Our client receives compressed archives (.zip files) containing up to hundreds of portable document format (PDF) lease documents that need review. Each document contains property lease details that must be validated for accuracy. The review process involves checking various business rules - for example, identifying leases with terms shorter than 2 years. Currently, this document validation is done manually, which is time-consuming. The client wants to automate and streamline this review workflow to improve efficiency.
20+
21+
Some complications with these lease documents are:
22+
23+
* The documents are not in a standard format so each lease may be written in a different way by a different property manager.
24+
* The documents may be scanned, so the text is sometimes human writing and not typewritten.
25+
* The documents may contain multiple pages, which are not always in the same order.
26+
* The lease terms may not be an actual date but written as "Expires five years from the start date" or "Expires on the anniversary of the start date".
27+
* Metadata such as acreage and tax parcel information is needed by our client to validate the lease details.
28+
29+
You can understand why this is time consuming for a human to review and validate the documents.
30+
31+
== Our Solution
32+
33+
After consulting with https://github.com/dliubarskyi[Dmytro Liubarskyi] and collaborating with the Quarkus team, we implemented a solution using LangChain4j for document metadata extraction. We chose https://ai.google.dev/docs/gemini_api_overview[Google Gemini] as our Large Language Model (LLM) since it excels at PDF analysis through its built-in Optical Character Recognition (OCR) capabilities, enabling accurate text extraction from both digital and scanned documents.
34+
35+
== Technical Details
36+
37+
The application is built using:
38+
39+
* Quarkus - A Kubernetes-native Java framework
40+
* LangChain4j - Java bindings for LangChain to interact with LLMs
41+
* Google Gemini AI - For PDF document analysis and information extraction
42+
* Quarkus REST - For handling multipart file uploads
43+
* HTML/JavaScript frontend - Simple UI for file upload and results display
44+
45+
The backend processes the PDF through these steps:
46+
47+
1. Accepts PDF upload via multipart form data
48+
2. Converts PDF content to base64 encoding
49+
3. Sends to Gemini AI with a structured JSON schema for response formatting
50+
4. Returns parsed lease information in a standardized format
51+
5. Displays results in a tabular format on the web interface
52+
53+
The main components are:
54+
55+
* `LeaseAnalyzerResource` - REST endpoint for PDF analysis
56+
* `LeaseReport` - Data structure for lease information
57+
* Web interface for file upload and results display
58+
59+
== How it works
60+
61+
First we need a Google Gemini API key. You can get one for free, see more details here: https://ai.google.dev/gemini-api/docs/api-key[Gemini API Key Documentation^].
62+
63+
[source,bash]
64+
----
65+
export GOOGLE_AI_GEMINI_API_KEY=<your-google-ai-gemini-api-key>
66+
----
67+
68+
Next we need to install the LangChain4j dependencies:
69+
70+
[source,xml]
71+
----
72+
<dependency>
73+
<groupId>io.quarkiverse.langchain4j</groupId>
74+
<artifactId>quarkus-langchain4j-core</artifactId>
75+
<version>0.24.0</version>
76+
</dependency>
77+
<dependency>
78+
<groupId>dev.langchain4j</groupId>
79+
<artifactId>langchain4j-google-ai-gemini</artifactId>
80+
<version>1.0.0-beta1</version>
81+
</dependency>
82+
----
83+
84+
=== Configure Gemini LLM
85+
86+
Next we need to wire up the Gemini LLM to the application (using your Google AI Gemini API key).
87+
88+
[source,java]
89+
----
90+
@ApplicationScoped
91+
public class GoogleGeminiConfig {
92+
93+
@Produces
94+
@ApplicationScoped
95+
ChatLanguageModel model() {
96+
return GoogleAiGeminiChatModel.builder()
97+
.apiKey(System.getenv("GOOGLE_AI_GEMINI_API_KEY"))
98+
.modelName("gemini-2.0-flash")
99+
.build();
100+
}
101+
}
102+
----
103+
104+
[NOTE]
105+
====
106+
Quarkus LangChain4j will provide autoconfiguration for Gemini in a future release. Currently, manual configuration is required since the Gemini integration is still evolving, with upstream LangChain4j offering three different modules for Google's AI APIs.
107+
====
108+
109+
=== Define your data structure
110+
111+
Now we need to model the data structure for the lease information that we want the LLM to extract from any lease document. You can customize these fields based on the information you need from the PDF documents but in our use case below we are extracting the following information:
112+
113+
[source,java]
114+
----
115+
public record LeaseReport(
116+
LocalDate agreementDate,
117+
LocalDate termStartDate,
118+
LocalDate termEndDate,
119+
LocalDate developmentTermEndDate,
120+
String landlordName,
121+
String tenantName,
122+
String taxParcelId,
123+
BigDecimal acres,
124+
Boolean exclusiveRights) {
125+
}
126+
----
127+
128+
=== Create the REST endpoint
129+
130+
Lastly, we need to create a `LeaseAnalyzerResource` class that will use the LLM to extract the lease information from the PDF document.
131+
132+
[source,java]
133+
----
134+
@Inject
135+
ChatLanguageModel model;
136+
137+
@PUT
138+
@Consumes(MediaType.MULTIPART_FORM_DATA)
139+
@Produces(MediaType.TEXT_PLAIN)
140+
public String upload(@RestForm("file") FileUpload fileUploadRequest) {
141+
final String fileName = fileUploadRequest.fileName();
142+
log.infof("Uploading file: %s", fileName);
143+
144+
try {
145+
// Convert input stream to byte array for processing
146+
byte[] fileBytes = Files.readAllBytes(fileUploadRequest.filePath());
147+
148+
// Encode PDF content to base64 for transmission
149+
String documentEncoded = Base64.getEncoder().encodeToString(fileBytes);
150+
151+
// Create user message with PDF content for analysis
152+
UserMessage userMessage = UserMessage.from(
153+
TextContent.from("Analyze the given document"),
154+
PdfFileContent.from(documentEncoded, "application/pdf"));
155+
156+
// Build chat request with JSON response format
157+
ChatRequest chatRequest = ChatRequest.builder()
158+
.messages(userMessage)
159+
.parameters(ChatRequestParameters.builder()
160+
.responseFormat(responseFormatFrom(LeaseReport.class))
161+
.build())
162+
.build();
163+
164+
log.info("Google Gemini analyzing....");
165+
long startTime = System.nanoTime();
166+
ChatResponse chatResponse = model.chat(chatRequest);
167+
long endTime = System.nanoTime();
168+
String response = chatResponse.aiMessage().text();
169+
log.infof("Google Gemini analyzed in %.2f seconds: %s", (endTime - startTime) / 1_000_000_000.0, response);
170+
171+
return response;
172+
} catch (IOException e) {
173+
throw new RuntimeException(e);
174+
}
175+
}
176+
----
177+
178+
There is a simple HTML/JavaScript frontend that allows you to upload a PDF document and view the results. In the example below 3 different lease documents were uploaded and analyzed.
179+
180+
image::lease-analyzer.png[Lease Analyzer Results,title="Lease Analyzer Results"]
181+
182+
You can find the complete example code on https://github.com/melloware/quarkus-lease-analyzer[GitHub^].
183+
184+
== Conclusion
185+
186+
This article demonstrated how LangChain4j and AI can be leveraged to automatically extract structured metadata from PDF documents. By implementing this solution, our client will significantly reduce manual document processing time, potentially saving thousands of work hours annually. The combination of LangChain4j and Google Gemini AI proves to be a powerful approach for automating document analysis workflows.
187+
188+
189+
190+
191+
192+
193+
194+
195+
196+
197+
10 KB
Loading
111 KB
Loading

0 commit comments

Comments
 (0)