Skip to content

Commit 979de19

Browse files
committed
docs(vertex-ai-gemini): Add Gemini PDF support docs and add IT
- Add test case for PDF document summarization using Gemini multimodal capabilities - Update documentation to reflect PDF support in model comparison table - Add PDF format to multimodal capabilities documentation
1 parent 87ff101 commit 979de19

File tree

4 files changed

+22
-3
lines changed

4 files changed

+22
-3
lines changed

models/spring-ai-vertex-ai-gemini/src/test/java/org/springframework/ai/vertexai/gemini/VertexAiGeminiChatModelIT.java

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@
4949
import org.springframework.core.convert.support.DefaultConversionService;
5050
import org.springframework.core.io.ClassPathResource;
5151
import org.springframework.core.io.Resource;
52+
import org.springframework.util.MimeType;
5253
import org.springframework.util.MimeTypeUtils;
5354

5455
import static org.assertj.core.api.Assertions.assertThat;
@@ -246,6 +247,22 @@ void multiModalityTest() throws IOException {
246247
// https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/intro_multimodal_use_cases.ipynb
247248
}
248249

250+
@Test
251+
void multiModalityPdfTest() throws IOException {
252+
253+
var pdfData = new ClassPathResource("/spring-ai-reference-overview.pdf");
254+
255+
var userMessage = new UserMessage(
256+
"You are a very professional document summarization specialist. Please summarize the given document.",
257+
List.of(new Media(new MimeType("application", "pdf"), pdfData)));
258+
259+
var response = this.chatModel.call(new Prompt(List.of(userMessage)));
260+
261+
System.out.println(response.getResult().getOutput().getContent());
262+
263+
assertThat(response.getResult().getOutput().getContent()).containsAnyOf("Spring AI", "portable API");
264+
}
265+
249266
record ActorsFilmsRecord(String actor, List<String> movies) {
250267

251268
}
Binary file not shown.

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/comparison.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This table compares various Chat Models supported by Spring AI, detailing their
2121

2222
| xref::api/chat/anthropic-chat.adoc[Anthropic Claude] | text, image ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12]
2323
| xref::api/chat/azure-openai-chat.adoc[Azure OpenAI] | text, image ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::yes.svg[width=16]
24-
| xref::api/chat/vertexai-gemini-chat.adoc[Google VertexAI Gemini] | text, image, audio, video ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::yes.svg[width=16]
24+
| xref::api/chat/vertexai-gemini-chat.adoc[Google VertexAI Gemini] | text, pdf, image, audio, video ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::yes.svg[width=16]
2525
| xref::api/chat/groq-chat.adoc[Groq (OpenAI-proxy)] | text, image ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::yes.svg[width=16]
2626
| xref::api/chat/huggingface.adoc[HuggingFace] | text ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12] ^a| image::no.svg[width=12]
2727
| xref::api/chat/mistralai-chat.adoc[Mistral AI] | text ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::yes.svg[width=16] ^a| image::no.svg[width=12] ^a| image::yes.svg[width=16]

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/chat/vertexai-gemini-chat.adoc

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,9 +124,11 @@ Read more about xref:api/chat/functions/vertexai-gemini-chat-functions.adoc[Vert
124124

125125
== Multimodal
126126

127-
Multimodality refers to a model's ability to simultaneously understand and process information from various sources, including text, images, audio, and other data formats. This paradigm represents a significant advancement in AI models.
127+
Multimodality refers to a model's ability to simultaneously understand and process information from various sources, including `text`, `pdf`, `images`, `audio`, and other data formats.
128+
This paradigm represents a significant advancement in AI models.
128129

129-
Google's Gemini AI models support this capability by comprehending and integrating text, code, audio, images, and video. For more details, refer to the blog post https://blog.google/technology/ai/google-gemini-ai/#introducing-gemini[Introducing Gemini].
130+
Google's Gemini AI models support this capability by comprehending and integrating text, code, audio, images, and video.
131+
For more details, refer to the blog post https://blog.google/technology/ai/google-gemini-ai/#introducing-gemini[Introducing Gemini].
130132

131133
Spring AI's `Message` interface supports multimodal AI models by introducing the Media type.
132134
This type contains data and information about media attachments in messages, using Spring's `org.springframework.util.MimeType` and a `java.lang.Object` for the raw media data.

0 commit comments

Comments
 (0)