Skip to content

Commit 71b8bff

Browse files
authored
[doc] Add doc for ElasticsearchVectorStore in java (#397)
1 parent bf5beb3 commit 71b8bff

File tree

2 files changed

+238
-18
lines changed

2 files changed

+238
-18
lines changed

api/src/main/java/org/apache/flink/agents/api/vectorstores/VectorStoreQueryMode.java

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -23,15 +23,13 @@
2323
*
2424
* <ul>
2525
* <li>{@link #SEMANTIC}: Use dense vector embeddings and similarity search.
26-
* <li>{@link #KEYWORD}: Use keyword or lexical search when supported by the store.
27-
* <li>{@link #HYBRID}: Combine semantic and keyword search strategies.
2826
* </ul>
2927
*/
3028
public enum VectorStoreQueryMode {
3129
/** Semantic similarity search using embeddings. */
3230
SEMANTIC,
33-
/** Keyword/lexical search (store dependent). */
34-
KEYWORD,
35-
/** Hybrid search combining semantic and keyword results. */
36-
HYBRID;
31+
/** Keyword/lexical search (store dependent). TODO: term-based retrieval */
32+
// KEYWORD,
33+
/** Hybrid search combining semantic and keyword results. TODO: semantic + keyword retrieval */
34+
// HYBRID;
3735
}

docs/content/docs/development/vector_stores.md

Lines changed: 234 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,6 @@ under the License.
2424

2525
# Vector Stores
2626

27-
{{< hint info >}}
28-
Vector stores are currently supported in the Python API only. Java API support is planned for future releases.
29-
{{< /hint >}}
30-
3127
{{< hint info >}}
3228
This page covers semantic search using vector stores. Additional query modes (keyword, hybrid) are planned for future releases.
3329
{{< /hint >}}
@@ -50,43 +46,99 @@ To use vector stores in your agents, you need to configure both a vector store a
5046

5147
Flink Agents provides decorators to simplify vector store setup within agents:
5248

49+
{{< tabs "Resource Decorators" >}}
50+
51+
{{< tab "Python" >}}
52+
5353
#### @vector_store
5454

5555
The `@vector_store` decorator marks a method that creates a vector store. Vector stores automatically integrate with embedding models for text-based search.
5656

57+
{{< /tab >}}
58+
59+
{{< tab "Java" >}}
60+
61+
#### @VectorStore
62+
63+
The `@VectorStore` annotation marks a method that creates a vector store.
64+
65+
{{< /tab >}}
66+
67+
{{< /tabs >}}
68+
5769
### Query Objects
5870

5971
Vector stores use structured query objects for consistent interfaces:
6072

73+
{{< tabs "Query Objects" >}}
74+
75+
{{< tab "Python" >}}
76+
6177
```python
6278
# Create a semantic search query
6379
query = VectorStoreQuery(
64-
mode=VectorStoreQueryMode.SEMANTIC,
6580
query_text="What is Apache Flink Agents?",
6681
limit=3
6782
)
6883
```
6984

70-
### Query Results
85+
{{< /tab >}}
7186

72-
When you execute a query, you receive a `VectorStoreQueryResult` object that contains the search results:
87+
{{< tab "Java" >}}
7388

74-
```python
75-
# Execute the query
76-
result = vector_store.query(query)
89+
```java
90+
// Create a semantic search query
91+
VectorStoreQuery query = new VectorStoreQuery(
92+
"What is Apache Flink Agents?", // query text
93+
3 // limit
94+
);
7795
```
7896

97+
{{< /tab >}}
98+
99+
{{< /tabs >}}
100+
101+
### Query Results
102+
103+
When you execute a query, you receive a `VectorStoreQueryResult` object that contains the search results:
104+
79105
The `VectorStoreQueryResult` contains:
80106
- **documents**: A list of `Document` objects representing the retrieved results
81107
- Each `Document` has:
82108
- **content**: The actual text content of the document
83109
- **metadata**: Associated metadata (source, category, timestamp, etc.)
84110
- **id**: Unique identifier of the document (if available)
85111

112+
{{< tabs "Query Results" >}}
113+
114+
{{< tab "Python" >}}
115+
116+
```python
117+
# Execute the query
118+
result = vector_store.query(query)
119+
```
120+
121+
{{< /tab >}}
122+
123+
{{< tab "Java" >}}
124+
125+
```java
126+
// Execute the query
127+
VectorStoreQueryResult result = vectorStore.query(query);
128+
```
129+
130+
{{< /tab >}}
131+
132+
{{< /tabs >}}
133+
86134
### Usage Example
87135

88136
Here's how to define and use vector stores in your agent:
89137

138+
{{< tabs "Usage Example" >}}
139+
140+
{{< tab "Python" >}}
141+
90142
```python
91143
class MyAgent(Agent):
92144

@@ -127,7 +179,6 @@ class MyAgent(Agent):
127179
# Create a semantic search query
128180
user_query = str(event.input)
129181
query = VectorStoreQuery(
130-
mode=VectorStoreQueryMode.SEMANTIC,
131182
query_text=user_query,
132183
limit=3
133184
)
@@ -139,12 +190,73 @@ class MyAgent(Agent):
139190
# Process the retrieved context as needed for your use case
140191
```
141192

193+
{{< /tab >}}
194+
195+
{{< tab "Java" >}}
196+
197+
```java
198+
public class MyAgent extends Agent {
199+
200+
@EmbeddingModelConnection
201+
public static ResourceDescriptor embeddingConnection() {
202+
return ResourceDescriptor.Builder.newBuilder(OpenAIEmbeddingModelConnection.class.getName())
203+
.addInitialArgument("api_key", "your-api-key-here")
204+
.build();
205+
}
206+
207+
@EmbeddingModelSetup
208+
public static ResourceDescriptor embeddingModel() {
209+
return ResourceDescriptor.Builder.newBuilder(OpenAIEmbeddingModelSetup.class.getName())
210+
.addInitialArgument("connection", "embeddingConnection")
211+
.addInitialArgument("model", "text-embedding-3-small")
212+
.build();
213+
}
214+
215+
@VectorStore
216+
public static ResourceDescriptor vectorStore() {
217+
return ResourceDescriptor.Builder.newBuilder(ElasticsearchVectorStore.class.getName())
218+
.addInitialArgument("embedding_model", "embeddingModel")
219+
.addInitialArgument("host", "http://localhost:9200")
220+
.addInitialArgument("index", "my_documents")
221+
.addInitialArgument("vector_field", "content_vector")
222+
.addInitialArgument("dims", 1536)
223+
.build();
224+
}
225+
226+
@Action(listenEvents = InputEvent.class)
227+
public static void searchDocuments(InputEvent event, RunnerContext ctx) {
228+
// Option 1: Manual search via the vector store
229+
VectorStore vectorStore = (VectorStore) ctx.getResource("vectorStore", ResourceType.VECTOR_STORE);
230+
String queryText = (String) event.getInput();
231+
VectorStoreQuery query = new VectorStoreQuery(queryText, 3);
232+
VectorStoreQueryResult result = vectorStore.query(query);
233+
234+
// Option 2: Request context retrieval via built-in events
235+
ctx.sendEvent(new ContextRetrievalRequestEvent(queryText, "vectorStore"));
236+
}
237+
238+
@Action(listenEvents = ContextRetrievalResponseEvent.class)
239+
public static void onSearchResponse(ContextRetrievalResponseEvent event, RunnerContext ctx) {
240+
List<Document> documents = event.getDocuments();
241+
// Process the retrieved documents...
242+
}
243+
}
244+
```
245+
246+
{{< /tab >}}
247+
248+
{{< /tabs >}}
249+
142250
## Built-in Providers
143251

144252
### Chroma
145253

146254
[Chroma](https://www.trychroma.com/home) is an open-source vector database that provides efficient storage and querying of embeddings with support for multiple deployment modes.
147255

256+
{{< hint info >}}
257+
Chroma is currently supported in the Python API only.
258+
{{< /hint >}}
259+
148260
#### Prerequisites
149261

150262
1. Install ChromaDB: `pip install chromadb`
@@ -169,6 +281,10 @@ class MyAgent(Agent):
169281

170282
#### Usage Example
171283

284+
{{< tabs "Chroma Usage Example" >}}
285+
286+
{{< tab "Python" >}}
287+
172288
```python
173289
class MyAgent(Agent):
174290

@@ -208,6 +324,10 @@ class MyAgent(Agent):
208324
...
209325
```
210326

327+
{{< /tab >}}
328+
329+
{{< /tabs >}}
330+
211331
#### Deployment Modes
212332

213333
ChromaDB supports multiple deployment modes:
@@ -265,6 +385,66 @@ def chroma_store() -> ResourceDescriptor:
265385
)
266386
```
267387

388+
### Elasticsearch
389+
390+
[Elasticsearch](https://www.elastic.co/elasticsearch/) is a distributed, RESTful search and analytics engine that supports vector search through dense vector fields and K-Nearest Neighbors (KNN).
391+
392+
{{< hint info >}}
393+
Elasticsearch is currently supported in the Java API only.
394+
{{< /hint >}}
395+
396+
#### Prerequisites
397+
398+
1. An Elasticsearch cluster (version 8.0 or later for KNN support).
399+
2. An index with a `dense_vector` field.
400+
401+
#### ElasticsearchVectorStore Parameters
402+
403+
| Parameter | Type | Default | Description |
404+
|-----------|------|---------|-------------|
405+
| `embedding_model` | str | Required | Reference to embedding model resource name |
406+
| `index` | str | Required | Target Elasticsearch index name |
407+
| `vector_field` | str | Required | Name of the dense vector field used for KNN |
408+
| `dims` | int | `768` | Vector dimensionality |
409+
| `k` | int | None | Number of nearest neighbors to return; can be overridden per query |
410+
| `num_candidates` | int | None | Candidate set size for ANN search; can be overridden per query |
411+
| `filter_query` | str | None | Raw JSON Elasticsearch filter query (DSL) applied as a post-filter |
412+
| `host` | str | `"http://localhost:9200"` | Elasticsearch endpoint |
413+
| `hosts` | str | None | Comma-separated list of Elasticsearch endpoints |
414+
| `username` | str | None | Username for basic authentication |
415+
| `password` | str | None | Password for basic authentication |
416+
| `api_key_base64` | str | None | Base64-encoded API key for authentication |
417+
| `api_key_id` | str | None | API key ID for authentication |
418+
| `api_key_secret` | str | None | API key secret for authentication |
419+
420+
#### Usage Example
421+
422+
{{< tabs "Elasticsearch Usage Example" >}}
423+
424+
{{< tab "Java" >}}
425+
426+
Here's how to define an Elasticsearch vector store in your Java agent:
427+
428+
```java
429+
@VectorStore
430+
public static ResourceDescriptor vectorStore() {
431+
return ResourceDescriptor.Builder.newBuilder(ElasticsearchVectorStore.class.getName())
432+
.addInitialArgument("embedding_model", "embeddingModel")
433+
.addInitialArgument("host", "http://localhost:9200")
434+
.addInitialArgument("index", "my_documents")
435+
.addInitialArgument("vector_field", "content_vector")
436+
.addInitialArgument("dims", 1536)
437+
// Optional authentication
438+
// .addInitialArgument("username", "elastic")
439+
// .addInitialArgument("password", "secret")
440+
.build();
441+
}
442+
```
443+
444+
{{< /tab >}}
445+
446+
{{< /tabs >}}
447+
268448
## Custom Providers
269449

270450
{{< hint warning >}}
@@ -277,6 +457,10 @@ If you want to use vector stores not offered by the built-in providers, you can
277457

278458
The base class handles text-to-vector conversion and provides the high-level query interface. You only need to implement the core vector search functionality.
279459

460+
{{< tabs "Custom Vector Store" >}}
461+
462+
{{< tab "Python" >}}
463+
280464
```python
281465
class MyVectorStore(BaseVectorStore):
282466
# Add your custom configuration fields here
@@ -294,4 +478,42 @@ class MyVectorStore(BaseVectorStore):
294478
# - kwargs: Vector store-specific parameters
295479
# - Returns: List of Document objects matching the search criteria
296480
pass
297-
```
481+
```
482+
483+
{{< /tab >}}
484+
485+
{{< tab "Java" >}}
486+
487+
```java
488+
public class MyVectorStore extends BaseVectorStore {
489+
490+
public MyVectorStore(
491+
ResourceDescriptor descriptor,
492+
BiFunction<String, ResourceType, Resource> getResource) {
493+
super(descriptor, getResource);
494+
}
495+
496+
@Override
497+
public Map<String, Object> getStoreKwargs() {
498+
// Return vector store-specific configuration
499+
// These parameters are merged with query-specific parameters
500+
Map<String, Object> kwargs = new HashMap<>();
501+
kwargs.put("index", "my_index");
502+
return kwargs;
503+
}
504+
505+
@Override
506+
public List<Document> queryEmbedding(float[] embedding, int limit, Map<String, Object> args) {
507+
// Core method: perform vector search using pre-computed embedding
508+
// - embedding: Pre-computed embedding vector for semantic search
509+
// - limit: Maximum number of results to return
510+
// - args: Vector store-specific parameters
511+
// - Returns: List of Document objects matching the search criteria
512+
return null;
513+
}
514+
}
515+
```
516+
517+
{{< /tab >}}
518+
519+
{{< /tabs >}}

0 commit comments

Comments
 (0)