MLE-26953 Chunks now capture model name#606
Conversation
|
Copyright Validation Results ✅ Valid Files
✅ All files have valid copyright headers! |
There was a problem hiding this comment.
Pull request overview
This PR adds support for capturing and storing the model name in chunks when generating embeddings. The change leverages the LangChain4j API's getModelName() method to retrieve the model name from the embedding model and stores it alongside the embedding vector in both JSON and XML chunk documents.
Changes:
- Modified the
Chunkinterface and its implementations to accept amodelNameparameter in theaddEmbeddingmethod - Updated
EmbeddingGeneratorto retrieve and pass the model name when adding embeddings to chunks - Added comprehensive test coverage to verify model name storage across different document formats and configurations
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| Chunk.java | Updated interface signature to include modelName parameter |
| DOMChunk.java | Added XML element creation for model-name in the embedding namespace |
| JsonChunk.java | Added model-name field to JSON chunk objects |
| ChunkInputs.java | Added modelName field with getter/setter methods |
| DocumentInputs.java | Updated to pass modelName when setting embeddings on chunks |
| DocumentPipeline.java | Updated wrapper to forward modelName parameter |
| EmbeddingGenerator.java | Captures model name from embedding model and passes it to chunks |
| XmlChunkDocumentProducer.java | Updated method calls to include modelName parameter |
| JsonChunkDocumentProducer.java | Updated method calls to include modelName parameter |
| AbstractEmbeddingTest.java | New test base class with shared test constants including expected model name |
| AddEmbeddingsToXmlTest.java | Extended test coverage to verify model name storage in XML documents |
| AddEmbeddingsToJsonTest.java | Extended test coverage to verify model name storage in JSON documents |
| AddEmbeddingsFromTextTest.java | Extended test coverage to verify model name storage in text-based embeddings |
| TestEmbeddingModel.java | Updated test implementation to match new interface signature |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * @param embedding | ||
| * @param modelName |
There was a problem hiding this comment.
The parameter documentation is incomplete. Add descriptions explaining what each parameter represents. For example: '@param embedding the vector embedding data as a float array' and '@param modelName the name of the model used to generate the embedding'.
| * @param embedding | |
| * @param modelName | |
| * @param embedding the vector embedding data associated with this chunk | |
| * @param modelName the name of the model used to generate the embedding |
5674a29 to
8b2ed75
Compare
This is using embeddingModel.getModelName() in the LangChain4j API, and it will allow for Nuclia integration to easily add the model name found in each chunk response.
8b2ed75 to
293d991
Compare
This is using embeddingModel.getModelName() in the LangChain4j API, and it will allow for Nuclia integration to easily add the model name found in each chunk response.