-
Notifications
You must be signed in to change notification settings - Fork 2k
Add observability support to VectorStore #1205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add observability support to VectorStore #1205
Conversation
83e0413 to
75c5a9a
Compare
| DB_OPERATION_NAME { | ||
| @Override | ||
| public String asString() { | ||
| return "db.operation.name"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding this attributes coming from the OTel Conventions (db.*) to the AiObservationAttributes enum.
| QUERY { | ||
| @Override | ||
| public String asString() { | ||
| return "db.query.text"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see that this is used to add the text that is eventually translated into embeddings, and not the actual database query. Perhaps we should rename this to something else? How about db.vector.query.content? I expect the instrumentation for the lower-level database to use the db.query.text attribute for the db-specific query like SELECT *, embedding <=> ? AS distance FROM public.vector_store WHERE embedding <=> ? < ? ORDER BY distance LIMIT ? (getting this from PGVector when using the "net.ttddyy.observation:datasource-micrometer-spring-boot" library)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db.vector.query.content sounds quite reasonable to me.
| DELETE_REQUEST { | ||
| @Override | ||
| public String asString() { | ||
| return "db.vector.delete.request"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to include the content of the documents we add/delete? I'm not sure about a possible use case for this data, it's typically not something included in database instrumentation. I can see it's also not included in LLM products like OpenLLMetry/TraceLoop and OpenLit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The add/delete content is optional (e.g. requires filters) and is disabled by default.
My though was that if you want to debug/trace or provide providence for you app you might need to enable logging/trace this data as well.
But if you and Mark consider that this is too rare need I'm Ok to drop it?
| QUERY_RESPONSE { | ||
| @Override | ||
| public String asString() { | ||
| return "db.vector.query.response"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is used for the documents returned in the response, I would suggest changing it to db.vector.query.response.documents, and establish a namespace (db.vector.query.response.*) we extend in the future to observe response metadata, response dimension, response vectors, and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds like a good idea.
| QUERY_METADATA_FILTER { | ||
| @Override | ||
| public String asString() { | ||
| return "db.query.metadata.filter"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would keep using the vector namespace here and name the attribute db.vector.query.metadata.filter or even just db.vector.query.filter
| } | ||
|
|
||
| // DELETE | ||
| private List<String> deleteRequest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list of fields might grow a lot once Spring AI add support for more operations (like for updating embeddings for documents that changed). Consider if they can be grouped into some Records perhaps, or even maybe have a base VectorStoreObservationContext (that might be enough for add/delete operations if we remove the need to track the Documents) and specialised implementation whenever relevant (e.g. QueryVectorStoreObservationContext for semantic search)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Conventions and Documentations can remain the same, all grouped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually intended this initially but then seemed like unnecessary complexity because of the shared context attributes.
I will rethink it again.
| * @author Christian Tzolov | ||
| * @since 1.0.0 | ||
| */ | ||
| public class VectorStoreAddRequestContentObservationFilter implements ObservationFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider not having this and remove the related attribute, unless it's needed for some specific use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as #1205 (comment)
| * @author Christian Tzolov | ||
| * @since 1.0.0 | ||
| */ | ||
| public class VectorStoreDeleteRequestContentObservationFilter implements ObservationFilter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like for the "add" scenario, I would consider not having this and remove the related attribute, unless it's needed for some specific use case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as #1205 (comment)
| .withDimensions(this.embeddingDimensions()) | ||
| .withCollectionName(this.vectorTableName) | ||
| .withNamespace(this.schemaName) | ||
| .withSimilarityMetric(this.getDistanceType().name()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The OTel Conventions have defined a list of well known values for the similarity metric. I would suggest defining a VectorStoreSimilarityMetric enum in the same package as AiOperationType and use those values if possible. That will allow LLM observability backend to recognise the known values. Here, we would need to map between the DB specific name and the convention name.
| .withNamespace(this.schemaName) | ||
| .withSimilarityMetric(this.getDistanceType().name()) | ||
| .withIndexName(this.createIndexMethod.name()) | ||
| .withModel(this.embeddingModel.getClass().getSimpleName()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This attribute should be the model name used for computing embedding (e.g. text-embedding-3-small). It's optional, so if we cannot get the value for now, I would leave this field empty. Also, since we have already instrumented the EmbeddingModel, we already have the model name in the trace as part of the embedding span, so we might even argue whether we want/need this here. It might be needed for cases where it's the vector database itself providing the embedding model, but as far as I know it's not something supported by the Spring AI APIs yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left this open, until the Options PR is merged.
But even now it the EmbeddingModel does not expose getDefaultOptions method (similar to the ChatOptions) to show, at least, the default method name.
So we should either add the get options to the EmbeddingModel or drop this attribute all together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've decide to remove the Model attribute for now. We can consider adding it later if needed.
| @Override | ||
| public VectorStoreObservationContext.Builder createObservationContextBuilder(String operationName) { | ||
|
|
||
| return VectorStoreObservationContext.builder("pg_vector", operationName) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider introducing a VectorStoreProvider enum in the same package as AiProvider, and have the "pg_vector" value as part of that enum (but still leaving the field type as String, similar to how we handle the model provider in the LLM observations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will create an issue to consider this change once this PR is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added entry for the Simple Vector Store as well.
| @ConfigurationProperties(VectorStoreObservationProperties.CONFIG_PREFIX) | ||
| public class VectorStoreObservationProperties { | ||
|
|
||
| public static final String CONFIG_PREFIX = "spring.ai.vector.store.observations"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config properties for vector stores use the prefix spring.ai.vectorstore.*. Should this be using the same namespace?
| public static final String CONFIG_PREFIX = "spring.ai.vector.store.observations"; | |
| public static final String CONFIG_PREFIX = "spring.ai.vectorstore.observations"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, matching the extra 'dot' is not needed, it implies a sort of grouping under 'vector' as if there could be something else other than 'store'
7804ffa to
b58801b
Compare
- Introduce AbstractObservationVectorStore for instrumentation. Add observe instrumentation for the add, delete and the similaritySearch methods. - Add VectorStoreObservationContext for capturing operation details. - Implement DefaultVectorStoreObservationConvention for naming and tagging - Create filters for add and delete request content observation Implement VectorStoreQueryResponseContentObservationFilter. - Update VectorStore interface with getName() method, returning the class name by default. - Add VectorStoreObservationDocumentation for defining observation keys. - Create VectorStoreObservationAutoConfiguration for auto-configuring observations. - Add VectorStoreObservationProperties for configuring observation behavior. They are used to control the optional, additional, observationcontent filters. - Update PgVectorStoreAutoConfiguration to support observations. - Modify PgVectorStore to extend AbstractObservationVectorStore - Add observation support to PgVectorStore's Builder. Resolves spring-projects#1204
… QUERY values. Add tests for the VectorStore contex, convention and filters.
Add PgVectorObservationIT
b58801b to
a0a71d2
Compare
|
Looks great! |
Resolves #1204