Skip to content

Conversation

@tzolov
Copy link
Contributor

@tzolov tzolov commented Aug 11, 2024

  • Introduce AbstractObservationVectorStore for instrumentation. Add observe instrumentation for the add, delete and the similaritySearch methods.
  • Add VectorStoreObservationContext for capturing operation details.
  • Implement DefaultVectorStoreObservationConvention for naming and tagging
  • Create filters for add and delete request content observation Implement VectorStoreQueryResponseContentObservationFilter.
  • Update VectorStore interface with getName() method, returning the class name by default.
  • Add VectorStoreObservationDocumentation for defining observation keys.
  • Create VectorStoreObservationAutoConfiguration for auto-configuring observations.
  • Add VectorStoreObservationProperties for configuring observation behavior. They are used to control the optional, additional, observationcontent filters.
  • Update PgVectorStoreAutoConfiguration to support observations.
  • Modify PgVectorStore to extend AbstractObservationVectorStore
  • Add observation support to PgVectorStore's Builder.

Resolves #1204

@tzolov tzolov added this to the 1.0.0-M2 milestone Aug 11, 2024
@tzolov tzolov changed the title [WIP] Add observability support to VectorStore Add observability support to VectorStore Aug 11, 2024
@tzolov tzolov marked this pull request as ready for review August 11, 2024 08:30
@tzolov tzolov force-pushed the vector-store-observation-support branch from 83e0413 to 75c5a9a Compare August 11, 2024 16:51
DB_OPERATION_NAME {
@Override
public String asString() {
return "db.operation.name";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding this attributes coming from the OTel Conventions (db.*) to the AiObservationAttributes enum.

QUERY {
@Override
public String asString() {
return "db.query.text";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see that this is used to add the text that is eventually translated into embeddings, and not the actual database query. Perhaps we should rename this to something else? How about db.vector.query.content? I expect the instrumentation for the lower-level database to use the db.query.text attribute for the db-specific query like SELECT *, embedding <=> ? AS distance FROM public.vector_store WHERE embedding <=> ? < ? ORDER BY distance LIMIT ? (getting this from PGVector when using the "net.ttddyy.observation:datasource-micrometer-spring-boot" library)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db.vector.query.content sounds quite reasonable to me.

DELETE_REQUEST {
@Override
public String asString() {
return "db.vector.delete.request";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to include the content of the documents we add/delete? I'm not sure about a possible use case for this data, it's typically not something included in database instrumentation. I can see it's also not included in LLM products like OpenLLMetry/TraceLoop and OpenLit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The add/delete content is optional (e.g. requires filters) and is disabled by default.
My though was that if you want to debug/trace or provide providence for you app you might need to enable logging/trace this data as well.
But if you and Mark consider that this is too rare need I'm Ok to drop it?

QUERY_RESPONSE {
@Override
public String asString() {
return "db.vector.query.response";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is used for the documents returned in the response, I would suggest changing it to db.vector.query.response.documents, and establish a namespace (db.vector.query.response.*) we extend in the future to observe response metadata, response dimension, response vectors, and so on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that sounds like a good idea.

QUERY_METADATA_FILTER {
@Override
public String asString() {
return "db.query.metadata.filter";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep using the vector namespace here and name the attribute db.vector.query.metadata.filter or even just db.vector.query.filter

}

// DELETE
private List<String> deleteRequest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list of fields might grow a lot once Spring AI add support for more operations (like for updating embeddings for documents that changed). Consider if they can be grouped into some Records perhaps, or even maybe have a base VectorStoreObservationContext (that might be enough for add/delete operations if we remove the need to track the Documents) and specialised implementation whenever relevant (e.g. QueryVectorStoreObservationContext for semantic search)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conventions and Documentations can remain the same, all grouped

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually intended this initially but then seemed like unnecessary complexity because of the shared context attributes.
I will rethink it again.

* @author Christian Tzolov
* @since 1.0.0
*/
public class VectorStoreAddRequestContentObservationFilter implements ObservationFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider not having this and remove the related attribute, unless it's needed for some specific use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #1205 (comment)

* @author Christian Tzolov
* @since 1.0.0
*/
public class VectorStoreDeleteRequestContentObservationFilter implements ObservationFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like for the "add" scenario, I would consider not having this and remove the related attribute, unless it's needed for some specific use case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as #1205 (comment)

.withDimensions(this.embeddingDimensions())
.withCollectionName(this.vectorTableName)
.withNamespace(this.schemaName)
.withSimilarityMetric(this.getDistanceType().name())
Copy link
Contributor

@ThomasVitale ThomasVitale Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OTel Conventions have defined a list of well known values for the similarity metric. I would suggest defining a VectorStoreSimilarityMetric enum in the same package as AiOperationType and use those values if possible. That will allow LLM observability backend to recognise the known values. Here, we would need to map between the DB specific name and the convention name.

See: https://github.com/open-telemetry/semantic-conventions/pull/1231/files#diff-c3afcbe6f6acbeb5582972b0c76d7aebe0cfc01313c0ed92919b2c18ffe76785R216

.withNamespace(this.schemaName)
.withSimilarityMetric(this.getDistanceType().name())
.withIndexName(this.createIndexMethod.name())
.withModel(this.embeddingModel.getClass().getSimpleName());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This attribute should be the model name used for computing embedding (e.g. text-embedding-3-small). It's optional, so if we cannot get the value for now, I would leave this field empty. Also, since we have already instrumented the EmbeddingModel, we already have the model name in the trace as part of the embedding span, so we might even argue whether we want/need this here. It might be needed for cases where it's the vector database itself providing the embedding model, but as far as I know it's not something supported by the Spring AI APIs yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this open, until the Options PR is merged.
But even now it the EmbeddingModel does not expose getDefaultOptions method (similar to the ChatOptions) to show, at least, the default method name.
So we should either add the get options to the EmbeddingModel or drop this attribute all together.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've decide to remove the Model attribute for now. We can consider adding it later if needed.

@Override
public VectorStoreObservationContext.Builder createObservationContextBuilder(String operationName) {

return VectorStoreObservationContext.builder("pg_vector", operationName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider introducing a VectorStoreProvider enum in the same package as AiProvider, and have the "pg_vector" value as part of that enum (but still leaving the field type as String, similar to how we handle the model provider in the LLM observations).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will create an issue to consider this change once this PR is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added entry for the Simple Vector Store as well.

@ConfigurationProperties(VectorStoreObservationProperties.CONFIG_PREFIX)
public class VectorStoreObservationProperties {

public static final String CONFIG_PREFIX = "spring.ai.vector.store.observations";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config properties for vector stores use the prefix spring.ai.vectorstore.*. Should this be using the same namespace?

Suggested change
public static final String CONFIG_PREFIX = "spring.ai.vector.store.observations";
public static final String CONFIG_PREFIX = "spring.ai.vectorstore.observations";

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, matching the extra 'dot' is not needed, it implies a sort of grouping under 'vector' as if there could be something else other than 'store'

@tzolov tzolov force-pushed the vector-store-observation-support branch from 7804ffa to b58801b Compare August 15, 2024 13:37
tzolov added 8 commits August 16, 2024 08:38
 - Introduce AbstractObservationVectorStore for instrumentation.
   Add observe instrumentation for the add, delete and the similaritySearch methods.
 - Add VectorStoreObservationContext for capturing operation details.
 - Implement DefaultVectorStoreObservationConvention for naming and tagging
 - Create filters for add and delete request content observation
   Implement VectorStoreQueryResponseContentObservationFilter.
 - Update VectorStore interface with getName() method, returning the class name by default.
 - Add VectorStoreObservationDocumentation for defining observation keys.
 - Create VectorStoreObservationAutoConfiguration for auto-configuring observations.
 - Add VectorStoreObservationProperties for configuring observation behavior.
   They are used to control the optional, additional, observationcontent filters.
 - Update PgVectorStoreAutoConfiguration to support observations.
 - Modify PgVectorStore to extend AbstractObservationVectorStore
 - Add observation support to PgVectorStore's Builder.

 Resolves spring-projects#1204
… QUERY values.

Add tests for the VectorStore contex, convention and filters.
@tzolov tzolov force-pushed the vector-store-observation-support branch from b58801b to a0a71d2 Compare August 16, 2024 06:38
@ThomasVitale
Copy link
Contributor

Looks great!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Observability] Add observability support for VectorStores

3 participants