-
Notifications
You must be signed in to change notification settings - Fork 2k
Cassandra vector store builder refactoring #1912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cassandra vector store builder refactoring #1912
Conversation
- Deprecate CassandraVectorStoreConfig - Move its contents to a top level Builder class within the CassandraVectorStore Restructuring packages for Cassandra vector store and chat memory classes
|
@michaelsembwever We are trying to make the vector store builder pattern consistent. See the recently merged chroma vector store for a template. Cassandra vector store seems to be more involved. Since you are the original author of the cassandra vector store, can you take a look at this PR and see what you think? Thank you! |
|
|
||
| import org.springframework.ai.chat.memory.CassandraChatMemory; | ||
| import org.springframework.ai.chat.memory.CassandraChatMemoryConfig; | ||
| import org.springframework.ai.chat.memory.cassandra.CassandraChatMemory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are all vectorstore modules moving their classes to subpackages ? (i don't see that yet with pgvector. has there been some discussion/agreement to this somewhere we can link to ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we are in the process of moving the vector store classes under that structure - in this case - org.springframework.ai.chat.memory.cassandra and springframework.ai.vectorstore.cassandra. The issue was that the vector store classes currently use the same package name in all the different modules - springframework.ai.vectorstore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was hoping that there was some link you could provide to where the discussion had been had and landed on that conclusion and action. but it's ok, i trust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
next time we will reach out to you, i wish we had a more centralized way to communicate changes to frequent contributors. we are working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't read the diff here (because the change is clobbering the move).
would it be possible to do the move in a separate commit, to make the reviewing simpler…?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't see any diff there^ 😔
| * https://github.com/apache/cassandra-java-driver/tree/4.x/manual/core/configuration | ||
| * | ||
| */ | ||
| public static class CassandraBuilder extends AbstractVectorStoreBuilder<CassandraBuilder> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does CassandraBuilder have to be an inner class ?
no strong opinion, but i gut instinct is it's big enough to warrant being a separate java file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can consider that - but wanted to make the usage consistent across all the vector stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would suggest that, given the lines of code and that the builder has a little more logic in it the a super simple builder. (i had previously intentionally kept the classes separate for the sake of encapsulation between startup/configuration versus runtime behaviour).
it's not a hill i will die on. if you take past PoVs into account, then i'm happy it's your call to make.
| String[] parts = id.split("§¶"); | ||
| String title = parts[0]; | ||
| int chunk_no = 0 < parts.length ? Integer.parseInt(parts[1]) : 0; | ||
| int chunk_no = parts.length > 0 ? Integer.parseInt(parts[1]) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert that.
| String[] parts = id.split("§¶"); | ||
| String title = parts[0]; | ||
| int chunk_no = 0 < parts.length ? Integer.parseInt(parts[1]) : 0; | ||
| int chunk_no = parts.length > 0 ? Integer.parseInt(parts[1]) : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will revert that.
| .disallowSchemaChanges(!properties.isInitializeSchema()) | ||
| .returnEmbeddings(properties.getReturnEmbeddings()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a behaviour change here, in that we're no longer respecting the builder's default, imposing the autoConfiguration's default over it. (the builder's default should be canonical)
i'm not sure how important this is… 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, we should be consistent. I'll create a new issue to capture this and discuss.
|
@sobychacko the overall direction of the refactoring looks good to me, nice work. Threw in a few questions. The reviewing need to separate config-build vs package renaming is the big one. |
|
@michaelsembwever Yes, we are moving to the builder pattern for all the vector store classes to make it consistent. Started with chroma,and there are a few PR's out there that's performing this migration on the other vector stores. Before the M5 release, we hope to complete this migration. |
|
Thanks for your patience and understanding @michaelsembwever I've created #1949 to track the remaining comment in the discussion. I've merged this after updating ref doc and class javadoc. Merged in 25123a5 |
|
@michaelsembwever linking the original epic here: #1825 |
Restructuring packages for Cassandra vector store and chat memory classes