-
Notifications
You must be signed in to change notification settings - Fork 2.1k
feat: add distributed embedding table #42258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
ee/hogai/llm_traces_summaries/tools/embed_summaries.py, line 82 (link)logic: After the migration, new embeddings will be written to
sharded_posthog_document_embeddings, but this query still references the oldposthog_document_embeddingstable. This will cause new embeddings to not be found.
6 files reviewed, 1 comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR migrates the document embeddings table from a single replicated architecture to a distributed sharded architecture to support higher frequency data ingestion.
- Introduces sharded table architecture with separate read and write distributed tables
- Updates all references to use the new distributed table for reads
- Provides migration to transition from old to new table structure while preserving existing data
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| products/error_tracking/backend/embedding.py | Adds sharded and distributed table definitions with cityHash64 sharding key; updates table SQL generation functions to create sharded architecture |
| posthog/hogql_queries/test/test_document_embeddings_query_runner.py | Updates test queries to use distributed_posthog_document_embeddings table |
| posthog/hogql/database/schema/document_embeddings.py | Updates schema references to point to distributed table for HogQL queries |
| posthog/clickhouse/schema.py | Adds imports for new distributed and writable table SQL functions |
| posthog/clickhouse/migrations/0182_shard_document_embeddings.py | Migration script that drops old MV/Kafka/writable tables and creates new sharded infrastructure |
| posthog/clickhouse/migrations/max_migration.txt | Updates migration tracker to 0182 |
| posthog/clickhouse/test/snapshots/test_schema.ambr | Auto-generated snapshot updates reflecting new table schemas |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2ef3659 to
ecbfcaf
Compare
We want to use this for more high frequency data.