Skip to content

Conversation

@oliverb123
Copy link
Contributor

We want to use this for more high frequency data.

@oliverb123 oliverb123 marked this pull request as ready for review November 28, 2025 15:07
Copilot AI review requested due to automatic review settings November 28, 2025 15:07
@oliverb123 oliverb123 requested review from a team as code owners November 28, 2025 15:07
Copilot finished reviewing on behalf of oliverb123 November 28, 2025 15:10
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (1)

  1. ee/hogai/llm_traces_summaries/tools/embed_summaries.py, line 82 (link)

    logic: After the migration, new embeddings will be written to sharded_posthog_document_embeddings, but this query still references the old posthog_document_embeddings table. This will cause new embeddings to not be found.

6 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR migrates the document embeddings table from a single replicated architecture to a distributed sharded architecture to support higher frequency data ingestion.

  • Introduces sharded table architecture with separate read and write distributed tables
  • Updates all references to use the new distributed table for reads
  • Provides migration to transition from old to new table structure while preserving existing data

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
products/error_tracking/backend/embedding.py Adds sharded and distributed table definitions with cityHash64 sharding key; updates table SQL generation functions to create sharded architecture
posthog/hogql_queries/test/test_document_embeddings_query_runner.py Updates test queries to use distributed_posthog_document_embeddings table
posthog/hogql/database/schema/document_embeddings.py Updates schema references to point to distributed table for HogQL queries
posthog/clickhouse/schema.py Adds imports for new distributed and writable table SQL functions
posthog/clickhouse/migrations/0182_shard_document_embeddings.py Migration script that drops old MV/Kafka/writable tables and creates new sharded infrastructure
posthog/clickhouse/migrations/max_migration.txt Updates migration tracker to 0182
posthog/clickhouse/test/snapshots/test_schema.ambr Auto-generated snapshot updates reflecting new table schemas

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@oliverb123 oliverb123 force-pushed the distributed-embedding-table branch from 2ef3659 to ecbfcaf Compare November 28, 2025 15:28
@oliverb123 oliverb123 merged commit e90f0bf into master Nov 28, 2025
177 of 178 checks passed
@oliverb123 oliverb123 deleted the distributed-embedding-table branch November 28, 2025 16:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants