Skip to content

Conversation

@514Ben
Copy link
Contributor

@514Ben 514Ben commented Jan 22, 2026

This pull request significantly expands the data ingestion documentation for Moose by introducing comprehensive guidance on real-time Change Data Capture (CDC) from SAP HANA, alongside detailed explanations of CDC concepts, architectural patterns, and practical setup steps. The changes help users understand when and how to use CDC versus traditional batch loading, and provide hands-on instructions for implementing real-time data pipelines.

Major documentation improvements:

New CDC ingestion guide and setup instructions:

  • Added a step-by-step guide for setting up real-time CDC from SAP HANA to ClickHouse, including prerequisites, pipeline installation, configuration, model generation, CDC infrastructure initialization, pipeline operation, monitoring, troubleshooting, and performance considerations.
  • Clarified Moose's support for multiple ingestion patterns (batch, CDC, streaming, API) and linked to the new CDC section for real-time requirements.

CDC concepts and architecture:

  • Introduced a new section explaining Change Data Capture (CDC), including how it works, the differences between CDC and traditional ETL, and various CDC implementation patterns (trigger-based, log-based, query-based).
  • Detailed the SAP HANA CDC architecture, highlighting the flow from source tables through triggers and CDC tables to the Moose workflow and ClickHouse, along with benefits and trade-offs.

Decision guidance and performance:

  • Provided a comparison table and guidance for choosing between CDC and batch loading based on latency, efficiency, complexity, and use case. [1] [2]
  • Added notes on performance, resource usage, and troubleshooting common CDC issues.

These changes make the documentation much more actionable for users needing real-time data synchronization and deepen the conceptual understanding of CDC patterns.


Note

Expands the data ingestion docs with a full real-time CDC path from SAP HANA to ClickHouse and adds CDC concepts to the architecture section.

  • Adds "Option 3: Real-Time CDC from SAP HANA" with prerequisites, installation via 514 Labs registry, env config, model generation, trigger setup, pipeline run/monitoring, verification, troubleshooting, and performance notes
  • Introduces CDC overview in Part 3 (how it works, CDC vs ETL, implementation patterns, SAP HANA trigger-based architecture with ReplacingMergeTree) including benefits/trade-offs
  • Clarifies Moose supports multiple ingestion patterns (batch, CDC, streaming, API) and positions this guide around batch while linking to the CDC path
  • Provides decision guidance with a CDC vs batch comparison table and when-to-use recommendations

Written by Cursor Bugbot for commit 5f2586e. This will update automatically on new commits. Configure here.

@vercel
Copy link

vercel bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
docs-v2 Ready Ready Preview, Comment Jan 28, 2026 4:04pm

Request Review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 22, 2026

Walkthrough

A new real-time CDC integration section was added to the data warehouses guide, introducing SAP HANA change-data-capture with rationale, prerequisites, step-by-step setup, architecture overview, operational guidance, and troubleshooting. The guide also now lists multiple ingestion patterns and points to the CDC section for real-time needs.

Changes

Cohort / File(s) Summary
CDC Documentation
apps/framework-docs-v2/content/guides/data-warehouses.mdx
Added "Option 3: Real-Time CDC from SAP HANA" (~+345 lines) including rationale, prerequisites, installation and configuration steps, model generation, trigger/CDC infra, startup/monitoring/verification, architecture description (CDC triggers/tables, Moose workflow, Temporal, ClickHouse), operational guidance (initial load, sync, pruning, troubleshooting), CDC vs batch comparison, and duplicated insertion for additional context. Also added a brief overview of supported ingestion patterns near the data-warehouse intro.

Sequence Diagram(s)

sequenceDiagram
    participant SAP as SAP HANA
    participant Triggers as CDC Triggers
    participant CDC_Tables as CDC Tables
    participant Moose as Moose Workflow
    participant Temporal as Temporal
    participant CH as ClickHouse
    participant Monitor as Monitoring

    SAP->>Triggers: Write change events (INSERT/UPDATE/DELETE)
    Triggers->>CDC_Tables: Persist change records
    CDC_Tables->>Moose: Poll/stream CDC records
    Moose->>Temporal: Enqueue processing tasks
    Temporal->>Moose: Execute workflow steps
    Moose->>CH: Apply changes (upsert/prune)
    Moose->>Monitor: Emit metrics/logs
    CH->>Monitor: Expose ingestion metrics
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Suggested reviewers

  • callicles

Poem

CDC whispers through the stream,
SAP's small changes stitch the seam,
Moose and Temporal hum in time,
ClickHouse keeps the skyline prime. ✨

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the main change: integrating SAP HANA CDC into the data warehouse documentation guide.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description check ✅ Passed The PR description comprehensively details the changes: new SAP HANA CDC guide, CDC concepts, architecture patterns, and decision guidance between CDC and batch loading. It directly relates to the file changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/framework-docs-v2/content/guides/data-warehouses.mdx (1)

1-5: Add required frontmatter (title/description).
This guide is missing the mandatory frontmatter for content/guides/*.mdx.

✅ Suggested fix
+---
+title: "Building Your First Data Warehouse"
+description: "Build a ClickHouse-based analytics warehouse with batch loading and optional SAP HANA CDC."
+---
+
 # Building Your First Data Warehouse
As per coding guidelines, guides must include frontmatter with `title` and `description`.
🤖 Fix all issues with AI agents
In `@apps/framework-docs-v2/content/guides/data-warehouses.mdx`:
- Around line 1148-1156: Replace the unhyphenated compound modifiers in the
text: change "60 second interval" to "60-second interval" and "7 day default" to
"7-day default" so compound adjectives are hyphenated correctly; locate the
phrases "60 second interval" and "7 day default" in the Ongoing Sync / Resource
Usage bullet list and update them accordingly.
- Around line 992-1009: The Python code block containing class Ekko and its
__moose_config__ OlapTable declaration needs the `@test` directive so the
snippet is validated; update the fenced code block opening from ```python to
```python `@test` (the block that starts with "from moose_lib import OlapTable,
Key" and defines class Ekko and __moose_config__) so the documentation test
harness will run this snippet.
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 979f410 and 499e3d5.

📒 Files selected for processing (1)
  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
🧰 Additional context used
📓 Path-based instructions (2)
apps/framework-docs-v2/content/**/*.mdx

📄 CodeRabbit inference engine (apps/framework-docs-v2/CLAUDE.md)

apps/framework-docs-v2/content/**/*.mdx: Use {{ include "shared/path.mdx" }} directives to reuse content fragments, which are processed via processIncludes() during build
Validate code snippets in documentation with the @test directive for TypeScript and Python code blocks
TypeScript code snippets in documentation should be validated for syntax with brace matching; Python snippets should be validated for indentation

Files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
apps/framework-docs-v2/content/guides/**/*.mdx

📄 CodeRabbit inference engine (apps/framework-docs-v2/CLAUDE.md)

Guide MDX files in content/guides/ must include frontmatter with title and description fields

Files:

  • apps/framework-docs-v2/content/guides/data-warehouses.mdx
🪛 LanguageTool
apps/framework-docs-v2/content/guides/data-warehouses.mdx

[grammar] ~1149-~1149: Use a hyphen to join words.
Context: ...Ongoing Sync**: - Sub-minute latency (60 second interval) - Scales to thousands o...

(QB_NEW_EN_HYPHEN)


[grammar] ~1155-~1155: Use a hyphen to join words.
Context: ...size - Pruning keeps CDC tables small (7 day default) - Redis memory: ~1MB per 10...

(QB_NEW_EN_HYPHEN)

✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.

514Ben and others added 4 commits January 27, 2026 17:13
The SAP HANA CDC implementation uses database triggers and doesn't require Redis for state tracking. Updated prerequisites to reflect actual CREATE TABLE and CREATE TRIGGER permissions needed, rather than misleading "CDC permissions" reference.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Updated documentation to use the new, more intuitive command flags:
- --generate-models (instead of --recreate-moose-models)
- --create-database-triggers (instead of --init-cdc)
- --init-all (new quick start option to run both steps)

Added Quick Start tip showing how to run both model generation and trigger creation in a single command.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@514Ben 514Ben force-pushed the 514Ben/add-sap-hana-cdc-to-data-warehouse-guide branch from aa2602c to 000e2de Compare January 27, 2026 22:13
@514Ben 514Ben enabled auto-merge January 28, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants