Skip to content

Commit fdd2727

Browse files
authored
Merge pull request #36 from Lightricks/aheden/build-data-spec-skill
feat: add build-data-spec skill + fix segmentation CTE case sensitivity
2 parents 9f82501 + d9e3ce6 commit fdd2727

File tree

6 files changed

+724
-2
lines changed

6 files changed

+724
-2
lines changed
Lines changed: 152 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,152 @@
1+
---
2+
name: build-data-spec
3+
description: "Build a structured data spec document for analytics topics by exploring the dbt codebase, identifying relevant events/models/columns, and producing a ready-to-use markdown reference. Use when: (1) starting a new data analysis, (2) documenting events for a feature or domain, (3) creating a reference for an agent or analyst."
4+
tags: [analytics, dbt, documentation, data-spec]
5+
metadata:
6+
author: aheden
7+
version: "1.0"
8+
---
9+
10+
# Build Data Spec
11+
12+
## Target Repository
13+
14+
```
15+
~/dwh-data-model-transforms
16+
```
17+
18+
Remote: `origin` -> `github.com/Lightricks/dwh-data-model-transforms.git`
19+
Default branch: `develop`
20+
21+
**All file reads, searches, and explorations must target this directory**, regardless of which repo this skill is invoked from. Use absolute paths (e.g., `~/dwh-data-model-transforms/models/`) or set working directory before running commands.
22+
23+
## When to Use
24+
25+
- User wants to analyze a feature, domain, or event category
26+
- User needs a reference document for an agent or analyst
27+
- User says "pull all events for X", "create a data spec", "document events for Y"
28+
- Starting an analysis that requires understanding which tables, columns, and filters to use
29+
30+
31+
## Output
32+
33+
A markdown file saved to `~/ltx-analytics-agents/docs/{feature}_spec.md`.
34+
35+
**Naming convention:** Use the feature name in snake_case (e.g., `brand_kits_spec.md`, `gen_space_spec.md`, `failed_generations_spec.md`). This filename must match what other agents (e.g., dashboard-builder) use to look up the spec.
36+
37+
## Workflow
38+
39+
### Phase 1: Scope the Topic
40+
41+
Clarify with the user:
42+
43+
1. **What to analyze** -- the feature, domain, or event category (e.g., "failed generations", "brand kit usage", "export events")
44+
2. **Which product** -- LTX Studio, LTX Model, API
45+
3. **Breadth** -- single feature deep-dive or cross-feature overview
46+
47+
If the user's request is broad, propose a focused scope before proceeding.
48+
49+
### Phase 2: Explore the Codebase
50+
51+
Search systematically across all layers in `~/dwh-data-model-transforms`. Use parallel exploration agents for speed.
52+
53+
**Search targets (in priority order):**
54+
55+
All paths below are relative to `~/dwh-data-model-transforms/`.
56+
57+
| Layer | Where to Look | What to Extract |
58+
|-------|--------------|-----------------|
59+
| **Event registry** | `docs/event-registry.yaml` | Canonical event names, key properties, status |
60+
| **Mart models** | `models/**/marts/` | Final columns, filters, action_name/action_category mapping |
61+
| **Intermediate models** | `models/**/intermediate/` | Business logic, joins, derived columns |
62+
| **Base models** | `models/base/` | Raw source columns, process_started/ended pairs |
63+
| **Macros** | `macros/` | Extraction logic, parsing, field derivation |
64+
| **Source definitions** | `models/sources.yml` | Raw event table names |
65+
| **Existing specs** | `docs/*_spec.md` | Related specs to cross-reference |
66+
67+
**Search strategies:**
68+
69+
- Filename search: `Glob` for model names containing the topic keyword
70+
- Content search: `Grep` for column names, event names, action categories
71+
- Semantic search: "How does X work?" scoped to relevant directories
72+
- YAML search: Look at `.yml` files alongside `.sql` for column descriptions and tests
73+
74+
**Read priority:** Always read the SQL model files, not just YMLs. The SQL reveals:
75+
- Actual column derivation logic (CASE statements, COALESCEs, joins)
76+
- Filter conditions that define the event scope
77+
- Macro calls that generate columns
78+
- Incremental predicates and partition fields
79+
80+
### Phase 3: Read Key Models
81+
82+
For each relevant model found in Phase 2, read the full `.sql` file to extract:
83+
84+
1. **TL;DR block** -- model purpose and key features
85+
2. **Config block** -- partition_by, cluster_by, schema, tags
86+
3. **Column definitions** -- all SELECT columns with their derivation logic
87+
4. **Filter conditions** -- WHERE clauses that scope the data
88+
5. **Join logic** -- how tables connect (especially start/end event joins)
89+
6. **Macro calls** -- which macros generate columns (read the macro too)
90+
91+
Also read the `.yml` file for:
92+
- Column descriptions (especially "In this table:" context)
93+
- Accepted values tests (reveal valid column values)
94+
- Data quality tests (reveal important constraints)
95+
96+
### Phase 4: Compile the Spec Document
97+
98+
Write `~/ltx-analytics-agents/docs/{feature}_spec.md` following the structure in [references/spec-template.md](references/spec-template.md).
99+
100+
**Required sections:**
101+
102+
1. **Title + metadata** -- topic, last updated date
103+
2. **Overview** -- what the spec covers, key definitions
104+
3. **Primary tables** -- fully-qualified BigQuery table names, partition/cluster info
105+
4. **Key columns** -- organized by category (error/result, context, timing, parameters, user)
106+
5. **Filtering patterns** -- ready-to-use WHERE clauses for common scenarios
107+
6. **Sample analysis queries** -- 4-6 BigQuery queries answering likely questions
108+
7. **Model lineage** -- ASCII diagram showing source -> base -> intermediate -> mart flow
109+
8. **Key macros** -- macros involved in column derivation
110+
9. **Important notes** -- gotchas, caveats, edge cases
111+
112+
**Writing guidelines:**
113+
114+
- Use fully-qualified BigQuery table names (`` `project.schema.table` ``)
115+
- Include column types (STRING, BOOLEAN, INT64, TIMESTAMP, FLOAT64)
116+
- Show accepted values inline when known from tests
117+
- Always include `NOT is_lt_team` in example queries
118+
- Use partition column in WHERE for cost efficiency
119+
- Provide both simple filters and full analysis queries
120+
121+
### Phase 5: Validate Completeness
122+
123+
Before finalizing, check:
124+
125+
- [ ] Every column referenced in queries exists in the column tables
126+
- [ ] Filtering patterns cover the main use cases the user described
127+
- [ ] Sample queries are syntactically valid BigQuery SQL
128+
- [ ] Lineage diagram traces from source through to mart
129+
- [ ] Important notes capture non-obvious behavior (nullability, edge cases, timing windows)
130+
- [ ] Table names use correct project/schema from model config blocks
131+
132+
## Existing Specs as Reference
133+
134+
Current data spec documents in the project:
135+
136+
| File | Topic | Good Example Of |
137+
|------|-------|-----------------|
138+
| `docs/gen_space_events_spec.md` | Gen Space activity | Filtering patterns, page_workspace breakdown |
139+
| `docs/brand_kits_events_spec.md` | Brand Kit events | Event-to-column mapping, action_category usage |
140+
| `docs/gen_space_lightbox_actions_spec.md` | Lightbox/asset actions | UI-to-event mapping, cross-feature coverage |
141+
| `docs/ltxstudio_failed_generations_spec.md` | Failed generations | Error analysis, multi-layer column tracking |
142+
143+
Read these for style and depth calibration when creating a new spec.
144+
145+
## Checklist
146+
147+
- [ ] Topic scoped and confirmed with user
148+
- [ ] Codebase explored across all layers (mart -> intermediate -> base -> macro)
149+
- [ ] Key model SQL files read (not just YMLs)
150+
- [ ] Spec document written with all required sections
151+
- [ ] Queries validated for correct table names and syntax
152+
- [ ] File saved to `~/ltx-analytics-agents/docs/{feature}_spec.md`
Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,188 @@
1+
# Data Spec Template
2+
3+
Use this template when creating a new data spec document. Adapt sections as needed for the specific topic -- not every spec needs every section, and some topics will need additional sections.
4+
5+
---
6+
7+
## File Naming
8+
9+
```
10+
docs/{feature}_spec.md
11+
```
12+
13+
Examples:
14+
- `docs/ltxstudio_failed_generations_spec.md`
15+
- `docs/gen_space_events_spec.md`
16+
- `docs/brand_kits_events_spec.md`
17+
18+
---
19+
20+
## Template
21+
22+
```markdown
23+
# {Feature/Topic} - Data Spec
24+
25+
**Last Updated**: {Month Year}
26+
**Context**: {Optional link to Figma, ticket, or brief context}
27+
28+
---
29+
30+
## Overview
31+
32+
{1-3 paragraphs explaining what this spec covers, key definitions, and the core question it helps answer.}
33+
34+
---
35+
36+
## Primary Table(s)
37+
38+
### Recommended: {Layer} Layer
39+
40+
\`\`\`
41+
{fully-qualified BigQuery table name}
42+
\`\`\`
43+
44+
{1-2 sentences on what it contains and why it's recommended.}
45+
46+
**Partition**: `{column}` ({type}, {granularity})
47+
**Cluster**: `{col1}`, `{col2}`, `{col3}`
48+
49+
### Alternative: {Layer} Layer (optional section)
50+
51+
\`\`\`
52+
{alternative table if applicable}
53+
\`\`\`
54+
55+
{When to use this instead.}
56+
57+
---
58+
59+
## Key Columns
60+
61+
### {Category 1} Columns (e.g., Error & Result, Event Identity, etc.)
62+
63+
| Column | Type | Description | Available In |
64+
|--------|------|-------------|-------------|
65+
| `column_name` | STRING | Description of what it contains | Both tables / Mart only |
66+
67+
### {Category 2} Columns (e.g., Context, Timing, Parameters)
68+
69+
| Column | Type | Description |
70+
|--------|------|-------------|
71+
| `column_name` | TYPE | Description |
72+
73+
{Repeat for as many categories as needed. Common groupings:}
74+
{- Error/Result columns}
75+
{- Event identity columns (action_name, action_category)}
76+
{- Timing columns}
77+
{- Feature parameter columns}
78+
{- User & context columns}
79+
80+
---
81+
82+
## Filtering Patterns
83+
84+
### {Scenario 1} (e.g., "All X events")
85+
86+
\`\`\`sql
87+
SELECT *
88+
FROM \`project.schema.table\`
89+
WHERE {conditions}
90+
AND NOT is_lt_team
91+
AND date(action_ts) >= CURRENT_DATE() - 30
92+
\`\`\`
93+
94+
### {Scenario 2}
95+
96+
\`\`\`sql
97+
...
98+
\`\`\`
99+
100+
{Include 3-5 filtering patterns covering the most common analysis scenarios.}
101+
102+
---
103+
104+
## Sample Analysis Queries
105+
106+
### {Analysis 1} (e.g., "Daily event counts")
107+
108+
\`\`\`sql
109+
SELECT
110+
date(action_ts) AS dt,
111+
{dimensions},
112+
COUNT(*) AS total,
113+
{metrics}
114+
FROM \`project.schema.table\`
115+
WHERE {conditions}
116+
AND NOT is_lt_team
117+
AND date(action_ts) >= CURRENT_DATE() - 30
118+
GROUP BY 1, 2
119+
ORDER BY 1 DESC
120+
\`\`\`
121+
122+
{Include 4-6 queries answering the most likely analysis questions.}
123+
{Each query should have a descriptive heading.}
124+
125+
---
126+
127+
## Model Lineage
128+
129+
\`\`\`
130+
Sources (BigQuery raw events)
131+
|
132+
+-- {source_table} --> {base_model}
133+
| |
134+
| v
135+
| {intermediate_model}
136+
| |
137+
| v
138+
| {mart_model} [MART]
139+
\`\`\`
140+
141+
---
142+
143+
## Key Macros (optional -- include when macros are involved)
144+
145+
| Macro | Location | Purpose |
146+
|-------|----------|---------|
147+
| `macro_name` | `macros/path/to/macro.sql` | What it does |
148+
149+
---
150+
151+
## Important Notes
152+
153+
1. **{Gotcha 1}**: Explanation of non-obvious behavior.
154+
155+
2. **{Gotcha 2}**: Edge case or caveat.
156+
157+
{Include 3-6 notes covering:}
158+
{- Column semantics that are easy to confuse}
159+
{- Filtering gotchas (e.g., excluded event types)}
160+
{- Null behavior and edge cases}
161+
{- Internal user exclusion reminders}
162+
{- Timing windows or data freshness caveats}
163+
```
164+
165+
---
166+
167+
## Style Guidelines
168+
169+
1. **Table names**: Always fully qualified with backtick escaping in queries (`` \`project.schema.table\` ``)
170+
2. **Column types**: Use BigQuery types (STRING, INT64, FLOAT64, BOOLEAN, TIMESTAMP, DATE, ARRAY)
171+
3. **Accepted values**: Show inline when known (e.g., `'success'` or `'failure'`)
172+
4. **Queries**: Always include `NOT is_lt_team` and partition-based date filter
173+
5. **Headings**: Use `###` for sub-sections within main `##` sections
174+
6. **Tables**: Use markdown tables for structured column/event documentation
175+
7. **Code blocks**: Use `sql` language tag for queries, plain ``` for table names and diagrams
176+
8. **Lineage**: ASCII art with arrows for flow direction
177+
9. **Tone**: Direct and reference-like -- this is a lookup document, not a tutorial
178+
179+
## Calibration
180+
181+
Look at these existing specs for style and depth:
182+
183+
- `docs/gen_space_events_spec.md` -- medium complexity, good filtering patterns
184+
- `docs/brand_kits_events_spec.md` -- feature-specific, event-to-column mapping
185+
- `docs/ltxstudio_failed_generations_spec.md` -- multi-layer analysis, error handling
186+
- `docs/gen_space_lightbox_actions_spec.md` -- UI-to-event mapping, comprehensive
187+
188+
Match the depth and structure of the most similar existing spec.

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ If the request doesn't clearly match an agent, ask the user which they need.
2626
|---|---|---|
2727
| Create a PR, open a pull request, ship it, submit for review | **Create PR** | `.claude/skills/create-pr/SKILL.md` |
2828
| Create a new skill, automate a repeatable task, document a team pattern | **Create Skill** | `.claude/skills/create-skill/SKILL.md` |
29+
| Build a data spec, document events for a feature, pull all events for X | **Build Data Spec** | `.claude/skills/build-data-spec/SKILL.md` |
2930

3031
## Shared Knowledge
3132

agents/dashboard-builder/SKILL.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,13 @@ PHASE 4: VALIDATE → QA Report + Slack notification
7272
- Extract any context that helps understand the feature's purpose, target users, or success criteria
7373
- Note any stakeholder comments about what metrics they care about
7474

75-
6. **Present Feature Brief** using template from `agents/dashboard-builder/templates/feature-brief.md`:
75+
6. **Build or load a Data Spec:**
76+
- Check if `docs/{feature}_spec.md` already exists
77+
- If it exists, read it — it contains verified table names, key columns, filtering patterns, and sample queries
78+
- If it doesn't exist, follow the build-data-spec skill (`.claude/skills/build-data-spec/SKILL.md`) to create one by exploring the dbt codebase at `~/dwh-data-model-transforms`
79+
- The data spec feeds directly into Phase 2: use its table names, column details, and filtering patterns when building the Hex Context Brief
80+
81+
7. **Present Feature Brief** using template from `agents/dashboard-builder/templates/feature-brief.md`:
7682
- List all events with verification status (✅ registry / ⚠️ code only / ❓ unverified)
7783
- Propose funnel steps
7884
- Propose retention activation + return events
@@ -350,6 +356,8 @@ After each metric executes in Hex, review the results and check for:
350356
### Agent-specific
351357
| File | Read in phase |
352358
|------|---------------|
359+
| `docs/{feature}_spec.md` (if exists) | Phase 1, Phase 2 |
360+
| `.claude/skills/build-data-spec/SKILL.md` | Phase 1 (if spec missing) |
353361
| `agents/dashboard-builder/hex-prompts/patterns.md` | Phase 3 |
354362
| `agents/dashboard-builder/templates/feature-brief.md` | Phase 1 |
355363
| `agents/dashboard-builder/templates/dashboard-spec.md` | Phase 2 |

0 commit comments

Comments
 (0)