|
| 1 | +--- |
| 2 | +name: build-data-spec |
| 3 | +description: "Build a structured data spec document for analytics topics by exploring the dbt codebase, identifying relevant events/models/columns, and producing a ready-to-use markdown reference. Use when: (1) starting a new data analysis, (2) documenting events for a feature or domain, (3) creating a reference for an agent or analyst." |
| 4 | +tags: [analytics, dbt, documentation, data-spec] |
| 5 | +metadata: |
| 6 | + author: aheden |
| 7 | + version: "1.0" |
| 8 | +--- |
| 9 | + |
| 10 | +# Build Data Spec |
| 11 | + |
| 12 | +## Target Repository |
| 13 | + |
| 14 | +``` |
| 15 | +~/dwh-data-model-transforms |
| 16 | +``` |
| 17 | + |
| 18 | +Remote: `origin` -> `github.com/Lightricks/dwh-data-model-transforms.git` |
| 19 | +Default branch: `develop` |
| 20 | + |
| 21 | +**All file reads, searches, and explorations must target this directory**, regardless of which repo this skill is invoked from. Use absolute paths (e.g., `~/dwh-data-model-transforms/models/`) or set working directory before running commands. |
| 22 | + |
| 23 | +## When to Use |
| 24 | + |
| 25 | +- User wants to analyze a feature, domain, or event category |
| 26 | +- User needs a reference document for an agent or analyst |
| 27 | +- User says "pull all events for X", "create a data spec", "document events for Y" |
| 28 | +- Starting an analysis that requires understanding which tables, columns, and filters to use |
| 29 | + |
| 30 | + |
| 31 | +## Output |
| 32 | + |
| 33 | +A markdown file saved to `~/ltx-analytics-agents/docs/{feature}_spec.md`. |
| 34 | + |
| 35 | +**Naming convention:** Use the feature name in snake_case (e.g., `brand_kits_spec.md`, `gen_space_spec.md`, `failed_generations_spec.md`). This filename must match what other agents (e.g., dashboard-builder) use to look up the spec. |
| 36 | + |
| 37 | +## Workflow |
| 38 | + |
| 39 | +### Phase 1: Scope the Topic |
| 40 | + |
| 41 | +Clarify with the user: |
| 42 | + |
| 43 | +1. **What to analyze** -- the feature, domain, or event category (e.g., "failed generations", "brand kit usage", "export events") |
| 44 | +2. **Which product** -- LTX Studio, LTX Model, API |
| 45 | +3. **Breadth** -- single feature deep-dive or cross-feature overview |
| 46 | + |
| 47 | +If the user's request is broad, propose a focused scope before proceeding. |
| 48 | + |
| 49 | +### Phase 2: Explore the Codebase |
| 50 | + |
| 51 | +Search systematically across all layers in `~/dwh-data-model-transforms`. Use parallel exploration agents for speed. |
| 52 | + |
| 53 | +**Search targets (in priority order):** |
| 54 | + |
| 55 | +All paths below are relative to `~/dwh-data-model-transforms/`. |
| 56 | + |
| 57 | +| Layer | Where to Look | What to Extract | |
| 58 | +|-------|--------------|-----------------| |
| 59 | +| **Event registry** | `docs/event-registry.yaml` | Canonical event names, key properties, status | |
| 60 | +| **Mart models** | `models/**/marts/` | Final columns, filters, action_name/action_category mapping | |
| 61 | +| **Intermediate models** | `models/**/intermediate/` | Business logic, joins, derived columns | |
| 62 | +| **Base models** | `models/base/` | Raw source columns, process_started/ended pairs | |
| 63 | +| **Macros** | `macros/` | Extraction logic, parsing, field derivation | |
| 64 | +| **Source definitions** | `models/sources.yml` | Raw event table names | |
| 65 | +| **Existing specs** | `docs/*_spec.md` | Related specs to cross-reference | |
| 66 | + |
| 67 | +**Search strategies:** |
| 68 | + |
| 69 | +- Filename search: `Glob` for model names containing the topic keyword |
| 70 | +- Content search: `Grep` for column names, event names, action categories |
| 71 | +- Semantic search: "How does X work?" scoped to relevant directories |
| 72 | +- YAML search: Look at `.yml` files alongside `.sql` for column descriptions and tests |
| 73 | + |
| 74 | +**Read priority:** Always read the SQL model files, not just YMLs. The SQL reveals: |
| 75 | +- Actual column derivation logic (CASE statements, COALESCEs, joins) |
| 76 | +- Filter conditions that define the event scope |
| 77 | +- Macro calls that generate columns |
| 78 | +- Incremental predicates and partition fields |
| 79 | + |
| 80 | +### Phase 3: Read Key Models |
| 81 | + |
| 82 | +For each relevant model found in Phase 2, read the full `.sql` file to extract: |
| 83 | + |
| 84 | +1. **TL;DR block** -- model purpose and key features |
| 85 | +2. **Config block** -- partition_by, cluster_by, schema, tags |
| 86 | +3. **Column definitions** -- all SELECT columns with their derivation logic |
| 87 | +4. **Filter conditions** -- WHERE clauses that scope the data |
| 88 | +5. **Join logic** -- how tables connect (especially start/end event joins) |
| 89 | +6. **Macro calls** -- which macros generate columns (read the macro too) |
| 90 | + |
| 91 | +Also read the `.yml` file for: |
| 92 | +- Column descriptions (especially "In this table:" context) |
| 93 | +- Accepted values tests (reveal valid column values) |
| 94 | +- Data quality tests (reveal important constraints) |
| 95 | + |
| 96 | +### Phase 4: Compile the Spec Document |
| 97 | + |
| 98 | +Write `~/ltx-analytics-agents/docs/{feature}_spec.md` following the structure in [references/spec-template.md](references/spec-template.md). |
| 99 | + |
| 100 | +**Required sections:** |
| 101 | + |
| 102 | +1. **Title + metadata** -- topic, last updated date |
| 103 | +2. **Overview** -- what the spec covers, key definitions |
| 104 | +3. **Primary tables** -- fully-qualified BigQuery table names, partition/cluster info |
| 105 | +4. **Key columns** -- organized by category (error/result, context, timing, parameters, user) |
| 106 | +5. **Filtering patterns** -- ready-to-use WHERE clauses for common scenarios |
| 107 | +6. **Sample analysis queries** -- 4-6 BigQuery queries answering likely questions |
| 108 | +7. **Model lineage** -- ASCII diagram showing source -> base -> intermediate -> mart flow |
| 109 | +8. **Key macros** -- macros involved in column derivation |
| 110 | +9. **Important notes** -- gotchas, caveats, edge cases |
| 111 | + |
| 112 | +**Writing guidelines:** |
| 113 | + |
| 114 | +- Use fully-qualified BigQuery table names (`` `project.schema.table` ``) |
| 115 | +- Include column types (STRING, BOOLEAN, INT64, TIMESTAMP, FLOAT64) |
| 116 | +- Show accepted values inline when known from tests |
| 117 | +- Always include `NOT is_lt_team` in example queries |
| 118 | +- Use partition column in WHERE for cost efficiency |
| 119 | +- Provide both simple filters and full analysis queries |
| 120 | + |
| 121 | +### Phase 5: Validate Completeness |
| 122 | + |
| 123 | +Before finalizing, check: |
| 124 | + |
| 125 | +- [ ] Every column referenced in queries exists in the column tables |
| 126 | +- [ ] Filtering patterns cover the main use cases the user described |
| 127 | +- [ ] Sample queries are syntactically valid BigQuery SQL |
| 128 | +- [ ] Lineage diagram traces from source through to mart |
| 129 | +- [ ] Important notes capture non-obvious behavior (nullability, edge cases, timing windows) |
| 130 | +- [ ] Table names use correct project/schema from model config blocks |
| 131 | + |
| 132 | +## Existing Specs as Reference |
| 133 | + |
| 134 | +Current data spec documents in the project: |
| 135 | + |
| 136 | +| File | Topic | Good Example Of | |
| 137 | +|------|-------|-----------------| |
| 138 | +| `docs/gen_space_events_spec.md` | Gen Space activity | Filtering patterns, page_workspace breakdown | |
| 139 | +| `docs/brand_kits_events_spec.md` | Brand Kit events | Event-to-column mapping, action_category usage | |
| 140 | +| `docs/gen_space_lightbox_actions_spec.md` | Lightbox/asset actions | UI-to-event mapping, cross-feature coverage | |
| 141 | +| `docs/ltxstudio_failed_generations_spec.md` | Failed generations | Error analysis, multi-layer column tracking | |
| 142 | + |
| 143 | +Read these for style and depth calibration when creating a new spec. |
| 144 | + |
| 145 | +## Checklist |
| 146 | + |
| 147 | +- [ ] Topic scoped and confirmed with user |
| 148 | +- [ ] Codebase explored across all layers (mart -> intermediate -> base -> macro) |
| 149 | +- [ ] Key model SQL files read (not just YMLs) |
| 150 | +- [ ] Spec document written with all required sections |
| 151 | +- [ ] Queries validated for correct table names and syntax |
| 152 | +- [ ] File saved to `~/ltx-analytics-agents/docs/{feature}_spec.md` |
0 commit comments