Distributed AI Guidance Strategy

This document describes the implemented distributed guidance approach for the data engineering cookiecutter template, creating an AI-first, ready-to-work foundation.

Strategy Overview

Problem Solved

Single guidance file limitations: One large CLAUDE.md becomes unwieldy and lacks context-specific guidance
Generic advice: Without layer-specific context, AI assistants provide generic rather than domain-focused help
Discovery friction: Developers working in specific areas (dags/, dbt/, etc.) need immediate, relevant guidance

Solution: Two-Layer Guidance Distribution

Outer Layer (Template Repository)

File: /CLAUDE.md
Focus: Cookiecutter template mechanics, DevContainer setup, template development
Audience: Template maintainers and developers working on the template itself

Inner Layer (Generated Project)

Files: Multiple CLAUDE.md files distributed throughout project structure
Focus: Domain-specific data engineering patterns and layer-specific guidance
Audience: Data engineers working on the generated project

Implementation Structure

# Template Repository (Outer Layer)
data-eng-template/
├── CLAUDE.md                           # Template development guidance
├── cookiecutter.json                   # Variables with author_name, db_* fields
└── {{cookiecutter.repo_slug}}/         # Generated project structure
    ├── CLAUDE.md                       # Project-level guidance
    ├── dags/CLAUDE.md                  # Airflow DAG patterns
    ├── dbt/CLAUDE.md                   # dbt modeling guidance  
    ├── transforms/CLAUDE.md            # SQLModel/Pydantic patterns
    └── scripts/CLAUDE.md               # Operational utilities guidance

# Generated Project (Inner Layer)
awesome-data-project/
├── CLAUDE.md                           # "Awesome Data Project - Data Engineering Project"
├── dags/CLAUDE.md                      # "Airflow DAG Development Guidance"
├── dbt/CLAUDE.md                       # "dbt Modeling Guidance for Awesome Data Project"
├── transforms/CLAUDE.md                # "SQLModel and Pydantic Data Models"
└── scripts/CLAUDE.md                   # "Helper Scripts for Awesome Data Project"

Content Distribution Strategy

Outer Layer Content (Template Development)

Cookiecutter variable management and template structure
DevContainer configuration and service orchestration
Template testing and validation strategies
Cross-cutting architectural decisions (tool versions, integration patterns)
File editing workarounds and template-specific issues
ChatGPT collaboration workflow for template improvements

Inner Layer Content (Domain-Specific)

Project Root (CLAUDE.md)

Project mission and medallion architecture overview
Development environment setup (Airflow UI, database connections)
Cross-layer workflow patterns and testing strategies
Layer navigation guide (pointing to specific CLAUDE.md files)

DAGs Layer (dags/CLAUDE.md)

Airflow-specific patterns: DAG organization, naming conventions
Task group patterns, error handling, and retry logic
Dataset dependencies and cross-DAG coordination
Integration with dbt runs and data quality checks

dbt Layer (dbt/CLAUDE.md)

Medallion architecture modeling conventions (bronze/silver/gold)
Naming patterns: _bronze__source__table, _silver__domain__entity
SCD Type 2 implementation, incremental model strategies
Testing patterns and data quality validation

Transforms Layer (transforms/CLAUDE.md)

SQLModel and Pydantic patterns for type-safe data validation
Bronze/Silver/Gold model characteristics and validation levels
Database integration patterns and performance optimization
API interface design for data consumption

Scripts Layer (scripts/CLAUDE.md)

Operational script standards and error handling patterns
Development, operations, and maintenance script organization
Configuration management and logging standardization
Database utilities and performance monitoring tools

Key Benefits Achieved

1. Contextual AI Assistance

Problem: Generic guidance regardless of working context
Solution: Layer-specific guidance provides relevant context when Claude is asked for help
Example: Working in dags/ directory gets Airflow-specific patterns, not generic advice

2. Scalable Documentation

Problem: Single large file becomes unmaintainable
Solution: Distributed files can be independently maintained and expanded
Example: Adding new dbt patterns only requires updating dbt/CLAUDE.md

3. Discovery and Navigation

Problem: Finding relevant guidance in large documents
Solution: Guidance co-located with code provides immediate assistance
Example: Developer in transforms/ directory immediately sees SQLModel guidance

4. AI-First Development Experience

Problem: AI assistants lack project-specific context
Solution: Each CLAUDE.md provides full context for stateless AI interactions
Example: Each file includes project variables, tool versions, and domain context

5. Template Evolution Support

Problem: Template changes require manual documentation updates
Solution: Cookiecutter variables ensure generated guidance stays current
Example: {{cookiecutter.airflow_version}} automatically reflects chosen version

Implementation Lessons Learned

1. Cookiecutter Template Syntax Conflicts

Challenge: dbt Jinja syntax ({{ ref('model') }}) conflicts with cookiecutter templates Solution: Use {% raw %}...{% endraw %} blocks or separate detailed examples into non-templated files

2. Variable Completeness

Challenge: Missing cookiecutter variables cause generation failures Solution: Comprehensive cookiecutter.json with all necessary project variables:

{
  "project_name": "Awesome Data Project",
  "repo_slug": "awesome-data-project", 
  "author_name": "Data Engineering Team",
  "db_name": "{{ cookiecutter.repo_slug.replace('-', '_') }}",
  "db_user": "postgres",
  "db_password": "postgres"
}

3. Simplicity vs Comprehensiveness

Challenge: Balancing detailed guidance with template generation complexity Solution: Start with simplified guidance that works, then add detailed examples separately

4. Testing Strategy

Challenge: Validating distributed guidance approach Solution:

Test cookiecutter generation: cookiecutter . --no-input
Verify all CLAUDE.md files are created with correct variable substitution
Validate project structure matches intended distribution

Usage Patterns

For Template Developers

Work in main template directory with outer layer CLAUDE.md
Focus on cookiecutter mechanics, DevContainer configuration
Test template generation frequently
Update inner layer guidance when adding new patterns

For Data Engineers (Generated Project Users)

Start with project root CLAUDE.md for overall context
Navigate to layer-specific CLAUDE.md files when working in specific areas
Use layer-specific context when asking Claude for assistance
Reference detailed patterns in domain-specific guidance files

For AI Assistants (Claude)

Template Work: Use outer layer guidance for cookiecutter, DevContainer, and template concerns
Generated Project Work: Use inner layer guidance for domain-specific data engineering patterns
Context Awareness: Each CLAUDE.md provides full stateless context for the specific layer
Navigation: Follow guidance file references for comprehensive patterns

Success Metrics

The distributed guidance strategy successfully achieved:

✅ Template Generation: Cookiecutter generates projects with complete guidance distribution ✅ Variable Resolution: All {{cookiecutter.*}} variables properly resolve in generated files
✅ Context Separation: Clear distinction between template and domain concerns ✅ Layer-Specific Guidance: Each directory provides relevant, focused assistance ✅ AI-Ready Foundation: Generated projects include comprehensive AI interaction context

Future Enhancements

1. Detailed Pattern Libraries

Create comprehensive CLAUDE_DETAILED.md files with escaped dbt syntax
Include complete code examples and advanced patterns
Provide industry best practices and troubleshooting guides

2. Dynamic Content Generation

Use cookiecutter hooks to generate layer-specific content based on chosen features
Customize guidance based on selected tools and configurations
Include environment-specific deployment guidance

3. Interactive Documentation

Link to external resources and documentation
Provide runnable examples and quick-start commands
Include troubleshooting decision trees and diagnostic tools

Conclusion

The distributed AI guidance strategy transforms the data engineering template from a basic scaffolding tool into a comprehensive AI-first, ready-to-work foundation. By separating template concerns from domain expertise and distributing guidance contextually throughout the project structure, developers get immediate, relevant assistance exactly where they need it.

This approach scales with project complexity, supports template evolution, and provides a superior developer experience by embedding professional data engineering patterns directly into the project structure with full AI assistant integration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Distributed AI Guidance Strategy

Strategy Overview

Problem Solved

Solution: Two-Layer Guidance Distribution

Implementation Structure

Content Distribution Strategy

Outer Layer Content (Template Development)

Inner Layer Content (Domain-Specific)

Key Benefits Achieved

1. Contextual AI Assistance

2. Scalable Documentation

3. Discovery and Navigation

4. AI-First Development Experience

5. Template Evolution Support

Implementation Lessons Learned

1. Cookiecutter Template Syntax Conflicts

2. Variable Completeness

3. Simplicity vs Comprehensiveness

4. Testing Strategy

Usage Patterns

For Template Developers

For Data Engineers (Generated Project Users)

For AI Assistants (Claude)

Success Metrics

Future Enhancements

1. Detailed Pattern Libraries

2. Dynamic Content Generation

3. Interactive Documentation

Conclusion

FilesExpand file tree

DISTRIBUTED_GUIDANCE_STRATEGY.md

Latest commit

History

DISTRIBUTED_GUIDANCE_STRATEGY.md

File metadata and controls

Distributed AI Guidance Strategy

Strategy Overview

Problem Solved

Solution: Two-Layer Guidance Distribution

Implementation Structure

Content Distribution Strategy

Outer Layer Content (Template Development)

Inner Layer Content (Domain-Specific)

Key Benefits Achieved

1. Contextual AI Assistance

2. Scalable Documentation

3. Discovery and Navigation

4. AI-First Development Experience

5. Template Evolution Support

Implementation Lessons Learned

1. Cookiecutter Template Syntax Conflicts

2. Variable Completeness

3. Simplicity vs Comprehensiveness

4. Testing Strategy

Usage Patterns

For Template Developers

For Data Engineers (Generated Project Users)

For AI Assistants (Claude)

Success Metrics

Future Enhancements

1. Detailed Pattern Libraries

2. Dynamic Content Generation

3. Interactive Documentation

Conclusion