This document describes the implemented distributed guidance approach for the data engineering cookiecutter template, creating an AI-first, ready-to-work foundation.
- Single guidance file limitations: One large CLAUDE.md becomes unwieldy and lacks context-specific guidance
- Generic advice: Without layer-specific context, AI assistants provide generic rather than domain-focused help
- Discovery friction: Developers working in specific areas (dags/, dbt/, etc.) need immediate, relevant guidance
Outer Layer (Template Repository)
- File:
/CLAUDE.md - Focus: Cookiecutter template mechanics, DevContainer setup, template development
- Audience: Template maintainers and developers working on the template itself
Inner Layer (Generated Project)
- Files: Multiple
CLAUDE.mdfiles distributed throughout project structure - Focus: Domain-specific data engineering patterns and layer-specific guidance
- Audience: Data engineers working on the generated project
# Template Repository (Outer Layer)
data-eng-template/
├── CLAUDE.md # Template development guidance
├── cookiecutter.json # Variables with author_name, db_* fields
└── {{cookiecutter.repo_slug}}/ # Generated project structure
├── CLAUDE.md # Project-level guidance
├── dags/CLAUDE.md # Airflow DAG patterns
├── dbt/CLAUDE.md # dbt modeling guidance
├── transforms/CLAUDE.md # SQLModel/Pydantic patterns
└── scripts/CLAUDE.md # Operational utilities guidance
# Generated Project (Inner Layer)
awesome-data-project/
├── CLAUDE.md # "Awesome Data Project - Data Engineering Project"
├── dags/CLAUDE.md # "Airflow DAG Development Guidance"
├── dbt/CLAUDE.md # "dbt Modeling Guidance for Awesome Data Project"
├── transforms/CLAUDE.md # "SQLModel and Pydantic Data Models"
└── scripts/CLAUDE.md # "Helper Scripts for Awesome Data Project"
- Cookiecutter variable management and template structure
- DevContainer configuration and service orchestration
- Template testing and validation strategies
- Cross-cutting architectural decisions (tool versions, integration patterns)
- File editing workarounds and template-specific issues
- ChatGPT collaboration workflow for template improvements
Project Root (CLAUDE.md)
- Project mission and medallion architecture overview
- Development environment setup (Airflow UI, database connections)
- Cross-layer workflow patterns and testing strategies
- Layer navigation guide (pointing to specific CLAUDE.md files)
DAGs Layer (dags/CLAUDE.md)
- Airflow-specific patterns: DAG organization, naming conventions
- Task group patterns, error handling, and retry logic
- Dataset dependencies and cross-DAG coordination
- Integration with dbt runs and data quality checks
dbt Layer (dbt/CLAUDE.md)
- Medallion architecture modeling conventions (bronze/silver/gold)
- Naming patterns:
_bronze__source__table,_silver__domain__entity - SCD Type 2 implementation, incremental model strategies
- Testing patterns and data quality validation
Transforms Layer (transforms/CLAUDE.md)
- SQLModel and Pydantic patterns for type-safe data validation
- Bronze/Silver/Gold model characteristics and validation levels
- Database integration patterns and performance optimization
- API interface design for data consumption
Scripts Layer (scripts/CLAUDE.md)
- Operational script standards and error handling patterns
- Development, operations, and maintenance script organization
- Configuration management and logging standardization
- Database utilities and performance monitoring tools
- Problem: Generic guidance regardless of working context
- Solution: Layer-specific guidance provides relevant context when Claude is asked for help
- Example: Working in
dags/directory gets Airflow-specific patterns, not generic advice
- Problem: Single large file becomes unmaintainable
- Solution: Distributed files can be independently maintained and expanded
- Example: Adding new dbt patterns only requires updating
dbt/CLAUDE.md
- Problem: Finding relevant guidance in large documents
- Solution: Guidance co-located with code provides immediate assistance
- Example: Developer in
transforms/directory immediately sees SQLModel guidance
- Problem: AI assistants lack project-specific context
- Solution: Each CLAUDE.md provides full context for stateless AI interactions
- Example: Each file includes project variables, tool versions, and domain context
- Problem: Template changes require manual documentation updates
- Solution: Cookiecutter variables ensure generated guidance stays current
- Example:
{{cookiecutter.airflow_version}}automatically reflects chosen version
Challenge: dbt Jinja syntax ({{ ref('model') }}) conflicts with cookiecutter templates
Solution: Use {% raw %}...{% endraw %} blocks or separate detailed examples into non-templated files
Challenge: Missing cookiecutter variables cause generation failures
Solution: Comprehensive cookiecutter.json with all necessary project variables:
{
"project_name": "Awesome Data Project",
"repo_slug": "awesome-data-project",
"author_name": "Data Engineering Team",
"db_name": "{{ cookiecutter.repo_slug.replace('-', '_') }}",
"db_user": "postgres",
"db_password": "postgres"
}Challenge: Balancing detailed guidance with template generation complexity Solution: Start with simplified guidance that works, then add detailed examples separately
Challenge: Validating distributed guidance approach Solution:
- Test cookiecutter generation:
cookiecutter . --no-input - Verify all CLAUDE.md files are created with correct variable substitution
- Validate project structure matches intended distribution
- Work in main template directory with outer layer
CLAUDE.md - Focus on cookiecutter mechanics, DevContainer configuration
- Test template generation frequently
- Update inner layer guidance when adding new patterns
- Start with project root
CLAUDE.mdfor overall context - Navigate to layer-specific
CLAUDE.mdfiles when working in specific areas - Use layer-specific context when asking Claude for assistance
- Reference detailed patterns in domain-specific guidance files
- Template Work: Use outer layer guidance for cookiecutter, DevContainer, and template concerns
- Generated Project Work: Use inner layer guidance for domain-specific data engineering patterns
- Context Awareness: Each CLAUDE.md provides full stateless context for the specific layer
- Navigation: Follow guidance file references for comprehensive patterns
The distributed guidance strategy successfully achieved:
✅ Template Generation: Cookiecutter generates projects with complete guidance distribution
✅ Variable Resolution: All {{cookiecutter.*}} variables properly resolve in generated files
✅ Context Separation: Clear distinction between template and domain concerns
✅ Layer-Specific Guidance: Each directory provides relevant, focused assistance
✅ AI-Ready Foundation: Generated projects include comprehensive AI interaction context
- Create comprehensive
CLAUDE_DETAILED.mdfiles with escaped dbt syntax - Include complete code examples and advanced patterns
- Provide industry best practices and troubleshooting guides
- Use cookiecutter hooks to generate layer-specific content based on chosen features
- Customize guidance based on selected tools and configurations
- Include environment-specific deployment guidance
- Link to external resources and documentation
- Provide runnable examples and quick-start commands
- Include troubleshooting decision trees and diagnostic tools
The distributed AI guidance strategy transforms the data engineering template from a basic scaffolding tool into a comprehensive AI-first, ready-to-work foundation. By separating template concerns from domain expertise and distributing guidance contextually throughout the project structure, developers get immediate, relevant assistance exactly where they need it.
This approach scales with project complexity, supports template evolution, and provides a superior developer experience by embedding professional data engineering patterns directly into the project structure with full AI assistant integration.