DataRecce
diff --git a/‎.claude/agents/content-reviewer.md‎
Lines changed: 50 additions & 0 deletions b/‎.claude/agents/content-reviewer.md‎
Lines changed: 50 additions & 0 deletions
diff --git a/‎.claude/agents/intention-recorder.md‎
Lines changed: 34 additions & 0 deletions b/‎.claude/agents/intention-recorder.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎analyze_image_usage.py‎
Lines changed: 74 additions & 0 deletions b/‎analyze_image_usage.py‎
Lines changed: 74 additions & 0 deletions
diff --git a/‎claude/terminology.md‎
Lines changed: 175 additions & 0 deletions b/‎claude/terminology.md‎
Lines changed: 175 additions & 0 deletions
@@ -0,0 +1,50 @@
+---
+name: content-reviewer
+description: Use this agent when you need to review, edit, or draft documentation content following established writing principles. Examples: <example>Context: User has written a technical guide and wants it reviewed for clarity and adherence to writing standards. user: "I've finished writing the API documentation. Can you review it for clarity and consistency?" assistant: "I'll use the content-reviewer agent to review your API documentation against our writing principles and provide editing suggestions."</example> <example>Context: User needs help drafting new documentation content. user: "I need to create a user guide for our new feature" assistant: "Let me use the content-reviewer agent to help you draft a user guide that follows our established writing principles and documentation standards."</example> <example>Context: User wants to improve existing documentation. user: "This README file feels unclear and hard to follow" assistant: "I'll use the content-reviewer agent to analyze the README and suggest improvements based on our writing principles."</example>
+model: sonnet
+---
+
+You are a professional content reviewer and editor specializing in technical documentation. Your expertise lies in applying established writing principles to create clear, accessible, and effective documentation.
+
+Your core responsibilities:
+
+**Content Review & Analysis**:
+- Evaluate existing content against writing principles for clarity, structure, and effectiveness
+- Identify areas where content fails to meet established standards
+- Assess audience appropriateness and technical accuracy
+- Check for consistency in tone, style, and formatting
+
+**Editorial Excellence**:
+- Apply writing principles systematically to improve content quality
+- Ensure logical flow and coherent structure throughout documents
+- Optimize for readability while maintaining technical precision
+- Eliminate redundancy, ambiguity, and unnecessary complexity
+
+**Content Creation & Drafting**:
+- Create new documentation content following established writing principles
+- Structure information hierarchically for maximum comprehension
+- Adapt writing style to match intended audience and purpose
+- Integrate examples, code snippets, and visual elements effectively
+
+**Quality Assurance Process**:
+1. **Initial Assessment**: Analyze content purpose, audience, and current state
+2. **Principle Application**: Apply relevant writing principles systematically
+3. **Structural Review**: Evaluate organization, flow, and information hierarchy
+4. **Language Optimization**: Refine clarity, conciseness, and accessibility
+5. **Consistency Check**: Ensure uniform style, tone, and formatting
+6. **Final Validation**: Verify all improvements align with writing principles
+
+**Writing Principles Integration**:
+- Always reference and apply the specific writing principles provided in project context
+- Explain how suggested changes align with established principles
+- Prioritize clarity and user comprehension over technical complexity
+- Maintain consistency with existing documentation standards
+- Review one by one as the sequence as mkdocs.yml. When we finish one, we'll move on to the next one. 
+
+**Output Standards**:
+- Provide specific, actionable feedback with clear rationale
+- Offer concrete examples of improvements
+- Explain how changes support better user experience
+- Include both high-level structural suggestions and detailed line edits when appropriate
+
+You approach every piece of content with the goal of making it more accessible, accurate, and effective for its intended audience while strictly adhering to established writing principles.
@@ -0,0 +1,34 @@
+---
+name: intention-recorder
+description: Use this agent when you need to document and track the user's intentions behind documentation creation, modification, or review processes. This agent should be called after conversations with content-reviewer or general agents to capture the underlying purpose and goals behind documentation work. Examples: <example>Context: User has been working with content-reviewer agent on improving documentation and wants to record their intentions. user: "I just finished reviewing the API documentation with the content-reviewer agent. Can you help me record what I was trying to achieve?" assistant: "I'll use the intention-recorder agent to capture and document your goals and intentions from that documentation review session."</example> <example>Context: User has had multiple conversations about documentation and wants to track their evolving intentions. user: "After my chats with the general agent about project structure and content-reviewer about documentation quality, I want to record my overall intentions for this documentation effort." assistant: "Let me use the intention-recorder agent to systematically capture and organize your intentions from those conversations."</example>
+model: sonnet
+---
+
+You are an Intention Documentation Specialist, expert at capturing, analyzing, and recording the underlying purposes and goals behind user actions and decisions. Your role is to help users articulate and document their true intentions, especially after conversations with content-reviewer and general agents about documentation work.
+
+Your core responsibilities:
+1. **Intention Extraction**: Carefully analyze user conversations and interactions to identify underlying motivations, goals, and purposes
+2. **Context Analysis**: Review chat history and previous agent interactions to understand the full context of user intentions
+3. **Structured Documentation**: Create clear, organized records of user intentions that can be referenced later
+4. **Goal Clarification**: Help users articulate intentions they may not have fully expressed or realized
+5. **Pattern Recognition**: Identify recurring themes and evolving intentions across multiple conversations
+
+Your approach:
+- Ask clarifying questions to ensure you capture the complete picture of user intentions
+- Distinguish between stated goals and underlying motivations
+- Organize intentions hierarchically (primary goals, secondary objectives, supporting actions)
+- Include context about what prompted each intention
+- Note any constraints, preferences, or quality standards mentioned
+- Record both immediate and long-term intentions
+- Cross-reference with previous documentation work and agent conversations
+
+Output format:
+- Create structured intention records with clear categories
+- Include timestamps and context references
+- Organize by priority and relationship to other goals
+- Provide actionable summaries that can guide future work
+- Maintain a clear audit trail of intention evolution
+
+You'll provide output in the another folder that appear internally for my colleages to review this project
+
+You excel at helping users maintain clarity about their documentation goals and ensuring their true intentions are preserved and actionable for future reference.
@@ -0,0 +1,74 @@
+#!/usr/bin/env python3
+import os
+import re
+from collections import defaultdict
+from pathlib import Path
+
+def find_image_references(docs_path):
+    """Find all image references in markdown files and map them to sections."""
+    image_usage = defaultdict(list)
+    section_images = defaultdict(set)
+    
+    # Pattern to match image references
+    image_pattern = r'!\[.*?\]\((.*?assets/images/.*?)\)'
+    
+    for root, dirs, files in os.walk(docs_path):
+        for file in files:
+            if file.endswith('.md'):
+                file_path = os.path.join(root, file)
+                rel_path = os.path.relpath(file_path, docs_path)
+                
+                # Determine section from path
+                path_parts = rel_path.split(os.sep)
+                section = path_parts[0] if path_parts[0] != '.' else 'root'
+                
+                with open(file_path, 'r', encoding='utf-8') as f:
+                    content = f.read()
+                    matches = re.findall(image_pattern, content)
+                    
+                    for match in matches:
+                        # Normalize the path (remove ../ prefixes)
+                        normalized_path = match
+                        while normalized_path.startswith('../'):
+                            normalized_path = normalized_path[3:]
+                        
+                        image_usage[normalized_path].append({
+                            'file': rel_path,
+                            'section': section,
+                            'original_path': match
+                        })
+                        section_images[section].add(normalized_path)
+    
+    return image_usage, section_images
+
+def main():
+    docs_path = './docs'
+    image_usage, section_images = find_image_references(docs_path)
+    
+    print("=== IMAGE USAGE ANALYSIS ===\n")
+    
+    print("Images by section:")
+    for section, images in sorted(section_images.items()):
+        print(f"\n{section}:")
+        for img in sorted(images):
+            print(f"  - {img}")
+    
+    print("\n\n=== DETAILED IMAGE USAGE ===\n")
+    
+    for image_path, usages in sorted(image_usage.items()):
+        print(f"{image_path}:")
+        sections = set()
+        for usage in usages:
+            print(f"  Used in: {usage['file']} (section: {usage['section']})")
+            sections.add(usage['section'])
+        
+        if len(sections) == 1:
+            section = list(sections)[0]
+            print(f"  → Should move to: assets/images/{section}/")
+        else:
+            print(f"  → Used in multiple sections: {', '.join(sorted(sections))}")
+            print(f"  → Keep in: assets/images/shared/ or assess individual usage")
+        print()
+
+if __name__ == "__main__":
+    main()
@@ -0,0 +1,175 @@
+# Recce Documentation Terminology Guide
+
+This guide helps maintain consistent, data-team-friendly language across all Recce documentation.
+
+## Core Philosophy
+
+**Data teams think data-first, code-second.** Our terminology should reflect their mental models and avoid software engineering jargon that creates confusion.
+
+## Preferred Terminology
+
+### Recce-Specific Terms
+
+| **Use This** | **Not This** | **Context** |
+|-------------|-------------|-------------|
+| **Recce instance** | Recce server, Recce app | The UI launched by `recce server` |
+| **data validation** | data testing, data quality checks | Primary concept for what Recce does |
+| **validation results** | diff output, comparison data | What users see in Recce |
+| **impact analysis** | dependency analysis, lineage | Understanding downstream effects |
+| **data changes** | code changes, model changes | What users are validating |
+| **validation workflow** | testing workflow, QA process | How teams use Recce |
+| **diff** | comparison, delta | Data teams familiar with git diff, use freely |
+
+### Data vs Software Terms
+
+| **Data Team Friendly** | **Software Term** | **Why Different** |
+|------------------------|-------------------|-------------------|
+| **data warehouse** | database | Data teams distinguish warehouses from operational databases |
+| **development stage** | environment | "Environment" confuses (warehouse vs dev/prod) |
+| **data models** | components | dbt models vs software components |
+| **release changes** | deploy | Data teams "release" changes, don't "deploy" infrastructure |
+| **validation checks** | unit tests | Data quality checks vs code functionality tests |
+| **automated validation** | CI/CD pipeline | Data processing vs deployment automation |
+| **change review** | code review | Reviewing data changes vs code changes |
+| **diff** | comparison | Data teams understand diff from git/version control |
+
+### Business Impact Language
+
+| **Business Focused** | **Technical Focused** | **Impact** |
+|---------------------|----------------------|------------|
+| **build trust** | ensure quality | Emphasizes outcome over process |
+| **catch issues early** | prevent bugs | Prevention focus, business consequences |
+| **confident releases** | successful deployments | User empowerment over technical success |
+| **team collaboration** | workflow integration | People-first over tool-first |
+| **validate changes** | test modifications | Active validation vs passive testing |
+
+## Terms That Confuse Data Teams
+
+### 🚨 High Confusion Terms
+
+**Environment**
+- **Data team thinks**: Snowflake vs BigQuery warehouse
+- **Software team thinks**: dev/staging/prod deployment target
+- **✅ Use instead**: "development stage" or "dbt target"
+
+**Deploy**
+- **Data team thinks**: Infrastructure deployment (not their job)
+- **Software team thinks**: Release code changes
+- **✅ Use instead**: "release changes" or "make live"
+
+**Pipeline**
+- **Data team thinks**: Data transformation workflow (dbt run)
+- **Software team thinks**: CI/CD automation workflow
+- **✅ Use instead**: "data pipeline" vs "automation workflow"
+
+**Testing**
+- **Data team thinks**: Data quality validation
+- **Software team thinks**: Unit/integration tests for code
+- **✅ Use instead**: "validation" or "data quality checks"
+
+### ⚠️ Medium Confusion Terms
+
+**Model**
+- **Data context**: dbt data model (SQL transformation)
+- **Software context**: Software component or data structure
+- **✅ Clarify**: Always use "dbt model" or "data model"
+
+**Schema** 
+- **Data context**: Database schema (namespace for tables)
+- **Software context**: Data structure definition
+- **✅ Clarify**: "database schema" vs "data structure"
+
+**Target**
+- **Data context**: dbt profile target (dev/prod warehouse config)
+- **Software context**: Deployment target or goal
+- **✅ Clarify**: "dbt target" when referring to profiles.yml
+
+## Terminology Alert System
+
+When reviewing documentation, flag confusing terms with this format:
+
+```
+⚠️ **Terminology Alert**: [TERM]
+- **Confusion risk**: [Why data teams might misunderstand]
+- **Current usage**: [How it appears in content]
+- **Suggested clarification**: [Better phrasing or explanation]
+- **Context needed**: [When to add explanation]
+```
+
+### Examples
+
+```
+⚠️ **Terminology Alert**: "Deploy your changes"
+- **Confusion risk**: Data teams think infrastructure deployment
+- **Current usage**: "Deploy your dbt changes to production"
+- **Suggested clarification**: "Release your data changes to production"
+- **Context needed**: Always in data change contexts
+```
+
+```
+⚠️ **Terminology Alert**: "Test environment" 
+- **Confusion risk**: Could mean test warehouse vs test deployment stage
+- **Current usage**: "Run Recce in your test environment"
+- **Suggested clarification**: "Run Recce against your development data warehouse"
+- **Context needed**: When referring to data warehouse setup
+```
+
+## Clarification Patterns
+
+### Pattern 1: Define on First Use
+```markdown
+Recce validates your **data changes** (modifications to dbt models, seeds, or configurations) before they impact production.
+```
+
+### Pattern 2: Use Data Analogies
+```markdown
+Just like code reviews catch bugs before production, data validation catches issues before they affect business metrics.
+```
+
+### Pattern 3: Contrast Software vs Data
+```markdown
+While software teams deploy applications, data teams release model changes to their warehouse.
+```
+
+### Pattern 4: Add Contextual Clarifiers
+```markdown
+Configure your dbt target (the warehouse connection in profiles.yml) to point to your development environment.
+```
+
+## Maintenance Guidelines
+
+### Adding New Terms
+When introducing new terminology:
+1. **Check for confusion potential** - Could data teams misunderstand?
+2. **Define immediately** - Explain on first use
+3. **Use consistently** - Same term for same concept throughout
+4. **Add to this guide** - Update the preferred terminology table
+
+### Regular Reviews
+- **Monthly**: Review user questions for terminology confusion
+- **Quarterly**: Update based on support feedback and user research
+- **Major releases**: Ensure new features use data-team-friendly language
+
+### Quality Checks
+Before publishing, verify:
+- [ ] **No undefined jargon** - All technical terms explained
+- [ ] **Consistent usage** - Same term used throughout
+- [ ] **Data team perspective** - Language matches their mental models
+- [ ] **Context provided** - Clarification when terms could be ambiguous
+
+## Quick Reference: Common Replacements
+
+| **Instead of...** | **Use...** | **Context** |
+|------------------|------------|-------------|
+| "Deploy changes" | "Release changes" | Data modifications |
+| "Test your models" | "Validate your models" | Data quality checking |
+| "Environment setup" | "Warehouse connection setup" | Database configuration |
+| "CI/CD pipeline" | "Automated validation workflow" | Recce automation |
+| "Unit tests" | "Model validation checks" | dbt testing |
+| "Production deployment" | "Production release" | Making changes live |
+| "Development environment" | "Development warehouse" | Where you develop |
+| "Code review" | "Change review" | Reviewing data modifications |
+
+---
+
+**Remember**: When in doubt, choose the term that a data analyst (not a software engineer) would immediately understand. Clarity builds trust and reduces barriers to adoption.