github
diff --git a/‎ARCHITECTURE.md‎
Lines changed: 1 addition & 335 deletions b/‎ARCHITECTURE.md‎
Lines changed: 1 addition & 335 deletions
@@ -56,17 +56,6 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 - Handle default values and required parameters
 - Support multiple authentication methods
 
-**Key Classes:**
-
-- `EnvVars`: Immutable configuration object
-- Helper functions for type conversion and validation
-
-**Design Patterns:**
-
-- Configuration Object Pattern
-- Builder Pattern (for environment variable parsing)
-- Validation Chain Pattern
-
 ### 2. Authentication Manager (`auth.py`)
 
 **Responsibilities:**
@@ -76,16 +65,6 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 - Manage JWT tokens for GitHub App authentication
 - Provide unified authentication interface
 
-**Key Functions:**
-
-- `auth_to_github()`: Main authentication orchestrator
-- `get_github_app_installation_token()`: JWT token exchange
-
-**Design Patterns:**
-
-- Strategy Pattern (for different authentication methods)
-- Factory Pattern (for creating GitHub clients)
-
 ### 3. Core Analysis Engine (`measure_innersource.py`)
 
 **Responsibilities:**
@@ -95,19 +74,6 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 - Manage chunked processing for large repositories
 - Calculate InnerSource metrics and ratios
 
-**Key Algorithms:**
-
-- Team boundary detection
-- Contribution aggregation
-- Chunked data processing
-- Progress tracking and error handling
-
-**Design Patterns:**
-
-- Pipeline Pattern (for staged processing)
-- Iterator Pattern (for chunked processing)
-- Observer Pattern (for progress tracking)
-
 ### 4. Report Generation (`markdown_writer.py`)
 
 **Responsibilities:**
@@ -117,11 +83,6 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 - Handle edge cases and missing data
 - Provide consistent report structure
 
-**Design Patterns:**
-
-- Template Method Pattern
-- Null Object Pattern (for handling missing data)
-
 ### 5. Utility Functions (`markdown_helpers.py`)
 
 **Responsibilities:**
@@ -130,11 +91,6 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 - Split large files intelligently
 - Preserve content integrity during splits
 
-**Design Patterns:**
-
-- Strategy Pattern (for file splitting)
-- Utility/Helper Pattern
-
 ## Data Flow Architecture
 
 ### Processing Pipeline
@@ -215,190 +171,7 @@ The InnerSource Measurement Tool is designed to analyze GitHub repositories and
 └─────────────────┘
 ```
 
-### Data Structures
-
-#### Configuration Data Flow
-
-```python
-Environment Variables → EnvVars Object → Component Configuration
-
-Examples:
-GH_TOKEN → env_vars.gh_token → auth_to_github(token=...)
-REPOSITORY → env_vars.owner, env_vars.repo → github_connection.repository(owner, repo)
-CHUNK_SIZE → env_vars.chunk_size → chunked_processing(chunk_size=...)
-```
-
-#### Analysis Data Flow
-
-```python
-GitHub API → Raw Data → Processed Data → Aggregated Results → Report
-
-Examples:
-repo.commits() → commit_list → commit_author_counts → contribution_totals → markdown_report
-repo.pull_requests() → pr_list → pr_author_counts → innersource_metrics → formatted_output
-repo.issues() → issue_list → issue_author_counts → team_analysis → final_report
-```
-
-## Key Algorithms
-
-### Team Boundary Detection Algorithm
-
-The team boundary detection algorithm is central to the tool's functionality:
-
-```python
-def detect_team_boundaries(original_author: str, org_data: dict) -> set:
-    """
-    Detect team boundaries using organizational hierarchy
-
-    Algorithm:
-    1. Start with original commit author
-    2. Add their direct manager
-    3. Add all peers (people with same manager)
-    4. Recursively add anyone who reports to team members
-    5. Continue until no new members are found
-    """
-    team_members = {original_author}
-
-    # Add original author's manager
-    if original_author in org_data:
-        manager = org_data[original_author]["manager"]
-        team_members.add(manager)
-
-        # Add all peers (same manager)
-        for user, data in org_data.items():
-            if data["manager"] == manager:
-                team_members.add(user)
-
-    # Recursive expansion
-    changed = True
-    while changed:
-        changed = False
-        initial_size = len(team_members)
-
-        # Add anyone who reports to current team members
-        for user, data in org_data.items():
-            if data["manager"] in team_members:
-                team_members.add(user)
-
-        # Check if we added anyone new
-        changed = len(team_members) > initial_size
-
-    return team_members
-```
-
-### Chunked Processing Algorithm
-
-For memory efficiency with large repositories:
-
-```python
-def process_in_chunks(iterator, chunk_size: int, processor_func):
-    """
-    Process large datasets in memory-efficient chunks
-
-    Benefits:
-    - Prevents memory overflow
-    - Provides progress feedback
-    - Allows for configurable memory usage
-    - Handles API rate limiting gracefully
-    """
-    results = {}
-    total_processed = 0
-
-    while True:
-        # Collect chunk
-        chunk = []
-        for _ in range(chunk_size):
-            try:
-                chunk.append(next(iterator))
-            except StopIteration:
-                break
-
-        if not chunk:
-            break
-
-        # Process chunk
-        chunk_results = processor_func(chunk)
-
-        # Merge results
-        for key, value in chunk_results.items():
-            results[key] = results.get(key, 0) + value
-
-        total_processed += len(chunk)
-        print(f"Processed {total_processed} items...")
-
-    return results
-```
-
-### Contribution Aggregation Algorithm
-
-```python
-def aggregate_contributions(commit_counts, pr_counts, issue_counts):
-    """
-    Aggregate different types of contributions
-
-    Combines:
-    - Commit authorship
-    - Pull request creation
-    - Issue creation
-
-    Returns unified contribution counts per user
-    """
-    all_users = set(commit_counts.keys()) | set(pr_counts.keys()) | set(issue_counts.keys())
-
-    aggregated = {}
-    for user in all_users:
-        aggregated[user] = (
-            commit_counts.get(user, 0) +
-            pr_counts.get(user, 0) +
-            issue_counts.get(user, 0)
-        )
-
-    return aggregated
-```
-
-## Performance Considerations
-
-### Memory Management
-
-1. **Chunked Processing**: Large datasets are never loaded entirely into memory
-2. **Lazy Evaluation**: Use GitHub API iterators instead of loading full lists
-3. **Result Streaming**: Process and aggregate results incrementally
-4. **Garbage Collection**: Explicitly manage object lifecycles for large datasets
-
-### API Rate Limiting
-
-1. **Authentication Strategy**: Use GitHub App tokens for higher limits (5,000/hour vs 1,000/hour)
-2. **Request Batching**: Minimize API calls through efficient query patterns
-3. **Respectful Processing**: Honor rate limits and provide backoff mechanisms
-4. **Progress Tracking**: Provide feedback during long-running operations
-
-### Scalability Patterns
-
-1. **Horizontal Scaling**: Tool can be run on multiple repositories simultaneously
-2. **Configurable Resources**: Chunk size can be adjusted based on available memory
-3. **Incremental Processing**: Future enhancement for processing only recent changes
-4. **Caching Strategy**: Store intermediate results to avoid reprocessing
-
-## Error Handling Strategy
-
-### Graceful Degradation
-
-```python
-def handle_api_errors(func):
-    """
-    Decorator for graceful API error handling
-    """
-    def wrapper(*args, **kwargs):
-        try:
-            return func(*args, **kwargs)
-        except requests.exceptions.RequestException as e:
-            print(f"API request failed: {e}")
-            return None
-        except Exception as e:
-            print(f"Unexpected error: {e}")
-            return None
-    return wrapper
-```
+## Error Handling
 
 ### Error Categories
 
@@ -408,110 +181,3 @@ def handle_api_errors(func):
 4. **Data Errors**: Missing org-data.json or invalid format
 5. **Processing Errors**: Unexpected data structures or edge cases
 
-## Testing Strategy
-
-### Test Architecture
-
-```
-┌─────────────────────────────────────────────────────────────────┐
-│                        Test Suite                               │
-│                                                                 │
-│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │
-│  │    Unit     │  │ Integration │  │  End-to-End │  │  Edge   │ │
-│  │    Tests    │  │    Tests    │  │    Tests    │  │ Cases   │ │
-│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │
-│                                                                 │
-│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────┐ │
-│  │Performance  │  │ Security    │  │ Reliability │  │ Mock    │ │
-│  │   Tests     │  │   Tests     │  │   Tests     │  │ Tests   │ │
-│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────┘ │
-└─────────────────────────────────────────────────────────────────┘
-```
-
-### Test Coverage Strategy
-
-1. **Unit Tests**: Each function and class method
-2. **Integration Tests**: Component interactions
-3. **Configuration Tests**: Environment variable handling
-4. **Mock Tests**: GitHub API interactions
-5. **Edge Case Tests**: Empty repositories, missing data, error conditions
-
-## Security Considerations
-
-### Authentication Security
-
-1. **Token Management**: Never log or expose authentication tokens
-2. **Permission Principle**: Use minimum required permissions
-3. **Token Rotation**: Support for token refresh and rotation
-4. **Secure Storage**: Environment variables for sensitive data
-
-### Data Privacy
-
-1. **No Data Persistence**: Tool doesn't store user data beyond processing
-2. **Minimal Data Access**: Only accesses necessary repository information
-3. **User Privacy**: Respects GitHub's privacy settings and permissions
-4. **Audit Trail**: Provides logs for security auditing
-
-## Future Enhancements
-
-### Planned Features
-
-1. **Incremental Processing**: Process only recent changes
-2. **Historical Analysis**: Track InnerSource trends over time
-3. **Additional Metrics**: More sophisticated collaboration measurements
-4. **Multiple Platforms**: Support for GitLab, Bitbucket, etc.
-5. **Real-time Processing**: Webhook-based analysis
-
-### Architecture Extensions
-
-1. **Plugin System**: Allow custom analysis algorithms
-2. **Database Integration**: Store historical data for trending
-3. **API Interface**: REST API for programmatic access
-4. **Dashboard UI**: Web interface for visualization
-5. **Notification System**: Alerts for significant changes
-
-## Deployment Architecture
-
-### Container Strategy
-
-```dockerfile
-# Multi-stage build for optimization
-FROM python:3.10-slim as builder
-WORKDIR /app
-COPY requirements.txt .
-RUN pip install --no-cache-dir -r requirements.txt
-
-FROM python:3.10-slim
-WORKDIR /app
-COPY --from=builder /usr/local/lib/python3.10/site-packages /usr/local/lib/python3.10/site-packages
-COPY . .
-CMD ["python", "measure_innersource.py"]
-```
-
-### GitHub Actions Integration
-
-The tool is designed to run seamlessly in GitHub Actions with:
-
-- Minimal resource requirements
-- Efficient processing patterns
-- Clear progress reporting
-- Graceful error handling
-
-## Monitoring and Observability
-
-### Metrics Collection
-
-1. **Processing Time**: Track analysis duration
-2. **Memory Usage**: Monitor resource consumption
-3. **API Usage**: Track rate limit consumption
-4. **Error Rates**: Monitor failure patterns
-5. **Success Metrics**: Track successful analyses
-
-### Logging Strategy
-
-1. **Structured Logging**: JSON format for machine processing
-2. **Log Levels**: DEBUG, INFO, WARNING, ERROR
-3. **Contextual Information**: Include request IDs and user context
-4. **No Sensitive Data**: Sanitize logs of tokens and personal information
-
-This architecture provides a solid foundation for the InnerSource measurement tool while maintaining flexibility for future enhancements and scaling requirements.