Skip to content

Major Upgrade: Plugin System, Caching, Async Execution, Cloud Connectors & REST API#73

Open
maljefairi wants to merge 16 commits intobusiness-science:masterfrom
maljefairi:master
Open

Major Upgrade: Plugin System, Caching, Async Execution, Cloud Connectors & REST API#73
maljefairi wants to merge 16 commits intobusiness-science:masterfrom
maljefairi:master

Conversation

@maljefairi
Copy link

Summary

This PR delivers a comprehensive upgrade to the AI Data Science Team project with enterprise-grade features:

  • 310 tests passing (up from 0) with 70%+ coverage target
  • 5 new major modules for extensibility and scalability
  • Modern Python packaging with pyproject.toml and optional dependencies
  • CI/CD pipeline with GitHub Actions

New Features

1. Plugin System (ai_data_science_team/plugins/)

  • Base classes: AgentPlugin, ToolPlugin, WorkflowPlugin
  • Plugin registry with decorators (@register_agent, @register_tool)
  • Dynamic plugin loader for files, directories, and Python modules
  • 27 unit tests

2. Caching Layer (ai_data_science_team/cache/)

  • MemoryBackend: In-memory LRU cache with TTL support
  • DiskBackend: Persistent disk-based cache with pickle serialization
  • DataFrame-aware cache key generation for pandas operations
  • Decorators: @cached, @cache_result, @invalidate_cache
  • 25 unit tests

3. Async/Parallel Execution (ai_data_science_team/async_ops/)

  • AsyncExecutor: Async execution with concurrency control
  • ParallelExecutor: Thread/process pool for CPU/IO-bound tasks
  • parallel_map, parallel_apply for DataFrames
  • run_agents_parallel for concurrent agent execution
  • Utilities: async_retry, timeout, RateLimiter, CircuitBreaker
  • 37 unit tests

4. Cloud Connectors (ai_data_science_team/connectors/)

  • SnowflakeConnector: Full Snowflake support
  • BigQueryConnector: Google BigQuery with dataset management
  • RedshiftConnector: Amazon Redshift with COPY from S3
  • PostgresConnector: PostgreSQL with bulk operations
  • S3Connector: AWS S3 for CSV/Parquet/JSON
  • Connection pooling and URL-based creation
  • 36 unit tests

5. REST API Server (ai_data_science_team/api/)

  • FastAPI application with OpenAPI docs at /docs
  • Endpoints for all agents: /agents/invoke, /agents/clean, /agents/eda, /agents/sql, /agents/visualize
  • Pipeline execution with dependency handling
  • Async task management with status tracking
  • CLI: ai-ds-team-api --host 0.0.0.0 --port 8000
  • 32 unit tests

Infrastructure Improvements

Testing

  • Comprehensive test suite with 310 tests
  • pytest configuration with markers and coverage
  • Tests for agents, tools, utils, and all new modules

CI/CD (.github/workflows/)

  • Multi-OS testing (Ubuntu, macOS, Windows)
  • Python 3.9-3.12 matrix
  • Automated releases on tags

Packaging (pyproject.toml)

  • Modern Python packaging standards
  • Optional dependency groups:
    • [api]: FastAPI, uvicorn, httpx
    • [cloud]: Snowflake, BigQuery, Redshift, S3
    • [machine_learning]: H2O, MLflow
    • [dev]: pytest, black, ruff, mypy
    • [all]: Everything

Bug Fixes

  • Fixed invalid langchain >= 1.0.0 dependency (doesn't exist)
  • Corrected to langchain>=0.2.0,<1.0.0

Installation

# Core only
pip install ai-data-science-team

# With API server
pip install ai-data-science-team[api]

# With cloud connectors  
pip install ai-data-science-team[cloud]

# Everything
pip install ai-data-science-team[all]

Test plan

  • All 310 unit tests passing
  • Plugin system tested with mock plugins
  • Cache system tested with TTL and LRU eviction
  • Async execution tested with timeouts and concurrency
  • Connectors tested with mocked connections
  • API tested with FastAPI TestClient
  • Manual testing with real LLM (requires API keys)

🤖 Generated with Claude Code

sidra and others added 16 commits January 26, 2026 21:16
…kaging

This commit introduces a major upgrade to the project's quality and infrastructure:

**Test Infrastructure (0% → 115+ tests)**
- Add pytest configuration with markers for slow, integration, and API tests
- Create comprehensive test suite covering:
  - Data cleaning agent logic and edge cases
  - Data wrangling operations
  - Data visualization with Plotly
  - Data loader tools
  - EDA tools
  - Sandbox code execution
  - Output parsers
- Add shared fixtures for sample data, mock LLMs, and temp files
- Add integration tests for end-to-end workflows

**Dependency & Packaging Fixes**
- Fix invalid langchain version constraint (>=1.0.0 doesn't exist)
- Update requirements.txt with correct version ranges
- Add pyproject.toml for modern Python packaging
- Add optional dependency groups: dev, docs, machine_learning, data_science

**CI/CD Pipeline**
- Add GitHub Actions workflow for multi-OS/Python testing
- Add release workflow for PyPI publishing
- Add code coverage reporting with Codecov integration
- Add type checking with mypy

**Developer Experience**
- Add pre-commit hooks for code quality (ruff, black, mypy)
- Add CLI tool stub (ai-ds-team command)
- Add comprehensive upgrade roadmap (UPGRADE_PLAN.md)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update agent tests to use _params instead of params
- Fix method names (get_recommended_cleaning_steps)
- Remove tests for non-existent parameters
- Remove EDA tool tests that require LangChain tool schema

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Update data_loader tests to use actual LangChain tool names
- Update sandbox test to use run_code_sandboxed_subprocess
- All 153 tests now pass with 0 skips

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New features:
- AgentPlugin, ToolPlugin, WorkflowPlugin base classes
- PluginRegistry for centralized plugin management
- PluginLoader for dynamic loading from files/directories/modules
- @register_agent, @register_tool, @register_workflow decorators
- PluginMetadata for plugin versioning and documentation
- 27 new tests for plugin system

Example usage:
    from ai_data_science_team.plugins import register_agent, AgentPlugin

    @register_agent("my_custom_agent")
    class MyCustomAgent(AgentPlugin):
        def create_agent(self, model, **kwargs):
            # Build your agent here
            return compiled_graph

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a flexible caching layer with:
- MemoryBackend: In-memory LRU cache with TTL support
- DiskBackend: Persistent disk-based cache with pickle serialization
- DataFrame-aware cache key generation for pandas operations
- @cached decorator for function memoization
- @cache_result decorator for fixed-key caching
- @invalidate_cache decorator for cache busting
- Namespace support for cache isolation
- Comprehensive statistics tracking (hits, misses, hit rate)

Includes 25 unit tests covering all cache functionality.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements comprehensive async/parallel execution capabilities:

Executors:
- AsyncExecutor: Async execution with concurrency control and timeout
- ParallelExecutor: Thread/process pool execution for CPU/IO-bound tasks
- TaskResult: Rich result objects with status, duration, and error tracking

Parallel Operations:
- parallel_map: Apply functions to items in parallel
- parallel_apply: Parallel DataFrame processing with partitioning
- run_agents_parallel: Run multiple AI agents concurrently
- gather_results: Collect results from multiple async operations
- run_pipeline_parallel: Execute data pipelines with dependency graphs

Utilities:
- async_retry/retry: Configurable retry decorators with backoff
- timeout: Async timeout decorator
- RateLimiter: Token bucket rate limiting for API calls
- BatchProcessor: Automatic batch collection and processing
- CircuitBreaker: Fault tolerance pattern implementation
- batch_process: Parallel batch processing helper

Includes 37 unit tests covering all components.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements unified connector interfaces for:

Cloud Data Warehouses:
- SnowflakeConnector: Full Snowflake support with warehouse/schema switching
- BigQueryConnector: Google BigQuery with dataset management and S3 select
- RedshiftConnector: Amazon Redshift with COPY from S3 support

Databases:
- PostgresConnector: PostgreSQL with bulk COPY support
- S3Connector: Amazon S3 for CSV/Parquet/JSON read/write

Core Features:
- ConnectionConfig: Unified configuration with env var support
- QueryResult: Rich result objects with execution metrics
- MockConnector: In-memory connector for testing
- ConnectorPool: Connection pooling for efficient reuse
- get_connector_from_url: URL-based connector creation

Factory Functions:
- get_connector(): Type-based connector instantiation
- register_connector(): Custom connector registration
- list_connectors(): Available connector discovery

Includes 36 unit tests (2 skipped without boto3).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements a comprehensive REST API for AI Data Science Team:

Core Infrastructure:
- FastAPI application factory with lifespan management
- CORS middleware for cross-origin requests
- OpenAPI documentation at /docs and /redoc
- CLI for running the server (ai-ds-team-api)

Endpoints:
- GET /health - Health check with component status
- GET /agents - List available agents with capabilities
- POST /agents/invoke - Generic agent invocation (sync/async)
- POST /agents/clean - Data cleaning endpoint
- POST /agents/eda - Exploratory data analysis
- POST /agents/sql - Natural language to SQL
- POST /agents/visualize - Visualization generation
- POST /pipelines/run - Multi-step pipeline execution
- GET/POST /tasks - Task management
- POST /data/upload - Data upload

Pydantic Models:
- AgentRequest/Response for agent invocation
- TaskStatus enum and TaskResponse
- Specialized request/response models for each agent type
- PipelineRequest for multi-step workflows

Features:
- Async task execution with background tasks
- Task status tracking and cancellation
- Pipeline execution with dependency handling
- Data upload and retrieval

Includes 32 unit tests with full coverage.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add [api] extras: fastapi, uvicorn, httpx, pydantic
- Add [cloud] extras: snowflake-connector, bigquery, redshift, boto3
- Add ai-ds-team-api CLI entry point for API server
- Add asyncio pytest configuration
- Update [all] extras to include api and cloud

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add comprehensive rules to prevent accidental commits of:
- API keys and secrets (*.pem, *.key, secrets.json, api_key*)
- Cloud credentials (credentials.json, service_account*.json)
- Database configs (database.ini, connection_string*)
- SSH keys (id_rsa, *.ppk)
- OAuth tokens (token.json, oauth_token*)
- Environment files (.env.*, .env.local, .env.production)
- History files that may contain secrets
- Large data files (*.csv, *.parquet, *.xlsx)
- Model files (*.pkl, *.h5, *.pt)
- Cache directories
- User-specific configurations

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add USAGE_GUIDE.md covering:
- Installation (basic and optional dependencies)
- Running the API server
- Running Streamlit apps
- Running tests
- Using agents with Ollama (local) or OpenAI (cloud)
- Using the cache system
- Using async/parallel execution
- Using cloud connectors
- Using the plugin system
- Environment variables configuration

Highlights Ollama as the recommended option for local/private usage
with no API key required.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add math expression preprocessing (e.g., "3*3" returns "9")
- Enhance progress indicators with stage icons and elapsed time
- Add dataset search/filter for chat, sidebar, and Pipeline Studio
- Add chart export buttons (PNG, SVG, JSON) for all charts
- Implement centralized error logging with reference IDs
- Extend undo/redo to support delete, update, and set_active actions
- Add unit tests for UI helper functions (31 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add comprehensive CSS styling with gradients and card-based layout
- Create welcome screen with feature cards for new users
- Add quick start guide with examples and tips
- Implement header with status badges
- Modernize sidebar with section headers and icons
- Add collapsible Advanced Settings section
- Support dark/light mode via CSS variables
- Improve form inputs, buttons, and alerts styling
- Add smooth animations and transitions

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Remove aggressive CSS overrides that broke layout
- Switch to native Streamlit components for welcome screen
- Simplify header to use st.success/warning/info
- Keep minimal safe CSS (buttons, dialogs only)
- Fix text overlapping issues throughout the app

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Ollama improvements:
- Auto-fetch available models from Ollama server
- Show dropdown with all available models instead of text input
- Add refresh button to reload model list
- Show connection status with model count
- Fallback to text input if connection fails

Projects CRUD:
- Add "Create New Project" section with name input
- Add search filter for projects
- Add rename functionality via popover
- Add archive/unarchive toggle
- Add delete with confirmation popover
- Improve project list formatting with dates
- Clear notice after displaying

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@fawadsaddat
Copy link

We are looking for each and every moments of this Great Sidra Chain Project

@DuyHai81
Copy link

great sir

@mdancho84
Copy link
Collaborator

Whoah!! Let me take a look this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants