Smart CodeGen ML Agent

Python library for building multi-agent systems that automate the machine learning code generation and execution. The system is designed to take CSV files as input along with user questions, generate relevant ML code based on the dataset, execute them automatically, and return results. Unlike typical LLM-based solutions that only simulate answers, this system performs true code execution on the data to produce accurate results, without depending solely on LLM calculations. This approach ensures data privacy, reduced token usage, and efficient query resolution. The library is built to be modular and reusable, allowing anyone to import it as a package and integrate it into their own projects.

Features

Automated ML Workflow Generation – Takes CSV files and user queries to dynamically generate machine learning pipelines.
True Code Execution – Executes the generated code on the dataset, ensuring accurate, verifiable results (not just simulated answers).
Multi-Agent System – Uses specialized agents (e.g., code generator, executor, supervisor) to collaborate and handle different tasks efficiently.
Data Privacy First – Keeps computation local to avoid exposing sensitive data to third-party services.
Reduced Token Usage – Minimizes dependency on large language models, saving costs and improving efficiency.
Reusable & Modular – Can be imported as a Python package and easily integrated into existing projects.
Query-to-Result Pipeline – Directly answers natural language questions about the data by generating and running ML code.
Error Handling & Validation – Supervisory agent ensures generated code is debugged and runs without failures.
Extensible – Developers can plug in new agents, models, or tools to customize workflows.

Installation

From Source

git clone https://gitlab.mindfire.co.in/dipikad/smart-codegen-ml-agent.git
cd smart-codegen-ml-agent
pip install -e .

Quick Start

Here's a basic example of how to use the ML Analysis Agent:

from ml_analysis_agent import MLAnalysisAgent
from ml_analysis_agent.utils.input_helpers import get_user_input
from ml_analysis_agent.config.ml_config import AWSMLConfig

def main():
    # Get configuration from user or you can initialize directly token, region, model_name
    token, region, model_name = get_user_input()

    aws_ml_config = AWSMLCOnfig(aws_token=token, aws_region=region, model_name=model_name)
    
    # Initialize the agent
    agent = MLAnalysisAgent(ml_config=aws_ml_config)
    try:
        # Load your data once
        agent.load_data("csv_file_path")
        result = agent.ask("user_query")
        print(result)
    finally:
        # Clean up only when you're completely done
        agent.cleanup()

if __name__ == "__main__":
    main()

Using Context Manager (Automatic Cleanup)

with MLAnalysisAgent(ml_config=aws_ml_config) as agent:
    agent.load_data("data.csv")
    result = agent.ask("your question")
    print(result)
# Automatic cleanup when exiting context

Using Environment Variables

Set environment variables as per MLConfig child class

# Example for AWS client 
export AWS_BEARER_TOKEN_BEDROCK=<token>
export AWS_DEFAULT_REGION=<region> - default: us-west-2
export MODEL_NAME=<model name> - default: us.anthropic.claude-sonnet-4-20250514-v1:0

OR set values in .env

AWS_BEARER_TOKEN_BEDROCK=<token>
AWS_DEFAULT_REGION=<region>
MODEL_NAME=<model_name>

from ml_analysis_agent import MLAnalysisAgent
from ml_analysis_agent.config.ml_config import AWSMLConfig

def main():
    # Set environment variables as per MLConfig
    aws_ml_config = AWSMLConfig()
    
    # Initialize the agent
    agent = MLAnalysisAgent(ml_config=aws_ml_config)
    try:
        # Load your data once
        agent.load_data("csv_file_path")
        result = agent.ask("user_query")
        print(result)
    finally:
        # Clean up only when you're completely done
        agent.cleanup()

if __name__ == "__main__":
    main()

Command Line Interface (CLI)

The package includes a command-line interface for easy interaction after setup environment variables:

# Basic usage
ml-analysis --data your_data.csv --query "your query?"

# Interactive mode (without query)
ml-analysis --data your_data.csv

CLI Options

--data, -d: Path to your data file (CSV)
--query, -q: Single query to execute (non-interactive mode)

Interactive Mode

If you run the CLI without a query, it enters interactive mode where you can:

Load different data files using the change-data command
Ask multiple questions about your data
Type quit or press Ctrl+C to exit

Documentation

Key Methods

MLAnalysisAgent(aws_token, aws_region, model_name)
- Initialize the agent with AWS credentials
- Parameters:
- MLConfig:
  - Child class of MLConfig
  - Should contain get_llm_model function - returns langchain llm model instance
  Example: AWSMLConfig with below arguments
  - aws_token: AWS Bedrock token
  - aws_region: AWS region (e.g., "us-west-2")
  - model_name: model_name
load_data(filepath)
- Load your dataset
- Parameters:
  - filepath: Path to your CSV file
ask(question)
- Ask questions about your data to generate and run ML analysis
- Parameters:
  - question: Your question in natural language
- Returns: Analysis results and predictions
cleanup()
- Clean up temporary files and resources
- Call this when you're done using the agent

Supported ML Tasks

Data Analysis
- Descriptive statistics
- Data exploration
- Basic visualization code generation
- Feature analysis
Predictive Modeling
- Regression problems
- Basic classification tasks
- Feature importance analysis
- Model generation and evaluation
Code Generation
- Automated ML code creation
- Data preprocessing scripts
- Model training code
- Prediction generation

Example Questions

You can ask questions like:

"What will be the price of a house with 2,000 sq.ft. area?"
"Can you predict the price of a house with 3 bedrooms and 2 bathrooms?"
"What will be the estimated price of a 10-year-old house with 1,800 sq.ft. Area?"

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
ml_analysis_agent		ml_analysis_agent
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
MANIFEST.in		MANIFEST.in
README.md		README.md
TODO.md		TODO.md
env.example		env.example
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart CodeGen ML Agent

Features

Installation

From Source

Quick Start

Using Context Manager (Automatic Cleanup)

Using Environment Variables

Command Line Interface (CLI)

CLI Options

Interactive Mode

Documentation

Key Methods

Supported ML Tasks

Example Questions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Smart CodeGen ML Agent

Features

Installation

From Source

Quick Start

Using Context Manager (Automatic Cleanup)

Using Environment Variables

Command Line Interface (CLI)

CLI Options

Interactive Mode

Documentation

Key Methods

Supported ML Tasks

Example Questions

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages