Python library for building multi-agent systems that automate the machine learning code generation and execution. The system is designed to take CSV files as input along with user questions, generate relevant ML code based on the dataset, execute them automatically, and return results. Unlike typical LLM-based solutions that only simulate answers, this system performs true code execution on the data to produce accurate results, without depending solely on LLM calculations. This approach ensures data privacy, reduced token usage, and efficient query resolution. The library is built to be modular and reusable, allowing anyone to import it as a package and integrate it into their own projects.
- Automated ML Workflow Generation – Takes CSV files and user queries to dynamically generate machine learning pipelines.
- True Code Execution – Executes the generated code on the dataset, ensuring accurate, verifiable results (not just simulated answers).
- Multi-Agent System – Uses specialized agents (e.g., code generator, executor, supervisor) to collaborate and handle different tasks efficiently.
- Data Privacy First – Keeps computation local to avoid exposing sensitive data to third-party services.
- Reduced Token Usage – Minimizes dependency on large language models, saving costs and improving efficiency.
- Reusable & Modular – Can be imported as a Python package and easily integrated into existing projects.
- Query-to-Result Pipeline – Directly answers natural language questions about the data by generating and running ML code.
- Error Handling & Validation – Supervisory agent ensures generated code is debugged and runs without failures.
- Extensible – Developers can plug in new agents, models, or tools to customize workflows.
git clone https://gitlab.mindfire.co.in/dipikad/smart-codegen-ml-agent.git
cd smart-codegen-ml-agent
pip install -e .Here's a basic example of how to use the ML Analysis Agent:
from ml_analysis_agent import MLAnalysisAgent
from ml_analysis_agent.utils.input_helpers import get_user_input
from ml_analysis_agent.config.ml_config import AWSMLConfig
def main():
# Get configuration from user or you can initialize directly token, region, model_name
token, region, model_name = get_user_input()
aws_ml_config = AWSMLCOnfig(aws_token=token, aws_region=region, model_name=model_name)
# Initialize the agent
agent = MLAnalysisAgent(ml_config=aws_ml_config)
try:
# Load your data once
agent.load_data("csv_file_path")
result = agent.ask("user_query")
print(result)
finally:
# Clean up only when you're completely done
agent.cleanup()
if __name__ == "__main__":
main()with MLAnalysisAgent(ml_config=aws_ml_config) as agent:
agent.load_data("data.csv")
result = agent.ask("your question")
print(result)
# Automatic cleanup when exiting contextSet environment variables as per MLConfig child class
# Example for AWS client
export AWS_BEARER_TOKEN_BEDROCK=<token>
export AWS_DEFAULT_REGION=<region> - default: us-west-2
export MODEL_NAME=<model name> - default: us.anthropic.claude-sonnet-4-20250514-v1:0OR set values in .env
AWS_BEARER_TOKEN_BEDROCK=<token>
AWS_DEFAULT_REGION=<region>
MODEL_NAME=<model_name>
from ml_analysis_agent import MLAnalysisAgent
from ml_analysis_agent.config.ml_config import AWSMLConfig
def main():
# Set environment variables as per MLConfig
aws_ml_config = AWSMLConfig()
# Initialize the agent
agent = MLAnalysisAgent(ml_config=aws_ml_config)
try:
# Load your data once
agent.load_data("csv_file_path")
result = agent.ask("user_query")
print(result)
finally:
# Clean up only when you're completely done
agent.cleanup()
if __name__ == "__main__":
main()The package includes a command-line interface for easy interaction after setup environment variables:
# Basic usage
ml-analysis --data your_data.csv --query "your query?"
# Interactive mode (without query)
ml-analysis --data your_data.csv
--data,-d: Path to your data file (CSV)--query,-q: Single query to execute (non-interactive mode)
If you run the CLI without a query, it enters interactive mode where you can:
- Load different data files using the
change-datacommand - Ask multiple questions about your data
- Type
quitor press Ctrl+C to exit
-
MLAnalysisAgent(aws_token, aws_region, model_name)-
Initialize the agent with AWS credentials
-
Parameters:
-
MLConfig:
- Child class of MLConfig
- Should contain
get_llm_modelfunction - returns langchain llm model instance
Example: AWSMLConfig with below arguments
- aws_token: AWS Bedrock token
- aws_region: AWS region (e.g., "us-west-2")
- model_name: model_name
-
-
load_data(filepath)- Load your dataset
- Parameters:
- filepath: Path to your CSV file
-
ask(question)- Ask questions about your data to generate and run ML analysis
- Parameters:
- question: Your question in natural language
- Returns: Analysis results and predictions
-
cleanup()- Clean up temporary files and resources
- Call this when you're done using the agent
-
Data Analysis
- Descriptive statistics
- Data exploration
- Basic visualization code generation
- Feature analysis
-
Predictive Modeling
- Regression problems
- Basic classification tasks
- Feature importance analysis
- Model generation and evaluation
-
Code Generation
- Automated ML code creation
- Data preprocessing scripts
- Model training code
- Prediction generation
You can ask questions like:
- "What will be the price of a house with 2,000 sq.ft. area?"
- "Can you predict the price of a house with 3 bedrooms and 2 bathrooms?"
- "What will be the estimated price of a 10-year-old house with 1,800 sq.ft. Area?"