Here's a professional, well-structured README.md
for your project that utilizes Camel-AI, OpenAI, and Ollama to select and rank the best LLM for user tasks.
A unified AI orchestration framework that selects the best large language model (LLM) for a given user task using Camel-AI, OpenAI API, and Ollama. It evaluates, ranks, and logs LLM performance over time to dynamically improve routing for future tasks.
- ✅ Dynamic Task Understanding: Automatically identifies task type (e.g., summarization, coding, translation).
- 🧠 Model Selection & Routing: Uses Camel-AI to choose between models from OpenAI, Ollama, and others.
- 📊 Ranking & Feedback Loop: Tracks model performance per task type to rank them for future requests.
- ⚙️ Plug-and-Play Model Management: Seamlessly integrates local (Ollama) and cloud-based models (OpenAI).
- 🔁 Self-Optimizing: Learns which model performs best for specific task categories over time.
User Task
│
└─► Task Analyzer (Camel-AI)
│
├─► Model Selector
│ ├─ OpenAI GPT-4 / GPT-4o
│ ├─ Ollama (Mistral, LLaMA, etc.)
│ └─ Other plug-in models
│
└─► Execution Engine → Response
│
└─► Ranking & Logging Module
- Camel-AI – Agent framework for task decomposition and model orchestration.
- OpenAI API – Access to top-tier GPT models.
- Ollama – Local model runner for LLaMA, Mistral, etc.
- LangChain (optional) – Chaining tasks and memory.
- Python – Core implementation.
git clone https://github.com/Dhiraj309/Smart-AI-Agent.git
cd task_classifier_project
pip install -r requirements.txt
ollama serve
ollama pull mistral
ollama pull llama3
Create a .env
file:
OPENAI_API_KEY=your-openai-key
python main.py
from router import route_task
result = route_task("Translate this English paragraph to French.")
print(result)
Sample output:
{
"best_model": "openai:gpt-4o",
"response": "Voici le paragraphe traduit...",
"alternatives": {
"ollama:mistral": "Voici...",
"ollama:llama3": "Ceci est..."
},
"ranking_log": "Logged performance and accuracy score."
}
-
Metrics include:
- Accuracy (via eval benchmarks)
- Response time
- User feedback (thumbs up/down)
-
Rankings stored in a local DB (
sqlite
,json
, orPostgres
) -
Adaptive learning: Next task considers ranking history.
│ benchmark_summary.csv
│ Llama3ModelFile
│ main.py
│ requirements.txt
│ setup_llama3.sh
│
├───.ipynb_checkpoints
│ benchmark_log-checkpoint.jsonl
│ benchmark_summary-checkpoint.csv
│ Llama3ModelFile-checkpoint
│ main-checkpoint.py
│ setup_llama3-checkpoint.sh
│
├───agents
│ │ critic_agent.py
│ │ judge_agent.py
│ │ task_agent.py
│ │
│ └───.ipynb_checkpoints
│ critic_agent-checkpoint.py
│ judge_agent-checkpoint.py
│ task_agent-checkpoint.py
│
├───app
│ chat_app.py
│ pipeline.py
│
├───benchmark
│ │ evaluator.py
│ │
│ └───.ipynb_checkpoints
│ evaluator-checkpoint.py
│
├───models
│ │ runner.py
│ │
│ └───.ipynb_checkpoints
│ runner-checkpoint.py
│
└───utils
| │ benchmark_utils.py
| │ logger.py
| │
| └───.ipynb_checkpoints
| logger-checkpoint.py
└── README.md
- Add LangChain-style memory and retrieval
- Web UI dashboard for ranking visualization
- Fine-tuning feedback weights per user
- Model cost analysis for budget-sensitive routing
Pull requests are welcome! For major changes, open an issue first to discuss.