The Jedi Council – Unified LLM Prompting & Benchmarking Framework

“When in doubt, consult the Council.”

This project provides a clean, extensible framework for interacting with multiple LLMs (OpenAI, Anthropic, Mistral, Gemini) through a unified interface. It supports structured responses, token usage tracking, cost estimation, retry logic, and easy extensibility for adding more providers like LLaMA or Cohere.

Features

✅ Unified interface for calling different LLMs (GPT, Claude, Mistral, Gemini)
✅ Automatic retry with exponential backoff
✅ Cost estimation based on provider pricing
✅ Structured logging + latency tracking
✅ Extensible to support more LLM providers
CLI runner, streaming, parallelism (coming soon)

Installation

1. Clone and install in editable mode

git clone https://github.com/yourusername/jedi-council.git
cd jedi-council
pip install -e .

This uses the pyproject.toml to install dependencies and allows live development.

Set up your API keys

Create a .env file in the root folder:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
MISTRAL_API_KEY=sk-mistral-...
GOOGLE_API_KEY=AIza...

Make sure you have a .env loader if you’re running scripts directly. Otherwise, export env vars before execution.

Inspiration

This tool evolved from a research pipeline designed to automate UI tasks using LLMs. To benchmark multiple models across tasks with traceable cost, latency, and output format — a centralized interface was essential.

Usage

Import the wrapper

from jedi_council.core import TheJediCouncil

Initialize a model
```
council = TheJediCouncil(model="gpt-4o")
```
✅ Supported models:
- OpenAI: gpt-4o, gpt-4, gpt-3.5-turbo
- Anthropic: claude-3-opus-20240229, claude-3-sonnet-20240229, claude-3-haiku-20240307
- Mistral: mistral-small-latest, mistral-7b, mistral-large-latest
- Gemini: gemini-1.5-pro-latest, gemini-1.0-pro

Send your first message

response = council.get_wisdom("What is the capital of Naboo?")
print(response.text)

See structured metadata and cost

# See structured metadata and cost
print("Wisdom:", response.text)
print("Usage:", f"{response.usage.input_tokens} in, {response.usage.output_tokens} out")
print("Cost: $", f"{response.usage.cost:.4f}")
print("Latency:", f"{int(response.latency_ms)}ms")

⚙️ Custom Configuration

You can pass system prompts, temperature, and more:

council = TheJediCouncil(
  model="gpt-4o",
  system_prompt="You are a wise Jedi master.",
  temperature=0.2,
)

Example Output

Wisdom: The philosophy behind the Jedi Code is rooted in principles of peace, self-discipline, and harmony with the Force...
Usage: 16 in, 323 out  
Cost: $ 0.0049  
Latency: 9981ms

Example Script

Try running the following to test all configured models:

python example.py

🧰 CLI Prompt Runner (Beta)

Try this to query a model from terminal:

python run_prompt.py --model gpt-4o --prompt "Tell me a Yoda quote"

Benchmarking Support

To compare model performance across real tasks, you can run:

python benchmark/benchmarking_suite.py

This will:

Run a suite of predefined tasks across all available LLMs
Log model outputs, token usage, latency, and cost
Save detailed results in logs/benchmark_results.csv

You can analyze the results using pandas or any visualization tool:

import pandas as pd

df = pd.read_csv("logs/benchmark_results.csv")
print(df.groupby("model")["cost"].mean())

You can also track performance by task category or sort by latency:

df.groupby(["model", "task_name"])["latency_ms"].mean().unstack().plot(kind="bar")

Sample Benchmarking Output

Here's a sample summary of average latency (in ms) for different models across task categories:

Task	GPT-4o	Claude-3	Gemini	Mistral
Ambiguity Resolution	13528	4175	4390	5308
Code Generation	17870	4091	3741	10309
Logical Reasoning	13463	1630	1890	4002
Summarization	10132	814	608	799
Time Zone Reasoning	11085	1019	777	1712

This table highlights latency performance trends across various LLMs for core reasoning and generation tasks.

Roadmap

Dry-run support (simulate requests without actual calls)

CLI prompt runner:

python run_prompt.py --model gpt-4o --prompt "Say hello"

Streaming & parallel inference
LLaMA and Cohere provider integration
Output visualizations (token bar, cost heatmap, model latency ranking) ✅ CSV logging added

🤝 Contributing

We welcome contributions from Jedi and Padawans alike! If you have ideas for new features, improvements, or new LLM integrations, feel free to open an issue or a pull request.

How to Contribute

⭐ Star this repository to support the project.

Fork the repo and create a new branch:

git checkout -b feature/your-feature-name

Write clear, tested code and include helpful commit messages.
Make sure all tests pass: 🧪 Running Tests

Run unit tests locally:

pytest --cov=jedi_council --cov-report=term-missing

Make sure new features include test coverage.

Submit a pull request and describe your changes in detail.

May the source be with you.

🙏 Acknowledgements

Built by Avdhoot Patil as part of advanced LLM experimentation and research.
Inspired by research needs in UI automation, prompt optimization, and agent evaluation.
Big thanks to contributors and the open-source community for tools like openai, anthropic, google-generativeai, and mistralai.

👾 Community

Join the Jedi Dev Discord: [coming soon]

📜 License

This project is licensed under the MIT License — use freely, contribute kindly.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
benchmark_runs		benchmark_runs
benchmarking		benchmarking
jedi_council		jedi_council
notebooks		notebooks
tests		tests
.coverage		.coverage
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.cast		demo.cast
example.py		example.py
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Jedi Council – Unified LLM Prompting & Benchmarking Framework

Features

Installation

1. Clone and install in editable mode

Inspiration

Usage

⚙️ Custom Configuration

Example Output

Example Script

🧰 CLI Prompt Runner (Beta)

Benchmarking Support

Sample Benchmarking Output

Roadmap

🤝 Contributing

How to Contribute

🙏 Acknowledgements

👾 Community

📜 License

About

Uh oh!

Packages

Languages

License

avdhoot0303/jedi-llm-gateway

Folders and files

Latest commit

History

Repository files navigation

The Jedi Council – Unified LLM Prompting & Benchmarking Framework

Features

Installation

1. Clone and install in editable mode

Inspiration

Usage

⚙️ Custom Configuration

Example Output

Example Script

🧰 CLI Prompt Runner (Beta)

Benchmarking Support

Sample Benchmarking Output

Roadmap

🤝 Contributing

How to Contribute

🙏 Acknowledgements

👾 Community

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages