Inference-time scaling for LLMs-as-a-judge.
-
Updated
Nov 5, 2025 - Jupyter Notebook
Inference-time scaling for LLMs-as-a-judge.
An end-to-end AI agent project that transcribes audio files, embeds user queries, and searches in Qdrant and web browser via the Brave API. A Streamlit interface powered by OpenAI GPT models delivers actionable health insights from both the archive and the latest research.
🤖 A conversational chatbot powered by Meta-Llama-3-8B via HuggingFace API, with TrustGuard safety validation using an LLM-as-Judge.
Prompt Design & LLM Judge
A Streamlit web app that uses a Groq-powered LLM (Llama 3) to act as an impartial judge for evaluating and comparing two model outputs. Supports custom criteria, presets like creativity and brand tone, and returns structured scores, explanations, and a winner. Built end-to-end with Python, Groq API, and Streamlit.
Extensible benchmarking suite for evaluating AI coding agents on web search tasks. Compare native search vs MCP servers (You.com, expanding) across multiple agents (Claude Code, Gemini, Droid, Codex, expanding) with automated Docker workflows and statistical analysis.
StructAI offers a robust toolkit for LLM interaction—such as structured outputs, context management, and parallel execution.
Pondera is a lightweight, YAML-first framework to evaluate AI models and agents with pluggable runners and an LLM-as-a-judge.
Process-level rubric-based reward engine for Code Agent trajectories. CLI + MCP ready.
Agent QA Mentor: an agentic QA pipeline that evaluates tool-using AI agent trajectories (scores, issue codes, safety/hallucination detection), rewrites prompts with targeted fixes, and stores long-term memory for continuous improvement—plus a CI-style eval gate and demo notebook.
LLM-as-a-Judge system for rubric-based, explainable evaluation of large language model outputs.
Red-team framework for discovering alignment failures in frontier language models.
OpenJudges is an interactive CLI tool that uses LLMs as judges to evaluate AI responses against specific criteria
Add a description, image, and links to the llm-judge topic page so that developers can more easily learn about it.
To associate your repository with the llm-judge topic, visit your repo's landing page and select "manage topics."