Verdict

The first universal judge for Claude Code & Cowork. Auto-evaluates skill and agent execution quality with 7-dimension scoring, configurable rubrics, and dual-mode operation.

Features

Dual-Mode Operation -- Runs automatically via hooks after every skill/agent execution, or on-demand with the /judge command.
Dual-Platform -- Works on both Claude Code and Claude Cowork.
7-Dimension Scoring -- Every execution is scored across correctness, completeness, adherence, actionability, efficiency, safety, and consistency.
Configurable Rubrics -- Ship with sensible defaults; override per-skill or per-team with custom rubric files.
Persistent Scorecards -- All scores are saved to skills/judge/scores/ as JSON for historical tracking and benchmarking.
Blocking on Critical Failures -- Optionally block workflow when a score falls below the configured threshold.

Requirements

Python 3.9+ -- Used by the scoring engine (stdlib only, no pip packages needed)
jq -- JSON parsing in hook scripts (brew install jq / apt-get install jq)
bc -- Float comparison in hook scripts (brew install bc / apt-get install bc)

Quick Start

Install

Option A: From the official Claude Plugin Directory (once listed)

/plugin install verdict

Option B: Direct from GitHub

/plugin marketplace add sattyamjjain/verdict
/plugin install verdict@verdict

Configure

Edit judge-config.json at the project root to control auto-judge behavior, scoring weights, and defaults.

Use

Automatic mode -- After installing, every skill or agent execution that matches the auto_judge.always list is evaluated automatically. No action required.

Manual mode -- Run the /judge slash command to evaluate any prior execution on demand:

/judge                  # Judge the last execution
/judge --rubric strict  # Use a specific rubric
/scorecard              # View cumulative scores
/benchmark              # Run benchmark suite
/judge-config           # View or update configuration

Architecture

.claude-plugin/
  plugin.json             # Plugin manifest
  marketplace.json        # Marketplace listing metadata
skills/judge/
  SKILL.md                # Core skill definition
  scripts/                # Python scoring engine and utilities
  rubrics/                # Rubric definitions (default, code-review, security, etc.)
  scores/                 # Persisted score JSON files
  references/             # Benchmark standards and scoring methodology
agents/
  judge-agent.md          # Autonomous judge agent definition
hooks/
  hooks.json              # Hook definitions for auto-judge
  judge-on-stop.sh        # Stop hook script
  judge-on-subagent-stop.sh # SubagentStop hook script
  common.sh               # Shared hook utilities
commands/
  judge.md                # /judge command
  scorecard.md            # /scorecard command
  benchmark.md            # /benchmark command
  judge-config.md         # /judge-config command
judge-config.json         # Root configuration file
LICENSE                   # MIT license
CHANGELOG.md              # Version history

Scoring System

Each execution is scored on 7 weighted dimensions (total weight = 1.0):

Dimension	Weight	Description
Correctness	0.25	Does the output match expected behavior?
Completeness	0.20	Were all requested tasks addressed?
Adherence	0.15	Does it follow the skill/agent instructions?
Actionability	0.15	Are the results directly usable?
Efficiency	0.10	Was the work done without unnecessary steps?
Safety	0.10	Are there security or correctness risks?
Consistency	0.05	Is the output consistent with prior executions?

The weighted composite score is computed as the dot product of dimension scores (each 0--10) and their weights, yielding a final score on a 0--10 scale.

Verdict Grades

Score Range	Grade
9.5 -- 10.0	A+
9.0 -- 9.4	A
8.5 -- 8.9	A-
8.0 -- 8.4	B+
7.5 -- 7.9	B
7.0 -- 7.4	B-
6.5 -- 6.9	C+
6.0 -- 6.4	C
5.5 -- 5.9	C-
4.0 -- 5.4	D
0.0 -- 3.9	F

Configuration Reference

All configuration lives in judge-config.json:

`auto_judge`

Key	Type	Description
`enabled`	boolean	Master switch for automatic judging.
`always`	string[]	Skill names that are always auto-judged.
`never`	string[]	Skill names that are never auto-judged.
`threshold`	number	Minimum composite score to pass (0--10).
`block_on_critical`	boolean	If true, block workflow when score is below threshold.

`manual_judge`

Key	Type	Description
`default_rubric`	string	Name of the default rubric file to use.
`verbose`	boolean	Show detailed per-dimension breakdown.
`save_scores`	boolean	Persist scores to disk.

`scoring.dimensions`

A map of dimension name to weight (float). Weights must sum to 1.0.

License

MIT -- see LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Verdict

Features

Requirements

Quick Start

Install

Configure

Use

Architecture

Scoring System

Verdict Grades

Configuration Reference

`auto_judge`

`manual_judge`

`scoring.dimensions`

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.claude-plugin		.claude-plugin
agents		agents
commands		commands
hooks		hooks
skills/judge		skills/judge
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
judge-config.json		judge-config.json

Folders and files

Latest commit

History

Repository files navigation

Verdict

Features

Requirements

Quick Start

Install

Configure

Use

Architecture

Scoring System

Verdict Grades

Configuration Reference

auto_judge

manual_judge

scoring.dimensions

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`auto_judge`

`manual_judge`

`scoring.dimensions`

Packages