feat: Multi-Agent Council Orchestrator with Codegen Agent API #185

codegen-sh · 2025-12-07T20:45:00Z

🏛️ Multi-Agent Council Orchestrator

Implements a powerful multi-agent collaboration system using the Codegen Agent API, following the patterns from llm-council and OpenAI's Pro Mode.

What This PR Adds

Core Council System (3-stage process):

Stage 1: Parallel Candidate Generation
- Launches N agents × M candidates in parallel
- Each agent/model generates multiple responses
- Tracks agent run IDs and web URLs
Stage 2: Peer Ranking (optional)
- Anonymizes candidate responses (Response A, B, C...)
- Each agent ranks all candidates
- Parses structured rankings with fallback
- Calculates aggregate rankings across all judges
Stage 3: Synthesis
- Simple mode: Combine all candidates in one shot (for <20 candidates)
- Tournament mode: Group → synth groups → synth winners (for 20+ candidates)
- Returns final synthesized answer

Usage

# Basic usage with default models
codegen council run --prompt "How do I optimize Python code?"

# Custom models and candidates
codegen council run \
  --prompt "Explain quantum computing" \
  --models "gpt-4o,claude-3-5-sonnet-20241022,gemini-2.0-flash-exp" \
  --candidates 5

# Skip ranking for faster execution
codegen council run \
  --prompt "What is AI?" \
  --no-ranking

# Use different synthesis model
codegen council run \
  --prompt "Design a system architecture" \
  --synthesis-model "gpt-4o"

Key Features

✅ Codegen Agent API Integration

Uses existing Agent.run() and AgentTask infrastructure
No external API calls - fully integrated with Codegen backend
Respects org/token management from CLI

✅ Parallel Execution

Concurrent agent runs with configurable workers (default: 50)
Progress tracking with status polling
Graceful handling of failed runs

✅ Rich CLI Output

Beautiful tables showing all candidates
Aggregate ranking visualization
Links to all agent run web URLs
Synthesis method and details

✅ Tournament Synthesis

Automatically used for large councils (>20 candidates)
Groups candidates → synth each group → synth winners
Scales to 100+ candidates efficiently

✅ Test Coverage

Unit tests with mocked agent runs
Tests for ranking parsing, synthesis prompts, aggregate calculations
Integration test template included (marked as skip)

Files Added

src/codegen/council/__init__.py - Module exports
src/codegen/council/models.py - Data models (AgentConfig, CouncilConfig, CouncilResult, etc.)
src/codegen/council/orchestrator.py - Core orchestration logic (503 lines)
src/codegen/cli/commands/council/main.py - CLI command implementation
tests/council/test_orchestrator.py - Unit tests

Files Modified

src/codegen/cli/cli.py - Added council_app to main CLI

Architecture Decisions

Codegen Agent API Only (not external providers)
- Reuses existing authentication flow
- Consistent API surface
- Simpler token management
- Can add direct provider calls later if needed
Synchronous with Polling (not async/streaming)
- Matches existing Agent SDK patterns
- Simple to understand and debug
- Can add streaming in follow-up
Structured Prompt Engineering
- Clear ranking format with "FINAL RANKING:" marker
- Robust parsing with regex fallbacks
- Synthesis prompts that discourage meta-commentary

Future Enhancements (not in this PR)

Chain runner for sequential multi-agent workflows
Pre-set agent recipes (PRD, Research, Implement, Test)
Async/streaming execution for real-time progress
Direct provider support (OpenAI, Anthropic, xAI) alongside Codegen API
Web UI / TUI visualization
Checkpoint/resume for long-running councils

Testing

# Run tests
pytest tests/council/test_orchestrator.py -v

# Test CLI help
codegen council --help
codegen council run --help

# Smoke test (requires auth)
codegen council run --prompt "What is 2+2?" --models gpt-4o --candidates 1 --no-ranking

Summary by cubic

Adds a multi-agent council orchestrator using the Codegen Agent API and a new codegen council CLI. It generates candidates in parallel, optionally ranks them, and synthesizes a final answer (simple or tournament) to improve results on complex prompts.

New Features
- 3-stage workflow: parallel candidate generation → anonymous peer ranking → synthesis (simple or tournament).
- CLI command: codegen council run with flags for models, candidates, ranking toggle, synthesis model, org ID, and poll interval.
- Parallel execution with up to 50 workers, status polling, and graceful failure handling.
- Aggregate ranking calculation and CLI output with run IDs and web URLs.
- New data models and orchestrator module, plus unit tests for ranking parsing, synthesis prompts, and aggregation (integration test stub included).
Migration
- No breaking changes. Requires authentication and an org ID (use codegen login or pass --org-id).

^{Written for commit fda8dac. Summary will update automatically on new commits.}

Implements 3-stage council process using Codegen Agent API: - Stage 1: Generate N candidates from multiple models in parallel - Stage 2 (optional): Peer ranking with anonymized evaluation - Stage 3: Synthesis (simple or tournament-based for large councils) Features: - CLI command: codegen council run --prompt ... --models gpt-4o,claude-3-5-sonnet - Full tracking of agent run IDs and web URLs for all stages - Aggregate ranking calculation across all judges - Tests included with mocked agent runs Co-authored-by: Zeeeepa <[email protected]>

coderabbitai · 2025-12-07T20:45:16Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cubic-dev-ai

2 issues found across 7 files

Prompt for AI agents (all 2 issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/codegen/council/orchestrator.py">

<violation number="1" location="src/codegen/council/orchestrator.py:158">
P2: Label generation with `chr(65 + i)` only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won&#39;t match the `[A-Z]` regex pattern in `_parse_ranking_from_text`. Consider using multi-character labels (e.g., AA, AB) for larger councils.</violation>

<violation number="2" location="src/codegen/council/orchestrator.py:336">
P1: Tasks returned from `_launch_parallel_runs` are in completion order (due to `as_completed`), not submission order. When zipped with `run_configs`, this causes incorrect model attribution. Store the config with each task or preserve submission order.</violation>
</file>

_{Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR}

cubic-dev-ai · 2025-12-07T20:49:10Z

src/codegen/council/orchestrator.py

+    ) -> Tuple[List[RankingResult], Dict[str, str]]:
+        """Stage 2: Each agent ranks the anonymized candidates."""
+        # Create anonymous labels (Response A, Response B, etc.)
+        labels = [chr(65 + i) for i in range(len(candidates))]  # A, B, C, ...


P2: Label generation with chr(65 + i) only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won't match the [A-Z] regex pattern in _parse_ranking_from_text. Consider using multi-character labels (e.g., AA, AB) for larger councils.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/codegen/council/orchestrator.py, line 158: <comment>Label generation with `chr(65 + i)` only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won't match the `[A-Z]` regex pattern in `_parse_ranking_from_text`. Consider using multi-character labels (e.g., AA, AB) for larger councils.</comment> <file context> @@ -0,0 +1,504 @@ + ) -> Tuple[List[RankingResult], Dict[str, str]]: + """Stage 2: Each agent ranks the anonymized candidates.""" + # Create anonymous labels (Response A, Response B, etc.) + labels = [chr(65 + i) for i in range(len(candidates))] # A, B, C, ... + label_to_model = { + f"Response {label}": cand.model </file context>

cubic-dev-ai · 2025-12-07T20:49:10Z

src/codegen/council/orchestrator.py

+                future = executor.submit(agent.run, prompt)
+                future_to_config[future] = (model, prompt)
+
+            for future in as_completed(future_to_config):


P1: Tasks returned from _launch_parallel_runs are in completion order (due to as_completed), not submission order. When zipped with run_configs, this causes incorrect model attribution. Store the config with each task or preserve submission order.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/codegen/council/orchestrator.py, line 336: <comment>Tasks returned from `_launch_parallel_runs` are in completion order (due to `as_completed`), not submission order. When zipped with `run_configs`, this causes incorrect model attribution. Store the config with each task or preserve submission order.</comment> <file context> @@ -0,0 +1,504 @@ + future = executor.submit(agent.run, prompt) + future_to_config[future] = (model, prompt) + + for future in as_completed(future_to_config): + try: + task = future.result() </file context>

Co-authored-by: Zeeeepa <[email protected]>

cubic-dev-ai bot reviewed Dec 7, 2025

View reviewed changes

fix: Handle task.result as both string and dict in council orchestrator

fda8dac

Co-authored-by: Zeeeepa <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Multi-Agent Council Orchestrator with Codegen Agent API #185

feat: Multi-Agent Council Orchestrator with Codegen Agent API #185

Uh oh!

codegen-sh bot commented Dec 7, 2025 •

edited by cubic-dev-ai bot

Loading

Uh oh!

coderabbitai bot commented Dec 7, 2025 •

edited

Loading

Review skipped

Other AI code review bot(s) detected

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading

Uh oh!

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Multi-Agent Council Orchestrator with Codegen Agent API #185

Are you sure you want to change the base?

feat: Multi-Agent Council Orchestrator with Codegen Agent API #185

Uh oh!

Conversation

codegen-sh bot commented Dec 7, 2025 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🏛️ Multi-Agent Council Orchestrator

What This PR Adds

Usage

Key Features

Files Added

Files Modified

Architecture Decisions

Future Enhancements (not in this PR)

Testing

Related

Summary by cubic

Uh oh!

coderabbitai bot commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Other AI code review bot(s) detected

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codegen-sh bot commented Dec 7, 2025 •

edited by cubic-dev-ai bot

Loading

coderabbitai bot commented Dec 7, 2025 •

edited

Loading

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading

cubic-dev-ai bot Dec 7, 2025 •

edited

Loading