Skip to content

Conversation

@codegen-sh
Copy link

@codegen-sh codegen-sh bot commented Dec 7, 2025

🏛️ Multi-Agent Council Orchestrator

Implements a powerful multi-agent collaboration system using the Codegen Agent API, following the patterns from llm-council and OpenAI's Pro Mode.

What This PR Adds

Core Council System (3-stage process):

  1. Stage 1: Parallel Candidate Generation

    • Launches N agents × M candidates in parallel
    • Each agent/model generates multiple responses
    • Tracks agent run IDs and web URLs
  2. Stage 2: Peer Ranking (optional)

    • Anonymizes candidate responses (Response A, B, C...)
    • Each agent ranks all candidates
    • Parses structured rankings with fallback
    • Calculates aggregate rankings across all judges
  3. Stage 3: Synthesis

    • Simple mode: Combine all candidates in one shot (for <20 candidates)
    • Tournament mode: Group → synth groups → synth winners (for 20+ candidates)
    • Returns final synthesized answer

Usage

# Basic usage with default models
codegen council run --prompt "How do I optimize Python code?"

# Custom models and candidates
codegen council run \
  --prompt "Explain quantum computing" \
  --models "gpt-4o,claude-3-5-sonnet-20241022,gemini-2.0-flash-exp" \
  --candidates 5

# Skip ranking for faster execution
codegen council run \
  --prompt "What is AI?" \
  --no-ranking

# Use different synthesis model
codegen council run \
  --prompt "Design a system architecture" \
  --synthesis-model "gpt-4o"

Key Features

Codegen Agent API Integration

  • Uses existing Agent.run() and AgentTask infrastructure
  • No external API calls - fully integrated with Codegen backend
  • Respects org/token management from CLI

Parallel Execution

  • Concurrent agent runs with configurable workers (default: 50)
  • Progress tracking with status polling
  • Graceful handling of failed runs

Rich CLI Output

  • Beautiful tables showing all candidates
  • Aggregate ranking visualization
  • Links to all agent run web URLs
  • Synthesis method and details

Tournament Synthesis

  • Automatically used for large councils (>20 candidates)
  • Groups candidates → synth each group → synth winners
  • Scales to 100+ candidates efficiently

Test Coverage

  • Unit tests with mocked agent runs
  • Tests for ranking parsing, synthesis prompts, aggregate calculations
  • Integration test template included (marked as skip)

Files Added

  • src/codegen/council/__init__.py - Module exports
  • src/codegen/council/models.py - Data models (AgentConfig, CouncilConfig, CouncilResult, etc.)
  • src/codegen/council/orchestrator.py - Core orchestration logic (503 lines)
  • src/codegen/cli/commands/council/main.py - CLI command implementation
  • tests/council/test_orchestrator.py - Unit tests

Files Modified

  • src/codegen/cli/cli.py - Added council_app to main CLI

Architecture Decisions

  1. Codegen Agent API Only (not external providers)

    • Reuses existing authentication flow
    • Consistent API surface
    • Simpler token management
    • Can add direct provider calls later if needed
  2. Synchronous with Polling (not async/streaming)

    • Matches existing Agent SDK patterns
    • Simple to understand and debug
    • Can add streaming in follow-up
  3. Structured Prompt Engineering

    • Clear ranking format with "FINAL RANKING:" marker
    • Robust parsing with regex fallbacks
    • Synthesis prompts that discourage meta-commentary

Future Enhancements (not in this PR)

  • Chain runner for sequential multi-agent workflows
  • Pre-set agent recipes (PRD, Research, Implement, Test)
  • Async/streaming execution for real-time progress
  • Direct provider support (OpenAI, Anthropic, xAI) alongside Codegen API
  • Web UI / TUI visualization
  • Checkpoint/resume for long-running councils

Testing

# Run tests
pytest tests/council/test_orchestrator.py -v

# Test CLI help
codegen council --help
codegen council run --help

# Smoke test (requires auth)
codegen council run --prompt "What is 2+2?" --models gpt-4o --candidates 1 --no-ranking

Related

Based on patterns from:

  • llm-council - 3-stage deliberation
  • OpenAI Pro Mode - Tournament synthesis for large-scale generation

Ready for review! This is Phase 1 of the multi-agent upgrade. Chain runner and recipes will follow in separate PRs.


💻 View my work • 👤 Initiated by @ZeeeepaAbout Codegen
⛔ Remove Codegen from PR🚫 Ban action checks


Summary by cubic

Adds a multi-agent council orchestrator using the Codegen Agent API and a new codegen council CLI. It generates candidates in parallel, optionally ranks them, and synthesizes a final answer (simple or tournament) to improve results on complex prompts.

  • New Features

    • 3-stage workflow: parallel candidate generation → anonymous peer ranking → synthesis (simple or tournament).
    • CLI command: codegen council run with flags for models, candidates, ranking toggle, synthesis model, org ID, and poll interval.
    • Parallel execution with up to 50 workers, status polling, and graceful failure handling.
    • Aggregate ranking calculation and CLI output with run IDs and web URLs.
    • New data models and orchestrator module, plus unit tests for ranking parsing, synthesis prompts, and aggregation (integration test stub included).
  • Migration

    • No breaking changes. Requires authentication and an org ID (use codegen login or pass --org-id).

Written for commit fda8dac. Summary will update automatically on new commits.

Implements 3-stage council process using Codegen Agent API:
- Stage 1: Generate N candidates from multiple models in parallel
- Stage 2 (optional): Peer ranking with anonymized evaluation
- Stage 3: Synthesis (simple or tournament-based for large councils)

Features:
- CLI command: codegen council run --prompt ... --models gpt-4o,claude-3-5-sonnet
- Full tracking of agent run IDs and web URLs for all stages
- Aggregate ranking calculation across all judges
- Tests included with mocked agent runs

Co-authored-by: Zeeeepa <[email protected]>
@coderabbitai
Copy link

coderabbitai bot commented Dec 7, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 7 files

Prompt for AI agents (all 2 issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/codegen/council/orchestrator.py">

<violation number="1" location="src/codegen/council/orchestrator.py:158">
P2: Label generation with `chr(65 + i)` only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won&#39;t match the `[A-Z]` regex pattern in `_parse_ranking_from_text`. Consider using multi-character labels (e.g., AA, AB) for larger councils.</violation>

<violation number="2" location="src/codegen/council/orchestrator.py:336">
P1: Tasks returned from `_launch_parallel_runs` are in completion order (due to `as_completed`), not submission order. When zipped with `run_configs`, this causes incorrect model attribution. Store the config with each task or preserve submission order.</violation>
</file>

Reply to cubic to teach it or ask questions. Re-run a review with @cubic-dev-ai review this PR

) -> Tuple[List[RankingResult], Dict[str, str]]:
"""Stage 2: Each agent ranks the anonymized candidates."""
# Create anonymous labels (Response A, Response B, etc.)
labels = [chr(65 + i) for i in range(len(candidates))] # A, B, C, ...
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Label generation with chr(65 + i) only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won't match the [A-Z] regex pattern in _parse_ranking_from_text. Consider using multi-character labels (e.g., AA, AB) for larger councils.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/codegen/council/orchestrator.py, line 158:

<comment>Label generation with `chr(65 + i)` only produces valid letters A-Z for 26 candidates. With more candidates, non-letter characters are generated that won&#39;t match the `[A-Z]` regex pattern in `_parse_ranking_from_text`. Consider using multi-character labels (e.g., AA, AB) for larger councils.</comment>

<file context>
@@ -0,0 +1,504 @@
+    ) -&gt; Tuple[List[RankingResult], Dict[str, str]]:
+        &quot;&quot;&quot;Stage 2: Each agent ranks the anonymized candidates.&quot;&quot;&quot;
+        # Create anonymous labels (Response A, Response B, etc.)
+        labels = [chr(65 + i) for i in range(len(candidates))]  # A, B, C, ...
+        label_to_model = {
+            f&quot;Response {label}&quot;: cand.model
</file context>
Fix with Cubic

future = executor.submit(agent.run, prompt)
future_to_config[future] = (model, prompt)

for future in as_completed(future_to_config):
Copy link

@cubic-dev-ai cubic-dev-ai bot Dec 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Tasks returned from _launch_parallel_runs are in completion order (due to as_completed), not submission order. When zipped with run_configs, this causes incorrect model attribution. Store the config with each task or preserve submission order.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/codegen/council/orchestrator.py, line 336:

<comment>Tasks returned from `_launch_parallel_runs` are in completion order (due to `as_completed`), not submission order. When zipped with `run_configs`, this causes incorrect model attribution. Store the config with each task or preserve submission order.</comment>

<file context>
@@ -0,0 +1,504 @@
+                future = executor.submit(agent.run, prompt)
+                future_to_config[future] = (model, prompt)
+            
+            for future in as_completed(future_to_config):
+                try:
+                    task = future.result()
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant