⚠️ Disclaimer: This is an open-source community implementation of the PARL (Parallel-Agent Reinforcement Learning) technique based on the Kimi K2.5 technical report. This is NOT an official implementation from Kimi AI or any affiliated organization. This project is maintained independently by The Swarm Corporation and the open-source community.
Open-source implementation of PARL (Parallel-Agent Reinforcement Learning), a novel training paradigm that enables AI models to decompose complex tasks into parallel subtasks and coordinate multiple agents simultaneously.
PARL is a training methodology that addresses the critical challenge of serial collapse in multi-agent systems, where models default to sequential execution despite having parallel computational capacity. By implementing staged reward shaping and a latency-oriented evaluation metric, PARL trains models to efficiently orchestrate up to 100 sub-agents across 1,500+ coordinated steps.
- Staged Reward Shaping: Dynamic reward annealing that encourages parallelism early in training and gradually shifts focus toward task success
- Instantiation Reward: Incentivizes subagent creation and concurrent execution
- Critical Steps Metric: Latency-oriented evaluation inspired by parallel computation's critical path concept
- Differentiable Components: Fully compatible with gradient-based optimization
- Orchestrator-Subagent Architecture: Trainable coordinator with frozen execution agents
┌─────────────────────────────────────────────┐
│ Orchestrator Agent │
│ (Trainable Central Coordinator) │
│ - Decomposes tasks into subtasks │
│ - Manages parallel execution │
│ - Coordinates subagent workflows │
└──────────────┬──────────────────────────────┘
│
├──────────┬──────────┬─────────┐
│ │ │ │
┌────▼───┐ ┌───▼────┐ ┌──▼────┐ ┌─▼──────┐
│Subagent│ │Subagent│ │Subagent│ │Subagent│
│ 1 │ │ 2 │ │ 3 │ │ ...N │
└────────┘ └────────┘ └────────┘ └────────┘
(Frozen) (Frozen) (Frozen) (Frozen)
PARL implements the three-term reward from the Kimi K2.5 technical report:
r_PARL(x,y) = λ1·r_parallel + λ2·r_finish + r_perf(x,y)
Where:
- r_parallel (instantiation reward): Incentivizes subagent instantiation; mitigates serial collapse.
- r_finish (sub-agent finish rate): Rewards completed subtasks; prevents spurious parallelism (spawning many subagents without meaningful decomposition).
- r_perf(x,y) (task-level outcome): Evaluates overall success and quality of solution y for task x.
- λ1 and λ2: Annealed to zero over training so the final policy optimizes r_perf.
Per the paper, total critical steps are defined as:
CriticalSteps = Σ_t (S_main^(t) + max_i S_sub,i^(t))
- S_main^(t): Steps taken by the main agent in stage t (typically 1).
- S_sub,i^(t): Steps taken by the i-th subagent in that parallel group.
- The duration of stage t is governed by the longest-running subagent in that cohort.
This metric captures the true execution time (critical path) and incentivizes effective parallelization.
pip3 install -U open-parlimport torch
from parl import PARLReward, CriticalStepsMetric
# Initialize the reward function
reward_fn = PARLReward(
lambda_init=0.1,
lambda_final=0.0,
total_training_steps=10000,
device='cuda' if torch.cuda.is_available() else 'cpu'
)
# Prepare episode data
num_subagents = torch.tensor([25, 30, 40])
completed_subtasks = torch.tensor([20, 28, 35]) # Completed subtasks
assigned_subtasks = torch.tensor([25, 30, 40]) # Assigned subtasks
trajectory_features = torch.randn(3, 64)
success = torch.tensor([1.0, 1.0, 0.0])
training_step = 5000
# Compute rewards (completed_subtasks/assigned_subtasks optional; default r_finish=1)
rewards = reward_fn.compute_full_reward(
num_subagents=num_subagents,
trajectory_features=trajectory_features,
success=success,
training_step=training_step,
max_subagents=100,
completed_subtasks=completed_subtasks,
assigned_subtasks=assigned_subtasks,
)
print(f"Total Reward: {rewards['total_reward']}")
print(f"λ1 (r_parallel): {rewards['lambda1']:.4f}, λ2 (r_finish): {rewards['lambda2']:.4f}")
print(f"Instantiation: {rewards['instantiation_component']}")
print(f"Finish: {rewards['finish_component']}, Task: {rewards['task_component']}")
# Evaluate using Critical Steps metric (S_main typically 1 per stage)
critical_steps_metric = CriticalStepsMetric()
main_steps = torch.ones(3, 5) # Main agent steps per stage (typically 1)
sub_steps = torch.rand(3, 5, 10)
critical_steps = critical_steps_metric(main_steps, sub_steps)
print(f"Critical Steps: {critical_steps}")| Example | File | Description | Requirements |
|---|---|---|---|
| Basic Usage | basic_usage.py |
Core PARL reward function and Critical Steps metric without external models | open-parl only |
| Quick Start (HF) | quickstart_hf.py |
Minimal integration with Hugging Face GLM-4.7-Flash model | transformers, torch, accelerate |
| Full HF Integration | huggingface_integration.py |
Complete orchestrator class with training loop, batching, and reward analysis | transformers, torch, accelerate |
For detailed API documentation, see docs/API.md.
Run the example training simulation:
python -m parl.mainThis will demonstrate reward evolution across training stages and critical steps computation.
Run the comprehensive test suite:
# Using pytest
pytest tests/ -v
# With coverage report
pytest tests/ --cov=parl --cov-report=html
# Run specific test file
pytest tests/test_parl.py -vThis is an unofficial open-source implementation based on the technical report:
"Kimi K2.5: Visual Agentic Intelligence" Kimi AI Research Team, 2026
For technical details and experimental results from the original research, see: Kimi K2.5 Technical Report
Note: This implementation is not affiliated with, endorsed by, or officially connected to Kimi AI. It is an independent open-source project developed by the community based on publicly available information.
If you use PARL in your research, please cite:
@article{kimik2.5_2026,
title={Kimi K2.5: Visual Agentic Intelligence},
author={Kimi AI Research Team},
journal={Technical Report},
year={2026},
url={https://www.kimi.com/blog/kimi-k2-5}
}We welcome contributions! Please follow these steps:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Please ensure your code passes all tests and follows PEP 8 style guidelines.
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
- This is an unofficial open-source implementation inspired by the Kimi K2.5 technical report
- Built on PyTorch's efficient tensor operations
- Thanks to the open-source ML community
- This project is not affiliated with or endorsed by Kimi AI
- Repository: github.com/The-Swarm-Corporation/PARL
- Issues: github.com/The-Swarm-Corporation/PARL/issues
Made with ⚡ by The Swarm Corporation