Skip to content

Feature: Add platform integration.#1896

Merged
d42me merged 25 commits intomainfrom
feature/add-platform-integration
Mar 20, 2026
Merged

Feature: Add platform integration.#1896
d42me merged 25 commits intomainfrom
feature/add-platform-integration

Conversation

@d42me
Copy link
Collaborator

@d42me d42me commented Feb 26, 2026

Closes 3032

Example link for e2e training run: https://app.primeintellect.ai/dashboard/training/u5uvj6rxwiy32o533q783k5n


Note

Medium Risk
Adds automatic platform run registration/finalization and new config fields, which changes how monitoring interacts with external APIs and environment variables. Failures are handled by disabling uploads, but misconfiguration or API changes could silently drop monitoring data.

Overview
Enables PRIME-RL to auto-register training runs on the Prime Intellect platform when orchestrator.prime_monitor is enabled, falling back to prime login credentials when PRIME_API_KEY is not set.

Adds new PrimeMonitorConfig options (run_name, team_id, frontend_url), defaults the monitoring API base path to /api/v1/rft, and auto-derives run_name from the shared W&B run name when unset.

Updates PrimeMonitor to create a RUN_ID via an external-runs registration call, print a dashboard link, and finalize the run status (completed/failed) on success or on close(), with safeguards for forked processes.

Written by Cursor Bugbot for commit f3b0879. This will update automatically on new commits. Configure here.

@d42me d42me requested review from mikasenghaas and samsja February 26, 2026 06:20
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 3 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for all 3 issues found in the latest run.

  • ✅ Fixed: Missing CHANGELOG entry for new config fields
    • Added CHANGELOG entry documenting the new PlatformConfig class and platform field on RLConfig with all field names and default values.
  • ✅ Fixed: Unhandled HTTP exceptions in cleanup finalize calls
    • Wrapped httpx.put call in finalize_run with try/except httpx.HTTPError to catch transport-level exceptions and log a warning instead of raising.
  • ✅ Fixed: Platform run never finalized if pre-try code fails
    • Moved write_subconfigs call inside the try block so failures after register_run are covered by the except handlers that call _finalize_platform_run.

Create PR

Or push these changes by commenting:

@cursor push 2567b61b44
Preview (2567b61b44)
diff --git a/CHANGELOG.md b/CHANGELOG.md
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -2,6 +2,7 @@
 
 Documenting changes which affect configuration usage patterns (added/moved/removed/renamed fields, notable logic changes).
 
+- **`platform`**: Added `PlatformConfig` for Prime Intellect platform integration. When set, creates a run on the platform dashboard and streams metrics/samples via PrimeMonitor. Fields: `base_url` (default: `https://api.primeintellect.ai`), `run_name`, `wandb_project`, `wandb_entity`, `team_id` (2026-02-26)
 - **`model.lora`**: Moved from `model.experimental.lora` to `model.lora` (no longer experimental) (#1440, 2025-12-16)
 - Auto-set `api_server_count=1` on inference when LoRA is enabled, because vLLM doesn't support hotloading for multiple API servers (#1422, 2025-12-17)
 - **`inference.model.rope_scaling`**: Added RoPE scaling configuration passthrough to vLLM (#1447 2025-12-17)

diff --git a/src/prime_rl/entrypoints/rl.py b/src/prime_rl/entrypoints/rl.py
--- a/src/prime_rl/entrypoints/rl.py
+++ b/src/prime_rl/entrypoints/rl.py
@@ -158,10 +158,6 @@
         else:
             config.orchestrator.prime_monitor.base_url = f"{config.platform.base_url}/api/internal/rft"
 
-    # Write all resolved subconfigs to disk
-    config_dir = Path(".pydantic_config") / uuid.uuid4().hex
-    write_subconfigs(config, config_dir)
-
     # Start processes
     processes: list[Popen] = []
     monitor_threads: list[Thread] = []
@@ -169,6 +165,10 @@
     stop_events: dict[str, Event] = {}
 
     try:
+        # Write all resolved subconfigs to disk
+        config_dir = Path(".pydantic_config") / uuid.uuid4().hex
+        write_subconfigs(config, config_dir)
+
         # Optionally, start inference process
         if config.inference:
             inference_cmd = ["uv", "run", "inference", "@", (config_dir / "inference.toml").as_posix()]

diff --git a/src/prime_rl/utils/platform.py b/src/prime_rl/utils/platform.py
--- a/src/prime_rl/utils/platform.py
+++ b/src/prime_rl/utils/platform.py
@@ -120,12 +120,16 @@
     status_label = "completed" if success else "failed"
     logger.info(f"Finalizing platform run {run_id} as {status_label}")
 
-    response = httpx.put(
-        f"{config.base_url}/api/v1/rft/external-runs/{run_id}/status",
-        headers={"Authorization": f"Bearer {api_key}"},
-        json=payload,
-        timeout=30,
-    )
+    try:
+        response = httpx.put(
+            f"{config.base_url}/api/v1/rft/external-runs/{run_id}/status",
+            headers={"Authorization": f"Bearer {api_key}"},
+            json=payload,
+            timeout=30,
+        )
+    except httpx.HTTPError as e:
+        logger.warning(f"Failed to finalize platform run {run_id}: {e}")
+        return
 
     if response.status_code != 200:
         logger.warning(f"Failed to finalize platform run {run_id} (HTTP {response.status_code}): {response.text}")
This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.

…integration

Made-with: Cursor

# Conflicts:
#	CHANGELOG.md
Copy link
Member

@samsja samsja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should merge the two concept of prime monitor right now into one config and one runtime/class.

  • Config: Merge PlatformConfig into PrimeMonitorConfig by adding the optional run_name and team_id fields directly on it. Remove PlatformConfig, the
    prime_platform field on RLConfig/OrchestratorConfig, and the auto_setup_prime_platform validator. Users just configure prime_monitor with the lifecycle fields
    when they want registration

  • Runtime: Move register_run and finalize_run logic into PrimeMonitor itself — register in init (when no RUN_ID is already set), finalize in
    save_final_summary/close. Delete utils/platform.py and remove the manual registration/finalization calls from the orchestrator loop

@d42me d42me requested a review from JannikSt March 18, 2026 23:18
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

@d42me d42me merged commit d5c7240 into main Mar 20, 2026
9 checks passed
@d42me d42me deleted the feature/add-platform-integration branch March 20, 2026 01:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants