Fix: Deleted experiments activate by EoghanOConnor · Pull Request #24 · redhat-et/rhdp-rca-plugin

EoghanOConnor · 2026-03-30T08:06:19Z

Fix for issue: #22

Restore if status == deleted.
If not mlflow autolog claude tracking_url experiment_name (This line creates experiment if it doesnt exist)
Corrected README as command already given.

Summary by CodeRabbit

Bug Fixes
- When MLflow Claude tracing is enabled, the system now detects and restores a soft‑deleted MLflow experiment before starting tracing, preventing missing logs and ensuring tracing continues automatically.
Documentation
- Updated tracing docs to replace the previous autologging instruction with a concise "start Claude" step and note that disabling the env var turns off tracing.

coderabbitai · 2026-03-30T08:06:43Z

📝 Walkthrough

Walkthrough

Added an inline Python step to the SessionStart hook that checks an MLflow experiment by name and restores it if its lifecycle_stage is deleted; the hook then runs mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME (chained with &&).

Changes

Cohort / File(s)	Summary
MLflow Experiment Restoration Logic `.claude/settings.example.json`, `skills/root-cause-analysis/SKILL.md`	Replaced direct `mlflow autolog claude` invocation with a compound shell command that first runs an inline `python3 -c` snippet to create `MlflowClient`, call `get_experiment_by_name(...)`, and `client.restore_experiment(...)` only if the experiment exists and `lifecycle_stage == 'deleted'`, then (via `&&`) runs `mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME`.
Documentation `docs/mlflow-tracing.md`	Replaced the previous autologging instruction with a "Start claude" step and adjusted guidance to start Claude directly (documentation-only change).

Sequence Diagram(s)

sequenceDiagram
    participant Hook as Hook (SessionStart)
    participant InlinePy as Inline Python (MlflowClient)
    participant MLflow as MLflow Tracking Server
    participant Autolog as mlflow autolog claude

    Hook->>InlinePy: run inline Python snippet (connect to http://127.0.0.1:$MLFLOW_PORT)
    InlinePy->>MLflow: get_experiment_by_name($MLFLOW_EXPERIMENT_NAME)
    alt experiment exists and lifecycle_stage == "deleted"
        InlinePy->>MLflow: restore_experiment(experiment_id)
        MLflow-->>InlinePy: restoration result
    else experiment missing or not deleted
        InlinePy-->>Hook: no-op (no restore)
    end
    Note right of Hook: shell uses && so next step runs only if InlinePy succeeds
    Hook->>Autolog: run `mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME`
    Autolog->>MLflow: send logs/telemetry to experiment
    MLflow-->>Autolog: ack

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Fix: Deleted experiments activate' directly relates to the main change: restoring deleted MLflow experiments so they can be used in the tracing setup.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/settings.example.json:
- Line 38: The MlflowClient() instantiation in the inline Python restore command
may point at a different MLflow server than the subsequent "mlflow autolog
claude" call; update the restore logic so MlflowClient is constructed with the
same tracking URI used by autolog (e.g., read from MLFLOW_TRACKING_URI or build
"http://127.0.0.1:$MLFLOW_PORT") and pass that as the tracking_uri argument to
MlflowClient(tracking_uri=...), ensuring the restore
(client.get_experiment_by_name / client.restore_experiment) operates on the same
server as mlflow autolog claude.

In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The MlflowClient() call in the shell command should use the same
tracking URI as the subsequent "mlflow autolog claude -u
http://127.0.0.1:$MLFLOW_PORT" invocation; update the MlflowClient instantiation
to pass tracking_uri set to the same URI (e.g.,
MlflowClient(tracking_uri="http://127.0.0.1:$MLFLOW_PORT") or the equivalent
MLFLOW_TRACKING_URI env var) so the client and autolog use the same tracking
server when checking/restoring the experiment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 944d4b69-c498-4a53-9006-7ecb2a603a29

📥 Commits

Reviewing files that changed from the base of the PR and between c5a9de8 and 2fad5b4.

📒 Files selected for processing (2)

.claude/settings.example.json
skills/root-cause-analysis/SKILL.md

.claude/settings.example.json

skills/root-cause-analysis/SKILL.md

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The inline python call that references $MLFLOW_EXPERIMENT_NAME inside
a single-quoted Python string can break if the name contains quotes; change the
command so the Python snippet reads environment variables itself (e.g., use
os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT'))
instead of interpolating shell vars into the Python code, and keep the
surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent
mlflow autolog claude call unchanged; update the python3 -c invocation to import
os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those
values when creating MlflowClient and restoring the experiment.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff7d81d7-bc0c-4277-9664-bfd3531f7324

📥 Commits

Reviewing files that changed from the base of the PR and between 349e77a and 5f6a974.

📒 Files selected for processing (3)

.claude/settings.example.json
docs/mlflow-tracing.md
skills/root-cause-analysis/SKILL.md

🚧 Files skipped from review as they are similar to previous changes (1)

.claude/settings.example.json

coderabbitai · 2026-03-30T10:36:05Z

skills/root-cause-analysis/SKILL.md

          {
            "type": "command",
-            "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
+            "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail export MLFLOW_PORT=5000 export MLFLOW_EXPERIMENT_NAME="team's-exp" echo "Expect failure with direct interpolation into single-quoted Python string:" if python3 -c "exp='$MLFLOW_EXPERIMENT_NAME'; print(exp)" >/dev/null 2>&1; then echo "Unexpected: interpolation variant succeeded" exit 1 else echo "Confirmed: interpolation variant fails when value contains single quote" fi echo "Expect success when reading env var inside Python:" python3 -c "import os; exp=os.environ['MLFLOW_EXPERIMENT_NAME']; print(exp)" >/dev/null echo "Confirmed: env-based variant succeeds"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 304

🏁 Script executed:

cd /tmp && git -C redhat-et/rhdp-rca-plugin ls-files | grep -i skill

Repository: redhat-et/rhdp-rca-plugin

Length of output: 147

🏁 Script executed:

head -90 /tmp/redhat-et/rhdp-rca-plugin/skills/root-cause-analysis/SKILL.md | tail -20

Repository: redhat-et/rhdp-rca-plugin

Length of output: 194

🏁 Script executed:

git ls-files | grep -i skill

Repository: redhat-et/rhdp-rca-plugin

Length of output: 2360

🏁 Script executed:

fd SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 423

🏁 Script executed:

wc -l skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 108

🏁 Script executed:

sed -n '75,95p' skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1442

🏁 Script executed:

sed -n '84p' skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 515

Avoid shell interpolation inside the inline Python snippet.

$MLFLOW_EXPERIMENT_NAME is injected directly into a Python single-quoted string; names containing ' will break python3 -c and skip tracing setup. Read env vars inside Python using os.environ[] instead.

Proposed fix

- "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi" + "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import os, mlflow; uri=f'http://127.0.0.1:{os.environ[\\\"MLFLOW_PORT\\\"]}'; client=mlflow.tracking.MlflowClient(tracking_uri=uri); exp=client.get_experiment_by_name(os.environ['MLFLOW_EXPERIMENT_NAME']); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@skills/root-cause-analysis/SKILL.md` at line 84, The inline python call that references $MLFLOW_EXPERIMENT_NAME inside a single-quoted Python string can break if the name contains quotes; change the command so the Python snippet reads environment variables itself (e.g., use os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT')) instead of interpolating shell vars into the Python code, and keep the surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent mlflow autolog claude call unchanged; update the python3 -c invocation to import os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those values when creating MlflowClient and restoring the experiment.

jerry-ng2 · 2026-03-30T19:00:41Z

I wasn't able to reproduce the error but the experiment recovery process works fine. Maybe @anyuzoey @PalmPalm7 could try it on their end.

EoghanOConnor · 2026-04-02T06:15:17Z

I wasn't able to reproduce the error but the experiment recovery process works fine. Maybe @anyuzoey @PalmPalm7 could try it on their end.

To reproduce.
First create an experiment.
Then delete it via the UI
Start Claude with the deleted experiment (set it in the settings)
Say "Hi" for example
And there should be no sign of the experiment on the UI.

Then to fix it use the fix found here

jerry-ng2

Bug was reproduced and the fix worked! LGTM

EoghanOConnor requested a review from anyuzoey March 30, 2026 08:06

EoghanOConnor self-assigned this Mar 30, 2026

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

.claude/settings.example.json Outdated Show resolved Hide resolved

skills/root-cause-analysis/SKILL.md Outdated Show resolved Hide resolved

EoghanOConnor force-pushed the activate_deleted_experiment branch from 3b1c828 to 349e77a Compare March 30, 2026 08:42

Fix: Deleted experiments activate

5f6a974

EoghanOConnor force-pushed the activate_deleted_experiment branch from 349e77a to 5f6a974 Compare March 30, 2026 10:31

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

EoghanOConnor requested review from PalmPalm7 and jerry-ng2 March 30, 2026 13:38

jerry-ng2 approved these changes Apr 2, 2026

View reviewed changes

jerry-ng2 merged commit cb4decf into redhat-et:main Apr 2, 2026
6 checks passed

jerry-ng2 mentioned this pull request Apr 2, 2026

Failed to set deleted experiment as active in Claude with MLFLOW #22

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Deleted experiments activate#24

Fix: Deleted experiments activate#24
jerry-ng2 merged 1 commit intoredhat-et:mainfrom
EoghanOConnor:activate_deleted_experiment

EoghanOConnor commented Mar 30, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 30, 2026

Uh oh!

jerry-ng2 commented Mar 30, 2026

Uh oh!

EoghanOConnor commented Apr 2, 2026

Uh oh!

jerry-ng2 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

EoghanOConnor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jerry-ng2 commented Mar 30, 2026

Uh oh!

EoghanOConnor commented Apr 2, 2026

Uh oh!

jerry-ng2 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

EoghanOConnor commented Mar 30, 2026 •

edited

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading