Skip to content

Fix: Deleted experiments activate#24

Merged
jerry-ng2 merged 1 commit intoredhat-et:mainfrom
EoghanOConnor:activate_deleted_experiment
Apr 2, 2026
Merged

Fix: Deleted experiments activate#24
jerry-ng2 merged 1 commit intoredhat-et:mainfrom
EoghanOConnor:activate_deleted_experiment

Conversation

@EoghanOConnor
Copy link
Copy Markdown
Collaborator

@EoghanOConnor EoghanOConnor commented Mar 30, 2026

Fix for issue: #22

Restore if status == deleted.
If not mlflow autolog claude tracking_url experiment_name (This line creates experiment if it doesnt exist)
Corrected README as command already given.

Summary by CodeRabbit

  • Bug Fixes

    • When MLflow Claude tracing is enabled, the system now detects and restores a soft‑deleted MLflow experiment before starting tracing, preventing missing logs and ensuring tracing continues automatically.
  • Documentation

    • Updated tracing docs to replace the previous autologging instruction with a concise "start Claude" step and note that disabling the env var turns off tracing.

@EoghanOConnor EoghanOConnor requested a review from anyuzoey March 30, 2026 08:06
@EoghanOConnor EoghanOConnor self-assigned this Mar 30, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Added an inline Python step to the SessionStart hook that checks an MLflow experiment by name and restores it if its lifecycle_stage is deleted; the hook then runs mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME (chained with &&).

Changes

Cohort / File(s) Summary
MLflow Experiment Restoration Logic
​.claude/settings.example.json, skills/root-cause-analysis/SKILL.md
Replaced direct mlflow autolog claude invocation with a compound shell command that first runs an inline python3 -c snippet to create MlflowClient, call get_experiment_by_name(...), and client.restore_experiment(...) only if the experiment exists and lifecycle_stage == 'deleted', then (via &&) runs mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME.
Documentation
docs/mlflow-tracing.md
Replaced the previous autologging instruction with a "Start claude" step and adjusted guidance to start Claude directly (documentation-only change).

Sequence Diagram(s)

sequenceDiagram
    participant Hook as Hook (SessionStart)
    participant InlinePy as Inline Python (MlflowClient)
    participant MLflow as MLflow Tracking Server
    participant Autolog as mlflow autolog claude

    Hook->>InlinePy: run inline Python snippet (connect to http://127.0.0.1:$MLFLOW_PORT)
    InlinePy->>MLflow: get_experiment_by_name($MLFLOW_EXPERIMENT_NAME)
    alt experiment exists and lifecycle_stage == "deleted"
        InlinePy->>MLflow: restore_experiment(experiment_id)
        MLflow-->>InlinePy: restoration result
    else experiment missing or not deleted
        InlinePy-->>Hook: no-op (no restore)
    end
    Note right of Hook: shell uses && so next step runs only if InlinePy succeeds
    Hook->>Autolog: run `mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME`
    Autolog->>MLflow: send logs/telemetry to experiment
    MLflow-->>Autolog: ack
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix: Deleted experiments activate' directly relates to the main change: restoring deleted MLflow experiments so they can be used in the tracing setup.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/settings.example.json:
- Line 38: The MlflowClient() instantiation in the inline Python restore command
may point at a different MLflow server than the subsequent "mlflow autolog
claude" call; update the restore logic so MlflowClient is constructed with the
same tracking URI used by autolog (e.g., read from MLFLOW_TRACKING_URI or build
"http://127.0.0.1:$MLFLOW_PORT") and pass that as the tracking_uri argument to
MlflowClient(tracking_uri=...), ensuring the restore
(client.get_experiment_by_name / client.restore_experiment) operates on the same
server as mlflow autolog claude.

In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The MlflowClient() call in the shell command should use the same
tracking URI as the subsequent "mlflow autolog claude -u
http://127.0.0.1:$MLFLOW_PORT" invocation; update the MlflowClient instantiation
to pass tracking_uri set to the same URI (e.g.,
MlflowClient(tracking_uri="http://127.0.0.1:$MLFLOW_PORT") or the equivalent
MLFLOW_TRACKING_URI env var) so the client and autolog use the same tracking
server when checking/restoring the experiment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 944d4b69-c498-4a53-9006-7ecb2a603a29

📥 Commits

Reviewing files that changed from the base of the PR and between c5a9de8 and 2fad5b4.

📒 Files selected for processing (2)
  • .claude/settings.example.json
  • skills/root-cause-analysis/SKILL.md

@EoghanOConnor EoghanOConnor force-pushed the activate_deleted_experiment branch from 3b1c828 to 349e77a Compare March 30, 2026 08:42
@EoghanOConnor EoghanOConnor force-pushed the activate_deleted_experiment branch from 349e77a to 5f6a974 Compare March 30, 2026 10:31
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The inline python call that references $MLFLOW_EXPERIMENT_NAME inside
a single-quoted Python string can break if the name contains quotes; change the
command so the Python snippet reads environment variables itself (e.g., use
os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT'))
instead of interpolating shell vars into the Python code, and keep the
surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent
mlflow autolog claude call unchanged; update the python3 -c invocation to import
os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those
values when creating MlflowClient and restoring the experiment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: ff7d81d7-bc0c-4277-9664-bfd3531f7324

📥 Commits

Reviewing files that changed from the base of the PR and between 349e77a and 5f6a974.

📒 Files selected for processing (3)
  • .claude/settings.example.json
  • docs/mlflow-tracing.md
  • skills/root-cause-analysis/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • .claude/settings.example.json

{
"type": "command",
"command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
"command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

export MLFLOW_PORT=5000
export MLFLOW_EXPERIMENT_NAME="team's-exp"

echo "Expect failure with direct interpolation into single-quoted Python string:"
if python3 -c "exp='$MLFLOW_EXPERIMENT_NAME'; print(exp)" >/dev/null 2>&1; then
  echo "Unexpected: interpolation variant succeeded"
  exit 1
else
  echo "Confirmed: interpolation variant fails when value contains single quote"
fi

echo "Expect success when reading env var inside Python:"
python3 -c "import os; exp=os.environ['MLFLOW_EXPERIMENT_NAME']; print(exp)" >/dev/null
echo "Confirmed: env-based variant succeeds"

Repository: redhat-et/rhdp-rca-plugin

Length of output: 304


🏁 Script executed:

cd /tmp && git -C redhat-et/rhdp-rca-plugin ls-files | grep -i skill

Repository: redhat-et/rhdp-rca-plugin

Length of output: 147


🏁 Script executed:

head -90 /tmp/redhat-et/rhdp-rca-plugin/skills/root-cause-analysis/SKILL.md | tail -20

Repository: redhat-et/rhdp-rca-plugin

Length of output: 194


🏁 Script executed:

git ls-files | grep -i skill

Repository: redhat-et/rhdp-rca-plugin

Length of output: 2360


🏁 Script executed:

fd SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 423


🏁 Script executed:

wc -l skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 108


🏁 Script executed:

sed -n '75,95p' skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 1442


🏁 Script executed:

sed -n '84p' skills/root-cause-analysis/SKILL.md

Repository: redhat-et/rhdp-rca-plugin

Length of output: 515


Avoid shell interpolation inside the inline Python snippet.

$MLFLOW_EXPERIMENT_NAME is injected directly into a Python single-quoted string; names containing ' will break python3 -c and skip tracing setup. Read env vars inside Python using os.environ[] instead.

Proposed fix
-            "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
+            "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import os, mlflow; uri=f'http://127.0.0.1:{os.environ[\\\"MLFLOW_PORT\\\"]}'; client=mlflow.tracking.MlflowClient(tracking_uri=uri); exp=client.get_experiment_by_name(os.environ['MLFLOW_EXPERIMENT_NAME']); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@skills/root-cause-analysis/SKILL.md` at line 84, The inline python call that
references $MLFLOW_EXPERIMENT_NAME inside a single-quoted Python string can
break if the name contains quotes; change the command so the Python snippet
reads environment variables itself (e.g., use
os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT'))
instead of interpolating shell vars into the Python code, and keep the
surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent
mlflow autolog claude call unchanged; update the python3 -c invocation to import
os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those
values when creating MlflowClient and restoring the experiment.

@jerry-ng2
Copy link
Copy Markdown
Collaborator

I wasn't able to reproduce the error but the experiment recovery process works fine. Maybe @anyuzoey @PalmPalm7 could try it on their end.

@EoghanOConnor
Copy link
Copy Markdown
Collaborator Author

I wasn't able to reproduce the error but the experiment recovery process works fine. Maybe @anyuzoey @PalmPalm7 could try it on their end.

To reproduce.
First create an experiment.
Then delete it via the UI
Start Claude with the deleted experiment (set it in the settings)
Say "Hi" for example
And there should be no sign of the experiment on the UI.

Then to fix it use the fix found here

Copy link
Copy Markdown
Collaborator

@jerry-ng2 jerry-ng2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug was reproduced and the fix worked! LGTM

@jerry-ng2 jerry-ng2 merged commit cb4decf into redhat-et:main Apr 2, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants