Fix: Deleted experiments activate#24
Conversation
📝 WalkthroughWalkthroughAdded an inline Python step to the SessionStart hook that checks an MLflow experiment by name and restores it if its lifecycle_stage is Changes
Sequence Diagram(s)sequenceDiagram
participant Hook as Hook (SessionStart)
participant InlinePy as Inline Python (MlflowClient)
participant MLflow as MLflow Tracking Server
participant Autolog as mlflow autolog claude
Hook->>InlinePy: run inline Python snippet (connect to http://127.0.0.1:$MLFLOW_PORT)
InlinePy->>MLflow: get_experiment_by_name($MLFLOW_EXPERIMENT_NAME)
alt experiment exists and lifecycle_stage == "deleted"
InlinePy->>MLflow: restore_experiment(experiment_id)
MLflow-->>InlinePy: restoration result
else experiment missing or not deleted
InlinePy-->>Hook: no-op (no restore)
end
Note right of Hook: shell uses && so next step runs only if InlinePy succeeds
Hook->>Autolog: run `mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME`
Autolog->>MLflow: send logs/telemetry to experiment
MLflow-->>Autolog: ack
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.claude/settings.example.json:
- Line 38: The MlflowClient() instantiation in the inline Python restore command
may point at a different MLflow server than the subsequent "mlflow autolog
claude" call; update the restore logic so MlflowClient is constructed with the
same tracking URI used by autolog (e.g., read from MLFLOW_TRACKING_URI or build
"http://127.0.0.1:$MLFLOW_PORT") and pass that as the tracking_uri argument to
MlflowClient(tracking_uri=...), ensuring the restore
(client.get_experiment_by_name / client.restore_experiment) operates on the same
server as mlflow autolog claude.
In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The MlflowClient() call in the shell command should use the same
tracking URI as the subsequent "mlflow autolog claude -u
http://127.0.0.1:$MLFLOW_PORT" invocation; update the MlflowClient instantiation
to pass tracking_uri set to the same URI (e.g.,
MlflowClient(tracking_uri="http://127.0.0.1:$MLFLOW_PORT") or the equivalent
MLFLOW_TRACKING_URI env var) so the client and autolog use the same tracking
server when checking/restoring the experiment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 944d4b69-c498-4a53-9006-7ecb2a603a29
📒 Files selected for processing (2)
.claude/settings.example.jsonskills/root-cause-analysis/SKILL.md
3b1c828 to
349e77a
Compare
349e77a to
5f6a974
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@skills/root-cause-analysis/SKILL.md`:
- Line 84: The inline python call that references $MLFLOW_EXPERIMENT_NAME inside
a single-quoted Python string can break if the name contains quotes; change the
command so the Python snippet reads environment variables itself (e.g., use
os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT'))
instead of interpolating shell vars into the Python code, and keep the
surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent
mlflow autolog claude call unchanged; update the python3 -c invocation to import
os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those
values when creating MlflowClient and restoring the experiment.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ff7d81d7-bc0c-4277-9664-bfd3531f7324
📒 Files selected for processing (3)
.claude/settings.example.jsondocs/mlflow-tracing.mdskills/root-cause-analysis/SKILL.md
🚧 Files skipped from review as they are similar to previous changes (1)
- .claude/settings.example.json
| { | ||
| "type": "command", | ||
| "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi" | ||
| "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
export MLFLOW_PORT=5000
export MLFLOW_EXPERIMENT_NAME="team's-exp"
echo "Expect failure with direct interpolation into single-quoted Python string:"
if python3 -c "exp='$MLFLOW_EXPERIMENT_NAME'; print(exp)" >/dev/null 2>&1; then
echo "Unexpected: interpolation variant succeeded"
exit 1
else
echo "Confirmed: interpolation variant fails when value contains single quote"
fi
echo "Expect success when reading env var inside Python:"
python3 -c "import os; exp=os.environ['MLFLOW_EXPERIMENT_NAME']; print(exp)" >/dev/null
echo "Confirmed: env-based variant succeeds"Repository: redhat-et/rhdp-rca-plugin
Length of output: 304
🏁 Script executed:
cd /tmp && git -C redhat-et/rhdp-rca-plugin ls-files | grep -i skillRepository: redhat-et/rhdp-rca-plugin
Length of output: 147
🏁 Script executed:
head -90 /tmp/redhat-et/rhdp-rca-plugin/skills/root-cause-analysis/SKILL.md | tail -20Repository: redhat-et/rhdp-rca-plugin
Length of output: 194
🏁 Script executed:
git ls-files | grep -i skillRepository: redhat-et/rhdp-rca-plugin
Length of output: 2360
🏁 Script executed:
fd SKILL.mdRepository: redhat-et/rhdp-rca-plugin
Length of output: 423
🏁 Script executed:
wc -l skills/root-cause-analysis/SKILL.mdRepository: redhat-et/rhdp-rca-plugin
Length of output: 108
🏁 Script executed:
sed -n '75,95p' skills/root-cause-analysis/SKILL.mdRepository: redhat-et/rhdp-rca-plugin
Length of output: 1442
🏁 Script executed:
sed -n '84p' skills/root-cause-analysis/SKILL.mdRepository: redhat-et/rhdp-rca-plugin
Length of output: 515
Avoid shell interpolation inside the inline Python snippet.
$MLFLOW_EXPERIMENT_NAME is injected directly into a Python single-quoted string; names containing ' will break python3 -c and skip tracing setup. Read env vars inside Python using os.environ[] instead.
Proposed fix
- "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import mlflow; client=mlflow.tracking.MlflowClient(tracking_uri='http://127.0.0.1:$MLFLOW_PORT'); exp=client.get_experiment_by_name('$MLFLOW_EXPERIMENT_NAME'); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"
+ "command": "if [ \"$MLFLOW_CLAUDE_TRACING_ENABLED\" = \"true\" ]; then python3 -c \"import os, mlflow; uri=f'http://127.0.0.1:{os.environ[\\\"MLFLOW_PORT\\\"]}'; client=mlflow.tracking.MlflowClient(tracking_uri=uri); exp=client.get_experiment_by_name(os.environ['MLFLOW_EXPERIMENT_NAME']); client.restore_experiment(exp.experiment_id) if exp and exp.lifecycle_stage == 'deleted' else None\" && mlflow autolog claude -u http://127.0.0.1:$MLFLOW_PORT -n $MLFLOW_EXPERIMENT_NAME; fi"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@skills/root-cause-analysis/SKILL.md` at line 84, The inline python call that
references $MLFLOW_EXPERIMENT_NAME inside a single-quoted Python string can
break if the name contains quotes; change the command so the Python snippet
reads environment variables itself (e.g., use
os.environ.get('MLFLOW_EXPERIMENT_NAME') and os.environ.get('MLFLOW_PORT'))
instead of interpolating shell vars into the Python code, and keep the
surrounding shell conditional (MLFLOW_CLAUDE_TRACING_ENABLED) and the subsequent
mlflow autolog claude call unchanged; update the python3 -c invocation to import
os, read MLFLOW_EXPERIMENT_NAME/MLFLOW_PORT from os.environ, and use those
values when creating MlflowClient and restoring the experiment.
|
I wasn't able to reproduce the error but the experiment recovery process works fine. Maybe @anyuzoey @PalmPalm7 could try it on their end. |
To reproduce. Then to fix it use the fix found here |
jerry-ng2
left a comment
There was a problem hiding this comment.
Bug was reproduced and the fix worked! LGTM
Fix for issue: #22
Restore if status == deleted.
If not mlflow autolog claude tracking_url experiment_name (This line creates experiment if it doesnt exist)
Corrected README as command already given.
Summary by CodeRabbit
Bug Fixes
Documentation