feat: Add MLFlow support to LMEvalJob configuration#606
feat: Add MLFlow support to LMEvalJob configuration#606ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
Conversation
- Introduced MLFlowOutput type for specifying MLFlow tracking parameters. - Updated Outputs struct to include MLFlow configuration. - Enhanced LMEvalJob controller to handle MLFlow parameters and export results. - Added command-line flags for MLFlow settings in the driver. - Implemented MLFlow export in the driver. - Updated CRD with MLFlow fields and validation requirements.
Reviewer's GuideAdds first-class MLFlow output support to LMEvalJob by extending the CRD/Outputs schema, wiring MLFlow configuration from the CR to driver CLI flags, and having the driver invoke a Python export script to push metrics/artifacts to an MLFlow tracking server after successful job completion. Sequence diagram for MLFlow export on LMEvalJob completionsequenceDiagram
actor User as "LMEvalJob author"
participant K8sAPI as "Kubernetes API server"
participant Controller as "LMEvalJob controller"
participant Pod as "LMEvalJob Pod (driver)"
participant Driver as "Driver process"
participant Script as "mlflow_export.py"
participant MLFlow as "MLFlow tracking server"
User->>K8sAPI: "Apply LMEvalJob CR with outputs.mlflow configured"
K8sAPI->>Controller: "Send LMEvalJob add/update event"
Controller->>Controller: "Read spec.outputs.mlflow and build MLFlow params JSON"
Controller->>Pod: "Create Pod with driver CLI flags for MLFlow (tracking URI, experiment name, run ID, export types, source info, params JSON)"
Pod->>Driver: "Start driver with MLFlow flags parsed into DriverOption"
Driver->>Driver: "Run evaluation and write metrics and artifacts to output directory"
Driver->>Driver: "updateCompleteStatus() invoked after evaluation"
Driver->>Driver: "Check for errors and presence of MLFlow configuration"
alt "MLFlow tracking URI and export types are configured and no evaluation error"
Driver->>Driver: "exportMLFlow() builds argument list for mlflow_export.py and environment variables"
Driver->>Script: "Execute mlflow_export.py with output directory, tracking URI, experiment name, run ID, export types, source tags, params JSON"
Script->>MLFlow: "Create or select experiment and run using tracking URI and experiment name"
Script->>MLFlow: "Log evaluation metrics and upload artifacts based on export types"
MLFlow-->>Script: "Acknowledge logged metrics and stored artifacts"
Script-->>Driver: "Exit successfully with combined output"
Driver->>Driver: "Log MLFlow export completed successfully"
else "MLFlow not configured or export script fails"
Driver->>Driver: "Skip export or log MLFlow export error without failing the job"
end
Driver-->>Pod: "Job completes and status includes evaluation results"
Pod-->>Controller: "Pod status updated to completed"
Controller-->>User: "LMEvalJob status updated and results available, metrics and artifacts visible in MLFlow UI"
Updated class diagram for MLFlow-related LMEvalJob and driver typesclassDiagram
class LMEvalJobSpec {
+string "Model"
+TaskList "TaskList"
+Outputs "Outputs"
}
class Outputs {
+*string "PersistentVolumeClaimName"
+*PersistentVolumeClaimManaged "PersistentVolumeClaimManaged"
+*MLFlowOutput "MLFlow"
+bool "HasExistingPVC()"
+bool "HasManagedPVC()"
+bool "HasMLFlow()"
}
class PersistentVolumeClaimManaged {
+string "Size"
}
class MLFlowExportType {
<<enumeration>>
"MLFlowMetricsExport = \"metrics\""
"MLFlowArtifactsExport = \"artifacts\""
}
class MLFlowOutput {
+string "TrackingUri"
+*string "ExperimentName"
+*string "RunId"
+[]MLFlowExportType "Export"
+bool "HasMLFlowMetrics()"
+bool "HasMLFlowArtifacts()"
}
class DriverOption {
+string "OutputPath"
+bool "DetectDevice"
+strArrayArg "TaskRecipes"
+strArrayArg "CustomArtifacts"
+strArrayArg "TaskNames"
+bool "AllowOnline"
+string "MLFlowTrackingUri"
+string "MLFlowExperimentName"
+string "MLFlowRunId"
+[]string "MLFlowExportTypes"
+string "MLFlowSourceName"
+string "MLFlowSourceType"
+string "MLFlowParamsJSON"
}
class driverImpl {
+DriverOption "Option"
+void "updateCompleteStatus(error)"
+error "exportMLFlow()"
}
class LMEvalJobController {
+string "buildMLFlowParams(LMEvalJob)"
+[]string "generateCmd(serviceOptions, LMEvalJob, PermissionConfig)"
}
class LMEvalJob {
+string "Name"
+string "Namespace"
+LMEvalJobSpec "Spec"
}
LMEvalJob --> "1" LMEvalJobSpec : "spec"
LMEvalJobSpec --> "1" Outputs : "outputs"
Outputs --> "0..1" MLFlowOutput : "mlflow"
Outputs --> "0..1" PersistentVolumeClaimManaged : "pvcManaged"
MLFlowOutput --> "*" MLFlowExportType : "export"
LMEvalJobController --> LMEvalJob : "reads"
LMEvalJobController --> Outputs : "checks HasMLFlow()"
LMEvalJobController --> MLFlowOutput : "maps fields to CLI flags"
LMEvalJobController --> DriverOption : "populates via generateCmd()"
driverImpl --> DriverOption : "uses"
driverImpl --> MLFlowOutput : "indirectly via driver CLI flags and export script"
class strArrayArg {
+[]string "Values"
+void "Set(string)"
+string "String()"
}
DriverOption --> strArrayArg : "aggregates"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughMLFlow integration has been added to the LMEvalJob system, enabling optional metric and artifact export. Changes span type definitions, API schema, CLI flags, driver orchestration logic, and controller pod creation logic, including conditional PVC mounting and MLFlow parameter marshaling. Changes
Sequence DiagramsequenceDiagram
participant Controller as LMEvalJob<br/>Controller
participant Driver as Driver<br/>Process
participant MLFlow as MLFlow<br/>Tracking
participant PVC as PVC<br/>Storage
Note over Controller,PVC: Evaluation Execution
Controller->>Driver: Execute with job config
Driver->>PVC: Write results (if PVC configured)
Note over Controller,PVC: Completion Handling
Driver->>Driver: updateCompleteStatus()
alt MLFlow Export Configured
Driver->>MLFlow: exportMLFlow() invokes<br/>Python script with params
MLFlow-->>Driver: Export metrics/artifacts
Driver->>Driver: Log result (non-fatal if error)
else No MLFlow
Driver->>Driver: Skip MLFlow export
end
Driver-->>Controller: Job complete
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey there - I've reviewed your changes - here's some feedback:
- In exportMLFlow, consider using exec.CommandContext with the driver context instead of exec.Command so the MLFlow export process is terminated if the job context is cancelled or times out.
- The MLFlow export currently no-ops when trackingUri is set but export types are empty, even though the CRD allows export to be omitted; consider either providing a sensible default (e.g., metrics+artifacts) or logging a warning so users understand why nothing was exported.
- buildMLFlowParams dumps many CR fields directly into MLFlow params (including modelArgs/genArgs); it may be worth filtering or redacting known sensitive keys (e.g., API keys, tokens) before marshaling to JSON and sending to MLFlow.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In exportMLFlow, consider using exec.CommandContext with the driver context instead of exec.Command so the MLFlow export process is terminated if the job context is cancelled or times out.
- The MLFlow export currently no-ops when trackingUri is set but export types are empty, even though the CRD allows export to be omitted; consider either providing a sensible default (e.g., metrics+artifacts) or logging a warning so users understand why nothing was exported.
- buildMLFlowParams dumps many CR fields directly into MLFlow params (including modelArgs/genArgs); it may be worth filtering or redacting known sensitive keys (e.g., API keys, tokens) before marshaling to JSON and sending to MLFlow.
## Individual Comments
### Comment 1
<location> `controllers/lmes/driver/driver.go:778-779` </location>
<code_context>
+ args = append(args, "--params-json", d.Option.MLFlowParamsJSON)
+ }
+
+ // Execute MLFlow export script
+ cmd := exec.Command("python", args...)
+
+ // Set environment variables for the script
</code_context>
<issue_to_address>
**issue (bug_risk):** Running the MLFlow export script without a timeout can cause hangs.
Because exec.Command isn’t tied to a context, a stuck mlflow_export.py (e.g., due to tracking server issues) could block updateCompleteStatus indefinitely and delay job completion. Consider using exec.CommandContext with d.Option.Context (or a derived context with a timeout) so the export can be cancelled cleanly and doesn’t block the driver.
</issue_to_address>
### Comment 2
<location> `controllers/lmes/lmevaljob_controller.go:1428-1437` </location>
<code_context>
+func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
</code_context>
<issue_to_address>
**🚨 suggestion (security):** MLFlow params payload may become large or include sensitive fields; consider limiting or filtering.
Since buildMLFlowParams serializes multiple structs (ModelArgs, GenArgs, CustomArtifacts, etc.) into one JSON value for --mlflow-params-json, this may become quite large and/or contain secrets (tokens, internal URLs). Please consider restricting which fields are included, redacting known sensitive keys, and/or enforcing a size limit to avoid leaking sensitive data or hitting OS argument/env size limits.
Suggested implementation:
```golang
const mlflowParamsMaxBytes = 8 * 1024
func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
if job == nil {
return ""
}
params := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
}
sanitized := sanitizeMLFlowParams(params)
data, err := json.Marshal(sanitized)
if err != nil {
// In case of unexpected serialization issues, avoid passing partial or unsafe data.
return ""
}
if len(data) > mlflowParamsMaxBytes {
// If the payload is too large, fall back to a minimal, non-sensitive payload
// to avoid hitting OS argument/env limits.
truncated := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
"truncated": true,
"truncation_reason": "mlflow params exceeded maximum allowed size",
"original_size_bytes": len(data),
"max_size_bytes": mlflowParamsMaxBytes,
}
data, err = json.Marshal(truncated)
if err != nil {
return ""
}
}
return string(data)
}
// sanitizeMLFlowParams walks the provided value and redacts fields whose keys
// are known or likely to contain sensitive information (tokens, secrets, etc.).
func sanitizeMLFlowParams(v any) any {
switch typed := v.(type) {
case map[string]any:
out := make(map[string]any, len(typed))
for k, val := range typed {
if isSensitiveKey(k) {
out[k] = "***REDACTED***"
continue
}
out[k] = sanitizeMLFlowParams(val)
}
return out
case []any:
out := make([]any, len(typed))
for i, val := range typed {
out[i] = sanitizeMLFlowParams(val)
}
return out
default:
// For other scalar types, return as-is.
return v
}
}
func isSensitiveKey(key string) bool {
k := strings.ToLower(key)
// Exact matches for common sensitive keys.
switch k {
case "token", "access_token", "refresh_token", "api_key", "apikey", "password", "secret", "client_secret", "auth_token", "authorization":
return true
}
// Heuristic: redact fields containing these substrings.
sensitiveSubstrings := []string{
"token",
"secret",
"password",
"passwd",
"api_key",
"apikey",
"credential",
"auth",
}
for _, sub := range sensitiveSubstrings {
if strings.Contains(k, sub) {
return true
}
}
return false
}
```
1. Ensure the following imports are present at the top of controllers/lmes/lmevaljob_controller.go:
- "encoding/json"
- "strings"
For example, inside the existing import block:
- Add: encoding/json
- Add: strings
2. If there is already a constant or configuration system for limits, you may want to:
- Replace the hard-coded mlflowParamsMaxBytes (8 * 1024) with a value derived from configuration or a shared constant to align with project conventions.
3. If other parts of the code build the params map with additional nested structs (ModelArgs, GenArgs, CustomArtifacts, etc.):
- Make sure they are converted to map[string]any / []any or JSON-serializable types before being passed into sanitizeMLFlowParams so that the redaction logic can traverse them correctly.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| // Execute MLFlow export script | ||
| cmd := exec.Command("python", args...) |
There was a problem hiding this comment.
issue (bug_risk): Running the MLFlow export script without a timeout can cause hangs.
Because exec.Command isn’t tied to a context, a stuck mlflow_export.py (e.g., due to tracking server issues) could block updateCompleteStatus indefinitely and delay job completion. Consider using exec.CommandContext with d.Option.Context (or a derived context with a timeout) so the export can be cancelled cleanly and doesn’t block the driver.
| func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string { | ||
| if job == nil { | ||
| return "" | ||
| } | ||
|
|
||
| params := map[string]any{ | ||
| "crName": job.Name, | ||
| "namespace": job.Namespace, | ||
| "model": job.Spec.Model, | ||
| "tasks": concatTasks(job.Spec.TaskList), |
There was a problem hiding this comment.
🚨 suggestion (security): MLFlow params payload may become large or include sensitive fields; consider limiting or filtering.
Since buildMLFlowParams serializes multiple structs (ModelArgs, GenArgs, CustomArtifacts, etc.) into one JSON value for --mlflow-params-json, this may become quite large and/or contain secrets (tokens, internal URLs). Please consider restricting which fields are included, redacting known sensitive keys, and/or enforcing a size limit to avoid leaking sensitive data or hitting OS argument/env size limits.
Suggested implementation:
const mlflowParamsMaxBytes = 8 * 1024
func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
if job == nil {
return ""
}
params := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
}
sanitized := sanitizeMLFlowParams(params)
data, err := json.Marshal(sanitized)
if err != nil {
// In case of unexpected serialization issues, avoid passing partial or unsafe data.
return ""
}
if len(data) > mlflowParamsMaxBytes {
// If the payload is too large, fall back to a minimal, non-sensitive payload
// to avoid hitting OS argument/env limits.
truncated := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
"truncated": true,
"truncation_reason": "mlflow params exceeded maximum allowed size",
"original_size_bytes": len(data),
"max_size_bytes": mlflowParamsMaxBytes,
}
data, err = json.Marshal(truncated)
if err != nil {
return ""
}
}
return string(data)
}
// sanitizeMLFlowParams walks the provided value and redacts fields whose keys
// are known or likely to contain sensitive information (tokens, secrets, etc.).
func sanitizeMLFlowParams(v any) any {
switch typed := v.(type) {
case map[string]any:
out := make(map[string]any, len(typed))
for k, val := range typed {
if isSensitiveKey(k) {
out[k] = "***REDACTED***"
continue
}
out[k] = sanitizeMLFlowParams(val)
}
return out
case []any:
out := make([]any, len(typed))
for i, val := range typed {
out[i] = sanitizeMLFlowParams(val)
}
return out
default:
// For other scalar types, return as-is.
return v
}
}
func isSensitiveKey(key string) bool {
k := strings.ToLower(key)
// Exact matches for common sensitive keys.
switch k {
case "token", "access_token", "refresh_token", "api_key", "apikey", "password", "secret", "client_secret", "auth_token", "authorization":
return true
}
// Heuristic: redact fields containing these substrings.
sensitiveSubstrings := []string{
"token",
"secret",
"password",
"passwd",
"api_key",
"apikey",
"credential",
"auth",
}
for _, sub := range sensitiveSubstrings {
if strings.Contains(k, sub) {
return true
}
}
return false
}-
Ensure the following imports are present at the top of controllers/lmes/lmevaljob_controller.go:
- "encoding/json"
- "strings"
For example, inside the existing import block:
- Add: encoding/json
- Add: strings
-
If there is already a constant or configuration system for limits, you may want to:
- Replace the hard-coded mlflowParamsMaxBytes (8 * 1024) with a value derived from configuration or a shared constant to align with project conventions.
-
If other parts of the code build the params map with additional nested structs (ModelArgs, GenArgs, CustomArtifacts, etc.):
- Make sure they are converted to map[string]any / []any or JSON-serializable types before being passed into sanitizeMLFlowParams so that the redaction logic can traverse them correctly.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (1)
279-305: Tighten mlflow schema: dedupe export, require at least one item, basic string guards, and lock object shape.Adds safer validation without changing behavior. Defaults can mirror the example (metrics+artifacts).
mlflow: description: Export results to MLFlow tracking server properties: experimentName: description: ExperimentName is the name of the MLFlow experiment type: string + minLength: 1 export: description: Export defines what to export to MLFlow (metrics, artifacts, or both) items: description: MLFlowExportType defines what to export to MLFlow enum: - metrics - artifacts type: string - type: array + type: array + minItems: 1 + uniqueItems: true + x-kubernetes-list-type: set + default: + - metrics + - artifacts runId: description: RunId is the specific MLFlow run ID to use (optional) type: string + minLength: 1 trackingUri: description: TrackingUri is the MLFlow tracking server URI + format: uri pattern: ^https?://[a-zA-Z0-9.-]+(:[0-9]+)?(/.*)?$ type: string required: - trackingUri - type: object + type: object + additionalProperties: falseOptional note: if you plan to support non-HTTP backends (e.g., file:// or IPv6 hosts), we can relax the trackingUri pattern later.
api/lmes/v1alpha1/lmevaljob_types.go (1)
396-405: MLFlow API surface is coherent; consider clarifying/strengthening Export semanticsThe MLFlow additions (
MLFlowExportType,MLFlowOutput,Outputs.MLFlow, and theHasMLFlow*helpers) are structurally sound and integrate cleanly with the rest of the spec and controller helpers.One behavioral nuance to double‑check: if a user sets
trackingUribut leavesexportempty,HasMLFlowwill be true but bothHasMLFlowMetricsandHasMLFlowArtifactsreturn false, and the driver will receive no--mlflow-export-typeflags. This effectively disables export while still requiring a validtrackingUri. If that’s unintended, you might:
- enforce
exportas required whenmlflowis present (CRD validation / webhook), or- treat an empty
exportas “export both metrics and artifacts” (e.g., fill defaults before wiring down to the driver).Otherwise, the design looks good.
Also applies to: 407-421, 423-433, 637-666
controllers/lmes/driver/driver.go (1)
731-820: exportMLFlow implementation is straightforward; consider minor robustness tweaksThe
exportMLFlowmethod correctly:
- short‑circuits when tracking URI is empty or no export types are provided,
- passes all MLFlow options via both CLI args and environment,
- uses
exec.Command("python", args...)(no shell), avoiding injection, and- logs combined output on success or failure.
Two small robustness ideas you might consider (non‑blocking):
- Guard against the export script not being present or executable by enriching the error message with a hint (path and maybe
os.IsNotExistcheck) so operators can diagnose image issues quickly.- If
MLFlowExportTypescould ever be user‑provided outside the CRD path, you might validate values (e.g. restrict tometrics/artifacts) before passing them through.Otherwise this looks good.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/lmes/v1alpha1/lmevaljob_types.go(2 hunks)api/lmes/v1alpha1/zz_generated.deepcopy.go(2 hunks)cmd/lmes_driver/main.go(2 hunks)config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml(1 hunks)controllers/lmes/driver/driver.go(3 hunks)controllers/lmes/lmevaljob_controller.go(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
controllers/lmes/lmevaljob_controller.go (4)
api/lmes/v1alpha1/lmevaljob_types.go (7)
Outputs(423-433)LMEvalJob(704-710)TaskList(244-254)CustomArtifacts(140-153)CustomTasks(239-242)GitSource(216-230)ChatTemplate(508-511)controllers/job_mgr/job_mgr_controller.go (1)
LMEvalJob(46-48)controllers/lmes/constants.go (2)
AllowOnline(37-37)AllowCodeExecution(38-38)api/lmes/v1alpha1/groupversion_info.go (1)
KindName(30-30)
cmd/lmes_driver/main.go (3)
controllers/lmes/constants.go (2)
OutputPath(28-28)AllowOnline(37-37)controllers/lmes/driver/driver.go (1)
DefaultPort(48-48)api/lmes/v1alpha1/lmevaljob_types.go (1)
CustomArtifacts(140-153)
api/lmes/v1alpha1/zz_generated.deepcopy.go (1)
api/lmes/v1alpha1/lmevaljob_types.go (2)
MLFlowOutput(408-421)MLFlowExportType(398-398)
controllers/lmes/driver/driver.go (2)
cmd/lmes_driver/main.go (1)
OutputPath(37-37)controllers/lmes/constants.go (1)
OutputPath(28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Sourcery review
- GitHub Check: deploy
- GitHub Check: build
🔇 Additional comments (5)
api/lmes/v1alpha1/zz_generated.deepcopy.go (1)
421-449: MLFlow DeepCopy implementation looks correct and consistentThe new
MLFlowOutput.DeepCopyInto/DeepCopyand theOutputs.MLFlowdeep copy block follow the same patterns as other types in this file and avoid aliasing pointer/slice fields. No issues found.Also applies to: 538-555
controllers/lmes/lmevaljob_controller.go (2)
912-920: PVC volume/volumeMount gating now matches outputs configurationConditionally adding the
outputsvolume and mount only whenOutputsexists and a managed or existing PVC is configured is consistent with the reconciliation logic and avoids mounting unused PVCs. No issues here.Also applies to: 930-950
1580-1603: MLFlow CLI wiring from Outputs to driver looks correctThe MLFlow block in
generateCmdcorrectly:
- checks
Outputs != nilandHasMLFlow()before accessing the pointer,- passes tracking URI, optional experiment/run IDs, and each export type as repeatable
--mlflow-export-typeflags,- sets
--mlflow-source-nameto<namespace>/<name>and--mlflow-source-typetoKindName, and- appends
--mlflow-params-jsononly when JSON construction succeeds.This aligns with the driver’s
DriverOptionfields and the CLI flags, and it gracefully degrades if param marshaling fails.controllers/lmes/driver/driver.go (1)
76-84: Completion hook for MLFlow export is well‑placed and non‑disruptiveWiring MLFlow export from
updateCompleteStatusonly when:
- the run itself had no error, and
- result collection succeeded,
and then treating MLFlow failures as non‑fatal (log only, keep job as succeeded) is a sensible contract for an optional integration. The new
DriverOptionMLFlow fields are also straightforward and match how the controller/CLI populate them.Also applies to: 445-472
cmd/lmes_driver/main.go (1)
52-83: CLI MLFlow flags and wiring into DriverOption look consistentThe new
mlflow-*flags andmlflowExportTypesaggregation integrate cleanly:
strArrayArgis an appropriate choice for repeatable--mlflow-export-typeflags,- all MLFlow options are passed through to
DriverOptionand then to the driver, and- existing flows (
copy,get-status,shutdown) are unaffected.No issues spotted on the CLI side.
Also applies to: 130-153
| func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string { | ||
| if job == nil { | ||
| return "" | ||
| } | ||
|
|
||
| params := map[string]any{ | ||
| "crName": job.Name, | ||
| "namespace": job.Namespace, | ||
| "model": job.Spec.Model, | ||
| "tasks": concatTasks(job.Spec.TaskList), | ||
| } | ||
|
|
||
| if len(job.Spec.TaskList.TaskNames) > 0 { | ||
| params["taskNames"] = job.Spec.TaskList.TaskNames | ||
| } | ||
|
|
||
| if len(job.Spec.TaskList.TaskRecipes) > 0 { | ||
| params["taskRecipes"] = job.Spec.TaskList.TaskRecipes | ||
| } | ||
|
|
||
| if job.Spec.TaskList.CustomArtifacts != nil { | ||
| params["customArtifacts"] = job.Spec.TaskList.CustomArtifacts | ||
| } | ||
|
|
||
| if job.Spec.TaskList.HasCustomTasksWithGit() { | ||
| params["customTasksGit"] = job.Spec.TaskList.CustomTasks.Source.GitSource | ||
| } | ||
|
|
||
| if len(job.Spec.ModelArgs) > 0 { | ||
| params["modelArgs"] = job.Spec.ModelArgs | ||
| } | ||
|
|
||
| if len(job.Spec.GenArgs) > 0 { | ||
| params["genArgs"] = job.Spec.GenArgs | ||
| } | ||
|
|
||
| if job.Spec.NumFewShot != nil { | ||
| params["numFewShot"] = job.Spec.NumFewShot | ||
| } | ||
|
|
||
| if job.Spec.Limit != "" { | ||
| params["limit"] = job.Spec.Limit | ||
| } | ||
|
|
||
| if job.Spec.LogSamples != nil { | ||
| params["logSamples"] = job.Spec.LogSamples | ||
| } | ||
|
|
||
| if job.Spec.BatchSize != nil { | ||
| params["batchSize"] = job.Spec.BatchSize | ||
| } | ||
|
|
||
| if job.Spec.AllowOnline != nil { | ||
| params["allowOnline"] = job.Spec.AllowOnline | ||
| } | ||
|
|
||
| if job.Spec.AllowCodeExecution != nil { | ||
| params["allowCodeExecution"] = job.Spec.AllowCodeExecution | ||
| } | ||
|
|
||
| if job.Spec.SystemInstruction != "" { | ||
| params["systemInstruction"] = job.Spec.SystemInstruction | ||
| } | ||
|
|
||
| if job.Spec.ChatTemplate != nil { | ||
| params["chatTemplate"] = job.Spec.ChatTemplate | ||
| } | ||
|
|
||
| paramsJSON, err := json.Marshal(params) | ||
| if err != nil { | ||
| ctrl.Log.WithName("mlflow").Error(err, "failed to marshal MLFlow parameters for LMEvalJob", "jobName", job.Name) | ||
| return "" | ||
| } | ||
|
|
||
| return string(paramsJSON) | ||
| } |
There was a problem hiding this comment.
Be careful exporting potentially sensitive fields in MLFlow params JSON
buildMLFlowParams helpfully captures rich context for MLFlow (model, tasks, recipes, custom artifacts, args, flags, etc.), but note that both ModelArgs and GenArgs are included verbatim. In lm-eval, --model_args is commonly used to carry provider credentials (e.g. api_key=... for OpenAI/HF), so this will propagate those secrets into:
- the
--mlflow-params-jsonCLI argument, and - the
MLFLOW_PARAMS_JSONenvironment variable consumed by the export script (and potentially recorded as params/tags in MLFlow).
That’s a significant expansion of the exposure surface for API keys and similar secrets.
Consider at least one of:
- Explicitly filtering/redacting known sensitive keys (e.g.
api_key,openai_api_key,hf_token,password, etc.) frommodelArgs/genArgsbefore putting them intoparams. - Making inclusion of
modelArgs/genArgsin MLFlow params opt‑in via a separate flag or field. - Documenting clearly that MLFlow export will store these values so operators can avoid putting secrets into
modelArgs.
I’d treat some mitigation here as important before widespread use of the feature.
🤖 Prompt for AI Agents
controllers/lmes/lmevaljob_controller.go around lines 1428-1503:
buildMLFlowParams currently includes job.Spec.ModelArgs and job.Spec.GenArgs
verbatim which can leak secrets (API keys/passwords) into MLFlow; change it to
sanitize those maps before adding them to params by redacting known sensitive
keys (e.g. api_key, openai_api_key, hf_token, password, secret, token,
access_key, secret_key) — implement a small redactMap(map[string]any) ->
map[string]any that returns a shallow copy replacing values of matching keys
with "<REDACTED>" (case-insensitive) and use that output when setting
"modelArgs" and "genArgs"; update unit tests or add a test for redaction and
adjust any callers/log messages accordingly.
|
@ruivieira: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Given the following CR:
LMEvalJob metrics and artifacts are stored in MLFlow.
Screenshots
Requires opendatahub-io/lm-evaluation-harness#58
Summary by Sourcery
Add configurable MLFlow integration for LMEvalJob outputs and driver execution.
New Features:
Enhancements:
Summary by CodeRabbit
Release Notes
New Features
✏️ Tip: You can customize this high-level summary in your review settings.