Skip to content

feat: Add MLFlow support to LMEvalJob configuration#606

Open
ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
ruivieira:mlflow
Open

feat: Add MLFlow support to LMEvalJob configuration#606
ruivieira wants to merge 1 commit intotrustyai-explainability:mainfrom
ruivieira:mlflow

Conversation

@ruivieira
Copy link
Member

@ruivieira ruivieira commented Nov 23, 2025

  • Introduced MLFlowOutput type for specifying MLFlow tracking parameters.
  • Updated Outputs struct to include MLFlow configuration.
  • Enhanced LMEvalJob controller to handle MLFlow parameters and export results.
  • Added command-line flags for MLFlow settings in the driver.
  • Implemented MLFlow export in the driver.
  • Updated CRD with MLFlow fields and validation requirements.

Given the following CR:

apiVersion: trustyai.opendatahub.io/v1alpha1
kind: LMEvalJob
metadata:
  name: evaljob
  namespace: test
spec:
  model: local-completions
  taskList:
    taskNames:
      - arc_easy
  logSamples: true
  limit: '0.5'
  batchSize: '1'
  allowOnline: true
  allowCodeExecution: true
  modelArgs:
    - name: model
      value: tinyllama
    - name: base_url
      value: http://vllm-server:8000/v1/completions
    - name: num_concurrent
      value:  "1"
    - name: max_retries
      value:  "3"
    - name: tokenized_requests
      value: "False"
    - name: tokenizer
      value: google/flan-t5-small
  outputs:
    mlflow:
      trackingUri: http://mlflow
      experimentName: evaljob-sample-3
      export:
        - metrics
        - artifacts

LMEvalJob metrics and artifacts are stored in MLFlow.

Screenshots

image image

Requires opendatahub-io/lm-evaluation-harness#58

Summary by Sourcery

Add configurable MLFlow integration for LMEvalJob outputs and driver execution.

New Features:

  • Allow LMEvalJob resources to define MLFlow tracking configuration, including tracking URI, experiment name, run ID, and export types.
  • Enable the driver to accept MLFlow configuration via command-line flags and export evaluation results to an MLFlow tracking server.
  • Attach rich LMEvalJob context as JSON parameters when exporting runs to MLFlow.

Enhancements:

  • Ensure PVC volumes and mounts are only created when PVC-based outputs are actually configured in the LMEvalJob spec.
  • Extend the LMEvalJob Outputs API and CRD schema with MLFlow-specific fields and validation, including enum-based export types.

Summary by CodeRabbit

Release Notes

New Features

  • Added MLFlow integration to enable exporting evaluation results to MLFlow for experiment tracking and management.
  • Introduced configuration options to selectively export metrics and artifacts to MLFlow with customizable tracking URI, experiment name, and run ID settings.

✏️ Tip: You can customize this high-level summary in your review settings.

- Introduced MLFlowOutput type for specifying MLFlow tracking parameters.
- Updated Outputs struct to include MLFlow configuration.
- Enhanced LMEvalJob controller to handle MLFlow parameters and export results.
- Added command-line flags for MLFlow settings in the driver.
- Implemented MLFlow export in the driver.
- Updated CRD with MLFlow fields and validation requirements.
@ruivieira ruivieira self-assigned this Nov 23, 2025
@ruivieira ruivieira added the kind/enhancement New feature or request label Nov 23, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Nov 23, 2025

Reviewer's Guide

Adds first-class MLFlow output support to LMEvalJob by extending the CRD/Outputs schema, wiring MLFlow configuration from the CR to driver CLI flags, and having the driver invoke a Python export script to push metrics/artifacts to an MLFlow tracking server after successful job completion.

Sequence diagram for MLFlow export on LMEvalJob completion

sequenceDiagram
    actor User as "LMEvalJob author"
    participant K8sAPI as "Kubernetes API server"
    participant Controller as "LMEvalJob controller"
    participant Pod as "LMEvalJob Pod (driver)"
    participant Driver as "Driver process"
    participant Script as "mlflow_export.py"
    participant MLFlow as "MLFlow tracking server"

    User->>K8sAPI: "Apply LMEvalJob CR with outputs.mlflow configured"
    K8sAPI->>Controller: "Send LMEvalJob add/update event"
    Controller->>Controller: "Read spec.outputs.mlflow and build MLFlow params JSON"
    Controller->>Pod: "Create Pod with driver CLI flags for MLFlow (tracking URI, experiment name, run ID, export types, source info, params JSON)"

    Pod->>Driver: "Start driver with MLFlow flags parsed into DriverOption"
    Driver->>Driver: "Run evaluation and write metrics and artifacts to output directory"

    Driver->>Driver: "updateCompleteStatus() invoked after evaluation"
    Driver->>Driver: "Check for errors and presence of MLFlow configuration"
    alt "MLFlow tracking URI and export types are configured and no evaluation error"
        Driver->>Driver: "exportMLFlow() builds argument list for mlflow_export.py and environment variables"
        Driver->>Script: "Execute mlflow_export.py with output directory, tracking URI, experiment name, run ID, export types, source tags, params JSON"
        Script->>MLFlow: "Create or select experiment and run using tracking URI and experiment name"
        Script->>MLFlow: "Log evaluation metrics and upload artifacts based on export types"
        MLFlow-->>Script: "Acknowledge logged metrics and stored artifacts"
        Script-->>Driver: "Exit successfully with combined output"
        Driver->>Driver: "Log MLFlow export completed successfully"
    else "MLFlow not configured or export script fails"
        Driver->>Driver: "Skip export or log MLFlow export error without failing the job"
    end

    Driver-->>Pod: "Job completes and status includes evaluation results"
    Pod-->>Controller: "Pod status updated to completed"
    Controller-->>User: "LMEvalJob status updated and results available, metrics and artifacts visible in MLFlow UI"
Loading

Updated class diagram for MLFlow-related LMEvalJob and driver types

classDiagram
    class LMEvalJobSpec {
        +string "Model"
        +TaskList "TaskList"
        +Outputs "Outputs"
    }

    class Outputs {
        +*string "PersistentVolumeClaimName"
        +*PersistentVolumeClaimManaged "PersistentVolumeClaimManaged"
        +*MLFlowOutput "MLFlow"
        +bool "HasExistingPVC()"
        +bool "HasManagedPVC()"
        +bool "HasMLFlow()"
    }

    class PersistentVolumeClaimManaged {
        +string "Size"
    }

    class MLFlowExportType {
        <<enumeration>>
        "MLFlowMetricsExport = \"metrics\""
        "MLFlowArtifactsExport = \"artifacts\""
    }

    class MLFlowOutput {
        +string "TrackingUri"
        +*string "ExperimentName"
        +*string "RunId"
        +[]MLFlowExportType "Export"
        +bool "HasMLFlowMetrics()"
        +bool "HasMLFlowArtifacts()"
    }

    class DriverOption {
        +string "OutputPath"
        +bool "DetectDevice"
        +strArrayArg "TaskRecipes"
        +strArrayArg "CustomArtifacts"
        +strArrayArg "TaskNames"
        +bool "AllowOnline"
        +string "MLFlowTrackingUri"
        +string "MLFlowExperimentName"
        +string "MLFlowRunId"
        +[]string "MLFlowExportTypes"
        +string "MLFlowSourceName"
        +string "MLFlowSourceType"
        +string "MLFlowParamsJSON"
    }

    class driverImpl {
        +DriverOption "Option"
        +void "updateCompleteStatus(error)"
        +error "exportMLFlow()"
    }

    class LMEvalJobController {
        +string "buildMLFlowParams(LMEvalJob)"
        +[]string "generateCmd(serviceOptions, LMEvalJob, PermissionConfig)"
    }

    class LMEvalJob {
        +string "Name"
        +string "Namespace"
        +LMEvalJobSpec "Spec"
    }

    LMEvalJob --> "1" LMEvalJobSpec : "spec"
    LMEvalJobSpec --> "1" Outputs : "outputs"
    Outputs --> "0..1" MLFlowOutput : "mlflow"
    Outputs --> "0..1" PersistentVolumeClaimManaged : "pvcManaged"
    MLFlowOutput --> "*" MLFlowExportType : "export"

    LMEvalJobController --> LMEvalJob : "reads"
    LMEvalJobController --> Outputs : "checks HasMLFlow()"
    LMEvalJobController --> MLFlowOutput : "maps fields to CLI flags"
    LMEvalJobController --> DriverOption : "populates via generateCmd()"

    driverImpl --> DriverOption : "uses"
    driverImpl --> MLFlowOutput : "indirectly via driver CLI flags and export script"

    class strArrayArg {
        +[]string "Values"
        +void "Set(string)"
        +string "String()"
    }

    DriverOption --> strArrayArg : "aggregates"
Loading

File-Level Changes

Change Details Files
Wire MLFlow configuration through the driver and invoke an MLFlow export script after successful evaluation.
  • Extend DriverOption with MLFlow-related fields for tracking URI, experiment name, run ID, export types, source metadata, and JSON-encoded params.
  • Add exportMLFlow method that builds a python mlflow_export.py command with flags/env vars based on DriverOption, executes it, and logs failures without failing the job.
  • Invoke exportMLFlow from updateCompleteStatus only when the evaluation finished without error and MLFlow is configured.
controllers/lmes/driver/driver.go
Expose MLFlow settings via the driver CLI and pass them into DriverOption.
  • Add mlflow-related CLI flags (tracking URI, experiment, run ID, export type, source name/type, params JSON) plus a strArrayArg for repeated export types.
  • Populate the new MLFlow fields in DriverOption from parsed flags when constructing the driver.
cmd/lmes_driver/main.go
Add MLFlow output configuration to the LMEvalJob API/CRD and helper methods for checking MLFlow exports.
  • Introduce MLFlowExportType enum and MLFlowOutput struct with tracking URI, optional experiment/run ID, and export list, plus HasMLFlow/HasMLFlowMetrics/HasMLFlowArtifacts helpers.
  • Extend Outputs struct to include an optional MLFlow field and corresponding DeepCopy implementations.
  • Update the generated CRD schema to describe the mlflow outputs block, including validation for trackingUri and allowed export values.
api/lmes/v1alpha1/lmevaljob_types.go
api/lmes/v1alpha1/zz_generated.deepcopy.go
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml
Propagate MLFlow configuration from LMEvalJob spec into driver CLI arguments and avoid requiring PVC outputs when only MLFlow is used.
  • Tighten PVC volume/volumeMount creation so it only occurs when HasCustomOutput and either HasManagedPVC or HasExistingPVC are true, allowing MLFlow-only outputs without PVCs.
  • Add buildMLFlowParams helper to derive a JSON-encoded parameter map from the LMEvalJob spec (model, tasks, args, limits, flags, templates, etc.) for tagging MLFlow runs.
  • Extend generateCmd to recognize Outputs.MLFlow, emit corresponding --mlflow-* flags (tracking URI, experiment, run ID, export types, source name/type, params JSON), using namespace/name as source identifiers.
controllers/lmes/lmevaljob_controller.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@openshift-ci
Copy link

openshift-ci bot commented Nov 23, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 23, 2025

Walkthrough

MLFlow integration has been added to the LMEvalJob system, enabling optional metric and artifact export. Changes span type definitions, API schema, CLI flags, driver orchestration logic, and controller pod creation logic, including conditional PVC mounting and MLFlow parameter marshaling.

Changes

Cohort / File(s) Summary
Type definitions and deep copy support
api/lmes/v1alpha1/lmevaljob_types.go, api/lmes/v1alpha1/zz_generated.deepcopy.go
Added MLFlowExportType enum (metrics, artifacts) and MLFlowOutput configuration struct with fields for tracking URI, experiment name, run ID, and export list. Introduced helper methods (HasMLFlow, HasMLFlowMetrics, HasMLFlowArtifacts) and corresponding deep copy support.
Driver integration
cmd/lmes_driver/main.go, controllers/lmes/driver/driver.go
Added CLI flags for MLFlow configuration (tracking URI, experiment name, run ID, source name/type, export types, params JSON). Extended DriverOption struct with seven MLFlow-related fields and implemented exportMLFlow method to execute MLFlow exports with Python script invocation.
Controller logic
controllers/lmes/lmevaljob_controller.go
Introduced conditional PVC mounting based on outputs configuration, added buildMLFlowParams helper to marshal MLFlow parameters into JSON, and integrated MLFlow flags and parameter passing into command generation. Wired MLFlow export invocation into updateCompleteStatus completion flow.
CRD schema
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml
Added mlflow object definition under outputs with required trackingUri (HTTP(S) URL pattern), optional experimentName, runId, and export array fields.

Sequence Diagram

sequenceDiagram
    participant Controller as LMEvalJob<br/>Controller
    participant Driver as Driver<br/>Process
    participant MLFlow as MLFlow<br/>Tracking
    participant PVC as PVC<br/>Storage

    Note over Controller,PVC: Evaluation Execution
    Controller->>Driver: Execute with job config
    Driver->>PVC: Write results (if PVC configured)
    
    Note over Controller,PVC: Completion Handling
    Driver->>Driver: updateCompleteStatus()
    alt MLFlow Export Configured
        Driver->>MLFlow: exportMLFlow() invokes<br/>Python script with params
        MLFlow-->>Driver: Export metrics/artifacts
        Driver->>Driver: Log result (non-fatal if error)
    else No MLFlow
        Driver->>Driver: Skip MLFlow export
    end
    Driver-->>Controller: Job complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Areas requiring extra attention:

  • MLFlow parameter marshaling logic in buildMLFlowParams to ensure correct JSON serialization across optional fields
  • Conditional PVC mounting logic in pod creation to verify all mounting scenarios are correctly handled
  • exportMLFlow implementation in driver.go for proper environment variable setup and Python script execution
  • Integration points between controller and driver to verify MLFlow data flows correctly through the DriverOption struct
  • CRD schema validation to ensure trackingUri regex pattern is appropriate for MLFlow URIs

Poem

🐰 A dash of MLFlow, metrics take flight,
Artifacts export, oh what a sight!
PVCs conditional, smart and lean,
The finest tracking you've ever seen!
Parameters marshal in JSON dreams,
Integration streams! 📊✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely summarizes the main change: adding MLFlow support to LMEvalJob configuration, which is reflected throughout all modified files.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • In exportMLFlow, consider using exec.CommandContext with the driver context instead of exec.Command so the MLFlow export process is terminated if the job context is cancelled or times out.
  • The MLFlow export currently no-ops when trackingUri is set but export types are empty, even though the CRD allows export to be omitted; consider either providing a sensible default (e.g., metrics+artifacts) or logging a warning so users understand why nothing was exported.
  • buildMLFlowParams dumps many CR fields directly into MLFlow params (including modelArgs/genArgs); it may be worth filtering or redacting known sensitive keys (e.g., API keys, tokens) before marshaling to JSON and sending to MLFlow.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In exportMLFlow, consider using exec.CommandContext with the driver context instead of exec.Command so the MLFlow export process is terminated if the job context is cancelled or times out.
- The MLFlow export currently no-ops when trackingUri is set but export types are empty, even though the CRD allows export to be omitted; consider either providing a sensible default (e.g., metrics+artifacts) or logging a warning so users understand why nothing was exported.
- buildMLFlowParams dumps many CR fields directly into MLFlow params (including modelArgs/genArgs); it may be worth filtering or redacting known sensitive keys (e.g., API keys, tokens) before marshaling to JSON and sending to MLFlow.

## Individual Comments

### Comment 1
<location> `controllers/lmes/driver/driver.go:778-779` </location>
<code_context>
+		args = append(args, "--params-json", d.Option.MLFlowParamsJSON)
+	}
+
+	// Execute MLFlow export script
+	cmd := exec.Command("python", args...)
+
+	// Set environment variables for the script
</code_context>

<issue_to_address>
**issue (bug_risk):** Running the MLFlow export script without a timeout can cause hangs.

Because exec.Command isn’t tied to a context, a stuck mlflow_export.py (e.g., due to tracking server issues) could block updateCompleteStatus indefinitely and delay job completion. Consider using exec.CommandContext with d.Option.Context (or a derived context with a timeout) so the export can be cancelled cleanly and doesn’t block the driver.
</issue_to_address>

### Comment 2
<location> `controllers/lmes/lmevaljob_controller.go:1428-1437` </location>
<code_context>
+func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
</code_context>

<issue_to_address>
**🚨 suggestion (security):** MLFlow params payload may become large or include sensitive fields; consider limiting or filtering.

Since buildMLFlowParams serializes multiple structs (ModelArgs, GenArgs, CustomArtifacts, etc.) into one JSON value for --mlflow-params-json, this may become quite large and/or contain secrets (tokens, internal URLs). Please consider restricting which fields are included, redacting known sensitive keys, and/or enforcing a size limit to avoid leaking sensitive data or hitting OS argument/env size limits.

Suggested implementation:

```golang
const mlflowParamsMaxBytes = 8 * 1024

func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
	if job == nil {
		return ""
	}

	params := map[string]any{
		"crName":    job.Name,
		"namespace": job.Namespace,
		"model":     job.Spec.Model,
		"tasks":     concatTasks(job.Spec.TaskList),
	}

	sanitized := sanitizeMLFlowParams(params)

	data, err := json.Marshal(sanitized)
	if err != nil {
		// In case of unexpected serialization issues, avoid passing partial or unsafe data.
		return ""
	}

	if len(data) > mlflowParamsMaxBytes {
		// If the payload is too large, fall back to a minimal, non-sensitive payload
		// to avoid hitting OS argument/env limits.
		truncated := map[string]any{
			"crName":                job.Name,
			"namespace":             job.Namespace,
			"model":                 job.Spec.Model,
			"tasks":                 concatTasks(job.Spec.TaskList),
			"truncated":             true,
			"truncation_reason":     "mlflow params exceeded maximum allowed size",
			"original_size_bytes":   len(data),
			"max_size_bytes":        mlflowParamsMaxBytes,
		}

		data, err = json.Marshal(truncated)
		if err != nil {
			return ""
		}
	}

	return string(data)
}

// sanitizeMLFlowParams walks the provided value and redacts fields whose keys
// are known or likely to contain sensitive information (tokens, secrets, etc.).
func sanitizeMLFlowParams(v any) any {
	switch typed := v.(type) {
	case map[string]any:
		out := make(map[string]any, len(typed))
		for k, val := range typed {
			if isSensitiveKey(k) {
				out[k] = "***REDACTED***"
				continue
			}
			out[k] = sanitizeMLFlowParams(val)
		}
		return out
	case []any:
		out := make([]any, len(typed))
		for i, val := range typed {
			out[i] = sanitizeMLFlowParams(val)
		}
		return out
	default:
		// For other scalar types, return as-is.
		return v
	}
}

func isSensitiveKey(key string) bool {
	k := strings.ToLower(key)

	// Exact matches for common sensitive keys.
	switch k {
	case "token", "access_token", "refresh_token", "api_key", "apikey", "password", "secret", "client_secret", "auth_token", "authorization":
		return true
	}

	// Heuristic: redact fields containing these substrings.
	sensitiveSubstrings := []string{
		"token",
		"secret",
		"password",
		"passwd",
		"api_key",
		"apikey",
		"credential",
		"auth",
	}

	for _, sub := range sensitiveSubstrings {
		if strings.Contains(k, sub) {
			return true
		}
	}

	return false
}

```

1. Ensure the following imports are present at the top of controllers/lmes/lmevaljob_controller.go:
   - "encoding/json"
   - "strings"

   For example, inside the existing import block:
   - Add: encoding/json
   - Add: strings

2. If there is already a constant or configuration system for limits, you may want to:
   - Replace the hard-coded mlflowParamsMaxBytes (8 * 1024) with a value derived from configuration or a shared constant to align with project conventions.

3. If other parts of the code build the params map with additional nested structs (ModelArgs, GenArgs, CustomArtifacts, etc.):
   - Make sure they are converted to map[string]any / []any or JSON-serializable types before being passed into sanitizeMLFlowParams so that the redaction logic can traverse them correctly.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +778 to +779
// Execute MLFlow export script
cmd := exec.Command("python", args...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Running the MLFlow export script without a timeout can cause hangs.

Because exec.Command isn’t tied to a context, a stuck mlflow_export.py (e.g., due to tracking server issues) could block updateCompleteStatus indefinitely and delay job completion. Consider using exec.CommandContext with d.Option.Context (or a derived context with a timeout) so the export can be cancelled cleanly and doesn’t block the driver.

Comment on lines +1428 to +1437
func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
if job == nil {
return ""
}

params := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚨 suggestion (security): MLFlow params payload may become large or include sensitive fields; consider limiting or filtering.

Since buildMLFlowParams serializes multiple structs (ModelArgs, GenArgs, CustomArtifacts, etc.) into one JSON value for --mlflow-params-json, this may become quite large and/or contain secrets (tokens, internal URLs). Please consider restricting which fields are included, redacting known sensitive keys, and/or enforcing a size limit to avoid leaking sensitive data or hitting OS argument/env size limits.

Suggested implementation:

const mlflowParamsMaxBytes = 8 * 1024

func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
	if job == nil {
		return ""
	}

	params := map[string]any{
		"crName":    job.Name,
		"namespace": job.Namespace,
		"model":     job.Spec.Model,
		"tasks":     concatTasks(job.Spec.TaskList),
	}

	sanitized := sanitizeMLFlowParams(params)

	data, err := json.Marshal(sanitized)
	if err != nil {
		// In case of unexpected serialization issues, avoid passing partial or unsafe data.
		return ""
	}

	if len(data) > mlflowParamsMaxBytes {
		// If the payload is too large, fall back to a minimal, non-sensitive payload
		// to avoid hitting OS argument/env limits.
		truncated := map[string]any{
			"crName":                job.Name,
			"namespace":             job.Namespace,
			"model":                 job.Spec.Model,
			"tasks":                 concatTasks(job.Spec.TaskList),
			"truncated":             true,
			"truncation_reason":     "mlflow params exceeded maximum allowed size",
			"original_size_bytes":   len(data),
			"max_size_bytes":        mlflowParamsMaxBytes,
		}

		data, err = json.Marshal(truncated)
		if err != nil {
			return ""
		}
	}

	return string(data)
}

// sanitizeMLFlowParams walks the provided value and redacts fields whose keys
// are known or likely to contain sensitive information (tokens, secrets, etc.).
func sanitizeMLFlowParams(v any) any {
	switch typed := v.(type) {
	case map[string]any:
		out := make(map[string]any, len(typed))
		for k, val := range typed {
			if isSensitiveKey(k) {
				out[k] = "***REDACTED***"
				continue
			}
			out[k] = sanitizeMLFlowParams(val)
		}
		return out
	case []any:
		out := make([]any, len(typed))
		for i, val := range typed {
			out[i] = sanitizeMLFlowParams(val)
		}
		return out
	default:
		// For other scalar types, return as-is.
		return v
	}
}

func isSensitiveKey(key string) bool {
	k := strings.ToLower(key)

	// Exact matches for common sensitive keys.
	switch k {
	case "token", "access_token", "refresh_token", "api_key", "apikey", "password", "secret", "client_secret", "auth_token", "authorization":
		return true
	}

	// Heuristic: redact fields containing these substrings.
	sensitiveSubstrings := []string{
		"token",
		"secret",
		"password",
		"passwd",
		"api_key",
		"apikey",
		"credential",
		"auth",
	}

	for _, sub := range sensitiveSubstrings {
		if strings.Contains(k, sub) {
			return true
		}
	}

	return false
}
  1. Ensure the following imports are present at the top of controllers/lmes/lmevaljob_controller.go:

    • "encoding/json"
    • "strings"

    For example, inside the existing import block:

    • Add: encoding/json
    • Add: strings
  2. If there is already a constant or configuration system for limits, you may want to:

    • Replace the hard-coded mlflowParamsMaxBytes (8 * 1024) with a value derived from configuration or a shared constant to align with project conventions.
  3. If other parts of the code build the params map with additional nested structs (ModelArgs, GenArgs, CustomArtifacts, etc.):

    • Make sure they are converted to map[string]any / []any or JSON-serializable types before being passed into sanitizeMLFlowParams so that the redaction logic can traverse them correctly.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (1)

279-305: Tighten mlflow schema: dedupe export, require at least one item, basic string guards, and lock object shape.

Adds safer validation without changing behavior. Defaults can mirror the example (metrics+artifacts).

                   mlflow:
                     description: Export results to MLFlow tracking server
                     properties:
                       experimentName:
                         description: ExperimentName is the name of the MLFlow experiment
                         type: string
+                        minLength: 1
                       export:
                         description: Export defines what to export to MLFlow (metrics,
                           artifacts, or both)
                         items:
                           description: MLFlowExportType defines what to export to
                             MLFlow
                           enum:
                           - metrics
                           - artifacts
                           type: string
-                        type: array
+                        type: array
+                        minItems: 1
+                        uniqueItems: true
+                        x-kubernetes-list-type: set
+                        default:
+                        - metrics
+                        - artifacts
                       runId:
                         description: RunId is the specific MLFlow run ID to use (optional)
                         type: string
+                        minLength: 1
                       trackingUri:
                         description: TrackingUri is the MLFlow tracking server URI
+                        format: uri
                         pattern: ^https?://[a-zA-Z0-9.-]+(:[0-9]+)?(/.*)?$
                         type: string
                     required:
                     - trackingUri
-                    type: object
+                    type: object
+                    additionalProperties: false

Optional note: if you plan to support non-HTTP backends (e.g., file:// or IPv6 hosts), we can relax the trackingUri pattern later.

api/lmes/v1alpha1/lmevaljob_types.go (1)

396-405: MLFlow API surface is coherent; consider clarifying/strengthening Export semantics

The MLFlow additions (MLFlowExportType, MLFlowOutput, Outputs.MLFlow, and the HasMLFlow* helpers) are structurally sound and integrate cleanly with the rest of the spec and controller helpers.

One behavioral nuance to double‑check: if a user sets trackingUri but leaves export empty, HasMLFlow will be true but both HasMLFlowMetrics and HasMLFlowArtifacts return false, and the driver will receive no --mlflow-export-type flags. This effectively disables export while still requiring a valid trackingUri. If that’s unintended, you might:

  • enforce export as required when mlflow is present (CRD validation / webhook), or
  • treat an empty export as “export both metrics and artifacts” (e.g., fill defaults before wiring down to the driver).

Otherwise, the design looks good.

Also applies to: 407-421, 423-433, 637-666

controllers/lmes/driver/driver.go (1)

731-820: exportMLFlow implementation is straightforward; consider minor robustness tweaks

The exportMLFlow method correctly:

  • short‑circuits when tracking URI is empty or no export types are provided,
  • passes all MLFlow options via both CLI args and environment,
  • uses exec.Command("python", args...) (no shell), avoiding injection, and
  • logs combined output on success or failure.

Two small robustness ideas you might consider (non‑blocking):

  • Guard against the export script not being present or executable by enriching the error message with a hint (path and maybe os.IsNotExist check) so operators can diagnose image issues quickly.
  • If MLFlowExportTypes could ever be user‑provided outside the CRD path, you might validate values (e.g. restrict to metrics/artifacts) before passing them through.

Otherwise this looks good.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b3ba151 and 725ecc2.

📒 Files selected for processing (6)
  • api/lmes/v1alpha1/lmevaljob_types.go (2 hunks)
  • api/lmes/v1alpha1/zz_generated.deepcopy.go (2 hunks)
  • cmd/lmes_driver/main.go (2 hunks)
  • config/crd/bases/trustyai.opendatahub.io_lmevaljobs.yaml (1 hunks)
  • controllers/lmes/driver/driver.go (3 hunks)
  • controllers/lmes/lmevaljob_controller.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
controllers/lmes/lmevaljob_controller.go (4)
api/lmes/v1alpha1/lmevaljob_types.go (7)
  • Outputs (423-433)
  • LMEvalJob (704-710)
  • TaskList (244-254)
  • CustomArtifacts (140-153)
  • CustomTasks (239-242)
  • GitSource (216-230)
  • ChatTemplate (508-511)
controllers/job_mgr/job_mgr_controller.go (1)
  • LMEvalJob (46-48)
controllers/lmes/constants.go (2)
  • AllowOnline (37-37)
  • AllowCodeExecution (38-38)
api/lmes/v1alpha1/groupversion_info.go (1)
  • KindName (30-30)
cmd/lmes_driver/main.go (3)
controllers/lmes/constants.go (2)
  • OutputPath (28-28)
  • AllowOnline (37-37)
controllers/lmes/driver/driver.go (1)
  • DefaultPort (48-48)
api/lmes/v1alpha1/lmevaljob_types.go (1)
  • CustomArtifacts (140-153)
api/lmes/v1alpha1/zz_generated.deepcopy.go (1)
api/lmes/v1alpha1/lmevaljob_types.go (2)
  • MLFlowOutput (408-421)
  • MLFlowExportType (398-398)
controllers/lmes/driver/driver.go (2)
cmd/lmes_driver/main.go (1)
  • OutputPath (37-37)
controllers/lmes/constants.go (1)
  • OutputPath (28-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Sourcery review
  • GitHub Check: deploy
  • GitHub Check: build
🔇 Additional comments (5)
api/lmes/v1alpha1/zz_generated.deepcopy.go (1)

421-449: MLFlow DeepCopy implementation looks correct and consistent

The new MLFlowOutput.DeepCopyInto/DeepCopy and the Outputs.MLFlow deep copy block follow the same patterns as other types in this file and avoid aliasing pointer/slice fields. No issues found.

Also applies to: 538-555

controllers/lmes/lmevaljob_controller.go (2)

912-920: PVC volume/volumeMount gating now matches outputs configuration

Conditionally adding the outputs volume and mount only when Outputs exists and a managed or existing PVC is configured is consistent with the reconciliation logic and avoids mounting unused PVCs. No issues here.

Also applies to: 930-950


1580-1603: MLFlow CLI wiring from Outputs to driver looks correct

The MLFlow block in generateCmd correctly:

  • checks Outputs != nil and HasMLFlow() before accessing the pointer,
  • passes tracking URI, optional experiment/run IDs, and each export type as repeatable --mlflow-export-type flags,
  • sets --mlflow-source-name to <namespace>/<name> and --mlflow-source-type to KindName, and
  • appends --mlflow-params-json only when JSON construction succeeds.

This aligns with the driver’s DriverOption fields and the CLI flags, and it gracefully degrades if param marshaling fails.

controllers/lmes/driver/driver.go (1)

76-84: Completion hook for MLFlow export is well‑placed and non‑disruptive

Wiring MLFlow export from updateCompleteStatus only when:

  • the run itself had no error, and
  • result collection succeeded,

and then treating MLFlow failures as non‑fatal (log only, keep job as succeeded) is a sensible contract for an optional integration. The new DriverOption MLFlow fields are also straightforward and match how the controller/CLI populate them.

Also applies to: 445-472

cmd/lmes_driver/main.go (1)

52-83: CLI MLFlow flags and wiring into DriverOption look consistent

The new mlflow-* flags and mlflowExportTypes aggregation integrate cleanly:

  • strArrayArg is an appropriate choice for repeatable --mlflow-export-type flags,
  • all MLFlow options are passed through to DriverOption and then to the driver, and
  • existing flows (copy, get-status, shutdown) are unaffected.

No issues spotted on the CLI side.

Also applies to: 130-153

Comment on lines +1428 to +1503
func buildMLFlowParams(job *lmesv1alpha1.LMEvalJob) string {
if job == nil {
return ""
}

params := map[string]any{
"crName": job.Name,
"namespace": job.Namespace,
"model": job.Spec.Model,
"tasks": concatTasks(job.Spec.TaskList),
}

if len(job.Spec.TaskList.TaskNames) > 0 {
params["taskNames"] = job.Spec.TaskList.TaskNames
}

if len(job.Spec.TaskList.TaskRecipes) > 0 {
params["taskRecipes"] = job.Spec.TaskList.TaskRecipes
}

if job.Spec.TaskList.CustomArtifacts != nil {
params["customArtifacts"] = job.Spec.TaskList.CustomArtifacts
}

if job.Spec.TaskList.HasCustomTasksWithGit() {
params["customTasksGit"] = job.Spec.TaskList.CustomTasks.Source.GitSource
}

if len(job.Spec.ModelArgs) > 0 {
params["modelArgs"] = job.Spec.ModelArgs
}

if len(job.Spec.GenArgs) > 0 {
params["genArgs"] = job.Spec.GenArgs
}

if job.Spec.NumFewShot != nil {
params["numFewShot"] = job.Spec.NumFewShot
}

if job.Spec.Limit != "" {
params["limit"] = job.Spec.Limit
}

if job.Spec.LogSamples != nil {
params["logSamples"] = job.Spec.LogSamples
}

if job.Spec.BatchSize != nil {
params["batchSize"] = job.Spec.BatchSize
}

if job.Spec.AllowOnline != nil {
params["allowOnline"] = job.Spec.AllowOnline
}

if job.Spec.AllowCodeExecution != nil {
params["allowCodeExecution"] = job.Spec.AllowCodeExecution
}

if job.Spec.SystemInstruction != "" {
params["systemInstruction"] = job.Spec.SystemInstruction
}

if job.Spec.ChatTemplate != nil {
params["chatTemplate"] = job.Spec.ChatTemplate
}

paramsJSON, err := json.Marshal(params)
if err != nil {
ctrl.Log.WithName("mlflow").Error(err, "failed to marshal MLFlow parameters for LMEvalJob", "jobName", job.Name)
return ""
}

return string(paramsJSON)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Be careful exporting potentially sensitive fields in MLFlow params JSON

buildMLFlowParams helpfully captures rich context for MLFlow (model, tasks, recipes, custom artifacts, args, flags, etc.), but note that both ModelArgs and GenArgs are included verbatim. In lm-eval, --model_args is commonly used to carry provider credentials (e.g. api_key=... for OpenAI/HF), so this will propagate those secrets into:

  • the --mlflow-params-json CLI argument, and
  • the MLFLOW_PARAMS_JSON environment variable consumed by the export script (and potentially recorded as params/tags in MLFlow).

That’s a significant expansion of the exposure surface for API keys and similar secrets.

Consider at least one of:

  • Explicitly filtering/redacting known sensitive keys (e.g. api_key, openai_api_key, hf_token, password, etc.) from modelArgs/genArgs before putting them into params.
  • Making inclusion of modelArgs/genArgs in MLFlow params opt‑in via a separate flag or field.
  • Documenting clearly that MLFlow export will store these values so operators can avoid putting secrets into modelArgs.

I’d treat some mitigation here as important before widespread use of the feature.

🤖 Prompt for AI Agents
controllers/lmes/lmevaljob_controller.go around lines 1428-1503:
buildMLFlowParams currently includes job.Spec.ModelArgs and job.Spec.GenArgs
verbatim which can leak secrets (API keys/passwords) into MLFlow; change it to
sanitize those maps before adding them to params by redacting known sensitive
keys (e.g. api_key, openai_api_key, hf_token, password, secret, token,
access_key, secret_key) — implement a small redactMap(map[string]any) ->
map[string]any that returns a shallow copy replacing values of matching keys
with "<REDACTED>" (case-insensitive) and use that output when setting
"modelArgs" and "genArgs"; update unit tests or add a test for redaction and
adjust any callers/log messages accordingly.

@openshift-ci
Copy link

openshift-ci bot commented Nov 24, 2025

@ruivieira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 725ecc2 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments