Skip to content

Conversation

@ruivieira
Copy link
Member

@ruivieira ruivieira commented Jul 30, 2025

Adds support for configuring LMEval online mode and code execution permissions through DataScienceCluster ConfigMaps.

See RHOAIENG-30963.

Depends (functionally) on opendatahub-io/opendatahub-operator#2225, but can be merged without it.

Summary by Sourcery

Add support for configuring LMEval online mode and code execution permissions via a DataScienceCluster ConfigMap and update pod creation logic to respect these overrides.

New Features:

  • Introduce DataScienceCluster ConfigMap constants and a getDSCLMEvalSettings method to parse allowOnline and allowCodeExecution flags.
  • Enhance CreatePod and generateCmd to consult DSC ConfigMap settings, giving them precedence over operator defaults for LMEval online mode and code execution.
  • Allow CreatePod to accept a nil reconciler for contexts outside the controller and update job manager to use this signature.

Tests:

  • Add unit tests for getDSCLMEvalSettings to validate parsing of DSC ConfigMap values.
  • Extend CreatePod tests to verify behavior when DSC settings override operator and job spec permissions.

@ruivieira ruivieira self-assigned this Jul 30, 2025
@ruivieira ruivieira added kind/enhancement New feature or request lm-eval Issues related to LM-Eval labels Jul 30, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Jul 30, 2025

Reviewer's Guide

This PR introduces support for overriding LMEval’s online mode and code execution permissions through a DataScienceCluster ConfigMap. It adds constants and a new helper to read the ConfigMap, refactors CreatePod and generateCmd signatures to accept reconciler context and DSC flags, and applies a precedence logic (ConfigMap → operator defaults → job spec) to determine and log the effective permissions when constructing pods.

Sequence diagram for pod creation with DSC ConfigMap precedence

sequenceDiagram
    participant Reconciler as LMEvalJobReconciler
    participant ConfigMap as DataScienceCluster ConfigMap
    participant Operator as Operator Defaults
    participant Job as LMEvalJob.Spec
    participant Pod as Pod
    Reconciler->>ConfigMap: getDSCLMEvalSettings()
    alt ConfigMap found and valid
        ConfigMap-->>Reconciler: allowOnline, allowCodeExecution
    else ConfigMap not found/invalid
        ConfigMap-->>Reconciler: nil, nil
    end
    Reconciler->>Operator: get AllowOnline, AllowCodeExecution
    Reconciler->>Job: get AllowOnline, AllowCodeExecution
    Note right of Reconciler: Precedence: ConfigMap > Operator > Job.Spec
    Reconciler->>Pod: CreatePod(..., effective permissions)
Loading

Class diagram for LMEvalJobReconciler and ConfigMap integration

classDiagram
    class LMEvalJobReconciler {
        +getDSCLMEvalSettings(ctx, log) (*bool, *bool, error)
    }
    class serviceOptions {
        +AllowOnline: bool
        +AllowCodeExecution: bool
        +PodImage: string
        +ImagePullPolicy: string
        +DriverPort: int
    }
    class LMEvalJob {
        +Spec: LMEvalJobSpec
        +Status: LMEvalJobStatus
    }
    class LMEvalJobSpec {
        +AllowOnline: *bool
        +AllowCodeExecution: *bool
        +Pod: PodSpec
    }
    class corev1.ConfigMap {
        +Data: map[string]string
    }
    LMEvalJobReconciler --|> serviceOptions : uses
    LMEvalJobReconciler --|> LMEvalJob : manages
    LMEvalJobReconciler --> corev1.ConfigMap : reads
    LMEvalJob --> LMEvalJobSpec : has
    LMEvalJobSpec --> PodSpec : has
Loading

Flow diagram for permission resolution logic

flowchart TD
    A[Start Pod Creation] --> B{DSC ConfigMap exists?}
    B -- Yes --> C{Valid allowOnline/allowCodeExecution?}
    C -- Yes --> D[Use ConfigMap values]
    C -- No --> E[Use operator defaults]
    B -- No --> E
    D --> F{Job.Spec overrides?}
    E --> F
    F -- Yes --> G[Apply Job.Spec if allowed by effective permissions]
    F -- No --> H[Use effective permissions]
    G --> I[Create Pod]
    H --> I
Loading

File-Level Changes

Change Details Files
Introduce DataScienceCluster ConfigMap parsing for LMEval settings
  • Define DSCConfigMapName and key constants
  • Implement getDSCLMEvalSettings to fetch and parse allowOnline and allowCodeExecution
  • Add unit tests covering ConfigMap presence, absence, invalid values, and CreatePod behavior
constants.go
lmevaljob_controller.go
dsc_config_test.go
Refactor CreatePod and generateCmd signatures for DSC integration
  • Extend CreatePod to take *LMEvalJobReconciler
  • Add dscAllowOnline parameter to generateCmd
  • Update all CreatePod and generateCmd calls in controllers, job_mgr, and tests to pass reconciler or nil
lmevaljob_controller.go
job_mgr_controller.go
lmevaljob_controller_test.go
Apply permission precedence logic in CreatePod for online mode and code execution
  • Retrieve DSC settings when reconciler available
  • Determine codeExecutionAllowed and onlineModeAllowed based on ConfigMap vs operator config
  • Adjust envVars and command flags accordingly, with logs indicating the chosen source
lmevaljob_controller.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@ruivieira ruivieira moved this to In Review in TrustyAI planning Jul 30, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @ruivieira - I've reviewed your changes - here's some feedback:

  • getDSCLMEvalSettings currently ignores the "opendatahub.io/config-source" annotation and will parse any ConfigMap with the DSC name—add an annotation check so unannotated maps are skipped as your tests expect.
  • CreatePod calls getDSCLMEvalSettings on every invocation, causing repeated API requests—consider fetching and caching the DSC settings once per reconciliation and passing them in.
  • The permission evaluation for online mode and code execution is embedded deeply in CreatePod, making its signature bulkier—extract that logic into a helper or serviceOptions extension to simplify CreatePod.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- getDSCLMEvalSettings currently ignores the "opendatahub.io/config-source" annotation and will parse any ConfigMap with the DSC name—add an annotation check so unannotated maps are skipped as your tests expect.
- CreatePod calls getDSCLMEvalSettings on every invocation, causing repeated API requests—consider fetching and caching the DSC settings once per reconciliation and passing them in.
- The permission evaluation for online mode and code execution is embedded deeply in CreatePod, making its signature bulkier—extract that logic into a helper or serviceOptions extension to simplify CreatePod.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@ruivieira ruivieira marked this pull request as draft July 30, 2025 10:49
@ruivieira ruivieira moved this from In Review to In Progress in TrustyAI planning Jul 30, 2025
@ruivieira
Copy link
Member Author

ruivieira commented Jul 30, 2025

@sourcery-ai Your review does not appear to be correct.
The DSC ConfigMap overrides the TrustyAI operator's default configuration, but these set the permissions for those settings. The user's LMEval settings define which mode will actually be used based on the previous permissions.
As such, "Precedence: ConfigMap > Operator > Job.Spec" does not make sense.

@ruivieira ruivieira marked this pull request as ready for review July 30, 2025 10:52
@github-actions
Copy link

github-actions bot commented Jul 30, 2025

PR image build and manifest generation completed successfully!

📦 PR image: quay.io/trustyai/trustyai-service-operator:701b1c7e2acf67fc96cd5a0061dd57ce36b414e5

📦 LMES driver image: quay.io/trustyai/ta-lmes-driver:latest

📦 LMES job image: quay.io/trustyai/ta-lmes-job:latest

📦 Guardrails orchestrator image: quay.io/trustyai/ta-guardrails-orchestrator:latest

🗂️ CI manifests

      devFlags:
        manifests:
          - contextDir: config
            sourcePath: ''
            uri: https://api.github.com/repos/trustyai-explainability/trustyai-service-operator-ci/tarball/operator-701b1c7e2acf67fc96cd5a0061dd57ce36b414e5

@openshift-ci
Copy link

openshift-ci bot commented Jul 30, 2025

@ruivieira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 701b1c7 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

DefaultBatchSize = "1"
DefaultDetectDevice = true
ServiceName = "LMES"
// DataSienceCluster ConfigMap constants
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo here

@openshift-ci
Copy link

openshift-ci bot commented Sep 18, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: sheltoncyril

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-robot
Copy link
Collaborator

PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/enhancement New feature or request lgtm lm-eval Issues related to LM-Eval needs-rebase ok-to-test

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants