Skip to content

feat: Add MLFlow integration#646

Merged
ruivieira merged 19 commits intotrustyai-explainability:mainfrom
ruivieira:evalhub-mlflow
Feb 16, 2026
Merged

feat: Add MLFlow integration#646
ruivieira merged 19 commits intotrustyai-explainability:mainfrom
ruivieira:evalhub-mlflow

Conversation

@ruivieira
Copy link
Member

@ruivieira ruivieira commented Feb 14, 2026

Summary by Sourcery

Integrate EvalHub with MLFlow by configuring RBAC, environment variables, and pod volumes required for Kubernetes-authenticated access.

New Features:

  • Grant EvalHub service accounts MLFlow access via namespace-scoped RoleBindings to the built-in edit ClusterRole.
  • Expose MLFlow-related configuration to the EvalHub deployment via environment variables for CA cert path, workspace, and projected token path.
  • Mount service CA and a projected MLFlow service account token into the EvalHub pod to support secure communication with MLFlow.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added MLFlow integration support with automatic token and certificate management.
    • Enabled service certificate authority (CA) injection for secure communication.
  • Chores

    • Restructured role-based access control (RBAC) permissions for improved security and granularity.
    • Enhanced service account configuration for better resource management and isolation.

@ruivieira ruivieira self-assigned this Feb 14, 2026
@ruivieira ruivieira added the kind/enhancement New feature or request label Feb 14, 2026
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 14, 2026

Reviewer's Guide

Adds MLFlow integration by granting EvalHub service accounts the necessary Kubernetes RBAC via RoleBindings to the built-in "edit" ClusterRole, and wiring MLFlow-related environment variables and projected service account token/CA volumes into the EvalHub deployment.

Sequence diagram for MLFlow kubernetes-auth SubjectAccessReview with new RoleBinding

sequenceDiagram
    actor User
    participant EvalHubPod
    participant MlflowServer
    participant KubeApi
    participant RbacAuthz

    User->>EvalHubPod: Invoke MLFlow-tracked operation
    EvalHubPod->>EvalHubPod: Read MLFLOW_TOKEN_PATH
    EvalHubPod->>MlflowServer: HTTP request with SA token
    MlflowServer->>KubeApi: SubjectAccessReview for token in workspace namespace
    KubeApi->>RbacAuthz: Evaluate permissions using RoleBindings and ClusterRoles
    RbacAuthz->>RbacAuthz: Find RoleBinding evalhub-mlflow-proxy/jobs
    RbacAuthz->>RbacAuthz: Resolve ClusterRole edit
    RbacAuthz-->>KubeApi: Allow if edit grants requested verbs
    KubeApi-->>MlflowServer: SubjectAccessReview allowed
    MlflowServer-->>EvalHubPod: Request succeeds
    EvalHubPod-->>User: Operation completed
Loading

Flow diagram for createServiceAccount with MLFlow RoleBindings

graph TD
    Start["Start createServiceAccount"] --> CreateSA["Create main ServiceAccount"]
    CreateSA --> CreateJobsSA["Create jobs ServiceAccount"]
    CreateJobsSA --> RBProxy["Create evalhub proxy RoleBinding (existing)"]
    RBProxy --> RBJobsProxy["Create jobs proxy RoleBinding (existing)"]
    RBJobsProxy --> RBMlflowProxy["createMLFlowAccessRoleBinding for main SA (suffix proxy)"]
    RBMlflowProxy --> RBMlflowJobs["createMLFlowAccessRoleBinding for jobs SA (suffix jobs)"]
    RBMlflowJobs --> End["Done"]
Loading

File-Level Changes

Change Details Files
Grant MLFlow access permissions to EvalHub service accounts via RoleBindings to the built-in "edit" ClusterRole.
  • Invoke a new helper to create MLFlow access RoleBindings for both the main and jobs service accounts during service account creation.
  • Introduce a constant for the MLFlow access ClusterRole and implement helper logic to create or reconcile RoleBindings, including owner references, logging, and subject/RoleRef update handling.
controllers/evalhub/service_accounts.go
Wire MLFlow authentication and TLS configuration into the EvalHub deployment via env vars and volumes.
  • Add MLFlow-related environment variables (CA cert path, workspace namespace, token path) to the EvalHub container spec.
  • Mount a ConfigMap-based volume for the service CA and a projected volume exposing a service account token for MLFlow, and attach corresponding volume mounts to the container.
  • Define constants for service CA and MLFlow token volume names, paths, filenames, and token expiration to standardize configuration.
controllers/evalhub/deployment.go
controllers/evalhub/constants.go

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 14, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

This PR refactors EvalHub RBAC from proxy-centric to API and MLFlow-centric authorization with per-instance scoping. It introduces Service CA and MLFlow token handling in deployments, replaces monolithic resource-manager roles with granular least-privilege roles, and updates tests and manifests accordingly.

Changes

Cohort / File(s) Summary
Constants & Configuration
controllers/evalhub/constants.go, controllers/evalhub/configmap.go
Added Service CA and MLFlow token configuration constants; enhanced proxy config with "name" field in resourceAttributes for per-instance identity.
Deployment Configuration
controllers/evalhub/deployment.go, controllers/evalhub/deployment_test.go
Added MLFLOW_CA_CERT_PATH, MLFLOW_WORKSPACE, MLFLOW_TOKEN_PATH environment variables; injected service-ca and mlflow-token volume mounts; added corresponding volumes (ConfigMap-based and projected ServiceAccountToken). Test expectations updated for increased volume and mount counts; service account naming changed from -proxy to -api suffix.
RBAC & Service Account Reconciliation
controllers/evalhub/service_accounts.go, controllers/evalhub/evalhub_controller.go
Major refactor: renamed proxy-related service account generation to API-oriented (-api suffix); introduced per-instance Roles for API access and jobs API access; added MLFlow access RoleBindings for both API and jobs service accounts; replaced monolithic resource-manager RBAC with granular role-binding creation; implemented equality utilities for RoleBinding/ClusterRoleBinding comparison and idempotent updates; simplified cleanup to single auth-reviewer ClusterRoleBinding deletion.
RBAC Manifest Reorganization
config/rbac/kustomization.yaml, config/rbac/role.yaml
Replaced four legacy RBAC entries (evalhub_proxy_role, evalhub_jobs_proxy_role, evalhub_resource_manager_role, evalhub_resource_manager_binding) with nine new granular entries under evalhub/ path; added "roles" resource permission under rbac.authorization.k8s.io; removed "update" verb from rolebindings.
New Granular RBAC Roles
config/rbac/evalhub/*_role.yaml
Added six new ClusterRoles: evalhub-auth-reviewer (authentication reviews), evalhub-jobs-writer (batch/jobs CRUD), evalhub-job-config (configmaps CRUD), evalhub-mlflow-access (mlflow experiments full CRUD), evalhub-mlflow-jobs-access (mlflow experiments read-only for jobs), evalhub-service-proxy (services/proxy access). Each defines least-privilege permissions for specific responsibilities.
New RBAC Bindings
config/rbac/evalhub/*_binding.yaml
Added five new ClusterRoleBindings: evalhub-jobs-writer-binding, evalhub-mlflow-access-binding, evalhub-mlflow-jobs-binding, evalhub-service-proxy-binding. These bind ClusterRoles to the controller-manager service account to enable operator-created namespace-scoped RoleBindings.
Deleted Legacy RBAC Manifests
config/rbac/evalhub_jobs_proxy_role.yaml, config/rbac/evalhub_resource_manager_role.yaml
Removed obsolete proxy and resource-manager ClusterRole definitions; functionality split across new granular roles.
Updated Legacy Auth-Reviewer Role
config/rbac/evalhub/evalhub_auth_reviewer_role.yaml
Removed trustyai.opendatahub.io rules (evalhubs/proxy); retained authentication.k8s.io rule for subjectaccessreviews; rebranded metadata from evalhub-proxy-role to evalhub-auth-reviewer-role.
Updated Job Config Binding
config/rbac/evalhub/evalhub_job_config_binding.yaml
Changed binding from evalhub-resource-manager-binding to evalhub-job-config-binding; updated ClusterRole reference from evalhub-resource-manager to evalhub-job-config.
Unit Tests
controllers/evalhub/unit_test.go
Renamed proxy-related RoleBinding tests to API access semantics; added tests for per-instance API access Role and RoleBinding creation; updated MLFlow access RoleBinding tests with per-instance names; expanded volume/mount expectations; adjusted cleanup tests for auth-reviewer CRB deletion.
Integration Tests
controllers/evalhub/proxy_rbac_test.go
Rebranded test suite from "Proxy RBAC" to "API RBAC"; added per-instance API Role and namespace-scoped RoleBinding tests; introduced split resource-manager tests (jobs-writer, job-config); added proxy config name field validation; adjusted all expectations to use -api suffix and per-instance scoping.
RBAC Manifest Validation Tests
controllers/evalhub/rbac_manifests_test.go
Added new test validating evalhub_job_config_role.yaml ClusterRole permissions for configmaps (create, delete, get, update verbs).

Sequence Diagram(s)

sequenceDiagram
    participant Operator as Operator Reconciler
    participant K8sAPI as Kubernetes API
    participant RBAC as RBAC System
    participant Pod as EvalHub Pod
    
    Operator->>K8sAPI: Create per-instance API Access Role
    K8sAPI-->>Operator: Role created
    
    Operator->>K8sAPI: Create API Access RoleBinding<br/>(binds Role to API SA)
    K8sAPI-->>Operator: RoleBinding created
    
    Operator->>K8sAPI: Create MLFlow Access RoleBinding<br/>(binds ClusterRole to API SA)
    K8sAPI-->>Operator: RoleBinding created
    
    Operator->>K8sAPI: Create Deployment with volumes<br/>(service-ca, mlflow-token)
    K8sAPI-->>Operator: Deployment created
    
    K8sAPI->>Pod: Inject service-ca volume<br/>and mlflow-token volume
    K8sAPI->>Pod: Mount volumes as read-only
    K8sAPI->>Pod: Set env vars (MLFLOW_CA_CERT_PATH, MLFLOW_TOKEN_PATH)
    
    Pod->>RBAC: Check permissions for operations
    RBAC-->>Pod: Authorized via per-instance Role<br/>and MLFlow ClusterRole
    
    Pod->>Pod: Access service-ca from mount
    Pod->>Pod: Access mlflow-token from mount
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • feat: Add RBAC, SAs and Roles for EvalHub #633: Directly related — main PR refactors and extends the EvalHub controller code paths introduced in this PR (proxy→API renaming, per-instance RBAC, service-CA and MLFlow handling).
  • feat: Add EvalHub controller #611: Related — refactors EvalHub controller files modified in this PR (service account naming, RBAC structure, deployment configuration) to introduce API and MLFlow-specific authorization constructs.

Suggested labels

ok-to-test, feature

Suggested reviewers

  • tarilabs
  • RobGeada

Poem

🐰 Per-instance roles now bloom so bright,
With MLFlow tokens held tight,
Service CA secrets safely stowed,
No proxy overhead to bear the load,
EvalHub hops forth with least-privilege flight! 🚀

🚥 Pre-merge checks | ✅ 3 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 76.47% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Add MLFlow integration' directly and clearly summarizes the main objective of the changeset, which is integrating MLFlow with EvalHub.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into main

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • Using the built-in edit ClusterRole for MLFlow access is quite broad; if MLFlow only needs a subset of verbs/resources for its SubjectAccessReview checks, consider defining a narrower Role/ClusterRole to reduce the permission surface.
  • The ServiceAccountTokenProjection for the MLFlow token volume does not set an Audience; consider specifying a dedicated audience expected by MLFlow to avoid overly generic tokens that could be reused by other consumers.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Using the built-in `edit` ClusterRole for MLFlow access is quite broad; if MLFlow only needs a subset of verbs/resources for its SubjectAccessReview checks, consider defining a narrower Role/ClusterRole to reduce the permission surface.
- The `ServiceAccountTokenProjection` for the MLFlow token volume does not set an `Audience`; consider specifying a dedicated audience expected by MLFlow to avoid overly generic tokens that could be reused by other consumers.

## Individual Comments

### Comment 1
<location> `controllers/evalhub/service_accounts.go:261` </location>
<code_context>

+// MLFlow access uses the built-in "edit" ClusterRole which provides the permissions
+// that MLFlow's kubernetes-auth plugin checks via SubjectAccessReview.
+const mlflowAccessClusterRoleName = "edit"
+
+// createMLFlowAccessRoleBinding creates a RoleBinding for a ServiceAccount to the "edit"
</code_context>

<issue_to_address>
**🚨 suggestion (security):** Binding to the built-in "edit" ClusterRole may grant broader permissions than strictly required for MLFlow access.

Relying on the broad "edit" role is convenient but likely over-privileged for MLFlow’s SubjectAccessReview needs. Consider defining a dedicated ClusterRole with only the specific verbs/resources required by the kubernetes-auth plugin and binding to that instead to limit blast radius if MLFlow credentials are compromised.

Suggested implementation:

```golang
// MLFlow access uses a dedicated ClusterRole which provides the minimal permissions
// that MLFlow's kubernetes-auth plugin checks via SubjectAccessReview.
const mlflowAccessClusterRoleName = "mlflow-access"

```

```golang
// createMLFlowAccessRoleBinding creates a RoleBinding for a ServiceAccount to the
// mlflow-access ClusterRole in the instance namespace. This allows the ServiceAccount
// to pass MLFlow's kubernetes-auth SubjectAccessReview checks in the workspace namespace
// while avoiding the broader permissions of the built-in "edit" role.

```

To fully implement the least-privilege approach:
1. Define a `ClusterRole` named `mlflow-access` in your RBAC manifests (YAML) with only the specific verbs/resources required by the MLFlow kubernetes-auth plugin’s SubjectAccessReview checks.
2. Ensure that any existing references in manifests or code that assumed binding to the built-in `edit` ClusterRole are updated to use the new `mlflow-access` role instead.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@controllers/evalhub/service_accounts.go`:
- Around line 85-95: The RoleBindings for MLFlow are created unconditionally —
wrap the two calls to createMLFlowAccessRoleBinding (for serviceAccountName and
jobsServiceAccountName) in the same MLFlow-enabled check used elsewhere (e.g.,
the deployment MLFlow gate in deployment.go); only call
r.createMLFlowAccessRoleBinding(ctx, instance, serviceAccountName, "proxy") and
r.createMLFlowAccessRoleBinding(ctx, instance, jobsServiceAccountName, "jobs")
when the MLFlow feature flag/config check (the function or boolean you use to
decide MLFlow deployment) returns true so RBAC is only applied when MLFlow is
configured.
- Around line 259-261: The constant mlflowAccessClusterRoleName currently points
to the built-in "edit" ClusterRole which is overly permissive and does not grant
MLFlow the specific SubjectAccessReview permission it needs; create a minimal
custom ClusterRole (e.g., name it
"trustyai-service-operator-evalhub-mlflow-access") that grants
authorization.k8s.io/subjectaccessreviews:create (and optionally
authentication.k8s.io/tokenreviews:create) and replace the value of
mlflowAccessClusterRoleName with that custom role name; ensure any RBAC
manifests/creation logic uses the new ClusterRole name so the service account
binds to the minimal custom role rather than "edit".
🧹 Nitpick comments (3)
controllers/evalhub/deployment.go (2)

122-133: MLFlow environment variables are injected unconditionally.

MLFLOW_CA_CERT_PATH, MLFLOW_WORKSPACE, and MLFLOW_TOKEN_PATH are always set regardless of whether MLFlow integration is actually needed. This is consistent with the unconditional volume mounts (flagged above). If you gate the volumes behind an MLFlow-enabled check, these env vars should be gated similarly.


146-155: MLFlow volume mounts also unconditionally added — keep in sync with any gating applied to the volumes.

Same concern as the volumes and env vars above. If the volumes are gated or made optional, these mounts should follow the same pattern.

controllers/evalhub/service_accounts.go (1)

266-331: Consider extracting common RoleBinding reconciliation into a shared helper.

The create-or-update-RoleBinding pattern (check existence → create if missing → update subjects or delete+recreate if RoleRef changed) is repeated across createResourceManagementRoleBinding, createJobsResourceManagementRoleBinding, createJobsProxyRoleBinding, and now createMLFlowAccessRoleBinding. A shared helper like reconcileRoleBinding(ctx, instance, name, subjects, roleRef) would reduce ~50 lines of duplication per call site and make future additions less error-prone.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@controllers/evalhub/service_accounts.go`:
- Around line 85-95: The comment above the MLFlow RoleBinding creation is stale
(it mentions the "edit" ClusterRole) — update the comment to reflect that
createMLFlowAccessRoleBinding now binds to the custom ClusterRole
"evalhub-mlflow-access" (used in createMLFlowAccessRoleBinding). Edit the
comment to describe that MLFlow's kubernetes-auth plugin validates tokens via
SubjectAccessReview against the workspace namespace and that the custom
"evalhub-mlflow-access" ClusterRole provides the required permissions for both
service accounts (serviceAccountName and jobsServiceAccountName).
🧹 Nitpick comments (1)
controllers/evalhub/service_accounts.go (1)

259-334: Consider extracting a shared reconcileRoleBinding helper to reduce duplication.

createMLFlowAccessRoleBinding, createResourceManagementRoleBinding, createJobsResourceManagementRoleBinding, and createJobsProxyRoleBinding all follow the identical get-or-create/update/delete-recreate pattern, differing only in the RoleBinding name, labels, subjects, and roleRef. A single parameterized helper would eliminate ~80 lines of near-duplicate code and make it easier to consistently fix any reconciliation bugs across all bindings.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@controllers/evalhub/deployment_test.go`:
- Around line 374-375: The test assertions for evalhub container VolumeMounts
are outdated: update the Expect(...VolumeMounts).To(HaveLen(...)) checks so they
match the current mounts (evalhub-config, service-ca, mlflow-token and optional
DB secret). Specifically, change the assertion that currently expects 2 mounts
(the Expect(container.VolumeMounts).To(HaveLen(2)) in the DB-configured test) to
expect 4 mounts, and change the assertion that expects 1 mount (the
Expect(container.VolumeMounts).To(HaveLen(1)) in the non-DB test) to expect 3
mounts; leave the existing deployment.Spec.Template.Spec.Volumes expectation
as-is. Use the container variable/identifier used in the tests to locate the two
assertions to update.
🧹 Nitpick comments (1)
controllers/evalhub/service_accounts.go (1)

260-335: Consider extracting a generic RoleBinding reconciliation helper.

createMLFlowAccessRoleBinding, createResourceManagementRoleBinding, createJobsResourceManagementRoleBinding, and createJobsProxyRoleBinding share nearly identical get→create / compare→update / delete→recreate logic. A single reconcileRoleBinding(ctx, instance, name, namespace, labels, subjects, roleRef) helper would eliminate ~100 lines of duplication and make future additions (like this one) trivial.

Copy link
Collaborator

@ppadashe-psp ppadashe-psp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adds the necessary Operator and EH Job grants
/lgtm

thanks @ruivieira

@openshift-ci openshift-ci bot removed the lgtm label Feb 15, 2026
@ruivieira
Copy link
Member Author

@coderabbitai full review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 15, 2026

✅ Actions performed

Full review triggered.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
controllers/evalhub/service_accounts.go (1)

331-394: ⚠️ Potential issue | 🔴 Critical

Auth-reviewer CRB correctly omits owner references — but label issue remains.

Good: The comment at line 367 correctly explains why owner references cannot be set on cluster-scoped resources. The subjects/roleRef comparison and update logic is sound.

However, this is where the app.kubernetes.io/name label is set to the full CRB name (line 343), which is the root cause of the pipeline failures flagged above.

🤖 Fix all issues with AI agents
In `@controllers/evalhub/proxy_rbac_test.go`:
- Around line 17-23: The test failures are caused by cluster role binding names
(produced by generateAuthReviewerClusterRoleBindingName) exceeding Kubernetes'
63-char label limit and being placed verbatim into the app.kubernetes.io/name
label in service_accounts.go; fix service_accounts.go so the code that assigns
the app.kubernetes.io/name label for service accounts (the label assignment
logic near where service accounts are created) enforces Kubernetes label
length/format constraints by truncating or hashing the generated name to a
DNS-1123 compatible string no longer than 63 characters (preserve uniqueness,
e.g., keep a human-readable prefix and append a short hash), and use that
normalized value instead of the raw generateAuthReviewerClusterRoleBindingName
output.

In `@controllers/evalhub/service_accounts.go`:
- Around line 33-35: generateAuthReviewerClusterRoleBindingName builds a long
string (<name>-<namespace>-auth-reviewer-crb) that is later used as the
app.kubernetes.io/name label in createAuthReviewerClusterRoleBinding, which can
exceed Kubernetes' 63-char label limit; modify the code so the full CRB
metadata.name still uses generateAuthReviewerClusterRoleBindingName but the
label uses a safe, truncation-or-hash-derived value (e.g., take the first 63
characters or use a stable short hash of the full name) before setting
app.kubernetes.io/name in createAuthReviewerClusterRoleBinding; ensure the
label-producing logic is deterministic and referenced by both
createAuthReviewerClusterRoleBinding and any other places reading that label.
🧹 Nitpick comments (4)
controllers/evalhub/rbac_manifests_test.go (2)

38-49: Minor: only the first rule containing configmaps is checked.

If the manifest ever has multiple rules referencing configmaps (unlikely but possible), only the first one's verbs are validated. Current behavior is fine given the manifest structure, but worth noting.


54-65: Consider using reflect.DeepEqual or slices.Equal for the sorted comparison.

The manual element-by-element comparison works correctly but could be simplified.

♻️ Optional simplification
+	"reflect"
...
 	sort.Strings(gotVerbs)
 	sort.Strings(wantVerbs)
 
-	if len(gotVerbs) != len(wantVerbs) {
-		t.Fatalf("unexpected verbs for configmaps: got=%v want=%v", gotVerbs, wantVerbs)
-	}
-	for i := range wantVerbs {
-		if gotVerbs[i] != wantVerbs[i] {
-			t.Fatalf("unexpected verbs for configmaps: got=%v want=%v", gotVerbs, wantVerbs)
-		}
+	if !reflect.DeepEqual(gotVerbs, wantVerbs) {
+		t.Fatalf("unexpected verbs for configmaps: got=%v want=%v", gotVerbs, wantVerbs)
 	}
controllers/evalhub/service_accounts.go (2)

38-128: createServiceAccount has grown into a large orchestration function creating ~8 RBAC resources.

This function now creates a ServiceAccount, a ClusterRoleBinding, 2 Roles, and 5 RoleBindings. If any step fails mid-way, partial RBAC state is left behind (though reconciliation will retry). Consider extracting an ensureRBACForInstance helper to improve readability and make the orchestration explicit.

Not a blocker given the reconciler's idempotent retry design, but worth considering for maintainability.


504-531: equalPolicyRules uses positional comparison — fragile if rule order changes.

The function compares rules by index position. If the desired rules are reordered (e.g., during a refactor), existing roles will be unnecessarily updated. For the current usage where both sides are constructed in the same code this is acceptable, but consider sorting before comparison for robustness.

Comment on lines +17 to 23
var _ = Describe("EvalHub API RBAC", func() {
const (
testNamespacePrefix = "evalhub-proxy-rbac-test"
testNamespacePrefix = "evalhub-api-rbac-test"
operatorNamespacePrefix = "operator-system"
evalHubName = "proxy-rbac-evalhub"
evalHubName = "api-rbac-evalhub"
configMapName = "trustyai-service-operator-config"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

All pipeline failures in this file trace to app.kubernetes.io/name label exceeding 63 chars.

The test namespace names include nanosecond timestamps (line 38), making them ~40 chars long. Combined with evalHubName in generateAuthReviewerClusterRoleBindingName, the CRB name exceeds 63 characters and is then set verbatim as the app.kubernetes.io/name label in service_accounts.go (line 343). Kubernetes rejects label values >63 chars.

This is addressed in the service_accounts.go review comment — the fix needs to happen there, not in the tests. The test is correctly exposing a real production-time risk.

🤖 Prompt for AI Agents
In `@controllers/evalhub/proxy_rbac_test.go` around lines 17 - 23, The test
failures are caused by cluster role binding names (produced by
generateAuthReviewerClusterRoleBindingName) exceeding Kubernetes' 63-char label
limit and being placed verbatim into the app.kubernetes.io/name label in
service_accounts.go; fix service_accounts.go so the code that assigns the
app.kubernetes.io/name label for service accounts (the label assignment logic
near where service accounts are created) enforces Kubernetes label length/format
constraints by truncating or hashing the generated name to a DNS-1123 compatible
string no longer than 63 characters (preserve uniqueness, e.g., keep a
human-readable prefix and append a short hash), and use that normalized value
instead of the raw generateAuthReviewerClusterRoleBindingName output.

Comment on lines +33 to 35
func generateAuthReviewerClusterRoleBindingName(instance *evalhubv1alpha1.EvalHub) string {
return instance.Name + "-" + instance.Namespace + "-auth-reviewer-crb"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRB name used as label value can exceed Kubernetes' 63-character limit — root cause of all pipeline failures.

generateAuthReviewerClusterRoleBindingName produces <name>-<namespace>-auth-reviewer-crb. This value is then set as the app.kubernetes.io/name label (line 343), but Kubernetes label values are capped at 63 characters. With even moderately sized instance names and namespace names this limit is easily breached, as confirmed by the pipeline failures.

The .metadata.name field allows up to 253 characters, so the CRB name itself is fine — only the label needs to be truncated or computed differently.

Proposed fix
 func generateAuthReviewerClusterRoleBindingName(instance *evalhubv1alpha1.EvalHub) string {
 	return instance.Name + "-" + instance.Namespace + "-auth-reviewer-crb"
 }
+
+// truncateLabel truncates a string to the Kubernetes label value limit (63 chars).
+func truncateLabel(s string, maxLen int) string {
+	if len(s) <= maxLen {
+		return s
+	}
+	return s[:maxLen]
+}

Then in createAuthReviewerClusterRoleBinding (line 343):

 			Labels: map[string]string{
 				"app":                        "eval-hub",
-				"app.kubernetes.io/name":     clusterRoleBindingName,
+				"app.kubernetes.io/name":     truncateLabel(clusterRoleBindingName, 63),
 				"app.kubernetes.io/instance": instance.Name,
 				"app.kubernetes.io/part-of":  "eval-hub",
 				"app.kubernetes.io/version":  constants.Version,
 			},
🤖 Prompt for AI Agents
In `@controllers/evalhub/service_accounts.go` around lines 33 - 35,
generateAuthReviewerClusterRoleBindingName builds a long string
(<name>-<namespace>-auth-reviewer-crb) that is later used as the
app.kubernetes.io/name label in createAuthReviewerClusterRoleBinding, which can
exceed Kubernetes' 63-char label limit; modify the code so the full CRB
metadata.name still uses generateAuthReviewerClusterRoleBindingName but the
label uses a safe, truncation-or-hash-derived value (e.g., take the first 63
characters or use a stable short hash of the full name) before setting
app.kubernetes.io/name in createAuthReviewerClusterRoleBinding; ensure the
label-producing logic is deterministic and referenced by both
createAuthReviewerClusterRoleBinding and any other places reading that label.

@openshift-ci
Copy link

openshift-ci bot commented Feb 15, 2026

@ruivieira: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/trustyai-service-operator-e2e 0f29606 link true /test trustyai-service-operator-e2e

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Member

@tarilabs tarilabs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @ruivieira

minor comments below for your 👀

@openshift-ci openshift-ci bot added the lgtm label Feb 16, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 16, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ppadashe-psp, tarilabs

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Co-authored-by: Matteo Mortari <matteo.mortari@gmail.com>
@openshift-ci openshift-ci bot removed the lgtm label Feb 16, 2026
@openshift-ci
Copy link

openshift-ci bot commented Feb 16, 2026

New changes are detected. LGTM label has been removed.

@ruivieira ruivieira merged commit 353b718 into trustyai-explainability:main Feb 16, 2026
9 of 10 checks passed
@ruivieira ruivieira deleted the evalhub-mlflow branch February 16, 2026 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

3 participants