feat(evalhub): Providers as ConfigMaps#650
feat(evalhub): Providers as ConfigMaps#650ruivieira merged 5 commits intotrustyai-explainability:mainfrom
Conversation
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
📝 WalkthroughWalkthroughAdds an optional Changes
Sequence Diagram(s)sequenceDiagram
participant R as Reconciler
participant CR as EvalHub CR
participant CMop as Provider ConfigMaps (operator ns)
participant NS as Target Namespace
participant K8s as Kubernetes API (Deployment)
R->>CR: read instance.Spec.Providers
R->>CMop: list ConfigMaps by provider labels (operator ns)
CMop-->>R: matching ConfigMaps list
loop for each provider CM
R->>NS: create/update copy of provider ConfigMap
NS-->>R: confirm created/updated
end
R->>R: collect provider CM names
R->>K8s: build DeploymentSpec (include projected providers volume if any)
K8s-->>R: Deployment created/updated
R->>CR: update Status.ActiveProviders
CR-->>R: status persisted
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (10)
hack/sync-evalhub-providers.py (2)
78-91: ConfigMap manifest built via string interpolation is fragile.If a
filenameorsafe_idever contains YAML-special characters (:,#, etc.), the output will be malformed. Since the upstream data source is not under this repo's control, consider using theyamllibrary (already imported) to serialize the entire ConfigMap structure, or at minimum validate/quote the interpolated values.This is low-risk given the current known provider IDs but worth hardening since the script is designed to auto-sync from an external repo.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/sync-evalhub-providers.py` around lines 78 - 91, The ConfigMap is built via fragile string interpolation (variables cm_name, cm_file, filename, safe_id, provider_yaml) which can break with YAML-special characters; instead build a Python dict representing the ConfigMap (metadata.name, metadata.labels with PROVIDER_TYPE_LABEL/PROVIDER_NAME_LABEL set to system and safe_id, and data mapping filename -> provider_yaml) and serialize it with the imported yaml.safe_dump to produce a stable manifest, then write that serialized string to OUTPUT_DIR / cm_file; if you prefer minimal change, at least validate/quote filename and safe_id (e.g., ensure they are strings and use yaml.safe_dump on those scalar values) before interpolation.
63-64: Incomplete RFC 1123 sanitization for Kubernetes resource names.Only underscores are replaced with hyphens. Provider IDs containing uppercase letters, dots, or other non-
[a-z0-9-]characters will produce invalid K8s resource names. Consider a more thorough sanitization, e.g.:Proposed fix
- # Sanitize for K8s resource names (RFC 1123: only lowercase alphanumeric and hyphens) - safe_id = provider_id.replace("_", "-") + import re + # Sanitize for K8s resource names (RFC 1123: only lowercase alphanumeric and hyphens) + safe_id = re.sub(r"[^a-z0-9-]", "-", provider_id.lower()).strip("-")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/sync-evalhub-providers.py` around lines 63 - 64, The current sanitization only replaces underscores in provider_id to produce safe_id but doesn't enforce RFC1123 rules; update the logic in hack/sync-evalhub-providers.py where safe_id is derived from provider_id to: lowercase the string, replace any character not in [a-z0-9-] with hyphens (use a single re.sub with a negated class), collapse consecutive hyphens to one, strip leading/trailing hyphens, truncate to 63 characters, and ensure the result is non-empty (fallback to a safe default like "provider") so the final safe_id complies with Kubernetes resource name constraints.controllers/evalhub/suite_test.go (1)
132-154: Extract the default provider list to a shared constant to avoid duplication.The list
["garak", "guidellm", "lighteval", "lm-evaluation-harness"]is defined independently in both the test helper function (suite_test.go:135) and the CRD annotation (api/evalhub/v1alpha1/evalhub_types.go:50). When a provider is added or removed, both locations must be updated separately. Define a shared constant (e.g., in a config or constants package) that both the CRD annotation and test code can reference to keep them in sync.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/suite_test.go` around lines 132 - 154, The default provider list is duplicated; extract the slice currently assigned to defaultProviders in createDefaultProviderConfigMaps into a shared constant (for example DefaultProviders) in a common package (e.g., config or constants) and update both createDefaultProviderConfigMaps (in which defaultProviders is used) and the CRD annotation in api/evalhub/v1alpha1/evalhub_types.go to import and reference that shared constant so additions/removals are made in one place.controllers/evalhub/evalhub_controller_test.go (1)
45-48: Remove unused provider ConfigMap setup or add clarifying comment.The test creates provider ConfigMaps in BeforeEach (lines 45-48), but
createEvalHubInstance(suite_test.go:98) explicitly setsProviders: []string{}, which overrides the CRD default. ThereconcileProviderConfigMapsfunction returns immediately when the Providers list is empty (configmap.go:292-294), so these ConfigMaps are never consumed. Either remove this setup to reduce test noise, or add a comment clarifying why they're needed (e.g., if they're being set up for future test variations or to document the CRD default behavior).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/evalhub_controller_test.go` around lines 45 - 48, The test currently creates provider ConfigMaps via createDefaultProviderConfigMaps even though createEvalHubInstance explicitly sets Providers: []string{} and reconcileProviderConfigMaps returns early when Providers is empty; remove the loop that creates those ConfigMaps to eliminate dead setup, or alternatively add a clarifying comment above the loop explaining that the ConfigMaps are created only to document the CRD default/for potential future tests despite Providers being explicitly empty in createEvalHubInstance, referencing createDefaultProviderConfigMaps, createEvalHubInstance, reconcileProviderConfigMaps, and the Providers field so reviewers can see why the change was made.config/base/params.env (1)
20-23: Consider pinning image tags for reproducibility.Three of the four provider images use
:latest, whilelm-evaluation-harnessis pinned toodh-3.4-ea2. If these defaults are only for local development and are overridden by the ODH meta-operator in production, this is fine. Otherwise, unpinned tags can cause non-reproducible deployments.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/base/params.env` around lines 20 - 23, Several provider image entries (evalhub-provider-garak-image, evalhub-provider-guidellm-image, evalhub-provider-lighteval-image) are set to :latest which makes deployments non-reproducible; change these to pinned image tags or documented variables. Update the values for evalhub-provider-garak-image, evalhub-provider-guidellm-image, and evalhub-provider-lighteval-image to specific, immutable tags (or replace with clearly named environment variables that are overridden in production), and add a short comment noting that lm-evaluation-harness is already pinned (odh-3.4-ea2) to keep consistency and reproducibility.controllers/evalhub/deployment_test.go (1)
374-375: Volume count assertions will need companion tests for the providers-present path.Line 375 asserts 6 volumes (5 base + DB secret) and line 430 asserts 5 base volumes — both with
nilproviderCMNames. When providers are supplied, aprovidersprojected volume is added, which would change these counts. Ensure that a separate test (here or elsewhere) validates the volume/mount setup with a non-emptyproviderCMNamesslice.Also applies to: 429-430
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/deployment_test.go` around lines 374 - 375, The current tests assert exact volume counts with providerCMNames == nil but lack a companion case when providerCMNames is non-empty; add a test that sets providerCMNames to a non-empty slice, builds the Deployment (same setup code that produces the deployment variable), and assert that deployment.Spec.Template.Spec.Volumes length increases by 1 compared to the nil-provider case, that there is a projected Volume named "providers" (or the configured providers volume name), and that the corresponding container VolumeMount exists in deployment.Spec.Template.Spec.Containers[*].VolumeMounts; also validate any expected projected items (ConfigMap names) are present in the projected sources.controllers/evalhub/build_test.go (1)
71-71: Add a unit test forbuildDeploymentSpecwith non-nilproviderCMNames.The three test cases in
build_test.go(lines 71, 186, 204) all passnilforproviderCMNames, testing only the baseline no-providers path. While provider volume/mount integration is verified inunit_test.goviareconcileDeployment, there is no direct unit test ofbuildDeploymentSpecwith actual provider ConfigMap names. Add a test case that callsbuildDeploymentSpecwith a non-emptyproviderCMNamesslice to verify that the providers volume and corresponding container mounts are correctly configured.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/build_test.go` at line 71, Add a unit test that calls reconciler.buildDeploymentSpec(ctx, evalHub, providerCMNames) with a non-empty slice (e.g., []string{"provider-a","provider-b"}) and assert the returned deploymentSpec contains a providers volume and matching container volumeMounts; specifically verify that the volumes list includes a ConfigMap volume for the provider CM names and that the main container in deploymentSpec.Spec.Template.Spec.Containers has corresponding volumeMounts with the expected mountPath/name for each provider. Keep the test alongside existing cases in build_test.go and reuse the same evalHub/reconciler setup so it only validates providerCMNames handling.controllers/evalhub/configmap.go (2)
284-355: Stale provider ConfigMaps are not cleaned up when providers are removed from the spec.If a user removes a provider from
spec.Providers, the previously copied ConfigMap (e.g.,<instance>-provider-<old>) remains in the namespace. While it will be garbage-collected when the entire EvalHub CR is deleted (via owner reference), it persists as long as the CR exists. The stale ConfigMap also remains in the projected volume definition from a previous reconcile until the deployment is rebuilt.This is a low-risk issue since the deployment is rebuilt each reconcile (removing the stale volume source), but the orphaned ConfigMap itself remains. Consider adding cleanup logic that lists ConfigMaps with an owner reference to this CR and deletes those not in the current
providerCMNamesset.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/configmap.go` around lines 284 - 355, reconcileProviderConfigMaps currently only creates/updates provider ConfigMaps but never removes ones that were previously copied; modify reconcileProviderConfigMaps to, after building cmNames, list ConfigMaps in instance.Namespace filtered by an ownerReference to the EvalHub instance (or by name prefix instance.Name+"-provider-"), and delete any ConfigMap whose name matches the provider-target pattern but is not present in cmNames; use r.List to fetch candidates, check ownerRef (or use controllerutil.SetControllerReference already used when creating), and call r.Delete(ctx, cm) for each stale ConfigMap, returning any error encountered.
311-316: Multiple matching ConfigMaps are silently ignored — only the first is used.If more than one ConfigMap in the operator namespace matches both labels (
provider-type=systemandprovider-name=<name>), the code silently picksItems[0]. This could lead to non-deterministic behavior since the list order is not guaranteed by the Kubernetes API. Consider logging a warning whenlen(sourceList.Items) > 1.Suggested addition after Line 314
if len(sourceList.Items) == 0 { return nil, fmt.Errorf("provider %q not found: no ConfigMap with label %s=%s in namespace %s", providerName, providerNameLabel, providerName, r.Namespace) } + if len(sourceList.Items) > 1 { + log.Info("Multiple ConfigMaps found for provider, using first match", + "provider", providerName, "count", len(sourceList.Items)) + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/configmap.go` around lines 311 - 316, When multiple ConfigMaps match the labels the code currently just uses sourceList.Items[0]; add a warning when len(sourceList.Items) > 1 to make this visible. After the existing length check and before assigning src := &sourceList.Items[0], log a warning that includes providerNameLabel, providerName, r.Namespace and the number of matches (len(sourceList.Items)), then continue to use the first item as before; reference sourceList.Items, src, providerNameLabel, providerName and r.Namespace when constructing the message.config/configmaps/evalhub/provider-lm-evaluation-harness.yaml (1)
1-2027: Large ConfigMap may approach etcd object size limits.This single ConfigMap embeds ~2000+ lines of YAML data. Kubernetes etcd has a default 1.5 MB limit per object. While this file is likely under that threshold as text, it's worth monitoring—especially if the benchmark list grows further. Consider whether the provider could split benchmarks into a separate ConfigMap if the list continues to expand.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@config/configmaps/evalhub/provider-lm-evaluation-harness.yaml` around lines 1 - 2027, The ConfigMap evalhub-provider-lm-evaluation-harness contains a very large embedded key lm_evaluation_harness.yaml which may hit etcd object size limits; refactor by extracting the lengthy benchmarks: section into one or more separate ConfigMaps (e.g., evalhub-provider-lm-evaluation-harness-benchmarks or per-category ConfigMaps) and update the provider to load/merge those ConfigMaps at runtime (refer to the top-level resource name evalhub-provider-lm-evaluation-harness and the lm_evaluation_harness.yaml key when making the change); ensure the main ConfigMap keeps only small runtime metadata and a pointer to the separate benchmarks ConfigMap(s), and add a quick validation check in the deployment/CI to warn if any resulting ConfigMap exceeds the etcd ~1.5MB threshold.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/configmaps/evalhub/provider-garak.yaml`:
- Around line 24-26: In the provider-garak ConfigMap remove the placeholder env
entry (the env: - name: VAR_NAME value: VALUE) unless garak actually requires an
environment variable; if it does, replace VAR_NAME and VALUE with a meaningful
variable name and real default value and add a comment explaining its
purpose—locate the env block in the provider-garak ConfigMap and either delete
the entire env section or update the VAR_NAME/VALUE to a real key/value and
document it.
In `@controllers/evalhub/deployment.go`:
- Around line 110-113: The env var PROVIDERS_CONFIG_PATH is being added
unconditionally even when providerCMNames is empty; change the logic so that
append to defaultEnvVars for PROVIDERS_CONFIG_PATH with value providersMountPath
happens only when len(providerCMNames) > 0, mirroring how the corresponding
volumeMount/volume are created; locate the code that builds defaultEnvVars
(where PROVIDERS_CONFIG_PATH is currently added) and wrap that append in an
if-check using providerCMNames to ensure the env var is only set when the
volume/mount are also created.
---
Nitpick comments:
In `@config/base/params.env`:
- Around line 20-23: Several provider image entries
(evalhub-provider-garak-image, evalhub-provider-guidellm-image,
evalhub-provider-lighteval-image) are set to :latest which makes deployments
non-reproducible; change these to pinned image tags or documented variables.
Update the values for evalhub-provider-garak-image,
evalhub-provider-guidellm-image, and evalhub-provider-lighteval-image to
specific, immutable tags (or replace with clearly named environment variables
that are overridden in production), and add a short comment noting that
lm-evaluation-harness is already pinned (odh-3.4-ea2) to keep consistency and
reproducibility.
In `@config/configmaps/evalhub/provider-lm-evaluation-harness.yaml`:
- Around line 1-2027: The ConfigMap evalhub-provider-lm-evaluation-harness
contains a very large embedded key lm_evaluation_harness.yaml which may hit etcd
object size limits; refactor by extracting the lengthy benchmarks: section into
one or more separate ConfigMaps (e.g.,
evalhub-provider-lm-evaluation-harness-benchmarks or per-category ConfigMaps)
and update the provider to load/merge those ConfigMaps at runtime (refer to the
top-level resource name evalhub-provider-lm-evaluation-harness and the
lm_evaluation_harness.yaml key when making the change); ensure the main
ConfigMap keeps only small runtime metadata and a pointer to the separate
benchmarks ConfigMap(s), and add a quick validation check in the deployment/CI
to warn if any resulting ConfigMap exceeds the etcd ~1.5MB threshold.
In `@controllers/evalhub/build_test.go`:
- Line 71: Add a unit test that calls reconciler.buildDeploymentSpec(ctx,
evalHub, providerCMNames) with a non-empty slice (e.g.,
[]string{"provider-a","provider-b"}) and assert the returned deploymentSpec
contains a providers volume and matching container volumeMounts; specifically
verify that the volumes list includes a ConfigMap volume for the provider CM
names and that the main container in
deploymentSpec.Spec.Template.Spec.Containers has corresponding volumeMounts with
the expected mountPath/name for each provider. Keep the test alongside existing
cases in build_test.go and reuse the same evalHub/reconciler setup so it only
validates providerCMNames handling.
In `@controllers/evalhub/configmap.go`:
- Around line 284-355: reconcileProviderConfigMaps currently only
creates/updates provider ConfigMaps but never removes ones that were previously
copied; modify reconcileProviderConfigMaps to, after building cmNames, list
ConfigMaps in instance.Namespace filtered by an ownerReference to the EvalHub
instance (or by name prefix instance.Name+"-provider-"), and delete any
ConfigMap whose name matches the provider-target pattern but is not present in
cmNames; use r.List to fetch candidates, check ownerRef (or use
controllerutil.SetControllerReference already used when creating), and call
r.Delete(ctx, cm) for each stale ConfigMap, returning any error encountered.
- Around line 311-316: When multiple ConfigMaps match the labels the code
currently just uses sourceList.Items[0]; add a warning when
len(sourceList.Items) > 1 to make this visible. After the existing length check
and before assigning src := &sourceList.Items[0], log a warning that includes
providerNameLabel, providerName, r.Namespace and the number of matches
(len(sourceList.Items)), then continue to use the first item as before;
reference sourceList.Items, src, providerNameLabel, providerName and r.Namespace
when constructing the message.
In `@controllers/evalhub/deployment_test.go`:
- Around line 374-375: The current tests assert exact volume counts with
providerCMNames == nil but lack a companion case when providerCMNames is
non-empty; add a test that sets providerCMNames to a non-empty slice, builds the
Deployment (same setup code that produces the deployment variable), and assert
that deployment.Spec.Template.Spec.Volumes length increases by 1 compared to the
nil-provider case, that there is a projected Volume named "providers" (or the
configured providers volume name), and that the corresponding container
VolumeMount exists in deployment.Spec.Template.Spec.Containers[*].VolumeMounts;
also validate any expected projected items (ConfigMap names) are present in the
projected sources.
In `@controllers/evalhub/evalhub_controller_test.go`:
- Around line 45-48: The test currently creates provider ConfigMaps via
createDefaultProviderConfigMaps even though createEvalHubInstance explicitly
sets Providers: []string{} and reconcileProviderConfigMaps returns early when
Providers is empty; remove the loop that creates those ConfigMaps to eliminate
dead setup, or alternatively add a clarifying comment above the loop explaining
that the ConfigMaps are created only to document the CRD default/for potential
future tests despite Providers being explicitly empty in createEvalHubInstance,
referencing createDefaultProviderConfigMaps, createEvalHubInstance,
reconcileProviderConfigMaps, and the Providers field so reviewers can see why
the change was made.
In `@controllers/evalhub/suite_test.go`:
- Around line 132-154: The default provider list is duplicated; extract the
slice currently assigned to defaultProviders in createDefaultProviderConfigMaps
into a shared constant (for example DefaultProviders) in a common package (e.g.,
config or constants) and update both createDefaultProviderConfigMaps (in which
defaultProviders is used) and the CRD annotation in
api/evalhub/v1alpha1/evalhub_types.go to import and reference that shared
constant so additions/removals are made in one place.
In `@hack/sync-evalhub-providers.py`:
- Around line 78-91: The ConfigMap is built via fragile string interpolation
(variables cm_name, cm_file, filename, safe_id, provider_yaml) which can break
with YAML-special characters; instead build a Python dict representing the
ConfigMap (metadata.name, metadata.labels with
PROVIDER_TYPE_LABEL/PROVIDER_NAME_LABEL set to system and safe_id, and data
mapping filename -> provider_yaml) and serialize it with the imported
yaml.safe_dump to produce a stable manifest, then write that serialized string
to OUTPUT_DIR / cm_file; if you prefer minimal change, at least validate/quote
filename and safe_id (e.g., ensure they are strings and use yaml.safe_dump on
those scalar values) before interpolation.
- Around line 63-64: The current sanitization only replaces underscores in
provider_id to produce safe_id but doesn't enforce RFC1123 rules; update the
logic in hack/sync-evalhub-providers.py where safe_id is derived from
provider_id to: lowercase the string, replace any character not in [a-z0-9-]
with hyphens (use a single re.sub with a negated class), collapse consecutive
hyphens to one, strip leading/trailing hyphens, truncate to 63 characters, and
ensure the result is non-empty (fallback to a safe default like "provider") so
the final safe_id complies with Kubernetes resource name constraints.
tarilabs
left a comment
There was a problem hiding this comment.
some comments to guide the PR review if needed
|
|
||
| // Providers is the list of OOTB provider names to mount into the deployment. | ||
| // Each name must match a provider-name label on a ConfigMap in the operator namespace. | ||
| // +kubebuilder:default:={"garak","guidellm","lighteval","lm-evaluation-harness"} |
There was a problem hiding this comment.
default OOTB providers, so we don't have to force EH CR instance author to write them 🙃
| evalhub-provider-garak-image=quay.io/evalhub/garak:latest | ||
| evalhub-provider-guidellm-image=quay.io/evalhub/community-guidellm:latest | ||
| evalhub-provider-lighteval-image=quay.io/evalhub/community-lighteval:latest | ||
| evalhub-provider-lm-evaluation-harness-image=quay.io/opendatahub/ta-lmes-job:odh-3.4-ea2 No newline at end of file |
There was a problem hiding this comment.
we'll inject as needed here on in ODH overlay params.env, as usual
| type: builtin | ||
| runtime: | ||
| k8s: | ||
| image: $(evalhub-provider-garak-image) |
There was a problem hiding this comment.
for one example, this is what the script make sure to write in manifest CM
| @@ -1,4 +1,5 @@ | |||
| resources: | |||
| - lmeval.yaml | |||
| - evalhub/ | |||
There was a problem hiding this comment.
I needed to wire this, naturally
| - ../rbac | ||
| - ../manager | ||
| - ../prometheus | ||
| - ../configmaps |
There was a problem hiding this comment.
...and I needed to wire this too.
| default: | ||
| - garak | ||
| - guidellm | ||
| - lighteval | ||
| - lm-evaluation-harness |
There was a problem hiding this comment.
kubebuilder make the defaults for the K8s API schema, as expected.
| - evalhub/evalhub_jobs_writer_role.yaml | ||
| - evalhub/evalhub_jobs_writer_binding.yaml | ||
| - evalhub/evalhub_job_config_role.yaml | ||
| - evalhub/evalhub_job_config_binding.yaml | ||
| - evalhub/evalhub_mlflow_access_role.yaml | ||
| - evalhub/evalhub_job_config_role.yaml | ||
| - evalhub/evalhub_jobs_writer_binding.yaml | ||
| - evalhub/evalhub_jobs_writer_role.yaml | ||
| - evalhub/evalhub_mlflow_access_binding.yaml | ||
| - evalhub/evalhub_mlflow_jobs_role.yaml | ||
| - evalhub/evalhub_mlflow_access_role.yaml | ||
| - evalhub/evalhub_mlflow_jobs_binding.yaml | ||
| - evalhub/evalhub_mlflow_jobs_role.yaml |
There was a problem hiding this comment.
I wanted to preserve lexicographic ordering as in IDE 😅
controllers/evalhub/configmap.go
Outdated
| // Each source ConfigMap is discovered by the labels: | ||
| // - eval-hub.github.io/provider-type=system | ||
| // - eval-hub.github.io/provider-name=<name> |
There was a problem hiding this comment.
the labels to be used,
instead of relying on "de-prefix CM name to assume it's the provider final name"
| #!/usr/bin/env python3 | ||
| """ | ||
| Fetches provider YAML files from the eval-hub upstream repository and generates | ||
| Kubernetes ConfigMap manifests for the operator to deploy. | ||
|
|
||
| Usage: | ||
| hack/sync-evalhub-providers.py [branch] | ||
|
|
||
| Arguments: | ||
| branch Git branch to fetch from (default: main) | ||
| """ | ||
|
|
There was a problem hiding this comment.
the script I've used to make the config/configmaps/evalhub providers CM yamls
There was a problem hiding this comment.
@tarilabs LGMT, thanks!
Just a note that we need the params.env also changed in the odh and rhoai overlays.
ODH one needs to be updated with each ODH release, so for now, we could have quay.io/opendatahub/ta-lmes-job:odh-3.4-ea2 in preparation for the next release. The other providers can remain as latest (or if we cut a release in contrib, we can update them)
RHOAI ones can keep exactly like this, since the values are injected.
Please note that the image is injected by key, not value. i.e. evalhub-provider-lm-evaluation-harness-image would not be changed on RHOAI, only lmes-pod-image=quay.io/trustyai/ta-lmes-job:latest
Signed-off-by: tarilabs <matteo.mortari@gmail.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/overlays/rhoai/params.env`:
- Around line 21-24: The env var evalhub-provider-lm-evaluation-harness-image
currently points to quay.io/opendatahub/ta-lmes-job:odh-3.4-ea2 which breaks the
RHOAI overlay pattern; update evalhub-provider-lm-evaluation-harness-image to
use the trustyai namespace and the correct tag (e.g.,
quay.io/trustyai/ta-lmes-job:<appropriate-tag>) so it matches other RHOAI
images, and confirm which tag to use (latest or a release-specific tag); also
note the three evalhub community images (evalhub-provider-garak-image,
evalhub-provider-guidellm-image, evalhub-provider-lighteval-image) should be
tracked for future replacement with RHOAI/trustyai equivalents if/when
available.
| evalhub-provider-garak-image=quay.io/evalhub/garak:latest | ||
| evalhub-provider-guidellm-image=quay.io/evalhub/community-guidellm:latest | ||
| evalhub-provider-lighteval-image=quay.io/evalhub/community-lighteval:latest | ||
| evalhub-provider-lm-evaluation-harness-image=quay.io/opendatahub/ta-lmes-job:odh-3.4-ea2 No newline at end of file |
There was a problem hiding this comment.
Line 24 uses an ODH-scoped registry and EA tag in the RHOAI overlay — likely a placeholder needing update
Two concerns:
-
Namespace + tag mismatch on line 24: Every component that has both an ODH and RHOAI image in this repo uses
quay.io/trustyai/in the RHOAI overlay (e.g.,lmes-pod-image=quay.io/trustyai/ta-lmes-job:lateston line 7 vsquay.io/opendatahub/ta-lmes-job:latestin the ODH overlay). Line 24 breaks this pattern by pointing toquay.io/opendatahub/ta-lmes-job:odh-3.4-ea2— theopendatahubnamespace and theodh-3.4-ea2early-access tag are both inconsistent with the rest of the RHOAI overlay. This appears to be a temporary placeholder that still needs aquay.io/trustyai/ta-lmes-jobequivalent (with an appropriate tag). Please confirm. -
Community images on lines 21-23:
quay.io/evalhub/community-guidellmandquay.io/evalhub/community-lightevalare community images shared identically with the ODH overlay. If RHOAI-certified ortrustyai-namespaced equivalents are planned, these should be tracked for future update.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@config/overlays/rhoai/params.env` around lines 21 - 24, The env var
evalhub-provider-lm-evaluation-harness-image currently points to
quay.io/opendatahub/ta-lmes-job:odh-3.4-ea2 which breaks the RHOAI overlay
pattern; update evalhub-provider-lm-evaluation-harness-image to use the trustyai
namespace and the correct tag (e.g.,
quay.io/trustyai/ta-lmes-job:<appropriate-tag>) so it matches other RHOAI
images, and confirm which tag to use (latest or a release-specific tag); also
note the three evalhub community images (evalhub-provider-garak-image,
evalhub-provider-guidellm-image, evalhub-provider-lighteval-image) should be
tracked for future replacement with RHOAI/trustyai equivalents if/when
available.
There was a problem hiding this comment.
RHOAI ones can keep exactly like this, since the values are injected
for the scope of this PR, I believe it's fine as-is, but the reviewer could comment; thanks for bringing this to our attention
There was a problem hiding this comment.
@tarilabs, understood! Since these values are injected during the Kustomize build process, the current configuration makes sense for this PR. I appreciate the clarification about the substitution mechanism. The reviewer can assess whether any follow-up is needed. Thank you for considering the feedback! 👍
🧠 Learnings used
Learnt from: tarilabs
Repo: trustyai-explainability/trustyai-service-operator PR: 650
File: config/configmaps/evalhub/provider-garak.yaml:24-26
Timestamp: 2026-02-18T18:07:41.943Z
Learning: In the trustyai-service-operator EvalHub provider ConfigMaps (config/configmaps/evalhub/provider-*.yaml), the provider configurations are sourced from the upstream eval-hub repository via hack/sync-evalhub-providers.py script. The project policy is to preserve the upstream provider configs as-is, including any placeholder environment variables or template content, to maintain consistency with the upstream source.
|
@tarilabs: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ruivieira The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
- Fix ConfigMap struct tags: sigs.k8s.io/yaml uses json tags, not yaml tags, causing PascalCase keys (ReadyFile instead of ready_file) which EvalHub's Viper/mapstructure could not match, crashing on startup - Add service, env_mappings, database, and prometheus sections to generated ConfigMap so EvalHub gets a complete config - Replace CONFIG_PATH/PROVIDERS_CONFIG_PATH env vars with single EVAL_HUB_CONFIG_DIR pointing to /etc/evalhub/config - Update provider ConfigMap labels from eval-hub.github.io/* to trustyai.opendatahub.io/evalhub-* to match operator label selectors - Update sync-evalhub-providers.py to use new label scheme
|
New changes are detected. LGTM label has been removed. |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
hack/sync-evalhub-providers.py (3)
34-41: Consider adding a timeout to network requests.The
urlopencalls have no timeout, which means the script could hang indefinitely if the GitHub API or raw content endpoints are slow or unresponsive. This is especially relevant for a utility script that might run in CI/CD pipelines.♻️ Suggested improvement
def fetch_json(url: str): - with urllib.request.urlopen(url) as resp: + with urllib.request.urlopen(url, timeout=30) as resp: return json.load(resp) def fetch_text(url: str) -> str: - with urllib.request.urlopen(url) as resp: + with urllib.request.urlopen(url, timeout=30) as resp: return resp.read().decode()Regarding the Ruff S310 warnings about URL scheme auditing: since the URLs are constructed from hardcoded
https://patterns (GitHub API and raw content), the risk of file:// or other dangerous schemes is minimal.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/sync-evalhub-providers.py` around lines 34 - 41, The network calls in fetch_json and fetch_text use urllib.request.urlopen without a timeout, risking indefinite hangs; update both functions (fetch_json and fetch_text) to pass a timeout parameter to urllib.request.urlopen (e.g., a constant DEFAULT_TIMEOUT or an optional function parameter) so requests fail fast, and adjust callers if you expose a parameter; optionally catch urllib.error.URLError/timeout exceptions where these functions are used to surface a clear error in CI.
63-64: RFC 1123 sanitization may be incomplete for edge cases.The comment mentions RFC 1123 compliance, but only underscores are replaced with hyphens. Other invalid characters (uppercase letters, dots, consecutive hyphens, etc.) would pass through unmodified.
This is likely fine for the known upstream provider IDs, but if upstream adds providers with unusual naming, the script could generate invalid K8s resource names.
Consider a more robust sanitization if you want to be defensive:
import re safe_id = re.sub(r'[^a-z0-9-]', '-', provider_id.lower()).strip('-') safe_id = re.sub(r'-+', '-', safe_id) # collapse consecutive hyphens🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/sync-evalhub-providers.py` around lines 63 - 64, The current RFC 1123 sanitization only replaces underscores and can leave uppercase letters, dots, or repeated hyphens, so update the sanitization for provider_id used to compute safe_id in hack/sync-evalhub-providers.py: lower the provider_id, replace any character not in a-z0-9- with a hyphen, trim leading/trailing hyphens, and collapse consecutive hyphens so safe_id always meets RFC1123 (lowercase alphanumerics and single hyphens).
113-115: File cleanup before successful generation could leave empty directory on failure.The script deletes all existing
provider-*.yamlfiles before attempting to fetch and generate new ones. If the network calls fail after cleanup, the output directory will be left without any provider ConfigMaps.Consider a safer approach: generate to temporary files first, then atomically replace after success.
♻️ Suggested approach
+ # Generate to a temporary list first + new_files = [] for filename in filenames: if filename in EXCLUDE_FILES: print(f"Skipping {filename} (excluded)") continue print(f"Processing {filename}...") result = process_provider(filename, branch) if result: cm_file, provider_id = result - cm_files.append(cm_file) + new_files.append((cm_file, provider_id)) - provider_ids.append(provider_id) + # Only clean up after successful generation + for old in OUTPUT_DIR.glob("provider-*.yaml"): + old.unlink() + + cm_files = [f[0] for f in new_files] + provider_ids = [f[1] for f in new_files]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@hack/sync-evalhub-providers.py` around lines 113 - 115, The current loop that unlinks all existing provider-*.yaml files via OUTPUT_DIR.glob("provider-*.yaml") is unsafe because it deletes before generation; change the flow to first write generated ConfigMaps into temporary files or a temporary directory (e.g., OUTPUT_DIR / "tmp-*", or write files with a ".tmp" suffix), verify all network fetches and file writes succeed, then atomically replace the old files by using os.replace/Path.replace or by moving the temp dir contents into OUTPUT_DIR and deleting leftovers; ensure the old provider-*.yaml are removed only after successful replacement and add cleanup of the temp files on failure so the output directory is never left empty on network errors.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@hack/sync-evalhub-providers.py`:
- Around line 34-41: The network calls in fetch_json and fetch_text use
urllib.request.urlopen without a timeout, risking indefinite hangs; update both
functions (fetch_json and fetch_text) to pass a timeout parameter to
urllib.request.urlopen (e.g., a constant DEFAULT_TIMEOUT or an optional function
parameter) so requests fail fast, and adjust callers if you expose a parameter;
optionally catch urllib.error.URLError/timeout exceptions where these functions
are used to surface a clear error in CI.
- Around line 63-64: The current RFC 1123 sanitization only replaces underscores
and can leave uppercase letters, dots, or repeated hyphens, so update the
sanitization for provider_id used to compute safe_id in
hack/sync-evalhub-providers.py: lower the provider_id, replace any character not
in a-z0-9- with a hyphen, trim leading/trailing hyphens, and collapse
consecutive hyphens so safe_id always meets RFC1123 (lowercase alphanumerics and
single hyphens).
- Around line 113-115: The current loop that unlinks all existing
provider-*.yaml files via OUTPUT_DIR.glob("provider-*.yaml") is unsafe because
it deletes before generation; change the flow to first write generated
ConfigMaps into temporary files or a temporary directory (e.g., OUTPUT_DIR /
"tmp-*", or write files with a ".tmp" suffix), verify all network fetches and
file writes succeed, then atomically replace the old files by using
os.replace/Path.replace or by moving the temp dir contents into OUTPUT_DIR and
deleting leftovers; ensure the old provider-*.yaml are removed only after
successful replacement and add cleanup of the temp files on failure so the
output directory is never left empty on network errors.
ℹ️ Review info
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
config/configmaps/evalhub/provider-garak.yamlconfig/configmaps/evalhub/provider-guidellm.yamlconfig/configmaps/evalhub/provider-lighteval.yamlconfig/configmaps/evalhub/provider-lm-evaluation-harness.yamlcontrollers/evalhub/configmap.gocontrollers/evalhub/constants.gocontrollers/evalhub/deployment.gohack/sync-evalhub-providers.py
🚧 Files skipped from review as they are similar to previous changes (2)
- config/configmaps/evalhub/provider-garak.yaml
- controllers/evalhub/constants.go
There was a problem hiding this comment.
🧹 Nitpick comments (1)
controllers/evalhub/unit_test.go (1)
1345-1353: Add ownership assertions for copied provider ConfigMaps.Line 1345-1353 validates copied data, but not ownership. Add owner-reference checks to lock in GC/reconciliation guarantees.
Proposed test addition
// Verify the ConfigMap was created in the instance namespace with correct data copiedCM := &corev1.ConfigMap{} err = fakeClient.Get(ctx, types.NamespacedName{ Name: evalHubName + "-provider-testprovider", Namespace: instanceNamespace, }, copiedCM) require.NoError(t, err) assert.Equal(t, sourceProvider.Data["testprovider.yaml"], copiedCM.Data["testprovider.yaml"]) + + // Verify owner reference is set to EvalHub instance + require.Len(t, copiedCM.OwnerReferences, 1) + assert.Equal(t, evalHubName, copiedCM.OwnerReferences[0].Name) + assert.Equal(t, "EvalHub", copiedCM.OwnerReferences[0].Kind) + require.NotNil(t, copiedCM.OwnerReferences[0].Controller) + assert.True(t, *copiedCM.OwnerReferences[0].Controller) })🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@controllers/evalhub/unit_test.go` around lines 1345 - 1353, The test currently verifies copied ConfigMap data but not ownership—update the unit test so after fetching copiedCM you assert its ownerReferences include an owner pointing to the EvalHub (use evalHubName) and that the reference is a controller owner; specifically check copiedCM.OwnerReferences is non-empty and contains an entry with Name == evalHubName (and/or Kind matching your EvalHub type) and Controller != nil && *Controller == true (and if an evalHub variable with UID exists assert OwnerReference.UID == evalHub.ObjectMeta.UID); add these assertions immediately after the existing data equality check that uses copiedCM and sourceProvider.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@controllers/evalhub/unit_test.go`:
- Around line 1345-1353: The test currently verifies copied ConfigMap data but
not ownership—update the unit test so after fetching copiedCM you assert its
ownerReferences include an owner pointing to the EvalHub (use evalHubName) and
that the reference is a controller owner; specifically check
copiedCM.OwnerReferences is non-empty and contains an entry with Name ==
evalHubName (and/or Kind matching your EvalHub type) and Controller != nil &&
*Controller == true (and if an evalHub variable with UID exists assert
OwnerReference.UID == evalHub.ObjectMeta.UID); add these assertions immediately
after the existing data equality check that uses copiedCM and sourceProvider.
7d63227
into
trustyai-explainability:main
opendatahubtrusty operator namespace/etc/evalhub/config/providers/etc/evalhubwhen we're readyevalhub-controlplanenamespaceDEMO
Pictured: a "prefixed" CM for each provider is available in the
opendatahubtrusty operator namespacethis is created by the ODH meta-operator
Pictured: when Operator reconciles the EH CR instance, will copy over only the required CM for the providers in the destination namespace
Pictured: (each CM naturally as Owner to the EH CR instance)
Pictured: the EH server Pod gets the volume mount for the required Provider CMs
Pictured: an example CM content, to show the params.env has been injected as we expect also on the runtime end :)
Conclusions
When we're ready, we can switch also Config.yaml into
/etc/evalhub(or the likes) and then make the EH Go Server pick up the config from there,q.e.d.
Summary by CodeRabbit
New Features
Chores