Skip to content

feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator#287

Merged
jianrongzhang89 merged 28 commits intorh-ai-quickstart:devfrom
jianrongzhang89:arch-chart-operator-support
Apr 30, 2026
Merged

feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator#287
jianrongzhang89 merged 28 commits intorh-ai-quickstart:devfrom
jianrongzhang89:arch-chart-operator-support

Conversation

@jianrongzhang89
Copy link
Copy Markdown
Collaborator

@jianrongzhang89 jianrongzhang89 commented Apr 20, 2026

Summary

  • Upgrade ai-archiecture-charts llama-stack version to 0.7.4 with supports deployment of llama-stack with both helm chart and operator modes. The distribution-starter image is upgraded to 0.6.0.
  • Removes redundant llama-stack-instance Helm templates in favor of operator-based deployment
  • Upgraded the Helm Operator to support deployment RAG with llamastack via LlamaStack operator mode. A new "Managed By LlamaStack Operator" flag is added to enable this mode (it is off by default):
Screenshot 2026-04-29 at 8 53 01 AM

Testing

Deployed and tested in the cluster with both modes (llamstack chart and llamastack operator) successfully:

  1. make install in RHOAI 3.X and 2.X clusters
  2. make uninstall
  3. Install by the operator on RHOAI 3.X cluster: both LlamaStack chart mode and operator mode (using "Managed By LlamaStack Operator' flag) and on RHOAI 2.X cluster with LlamaStack chart mode.
  4. Uninstall with the operator

The vLLM page Analyze with AI and AI assistant, Analyze OpenShift and AI assistance, and Chat with Prometheus with Llama-3.1-8B-Instruct model work as expected.

  • ✅ Helm chart mode: vLLM Analyze with AI works correctly
  • ✅ Operator mode: vLLM Analyze with AI works correctly with updated MCP server
  • ✅ Model registration: llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instruct properly registered
  • ✅ Chat completions: OpenAI-compatible /v1/chat/completions endpoint works

Checklist

  • Verify on the cluster
  • Update tests if applicable and run make test
  • Add screenshots (if applicable)
  • Update readme (if applicable)

@jianrongzhang89 jianrongzhang89 changed the title refactor: migrate llama-stack to operator mode with chart fork [WIP] migrate llama-stack to operator mode with chart fork Apr 20, 2026
@jianrongzhang89 jianrongzhang89 changed the title [WIP] migrate llama-stack to operator mode with chart fork feat: upgrade ai-archiecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator Apr 22, 2026
@jianrongzhang89 jianrongzhang89 marked this pull request as ready for review April 22, 2026 20:24
@redhatHameed
Copy link
Copy Markdown
Collaborator

@jianrongzhang89 does these changes tested with operator installation method ?

@sgahlot
Copy link
Copy Markdown
Collaborator

sgahlot commented Apr 23, 2026

@jianrongzhang89 has it been tested on both RHOAI 2.x as well as RHOAI 3.x clusters?

Copy link
Copy Markdown
Collaborator

@tsisodia10 tsisodia10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good!

@jianrongzhang89 jianrongzhang89 changed the title feat: upgrade ai-archiecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator [WIP] feat: upgrade ai-archiecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator Apr 28, 2026
jianrongzhang89 and others added 17 commits April 28, 2026 17:20
- Upgrade llm-service subchart from 0.5.4 to 0.5.8
- Upgrade llama-stack subchart from 0.5.3 to 0.6.0
- Update LlamaStack endpoint from /v1/openai/v1 to /v1 across all configs
- Update deployment templates, Makefile, documentation, and tests

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add --max-old-space-size=4096 flag to prevent OOM errors during webpack build.
This resolves exit code 137 (killed by OOM) when building React UI in Docker.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… 0.6.0

- Upgrade llama-stack subchart from 0.6.0 to 0.7.3
- Update llama-stack-instance appVersion to 0.6.0
- Update distribution-starter image tag from 0.2.22 to 0.6.0
- Update Chart.lock with new dependency versions

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add RUN_CONFIG_PATH=/app-config/config.yaml to env lists in both
llama-stack and llama-stack-instance configurations. This was being
lost due to Helm array replacement behavior when parent chart
overrides subchart env values.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
LlamaStack 0.6.0+ returns model IDs in provider-prefixed format:
"{provider_id}/{model_id}" (e.g., "llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instruct")

Add provider-prefixed ID as the first candidate when calling the
chat completions API, constructed from serviceName and summarize_model_id.
This fixes 404 errors when requesting chat completions from llama-stack.

Fallback priority:
1. provider-prefixed (new format)
2. serviceName
3. modelName
4. summarize_model_id (original format)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move NODE_OPTIONS environment variable to the beginning of Dockerfile
before any COPY/RUN commands to ensure it's always available and not
affected by Docker layer caching.

Also add NODE_OPTIONS to Makefile's local yarn build step to prevent
OOM errors when building outside Docker.

This resolves recurring exit code 137 (OOM) errors during React UI builds.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Increase --max-old-space-size from 4096MB to 8192MB (8GB) to handle
larger webpack builds and prevent OOM errors during compilation.

Updated in:
- Dockerfile.react-ui (ENV NODE_OPTIONS)
- Makefile (local build step)
- package.json (build scripts)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Revert --max-old-space-size from 8192MB back to 4096MB (4GB).
The 8GB setting cannot be used on systems with only 2GB total RAM.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…y with llama-stack 0.6.0

Update llama-stack-instance subchart configuration to work properly when
deployed via the llama-stack operator (LlamaStackDistribution CR).

API changes:
- Remove 'telemetry' API (not supported in llama-stack 0.6.0)
- Add 'post_training' API (required in 0.6.0)

Provider configuration format updates for 0.6.0:
- vLLM provider: change 'url' to 'base_url'
- files provider: metadata_store now uses 'backend: sql_default' + 'table_name'
  (instead of 'type: sqlite' + 'db_path')
- eval/datasetio providers: kvstore uses 'backend: kv_default' + 'namespace'
  (instead of 'type: sqlite' + 'db_path')
- vector_io provider: rename 'kvstore' field to 'persistence'
- agents provider: restructure to use 'persistence.agent_state' and
  'persistence.responses' with backend/namespace format
- metadata store: use 'backend' + 'namespace' (not 'type'/'db_path')
- Add post_training provider (torchtune-cpu)

Operator compatibility fix:
- Change RUN_CONFIG_PATH from '/app-config/config.yaml' to
  '/etc/llama-stack/config.yaml' to match where the llama-stack operator
  mounts the configuration ConfigMap

These changes enable proper model registration when llama-stack is deployed
via the operator, fixing the "Model not found" errors.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactor provider-prefixed model ID handling into a centralized function
`get_llamastack_model_id_candidates()` that is used by both:
1. `summarize_with_llm()` - for vLLM "Analyze with AI" functionality
2. `LlamaChatBot` - for console plugin chat with Prometheus

This ensures consistent model ID resolution across all llama-stack API calls,
fixing "Model not found" errors in the console plugin chat functionality.

Changes:
- Add `get_llamastack_model_id_candidates()` function in `llm_client.py`
  that returns prioritized list of candidate model IDs:
  1. provider-prefixed: "llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instruct"
  2. serviceName: "llama-3-1-8b-instruct"
  3. modelName: alternate ID if present
  4. original model_id: user-facing key

- Update `summarize_with_llm()` to use centralized function
- Update `LlamaChatBot._extract_model_name()` to use centralized function

This fix ensures both the analyze and chat features work correctly with
llama-stack 0.6.0 in both Helm chart and operator deployment modes.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1. Fix parent RAG chart RUN_CONFIG_PATH for operator mode
   - Update deploy/helm/rag/values.yaml to use /etc/llama-stack/config.yaml
   - This ensures the operator mount path is used when llama-stack-instance
     is enabled via the RAG umbrella chart
   - Previously, the parent chart's env list would override the subchart's
     correct default, breaking operator deployments

2. Remove dead code after candidate-ID refactor
   - Remove unused model_id_to_use variable in summarize_with_llm()
   - This was computed but never used after refactoring to use
     get_llamastack_model_id_candidates()
   - Avoids confusion in code review and keeps lint tools quiet

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Address review comment about behavioral asymmetry:
- LlamaChatBot was using only candidates[0] (first model ID)
- summarize_with_llm tries all candidates with fallback

This could lead to chat-only failures if the first candidate doesn't work
but a later candidate would.

Changes:
- Split _extract_model_name() into two methods:
  - _get_model_id_candidates(): Returns full ordered list
  - _extract_model_name(): Returns first candidate (for backward compat)

- Add retry logic in chat() method:
  - On first iteration, try each candidate until one succeeds
  - Remember working_model_id for subsequent iterations
  - Only retry on "model not found" errors (404, "not found" in message)
  - Propagate other errors immediately

- Pass working_model_id to _try_direct_tool_call() for consistency

This ensures LlamaChatBot has the same resilience as summarize_with_llm()
when dealing with llama-stack 0.6.0 provider-prefixed model IDs.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add tests for get_llamastack_model_id_candidates() covering:
- Provider-prefixed ID generation (e.g., "provider/model-id")
- ModelName inclusion in candidate list
- Fallback behavior when serviceName is missing
- Unknown model handling
- Duplicate prevention logic
- Priority ordering verification
- External model compatibility

Also improve docstring in LlamaChatBot._get_model_id_candidates()
to clarify it's a thin wrapper around shared function.

Addresses review feedback on PR rh-ai-quickstart#286 regarding test coverage gaps.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed mock path from 'src.core.llm_client.get_model_config' to
'src.core.model_config_manager.get_model_config' because the function
uses a local import: `from .model_config_manager import get_model_config`

All 7 tests now pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ndidates

Removed duplicate `from .model_config_manager import get_model_config`
inside get_llamastack_model_id_candidates() since it's already imported
at module scope (line 15).

Benefits:
- Avoids duplicate symbol lookups
- Consistent with how summarize_with_llm() uses get_model_config
- Allows consistent patching across all functions in llm_client.py

Updated test mocks to patch 'src.core.llm_client.get_model_config'
(where symbol is used) rather than the original module.

All 7 tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The llama-stack chart (v0.7.3) from ai-architecture-charts
automatically injects POSTGRES_* env vars when pgvector.enabled=true
(which is the default). Our values.yaml was duplicating these,
causing Kubernetes warnings about hidden definitions.

Removed duplicate env var definitions since the base chart already
handles this via its pgvector integration feature.

Fixes warnings:
  spec.template.spec.containers[0].env[10]: hides previous definition of "POSTGRES_USER"
  spec.template.spec.containers[0].env[11]: hides previous definition of "POSTGRES_PASSWORD"
  (and POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DBNAME)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… fork

Remove llama-stack-instance subchart and enhance llama-stack chart to support
both operator-based (LlamaStackDistribution CRD) and traditional Helm deployment
modes using the managedByOperator flag.

Key Changes:
- Remove llama-stack-instance subchart entirely (all files deleted)
- Update Chart.yaml to use llama-stack chart from fork (operator branch)
- Add automated fork management in Makefile (clone, package, extract)
- Consolidate all llama-stack configuration into single values section
- Add operator mode configuration (userConfig, network) to prevent nil errors
- Remove LLAMA_STACK_CHART_PREFIX variable (no longer needed)
- Update helm_llama_stack_args to pass managedByOperator flag

Deployment Modes:
- Default (USE_LLAMA_STACK_OPERATOR=false): deploys as Deployment
  Service: llamastack, via llama-stack.managedByOperator=false

- Operator (USE_LLAMA_STACK_OPERATOR=true): creates LlamaStackDistribution CRD
  Service: llamastack-service, via llama-stack.managedByOperator=true
  Requires: RHOAI 3.x with LlamaStack operator enabled

Fork Details:
- Repository: https://github.com/jianrongzhang89/ai-architecture-charts
- Branch: operator
- Version: 0.7.3
- Auto-managed by: make depend (clones to ../ai-architecture-charts-fork)

Migration Path:
Once llama-stack 0.6.0+ is released with operator support, update Chart.yaml
to use official repository and remove fork-related Makefile code.
See deploy/helm/rag/LLAMA_STACK_FORK_USAGE.md for details.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
jianrongzhang89 and others added 8 commits April 28, 2026 17:20
Add network.allowedFrom configuration to allow pods in the same namespace
to connect to llama-stack when deployed via LlamaStackDistribution CRD.

This fixes the NetworkPolicy that was blocking the MCP server from
accessing the llamastack-service, causing 'Local model is not available'
errors when chatting with Prometheus.

Issue: LlamaStack operator creates a restrictive NetworkPolicy that only
allows access from specific labeled pods. The MCP server was being blocked.

Solution: Configure allowedFrom.namespaces to include the release namespace,
allowing all pods in the same namespace to connect.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Change RHOAI_VERSION auto-detection to only run when the variable is
undefined, not just when it's not from command line. This fixes an issue
where recursive make calls (e.g., during install-mcp-server ->
generate-model-config) would re-detect RHOAI_VERSION even though it was
already exported from the parent make.

Issue: When the Makefile is re-parsed in a recursive make call, exported
variables have origin='environment', not 'command line'. The previous
condition 'ifneq ($(origin RHOAI_VERSION),command line)' would always
be true for exported variables, causing auto-detection to run again and
potentially fail, resulting in RHOAI_VERSION=2 even on RHOAI 3.x clusters.

This caused the validation error:
  USE_LLAMA_STACK_OPERATOR=true requires RHOAI_VERSION=3

Solution: Only auto-detect when origin is 'undefined', which means the
variable is truly not set. If it's 'environment' (exported from parent)
or 'command line' (explicit override), skip auto-detection.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Change llama-stack NetworkPolicy allowedFrom configuration to be set
dynamically by the Makefile instead of using a Helm template in values.yaml.

Issue: Helm templates in values.yaml are not evaluated - they're just data.
When we set 'allowedFrom.namespaces: ["{{ .Release.Namespace }}"]' in
values.yaml, the literal string "{{ .Release.Namespace }}" was passed
to the LlamaStackDistribution CR, causing a validation error:
  'a valid label must be an empty string or consist of alphanumeric
  characters...'

Solution: Remove the template from values.yaml and instead pass the actual
namespace value via --set flag in helm_llama_stack_args when operator mode
is enabled:
  --set llama-stack.network.allowedFrom.namespaces[0]=$(NAMESPACE)

This ensures the LlamaStackDistribution CR receives the actual namespace
name (e.g., 'jianrong') instead of a template string.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add quotes around the llama-stack.network.allowedFrom.namespaces[0]
argument to prevent shell interpretation of square brackets.

Without quotes, the shell would interpret [] as glob pattern characters
which could cause errors or unexpected behavior.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace forked llama-stack chart with official release 0.7.4 from
rh-ai-quickstart/ai-architecture-charts.

- Update Chart.yaml to use llama-stack 0.7.4 from official repository
- Remove fork setup logic from Makefile (setup-llama-stack-fork target)
- Simplify depend target to use standard helm dependency update
- Replace LLAMA_STACK_FORK_USAGE.md with LLAMA_STACK_USAGE.md
- Update Chart.lock to reference official chart

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
helm list (without --all) hides failed/uninstalling releases, so
helm upgrade --install would find a stuck release and fail with
"has no deployed releases". Now we detect any stale release via
--all, call helm uninstall, and poll until it disappears (with a
fallback that force-deletes the Helm secret if the release is stuck).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Make operator mode vs Helm charts deployment mode terminology
explicit and consistent in the info messages printed during
USE_LLAMA_STACK_OPERATOR auto-detection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The short-circuit check in install_operator only matched the exact
startingCSV version. When a newer CSV (e.g. v0.144.0-2) was already
Succeeded but startingCSV referenced an older version (v0.144.0-1),
status.installedCSV was never populated by OLM, causing a 10-minute
hang.

Also removes the STARTING_CSV early-exit block that skipped subscription
creation when the CSV was already Succeeded. This block caused
verify-operators-ready failures because it left no subscription for
logging/loki after their subscriptions were deleted — check_operator is
the correct gatekeeper and install_operator should always create the
subscription when reached.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@jianrongzhang89 jianrongzhang89 force-pushed the arch-chart-operator-support branch from 5ca66b1 to a96ee21 Compare April 28, 2026 21:20
@jianrongzhang89 jianrongzhang89 force-pushed the arch-chart-operator-support branch from a96ee21 to 4eee744 Compare April 29, 2026 12:44
@jianrongzhang89 jianrongzhang89 changed the title [WIP] feat: upgrade ai-archiecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator [WIP] feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator Apr 29, 2026
@jianrongzhang89
Copy link
Copy Markdown
Collaborator Author

@jianrongzhang89 does these changes tested with operator installation method ?

@redhatHameed yes Operator installation is tested: on RHAOI 3.* cluster: tested AI Observability Summarizer Operator with both llamastack chart mode and llamastack operator mode and RHAOI 2.* cluster with llamastack chart mode successfully.

@jianrongzhang89 jianrongzhang89 changed the title [WIP] feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator Apr 29, 2026
Comment thread deploy/helm/aiobs-stack/templates/llamastack-config.yaml
Comment thread deploy/helm/rag/LLAMA_STACK_USAGE.md Outdated
@jianrongzhang89
Copy link
Copy Markdown
Collaborator Author

@jianrongzhang89 has it been tested on both RHOAI 2.x as well as RHOAI 3.x clusters?

@sgahlot test completed on both clusters successfully.

Copy link
Copy Markdown
Collaborator

@makon57 makon57 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sgahlot
Copy link
Copy Markdown
Collaborator

sgahlot commented Apr 30, 2026

Testing Results — Cycles 1-2

Tested on ROSA cluster with RHOAI 3.x, using PR #287 branch (arch-chart-operator-support).

Test Plan

Cycle Deploy Method LlamaStack Mode Status
1 make install USE_LLAMA_STACK_OPERATOR=true Operator ✅ Pass
2 Operator install (OLM) Operator ✅ Pass

Cycle 1 — make install with LlamaStack Operator

Deployment verification:

  • All pods running across 4 namespaces (ai-observability, openshift-logging, observability-hub, openshift-cluster-observability-operator)
  • LlamaStack confirmed using operator mode:
    • LlamaStackDistribution CR in Ready phase (operator v0.4.0, server v0.6.0)
    • Deployment owned by LlamaStackDistribution (ownerReference)
    • Labels: managed-by: llama-stack-operator

Korrel8r verification:

  • korrel8r_get_correlated invoked for all test pods — returned substantial correlated data:
    • jianrong/alert-example: 212K chars (logs + traces)
    • ai-observability/bad-image-test: 120K chars (correlated metrics)
    • ai-observability/crash-loop-test: 130K chars (correlated metrics)
    • jianrong/my-app-example: 18K chars (correlated metrics)
  • Loki CRB fix (commit afb7376) confirmed working — CRBs correctly reference openshift-logging-loki-stack-tenant-logs

Uninstall: Clean — all application and infrastructure resources removed.

Cycle 2 — Operator Install with LlamaStack Operator

Deployment verification:

  • All pods running across all 4 namespaces
  • LlamaStack confirmed using operator mode (same verification as cycle 1)

Korrel8r verification:

  • korrel8r_get_correlated invoked — initial broad query (k8s:Pod:{}) timed out (8s timeout, expected for cluster-wide query), then successfully used get_correlated_logs with specific pod queries
  • Loki CRB fix confirmed — CRBs correctly reference openshift-logging-cluster-ai-observability-loki-tenant-logs (operator path with alias)
  • 3-query diagnostic flow (error pods → detailed logs → investigate further) produced comprehensive analysis using Prometheus metrics + korrel8r log correlation

Uninstall: Clean — all application and infrastructure resources removed.

Loki CRB Fix Verification (commit afb7376)

The fix correctly handles both deployment paths:

  • make install: CRBs → openshift-logging-loki-stack-tenant-logs
  • Operator: CRBs → openshift-logging-cluster-ai-observability-loki-tenant-logs

The Helm sub-chart alias (loki in aiobs-stack/Chart.yaml) overrides .Chart.Name in the loki sub-chart's fullname helper, producing -loki- instead of -loki-stack- in the operator path.

Copy link
Copy Markdown
Collaborator

@sgahlot sgahlot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jianrongzhang89 jianrongzhang89 merged commit 99ca59c into rh-ai-quickstart:dev Apr 30, 2026
2 checks passed
@jianrongzhang89 jianrongzhang89 deleted the arch-chart-operator-support branch April 30, 2026 14:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants