feat: upgrade ai-architecture-charts llama-stack version to 0.7.4 and support deployment with llamastack operator#287
Conversation
|
@jianrongzhang89 does these changes tested with operator installation method ? |
|
@jianrongzhang89 has it been tested on both |
- Upgrade llm-service subchart from 0.5.4 to 0.5.8 - Upgrade llama-stack subchart from 0.5.3 to 0.6.0 - Update LlamaStack endpoint from /v1/openai/v1 to /v1 across all configs - Update deployment templates, Makefile, documentation, and tests Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add --max-old-space-size=4096 flag to prevent OOM errors during webpack build. This resolves exit code 137 (killed by OOM) when building React UI in Docker. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… 0.6.0 - Upgrade llama-stack subchart from 0.6.0 to 0.7.3 - Update llama-stack-instance appVersion to 0.6.0 - Update distribution-starter image tag from 0.2.22 to 0.6.0 - Update Chart.lock with new dependency versions Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add RUN_CONFIG_PATH=/app-config/config.yaml to env lists in both llama-stack and llama-stack-instance configurations. This was being lost due to Helm array replacement behavior when parent chart overrides subchart env values. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
LlamaStack 0.6.0+ returns model IDs in provider-prefixed format:
"{provider_id}/{model_id}" (e.g., "llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instruct")
Add provider-prefixed ID as the first candidate when calling the
chat completions API, constructed from serviceName and summarize_model_id.
This fixes 404 errors when requesting chat completions from llama-stack.
Fallback priority:
1. provider-prefixed (new format)
2. serviceName
3. modelName
4. summarize_model_id (original format)
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Move NODE_OPTIONS environment variable to the beginning of Dockerfile before any COPY/RUN commands to ensure it's always available and not affected by Docker layer caching. Also add NODE_OPTIONS to Makefile's local yarn build step to prevent OOM errors when building outside Docker. This resolves recurring exit code 137 (OOM) errors during React UI builds. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Increase --max-old-space-size from 4096MB to 8192MB (8GB) to handle larger webpack builds and prevent OOM errors during compilation. Updated in: - Dockerfile.react-ui (ENV NODE_OPTIONS) - Makefile (local build step) - package.json (build scripts) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Revert --max-old-space-size from 8192MB back to 4096MB (4GB). The 8GB setting cannot be used on systems with only 2GB total RAM. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…y with llama-stack 0.6.0 Update llama-stack-instance subchart configuration to work properly when deployed via the llama-stack operator (LlamaStackDistribution CR). API changes: - Remove 'telemetry' API (not supported in llama-stack 0.6.0) - Add 'post_training' API (required in 0.6.0) Provider configuration format updates for 0.6.0: - vLLM provider: change 'url' to 'base_url' - files provider: metadata_store now uses 'backend: sql_default' + 'table_name' (instead of 'type: sqlite' + 'db_path') - eval/datasetio providers: kvstore uses 'backend: kv_default' + 'namespace' (instead of 'type: sqlite' + 'db_path') - vector_io provider: rename 'kvstore' field to 'persistence' - agents provider: restructure to use 'persistence.agent_state' and 'persistence.responses' with backend/namespace format - metadata store: use 'backend' + 'namespace' (not 'type'/'db_path') - Add post_training provider (torchtune-cpu) Operator compatibility fix: - Change RUN_CONFIG_PATH from '/app-config/config.yaml' to '/etc/llama-stack/config.yaml' to match where the llama-stack operator mounts the configuration ConfigMap These changes enable proper model registration when llama-stack is deployed via the operator, fixing the "Model not found" errors. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Refactor provider-prefixed model ID handling into a centralized function `get_llamastack_model_id_candidates()` that is used by both: 1. `summarize_with_llm()` - for vLLM "Analyze with AI" functionality 2. `LlamaChatBot` - for console plugin chat with Prometheus This ensures consistent model ID resolution across all llama-stack API calls, fixing "Model not found" errors in the console plugin chat functionality. Changes: - Add `get_llamastack_model_id_candidates()` function in `llm_client.py` that returns prioritized list of candidate model IDs: 1. provider-prefixed: "llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instruct" 2. serviceName: "llama-3-1-8b-instruct" 3. modelName: alternate ID if present 4. original model_id: user-facing key - Update `summarize_with_llm()` to use centralized function - Update `LlamaChatBot._extract_model_name()` to use centralized function This fix ensures both the analyze and chat features work correctly with llama-stack 0.6.0 in both Helm chart and operator deployment modes. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1. Fix parent RAG chart RUN_CONFIG_PATH for operator mode
- Update deploy/helm/rag/values.yaml to use /etc/llama-stack/config.yaml
- This ensures the operator mount path is used when llama-stack-instance
is enabled via the RAG umbrella chart
- Previously, the parent chart's env list would override the subchart's
correct default, breaking operator deployments
2. Remove dead code after candidate-ID refactor
- Remove unused model_id_to_use variable in summarize_with_llm()
- This was computed but never used after refactoring to use
get_llamastack_model_id_candidates()
- Avoids confusion in code review and keeps lint tools quiet
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Address review comment about behavioral asymmetry: - LlamaChatBot was using only candidates[0] (first model ID) - summarize_with_llm tries all candidates with fallback This could lead to chat-only failures if the first candidate doesn't work but a later candidate would. Changes: - Split _extract_model_name() into two methods: - _get_model_id_candidates(): Returns full ordered list - _extract_model_name(): Returns first candidate (for backward compat) - Add retry logic in chat() method: - On first iteration, try each candidate until one succeeds - Remember working_model_id for subsequent iterations - Only retry on "model not found" errors (404, "not found" in message) - Propagate other errors immediately - Pass working_model_id to _try_direct_tool_call() for consistency This ensures LlamaChatBot has the same resilience as summarize_with_llm() when dealing with llama-stack 0.6.0 provider-prefixed model IDs. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add tests for get_llamastack_model_id_candidates() covering: - Provider-prefixed ID generation (e.g., "provider/model-id") - ModelName inclusion in candidate list - Fallback behavior when serviceName is missing - Unknown model handling - Duplicate prevention logic - Priority ordering verification - External model compatibility Also improve docstring in LlamaChatBot._get_model_id_candidates() to clarify it's a thin wrapper around shared function. Addresses review feedback on PR rh-ai-quickstart#286 regarding test coverage gaps. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Changed mock path from 'src.core.llm_client.get_model_config' to 'src.core.model_config_manager.get_model_config' because the function uses a local import: `from .model_config_manager import get_model_config` All 7 tests now pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ndidates Removed duplicate `from .model_config_manager import get_model_config` inside get_llamastack_model_id_candidates() since it's already imported at module scope (line 15). Benefits: - Avoids duplicate symbol lookups - Consistent with how summarize_with_llm() uses get_model_config - Allows consistent patching across all functions in llm_client.py Updated test mocks to patch 'src.core.llm_client.get_model_config' (where symbol is used) rather than the original module. All 7 tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The llama-stack chart (v0.7.3) from ai-architecture-charts automatically injects POSTGRES_* env vars when pgvector.enabled=true (which is the default). Our values.yaml was duplicating these, causing Kubernetes warnings about hidden definitions. Removed duplicate env var definitions since the base chart already handles this via its pgvector integration feature. Fixes warnings: spec.template.spec.containers[0].env[10]: hides previous definition of "POSTGRES_USER" spec.template.spec.containers[0].env[11]: hides previous definition of "POSTGRES_PASSWORD" (and POSTGRES_HOST, POSTGRES_PORT, POSTGRES_DBNAME) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
… fork Remove llama-stack-instance subchart and enhance llama-stack chart to support both operator-based (LlamaStackDistribution CRD) and traditional Helm deployment modes using the managedByOperator flag. Key Changes: - Remove llama-stack-instance subchart entirely (all files deleted) - Update Chart.yaml to use llama-stack chart from fork (operator branch) - Add automated fork management in Makefile (clone, package, extract) - Consolidate all llama-stack configuration into single values section - Add operator mode configuration (userConfig, network) to prevent nil errors - Remove LLAMA_STACK_CHART_PREFIX variable (no longer needed) - Update helm_llama_stack_args to pass managedByOperator flag Deployment Modes: - Default (USE_LLAMA_STACK_OPERATOR=false): deploys as Deployment Service: llamastack, via llama-stack.managedByOperator=false - Operator (USE_LLAMA_STACK_OPERATOR=true): creates LlamaStackDistribution CRD Service: llamastack-service, via llama-stack.managedByOperator=true Requires: RHOAI 3.x with LlamaStack operator enabled Fork Details: - Repository: https://github.com/jianrongzhang89/ai-architecture-charts - Branch: operator - Version: 0.7.3 - Auto-managed by: make depend (clones to ../ai-architecture-charts-fork) Migration Path: Once llama-stack 0.6.0+ is released with operator support, update Chart.yaml to use official repository and remove fork-related Makefile code. See deploy/helm/rag/LLAMA_STACK_FORK_USAGE.md for details. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add network.allowedFrom configuration to allow pods in the same namespace to connect to llama-stack when deployed via LlamaStackDistribution CRD. This fixes the NetworkPolicy that was blocking the MCP server from accessing the llamastack-service, causing 'Local model is not available' errors when chatting with Prometheus. Issue: LlamaStack operator creates a restrictive NetworkPolicy that only allows access from specific labeled pods. The MCP server was being blocked. Solution: Configure allowedFrom.namespaces to include the release namespace, allowing all pods in the same namespace to connect. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Change RHOAI_VERSION auto-detection to only run when the variable is undefined, not just when it's not from command line. This fixes an issue where recursive make calls (e.g., during install-mcp-server -> generate-model-config) would re-detect RHOAI_VERSION even though it was already exported from the parent make. Issue: When the Makefile is re-parsed in a recursive make call, exported variables have origin='environment', not 'command line'. The previous condition 'ifneq ($(origin RHOAI_VERSION),command line)' would always be true for exported variables, causing auto-detection to run again and potentially fail, resulting in RHOAI_VERSION=2 even on RHOAI 3.x clusters. This caused the validation error: USE_LLAMA_STACK_OPERATOR=true requires RHOAI_VERSION=3 Solution: Only auto-detect when origin is 'undefined', which means the variable is truly not set. If it's 'environment' (exported from parent) or 'command line' (explicit override), skip auto-detection. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Change llama-stack NetworkPolicy allowedFrom configuration to be set
dynamically by the Makefile instead of using a Helm template in values.yaml.
Issue: Helm templates in values.yaml are not evaluated - they're just data.
When we set 'allowedFrom.namespaces: ["{{ .Release.Namespace }}"]' in
values.yaml, the literal string "{{ .Release.Namespace }}" was passed
to the LlamaStackDistribution CR, causing a validation error:
'a valid label must be an empty string or consist of alphanumeric
characters...'
Solution: Remove the template from values.yaml and instead pass the actual
namespace value via --set flag in helm_llama_stack_args when operator mode
is enabled:
--set llama-stack.network.allowedFrom.namespaces[0]=$(NAMESPACE)
This ensures the LlamaStackDistribution CR receives the actual namespace
name (e.g., 'jianrong') instead of a template string.
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add quotes around the llama-stack.network.allowedFrom.namespaces[0] argument to prevent shell interpretation of square brackets. Without quotes, the shell would interpret [] as glob pattern characters which could cause errors or unexpected behavior. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace forked llama-stack chart with official release 0.7.4 from rh-ai-quickstart/ai-architecture-charts. - Update Chart.yaml to use llama-stack 0.7.4 from official repository - Remove fork setup logic from Makefile (setup-llama-stack-fork target) - Simplify depend target to use standard helm dependency update - Replace LLAMA_STACK_FORK_USAGE.md with LLAMA_STACK_USAGE.md - Update Chart.lock to reference official chart Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
helm list (without --all) hides failed/uninstalling releases, so helm upgrade --install would find a stuck release and fail with "has no deployed releases". Now we detect any stale release via --all, call helm uninstall, and poll until it disappears (with a fallback that force-deletes the Helm secret if the release is stuck). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Make operator mode vs Helm charts deployment mode terminology explicit and consistent in the info messages printed during USE_LLAMA_STACK_OPERATOR auto-detection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The short-circuit check in install_operator only matched the exact startingCSV version. When a newer CSV (e.g. v0.144.0-2) was already Succeeded but startingCSV referenced an older version (v0.144.0-1), status.installedCSV was never populated by OLM, causing a 10-minute hang. Also removes the STARTING_CSV early-exit block that skipped subscription creation when the CSV was already Succeeded. This block caused verify-operators-ready failures because it left no subscription for logging/loki after their subscriptions were deleted — check_operator is the correct gatekeeper and install_operator should always create the subscription when reached. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
5ca66b1 to
a96ee21
Compare
…Summarizer Helm Operator
a96ee21 to
4eee744
Compare
@redhatHameed yes Operator installation is tested: on RHAOI 3.* cluster: tested AI Observability Summarizer Operator with both llamastack chart mode and llamastack operator mode and RHAOI 2.* cluster with llamastack chart mode successfully. |
@sgahlot test completed on both clusters successfully. |
Testing Results — Cycles 1-2Tested on ROSA cluster with RHOAI 3.x, using PR #287 branch ( Test Plan
Cycle 1 —
|
Summary
Testing
Deployed and tested in the cluster with both modes (llamstack chart and llamastack operator) successfully:
The vLLM page Analyze with AI and AI assistant, Analyze OpenShift and AI assistance, and Chat with Prometheus with Llama-3.1-8B-Instruct model work as expected.
llama-3-1-8b-instruct/meta-llama/Llama-3.1-8B-Instructproperly registered/v1/chat/completionsendpoint worksChecklist
make test