Skip to content

Commit 374c7ff

Browse files
committed
feat: Enhance session handling and observability improvements
- Refactored session management to improve clarity and efficiency, including the removal of self-referential parent-session-id annotations. - Updated session workspace path handling to be relative to the content service's StateBaseDir, simplifying path management. - Introduced graceful shutdown for the content service, enhancing reliability during server termination. - Enhanced observability stack with new Grafana dashboard configurations and metrics for session lifecycle tracking. - Cleaned up unused code and improved logging for better debugging and maintenance. chore: Update .gitignore and remove obsolete deployment documentation - Added build log and log file patterns to .gitignore to prevent accidental commits. - Deleted outdated deployment documentation files: DEPLOYMENT_CHANGES.md, DIFF_IMPROVEMENTS.md, S3_MIGRATION_GAPS.md, and OPENSHIFT_SETUP.md, which are no longer relevant to the current architecture. - Cleaned up observability-related files, including Grafana and Prometheus configurations, to streamline the observability stack. feat: Enhance operator metrics and session handling - Introduced Prometheus metrics for monitoring session lifecycle, including startup duration, phase transitions, and error tracking. - Updated session handling to record metrics during reconciliation, including session creation and completion. - Refactored session management logic to ensure consistent behavior across API and kubectl session creations. - Increased QPS and Burst settings for Kubernetes client to improve performance under load. - Added a new Service and ServiceMonitor for exposing operator metrics in the ambient-code namespace. feat: Refactor AgenticSession handling to use Pods instead of Jobs - Updated the operator to create and manage Pods directly for AgenticSessions, improving startup speed and reducing complexity. - Changed environment variable references and logging to reflect the transition from Jobs to Pods. - Adjusted cleanup logic to handle Pods appropriately, including service creation and monitoring. - Modified deployment configurations to ensure compatibility with the new Pod-based architecture. feat: Implement S3 storage configuration for session artifacts - Added support for S3-compatible storage in the settings section, allowing users to configure S3 endpoint, bucket, region, access key, and secret key. - Updated the operator to persist session state and artifacts in S3, replacing the previous temporary content pod mechanism. - Removed deprecated references to temporary content pods and PVCs, transitioning to an EmptyDir storage model with S3 integration. - Enhanced the operator's handling of S3 configuration, ensuring proper validation and logging for S3 settings. - Updated Makefile to include new build targets for state-sync image and MinIO setup. feat: Enhance operator deployment with controller-runtime features - Added command-line arguments for metrics and health probe endpoints, enabling better observability. - Implemented concurrent reconciliation with a configurable maximum, improving performance. - Updated Dockerfile to use ENTRYPOINT for better argument handling. - Enhanced health checks with HTTP probes for liveness and readiness. - Updated README to reflect new configuration options and features. feat: Enhance observability stack deployment and cleanup in Makefile - Added new targets for deploying and cleaning up the observability stack, including OpenTelemetry and Grafana. - Introduced commands for accessing Grafana and Prometheus dashboards. - Updated .gitignore to include secrets template for MinIO credentials. - Removed deprecated image-prepuller DaemonSet and associated metrics service from manifests. - Updated Makefile to reflect changes in observability management and improve user experience. refactor: Clean up observability stack and enhance session handling - Removed obsolete observability stack deployment commands from Makefile. - Updated session handling in the operator to improve clarity and efficiency. - Introduced a new state sync image in deployment scripts and updated related configurations. - Refactored metrics handling for session lifecycle, ensuring consistent error tracking and performance monitoring. - Cleaned up unused code and improved readability across multiple files. feat: Refactor S3 storage configuration in settings and operator - Replaced S3_ENABLED with STORAGE_MODE to allow selection between shared and custom storage options. - Updated settings section to include radio buttons for storage mode selection, enhancing user experience. - Modified operator session handling to read and apply storage mode, ensuring proper configuration for S3 settings. - Improved logging for storage mode usage, clarifying the configuration process for users.
1 parent 41bac1f commit 374c7ff

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+6042
-1966
lines changed

.gitignore

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,3 +136,11 @@ e2e/langfuse/.env.langfuse-keys
136136
# Test Reporting
137137
logs/
138138
reports/
139+
140+
# Secrets (should use .example templates)
141+
**/minio-credentials-secret.yaml
142+
143+
# Build artifacts and logs
144+
build.log
145+
*.log
146+
!components/**/*.log

Makefile

Lines changed: 58 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,11 @@
1-
.PHONY: help setup build-all build-frontend build-backend build-operator build-runner deploy clean
1+
.PHONY: help setup build-all build-frontend build-backend build-operator build-runner build-state-sync deploy clean
22
.PHONY: local-up local-down local-clean local-status local-rebuild local-reload-backend local-reload-frontend local-reload-operator local-sync-version
33
.PHONY: local-dev-token
44
.PHONY: local-logs local-logs-backend local-logs-frontend local-logs-operator local-shell local-shell-frontend
55
.PHONY: local-test local-test-dev local-test-quick test-all local-url local-troubleshoot local-port-forward local-stop-port-forward
66
.PHONY: push-all registry-login setup-hooks remove-hooks check-minikube check-kubectl
77
.PHONY: e2e-test e2e-setup e2e-clean deploy-langfuse-openshift
8+
.PHONY: setup-minio minio-console minio-logs minio-status
89
.PHONY: validate-makefile lint-makefile check-shell makefile-health
910
.PHONY: _create-operator-config _auto-port-forward _show-access-info _build-and-load
1011

@@ -36,6 +37,7 @@ FRONTEND_IMAGE ?= vteam_frontend:latest
3637
BACKEND_IMAGE ?= vteam_backend:latest
3738
OPERATOR_IMAGE ?= vteam_operator:latest
3839
RUNNER_IMAGE ?= vteam_claude_runner:latest
40+
STATE_SYNC_IMAGE ?= vteam_state_sync:latest
3941

4042
# Build metadata (captured at build time)
4143
GIT_COMMIT := $(shell git rev-parse HEAD 2>/dev/null || echo "unknown")
@@ -91,7 +93,7 @@ help: ## Display this help message
9193

9294
##@ Building
9395

94-
build-all: build-frontend build-backend build-operator build-runner ## Build all container images
96+
build-all: build-frontend build-backend build-operator build-runner build-state-sync ## Build all container images
9597

9698
build-frontend: ## Build frontend image
9799
@echo "$(COLOR_BLUE)$(COLOR_RESET) Building frontend with $(CONTAINER_ENGINE)..."
@@ -145,6 +147,13 @@ build-runner: ## Build Claude Code runner image
145147
-t $(RUNNER_IMAGE) -f claude-code-runner/Dockerfile .
146148
@echo "$(COLOR_GREEN)$(COLOR_RESET) Runner built: $(RUNNER_IMAGE)"
147149

150+
build-state-sync: ## Build state-sync image for S3 persistence
151+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Building state-sync with $(CONTAINER_ENGINE)..."
152+
@echo " Git: $(GIT_BRANCH)@$(GIT_COMMIT_SHORT)$(GIT_DIRTY)"
153+
@cd components/runners/state-sync && $(CONTAINER_ENGINE) build $(PLATFORM_FLAG) $(BUILD_FLAGS) \
154+
-t vteam_state_sync:latest .
155+
@echo "$(COLOR_GREEN)$(COLOR_RESET) State-sync built: vteam_state_sync:latest"
156+
148157
##@ Git Hooks
149158

150159
setup-hooks: ## Install git hooks for branch protection
@@ -164,13 +173,59 @@ registry-login: ## Login to container registry
164173

165174
push-all: registry-login ## Push all images to registry
166175
@echo "$(COLOR_BLUE)$(COLOR_RESET) Pushing images to $(REGISTRY)..."
167-
@for image in $(FRONTEND_IMAGE) $(BACKEND_IMAGE) $(OPERATOR_IMAGE) $(RUNNER_IMAGE); do \
176+
@for image in $(FRONTEND_IMAGE) $(BACKEND_IMAGE) $(OPERATOR_IMAGE) $(RUNNER_IMAGE) $(STATE_SYNC_IMAGE); do \
168177
echo " Tagging and pushing $$image..."; \
169178
$(CONTAINER_ENGINE) tag $$image $(REGISTRY)/$$image && \
170179
$(CONTAINER_ENGINE) push $(REGISTRY)/$$image; \
171180
done
172181
@echo "$(COLOR_GREEN)$(COLOR_RESET) All images pushed"
173182

183+
##@ MinIO S3 Storage
184+
185+
setup-minio: ## Set up MinIO and create initial bucket
186+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Setting up MinIO for S3 state storage..."
187+
@./scripts/setup-minio.sh
188+
@echo "$(COLOR_GREEN)$(COLOR_RESET) MinIO setup complete"
189+
190+
minio-console: ## Open MinIO console (port-forward to localhost:9001)
191+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Opening MinIO console at http://localhost:9001"
192+
@echo " Login: admin / changeme123 (or your configured credentials)"
193+
@kubectl port-forward svc/minio 9001:9001 -n $(NAMESPACE)
194+
195+
minio-logs: ## View MinIO logs
196+
@kubectl logs -f deployment/minio -n $(NAMESPACE)
197+
198+
minio-status: ## Check MinIO status
199+
@echo "$(COLOR_BOLD)MinIO Status$(COLOR_RESET)"
200+
@kubectl get deployment,pod,svc,pvc -l app=minio -n $(NAMESPACE)
201+
202+
##@ Observability
203+
204+
deploy-observability: ## Deploy observability (OTel + OpenShift Prometheus)
205+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Deploying observability stack..."
206+
@kubectl apply -k components/manifests/observability/
207+
@echo "$(COLOR_GREEN)$(COLOR_RESET) Observability deployed (OTel + ServiceMonitor)"
208+
@echo " View metrics: OpenShift Console → Observe → Metrics"
209+
@echo " Optional Grafana: make add-grafana"
210+
211+
add-grafana: ## Add Grafana on top of observability stack
212+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Adding Grafana..."
213+
@kubectl apply -k components/manifests/observability/overlays/with-grafana/
214+
@echo "$(COLOR_GREEN)$(COLOR_RESET) Grafana deployed"
215+
@echo " Create route: oc create route edge grafana --service=grafana -n $(NAMESPACE)"
216+
217+
clean-observability: ## Remove observability components
218+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Removing observability..."
219+
@kubectl delete -k components/manifests/observability/overlays/with-grafana/ 2>/dev/null || true
220+
@kubectl delete -k components/manifests/observability/ 2>/dev/null || true
221+
@echo "$(COLOR_GREEN)$(COLOR_RESET) Observability removed"
222+
223+
grafana-dashboard: ## Open Grafana (create route first)
224+
@echo "$(COLOR_BLUE)$(COLOR_RESET) Opening Grafana..."
225+
@oc create route edge grafana --service=grafana -n $(NAMESPACE) 2>/dev/null || echo "Route already exists"
226+
@echo " URL: https://$$(oc get route grafana -n $(NAMESPACE) -o jsonpath='{.spec.host}')"
227+
@echo " Login: admin/admin"
228+
174229
##@ Local Development (Minikube)
175230

176231
local-up: check-minikube check-kubectl ## Start local development environment (minikube)

0 commit comments

Comments
 (0)