Releases: kubeshop/testkube
2.7.0
Changelog
New Features
- ea4a649: feat: Add runnerId field to TestWorkflowServiceNotificationsRequest and TestWorkflowParallelStepNotificationsRequest proto messages (#7088) (@Copilot)
- 59ad1a6: feat: Print runner capabilities to the log during start time (#7107) (@Copilot)
Bug fixes
- b180228: fix(mcp): return explicit message when execution metrics are unavailable (#7105) (@topliceanurazvan)
- d9eaf33: fix: API server panic caused by concurrent websocket writes (#7082) (@dom-raven)
- 2f7d660: fix: [TKC-4813] skip webhook informers when executor CRDs are missing (@povilasv)
- cf5b6c8: fix: [TKC-5040] avoid idle timeout errors after log stream completion (@povilasv)
- a03904d: fix: [TKC-5081] sync templates before workflows during SuperAgent migration (#7100) (@dejanzele)
- 9acafd5: fix: include assigned runner ID in error message (#7087) (@Copilot)
- 33dde15: fix: spurious error logs from websocket listener when no clients are connected (#7079) (@Copilot)
Other work
- 9274810: mcp: fix dataPoint unmarshal for InfluxDB string-encoded values (#7106) (@topliceanurazvan)
2.6.3
Testkube 2.6.3
This patch release includes critical fixes for webhook event handling, addressing race conditions and duplicate event processing.
🐛 Bug Fixes
-
Webhook Event Race Condition with Multiple Listeners and Become Events (#7056)
Fixed a critical race condition that occurred when multiple webhook listeners were configured with different event type filters.
Root Cause:
- The event emitter in shared event type pointers across concurrent goroutines
- When multiple webhooks matched different event types (e.g., webhook A listens to , webhook B listens to ), the goroutines racing to filter event types would overwrite each other's field values
- This caused webhooks to receive incorrect event types or miss events entirely
Solution:
- Modified event emission to create independent copies of events for each webhook listener
- Each goroutine now gets its own event instance with correctly filtered event types
- Fixed event type filtering to use value copies instead of slice element references
- Added comprehensive test coverage for concurrent webhook scenarios:
- Single webhook with multiple event types
- Multiple webhooks with overlapping event types
- Three webhooks with different event type patterns
Technical Details:
- Shallow copy of the event is safe because only the field (a per-goroutine event type pointer) is modified
- Event type filtering now uses per-goroutine pointers instead of shared slice element references, avoiding pointer aliasing
- Enhanced dummy listener implementation for better test coverage
Impact:
- Prevents webhooks from receiving incorrect event types
- Ensures reliable event delivery in multi-webhook configurations
- Improves event routing accuracy for complex webhook setups
-
Event Idempotency Using TTL Cache (#7052)
Implemented event deduplication to prevent duplicate webhook notifications and processing of the same event multiple times.
Root Cause:
- Events could be redelivered or replayed due to various reasons (network issues, message bus redelivery/retries, system restarts)
- The event emitter had no mechanism to track which events had already been processed
- This resulted in duplicate webhook calls and duplicate event processing
Solution:
- Added TTL-based cache for tracking processed events using as the key
- Cache uses configurable capacity (default: 100,000 events) and TTL (default: 1 hour)
- Implemented atomic operation for idempotency checking:
- First encounter of an event ID stores it in cache and processes the event
- Subsequent encounters within TTL window skip processing
- Cache lifecycle managed in the method:
- Started when emitter begins listening
- Stopped when emitter is shut down
Technical Details:
- Added dependency for efficient TTL cache
- Cache capacity passed as parameter to constructor for configurability
- Event idempotency check happens before event dispatch to listeners
- Comprehensive test coverage including:
- Duplicate event detection within TTL
- Cache initialization and cleanup
Configuration:
- Default TTL: 1 hour (sufficient for most retry scenarios)
- Default capacity: 100,000 events (handles high-throughput environments)
- Both values configurable through constructor parameters
Impact:
- Eliminates duplicate webhook notifications
- Prevents redundant processing of the same events
- Improves system reliability and efficiency
- Reduces noise in webhook endpoints and logs
📝 Changelog
Bug Fixes
- 238b766: fix: webhook event race condition with multiple listeners and become events (#7056) (@Copilot, @vsukhin)
- 0e225e4: fix: add event idempotency using TTL cache on event.Id with configurable TTL and capacity (#7052) (@Copilot, @vsukhin)
Other Changes
Commits
- 58aa160: chore: release 2.6.3
Full Changelog: 2.6.2...2.6.3
2.6.2
Testkube 2.6.2
This patch release includes critical bug fixes for utilization metrics and MinIO storage configuration.
🐛 Bug Fixes
-
Panic on Nil Timestamp in Utilization Metrics Grouping (#7057)
Fixed a critical panic that occurred when processing utilization metrics with nil timestamps.
Root Cause:
- The function in attempted to dereference without checking if it was nil
- This caused a nil pointer dereference panic when metrics without timestamps were processed
Solution:
- Added nil check for before processing
- Metrics without timestamps are now safely skipped during grouping
- Added comprehensive test coverage with test data file
Impact:
- Prevents agent crashes when processing incomplete metric data
- Improves system stability and reliability
-
MinIO Client Certificate Loading with Empty Paths (#7045)
Fixed an issue where MinIO client initialization failed when using GCS S3-compatible endpoints with SSL enabled but without client certificates.
Root Cause:
- The function in unconditionally attempted to load client certificates
- When or were empty strings (common for GCS and other S3-compatible services), MinIO client initialization would fail
- GCS S3-compatible endpoints require SSL but don't use client certificate authentication
Solution:
- Added validation to only load client certificates when both and are non-empty
- Server CA certificates (via ) are still loaded when provided, independent of client certificates
- Added comprehensive test suite covering all TLS configuration scenarios:
- SSL with client certificates
- SSL without client certificates (GCS use case)
- SSL with CA file only
- Insecure mode
- No SSL
Impact:
- Enables proper use of GCS S3-compatible storage endpoints
- Fixes artifact storage configuration for Google Cloud Storage users
- Improves compatibility with various S3-compatible storage providers
📝 Changelog
Bug Fixes
- af1f6ab: fix: panic on nil timestamp in utilization metrics grouping (#7057) (@Copilot, @vsukhin)
- 0de5549: fix: MinIO client certificate loading with empty paths (#7045) (@Copilot, @vsukhin)
Commits
- 0083e3c: chore: release 2.6.2
Full Changelog: 2.6.1...2.6.2
2.6.1
Testkube 2.6.1
This patch release includes important security updates and bug fixes for webhook event handling.
🔒 Security Updates
- CVE Fixes (February 2025) (#7036)
- Updated Alpine base image to 3.23.3
- Updated dependencies to address reported CVEs
- Addresses security vulnerabilities identified in February 2025
🐛 Bug Fixes
-
[TKC-4867] Webhook Event Issues - Misleading Logs and Incorrect Event Types (#7032)
This comprehensive fix addresses multiple issues with webhook event handling:
Become Event Handling:
- Fixed silent failures for become events where webhooks appeared to succeed with status_code:0
- Added debug logging when webhooks are skipped due to state not changing
- Introduced dedicated variable for clearer tracking of skipped webhooks
- Fixed metrics and telemetry collection to skip non-executed webhooks
- Added comprehensive tests for become event state transitions
Execution Queueing Events:
- Fixed incorrect end events being emitted during execution queueing
- The function was incorrectly sending events for executions with status assigned (not finished)
- Removed switch statement that emitted end events during queueing phase
- End events are now correctly sent only by in when executions actually complete
Technical Improvements:
- Enhanced test precision with strict mock expectations (changed to )
- Clearer conditional logic distinguishing between webhook skip scenarios and HTTP errors
- Improved code maintainability and debugging capabilities
📝 Changelog
Security
- 20803ec: chore: fix CVE reported feb 2025 (#7036) (@caiomede-tk)
Bug Fixes
- 9644254: fix: [TKC-4867] webhook event issues - misleading logs and incorrect event types (#7032) (@Copilot, @vsukhin)
Commits
- f110f2c: chore: release 2.6.1
Full Changelog: 2.6.0...2.6.1
2.6.0
Changelog
New Features
- 2eb9d75: feat: add bulk fetching tools for workflow definitions and executions… (#7006) (@devcatalin)
- cae0bff: feat: add support for bulk endpoint checks in MCP server tools registration (#7013) (@devcatalin)
- 3c2023e: feat: formatters for MCP tool responses to reduce tokens by 70%+ (#7005) (@devcatalin)
Bug fixes
- ebf5202: fix: fmt (#7015) for copilot (@vsukhin)
- 9e70e90: fix: ignore not finished event for become state (#7016) (@vsukhin)
- f5df72d: fix: improve concurrency handling in GetWorkflowDefinitions and GetExecutions (#7012) (@devcatalin)
- c7c5fbb: fix: k8s event label validation errors (#7010) (@Copilot)
- 6762c02: fix: make workflowName optional in list_executions MCP tool (#6999) (@topliceanurazvan)
- f70013c: fix: panic in abort testworkflow executions handler (#7003) (@Copilot)
- 1bb0f7a: fix: skip verify var in runner chart (#6998) (@ypoplavs)
2.5.7
Changelog
New Features
Bug fixes
- 22fb804: fix: [TKC-4750] Improve self-registration error handling (#6960) (@povilasv)
- 1bd0911: fix: add timeout for stuck log streaming (#6961) (@povilasv)
- fd20cdb: fix: cron scheduler exec context to avoid canceled watcher ctx (#6977) (@povilasv)
- d76e327: fix: inline global template in OSS CP for unresolved workflows (#6962) (@vsukhin)
- 2fb24da: fix: support global template for OSS CP in not resolved worklows (#6959) (@vsukhin)
2.5.6
2.5.5
2.5.4
2.5.3
Changelog
New Features
- e114099: feat: detect failure when service exits with non-zero code; cleanup services logic and add tests (#6909) (@dejanzele)
Bug fixes
- f930ad4: fix(devbox): add lease permissions for agent leader election (#6933) (@dejanzele)
- a08c263: fix: add mode=replace query param for TestTrigger full replacement (#6932) (@topliceanurazvan)
- f716012: fix: use full replacement for YAML updates in TestTrigger handler (#6931) (@topliceanurazvan)