Skip to content

Releases: lablup/backend.ai

25.14.0rc1

11 Sep 09:27
cae34ad
Compare
Choose a tag to compare
25.14.0rc1 Pre-release
Pre-release

Breaking Changes

  • Implement password hashing system with multiple algorithms (#5753)

Features

  • Add data migration script from VFolder to RBAC tables (#5340)
  • Migrate existing user/project records to RBAC data (#5417)
  • Expand RBAC tables by adding permission_groups table to group permissions with the same target (#5465)
  • Add reservoir_registries DB table, service, and CRUD GQL mutations (#5644)
  • Add artifact_registries DB table to store common information of the various artifact registries (#5656)
  • Implement Reservoir registry, and Sync APIs between Managers (#5660)
  • Add config to set GPFS fileset name prefix (#5684)
  • Add RBAC repositoy functions to manage scopes and entity DB records (#5699)
  • Allow __typename type query for advanced GraphQL features(GQL Federation, @connection directive) by introducing custom introspection rule (#5705)
  • Align the reported agent memory size so that the reserved memory absorbs tiny memory size deviation originating from various firmware/kernel settings and make the memory size of same-hardware agents consistent, preventing inadvertent resource allocation failures in large clusters (#5729)
  • Set expiry to set records of Bgtask metadata ids (#5736)
  • Add defaultArtifactRegistry GQL resolver to fetch the default artifact registry information (#5739)
  • Add Artifact, ArtifactRegistry REST API (#5747)
  • Add session mode(Client, Proxy) based error handling in FetchContextManager (#5774)
  • Add VFolder delete Background Task (#5778)
  • Apply cache layer for resource presets (#5781)
  • Add PBKDF2-SHA3-256 password hashing implementation and update supported algorithms (#5785)
  • Remove obsolete max slot limit validation during session creation (#5807)
  • Implement health check functionality for route management (#5811)

Improvements

  • Migrate legacy redis clients to valkey clients in App Proxy (#5741)

Fixes

  • Add missing restart: unless-stopped policy to all services in docker compose file (#5694)
  • Add missing session live stat update (#5762)
  • Make overlay network creation idempotent so that retries due to errors after this step does not make infinite retry loop (#5765)
  • Clean up dangling docker networks (#5770)
  • Add missing update when app proxy registered endpoint (#5783)
  • Restrict limit to scan_artifacts API (#5801)
  • Update client SDK to reflect UUID-only restriction in session dependencies (#5809)

Miscellaneous

  • Refactor scheduler handlers: Split into individual files and create a base handler class (#5766)

Full Changelog

Check out the full changelog until this release (25.14.0rc1).

Full Commit Logs

Check out the full commit logs between release (25.13.4) and (25.14.0rc1).

25.13.4

03 Sep 11:13
a0e3352
Compare
Choose a tag to compare

Fixes

  • Add missing scheduler options to AllowedScalingGroup and update related components (#5730)

Full Changelog

Check out the full changelog until this release (25.13.4).

Full Commit Logs

Check out the full commit logs between release (25.13.3) and (25.13.4).

25.13.3

03 Sep 06:32
9f92014
Compare
Choose a tag to compare

Fixes

  • Improve HTTP request proxying in the webserver to be transparent with content-encoding (#5709)
  • Add null-user check in resource usage query (#5712)
  • Ensure id parameter of chown function is an int (#5713)
  • Refresh agent fields in kernel when rescheduling (#5717)
  • Fix issue where App-Proxy failed to query worker circuits due to incorrect variable reference (#5718)
  • Add missing network cleanup when creating overlay network (#5721)

Full Changelog

Check out the full changelog until this release (25.13.3).

Full Commit Logs

Check out the full commit logs between release (25.13.2) and (25.13.3).

25.13.2

02 Sep 08:21
7365036
Compare
Choose a tag to compare

Features

  • The mouse-selected or copy-mode selected texts in the intrinsic ttyd app with tmux are now directly copied to the user-side clipboard, without needing to set mouse=off in the tmux session (#5688)
  • feat: Improvement redis keys command to scan_iter for manager cli (#5704)

Fixes

  • Add missing all-smi manpage file in the wheel packages (#5685)
  • Updated RedisProfileTarget to handle cases where 'addr' is missing or None in the input data, preventing errors during address parsing. (#5695)
  • fixes a duplicate joins issue during serialization when using pydantic by removing the join filter from the TOMLStringListField's _transform method. (#5700)
  • Fix coordinator not performing health check for all endpoints (#5702)
  • Fix session creation failing with not allowed scaling group error (#5706)
  • Enhance endpoint creation logic to update existing records and handle circuits (#5707)

Full Changelog

Check out the full changelog until this release (25.13.2).

Full Commit Logs

Check out the full commit logs between release (25.13.1) and (25.13.2).

25.13.1

29 Aug 14:49
969ffea
Compare
Choose a tag to compare

Fixes

  • Fix session ordering in session_pending_queue query resolver (#5682)
  • fix: Ensure redis address is nullable (#5683)

Full Changelog

Check out the full changelog until this release (25.13.1).

Full Commit Logs

Check out the full commit logs between release (25.13.0) and (25.13.1).

25.13.0

29 Aug 11:54
Compare
Choose a tag to compare

Features

  • Introduce strawberry, and strawberry-based ArtifactRegistry GQL types (#5232)
  • Add ModelDeployment, ModelRevision strawberry GQL types migrated from existing federated graphene schema (#5249)
  • Open-source and integrate Backend.AI App Proxy into the main codebase (#5275)
  • Add storages API to storage proxy (#5286)
  • Add OpenTelemetry and service discovery configuration to appproxy (#5296)
  • Implement connection monitoring and reconnection logic in ValkeyStandaloneClient (#5298)
  • Implement Sokovan orchestrator architecture (#5361)
  • Add HuggingFace scanner, and API to storage proxy (#5362)
  • Split out container log processing to a more concrete ValkeyContainerLogClient (based on ValkeyClient with default behavior) and use a separate Redis instance dedicated for log streaming (#5375)
  • Implement scheduling prioritizers (#5378)
  • Add validators for scheduling (#5380)
  • Ship all-smi so that users can execute it inside any session container (#5381)
  • Implement sokovan scheduler agent selectors (#5383)
  • Integrate Agent selector with allocator in sokovan orchestrator (#5393)
  • Add UserNode as a field of ComputeSessionNode (#5403)
  • Enhance Scheduler allocation logic and add comprehensive tests (#5404)
  • Add allocation methods in scheduler repository (#5406)
  • Add TTL support to Redis key operations in AppProxy (#5416)
  • Unify separate GraphQL subgraph endpoints into single Apollo Router supergraph with web-server proxy integration to enable single endpoint access for clients (#5419)
  • Integrate sokovan orchestrator in manager (#5421)
  • Add source field to roles table to distinguish system-defined roles from custom-defined roles, enabling automatic permission grants for system roles when new entity types or operations are introduced (#5440)
  • Add phase tracking in scheduling (#5441)
  • Implement scheduler coordinator in sokovan orchestrator (#5455)
  • Changed the behavior to terminate "terminating session" in batch processing (#5467)
  • Implement session sweeping functionality and related handlers (#5485)
  • Inject storages config to storage-proxy (#5491)
  • Add object_storages table to DB (#5498)
  • Add request_timeout configuration for Redis clients (#5502)
  • Add decrement_keypair_concurrencies method and update session termination logic (#5504)
  • Add hugging_registries DB table, and GQL schema (#5508)
  • Replace the existing ArtifactGroup model with Artifact, and replace Artifact with ArtifactRevision (#5510)
  • Integrate Artifact service to Manager (#5514)
  • Add Valkey client for Background Task Manager (#5519)
  • Improve logging.BraceStyleAdapter to support user-defined kwargs and access to extra data including contextual fields. (#5523)
  • Add Background Task heartbeat loop to refresh TTL (#5531)
  • Modify value reading to avoid cache-based scheduling (#5533)
  • Implement scheduling controller (#5547)
  • Implement kernel state engine (#5551)
  • Add Background Task retry loop (#5555)
  • Allow specifying multiple endpoint addresses in the etcd config (#5564)
  • Update session limits to allow None and 0 as indicators for unlimited concurrent sessions (#5567)
  • Add configuration option for Sokovan orchestrator usage (#5568)
  • Implement health monitoring for scheduling operations (#5569)
  • Enhance session management by adding checks for truly stuck pulling and creating sessions (#5570)
  • Add Valkey Client TLS configuration (#5573)
  • Implement Generalized pagination on Strawberry GQL API (#5575)
  • Implement session transition hooks for various session types (#5579)
  • Implement deployment management with Sokovan integration (#5580)
  • Implement batch scheduling events and event propagation through Event Hub (#5589)
  • Apply centralized distributed locking for Sokovan scheduling operations (#5592)
  • Implement cache-through pattern for keypair concurrency management in SchedulerRepository (#5594)
  • Apply READ COMMITTED isolation level for scheduler operations (#5600)
  • Add Volume Pool field to RootContext of Storage-Proxy (#5603)
  • Add Bgtask handler Registry (#5606)
  • Implement Valkey-based leader election in manager (#5607)
  • Apply retry feature to VFolder clone bgtask (#5611)
  • Add object_storage_meta DB table for managing buckets (#5617)
  • Add operation metrics observer for session termination tracking (#5623)
  • Implement EventPropagatorMetricObserver for tracking event propagator metrics (#5630)
  • Apply cache propagator when broadcasting scheduling event (#5638)
  • Implement deployment controller and integrate with sokovan orchestrator (#5639)
  • Added automated GraphQL supergraph generation using rover CLI to CI pipeline for improved schema management (#5645)
  • Add --wait option to backend.ai events command for easier scripting and automation (#5650)
  • Implement session wait logic in AgentRegistry for improved scheduling handling (#5659)
  • Manage object storage buckets using storage_namespace (#5667)
  • Add scheduling detail info for pending sessions (#5676)

Fixes

  • Correct the asyncio connection sharing pattern in alembic env.py so that we could use alembic-rebase.py script and other alembic-based automation seamlessly. (#5151)
  • Use persistent aiohttp.ClientSession instances per route in App Proxy circuits to benefit from keep-alive connections and resource reuse (#5287)
  • Add missing resolver of VFolder permissions field in Compute session node (#5322)
  • Let insepct.signature handle stringified types generated by __future__ annotations by setting the eval_str option to True (#5325)
  • Handle None user when request context setup in auth middleware (#5327)
  • Add missing database transaction retry logic when setting network ID of new sessions (#5329)
  • Apply memoization to the scheduler plugin loaders to reduce runtime overheads when running the scheduler loop (#5342)
  • Broken Agent, Webserver in HA development environment (#5343)
  • Add missing components in HA development environment (#5345)
  • Make --log-level and --debug flag behavior and description consistent across all `start-...
Read more

25.11.3

18 Aug 15:27
Compare
Choose a tag to compare

No significant changes.

Full Changelog

Check out the full changelog until this release (25.11.3).

Full Commit Logs

Check out the full commit logs between release (25.11.2) and (25.11.3).

24.09.12

07 Jul 12:53
Compare
Choose a tag to compare

Features

  • Add expiration time to login history Redis keys to reduce Redis memory usage. (#4939)
  • Built-in WSProxy exposes advertised address (#4975)

Fixes

  • Status code is missing when the Accept header is not set to application/json in the wsproxy exception middleware (#4788)
  • Fix Agent Memory plugin to handle multiple IO device stat (#4789)
  • Fix invalid state error when setting kernel termination future (#4791)
  • Fix wrong Accept Header on HarborRegistryV2._process_oci_index() (#4807)
  • Prevent model service creation with project type vfolder (#4852)
  • Handle NoSuchProcess properly when gather process memory stat (#4922)
  • Skip kernel destroy when agent shutdown (#4923)
  • Check if Agent is daemon process before query docker netstat (#4929)
  • Wrong indent in Agent container stat function (#4946)
  • Calculate correct VFolder permissions when admins query (#4963)
  • Fix issue preventing admins from leaving invited vfolders (#4964)

Full Changelog

Check out the full changelog until this release (24.09.12).

Full Commit Logs

Check out the full commit logs between release (24.09.11) and (24.09.12).

25.13.0rc1

08 Aug 01:29
Compare
Choose a tag to compare
25.13.0rc1 Pre-release
Pre-release

Features

  • Introduce strawberry, and strawberry-based ArtifactRegistry GQL types (#5232)
  • Add ModelDeployment, ModelRevision strawberry GQL types migrated from existing federated graphene schema (#5249)
  • Open-source and integrate Backend.AI App Proxy into the main codebase (#5275)
  • Add storages API to storage proxy (#5286)
  • Add OpenTelemetry and service discovery configuration to appproxy (#5296)
  • Implement connection monitoring and reconnection logic in ValkeyStandaloneClient (#5298)
  • Implement Sokovan orchestrator architecture (#5361)
  • Implement scheduling prioritizers (#5378)
  • Add validators for scheduling (#5380)
  • Ship all-smi so that users can execute it inside any session container (#5381)
  • Implement sokovan scheduler agent selectors (#5383)
  • Integrate Agent selector with allocator in sokovan orchestrator (#5393)
  • Add UserNode as a field of ComputeSessionNode (#5403)
  • Enhance Scheduler allocation logic and add comprehensive tests (#5404)
  • Add allocation methods in scheduler repository (#5406)
  • Add TTL support to Redis key operations in AppProxy (#5416)
  • Integrate sokovan orchestrator in manager (#5421)

Fixes

  • Correct the asyncio connection sharing pattern in alembic env.py so that we could use alembic-rebase.py script and other alembic-based automation seamlessly. (#5151)
  • Add missing resolver of VFolder permissions field in Compute session node (#5322)
  • Let insepct.signature handle stringified types generated by __future__ annotations by setting the eval_str option to True (#5325)
  • Handle None user when request context setup in auth middleware (#5327)
  • Add missing database transaction retry logic when setting network ID of new sessions (#5329)
  • Apply memoization to the scheduler plugin loaders to reduce runtime overheads when running the scheduler loop (#5342)
  • Broken Agent, Webserver in HA development environment (#5343)
  • Add missing components in HA development environment (#5345)
  • Make --log-level and --debug flag behavior and description consistent across all start-server commands (#5366)
  • Defer imports in the CLI and server entrypoints to reduce CLI startup times and avoid unnecessary cross-component imports (#5372)
  • Fix and improve optimization to glob-based BUILD file scanning when loading CLI entrypoints, improving the CLI command initialization latency for about 15% (e.g., 3.5 sec -> 3.0 sec) (#5377)
  • Fix missing event_logs table creation when populating the database schema with mgr schema oneshot, which may have caused issues in fresh installations (#5391)
  • Add Docker image rescan exception handling logic when the image config is None (#5394)

Miscellaneous

  • Refactor the import structure for RepositoryArgs by moving it to a dedicated ai.backend.manager.repositories.types module (#5409)

Full Changelog

Check out the full changelog until this release (25.13.0rc1).

Full Commit Logs

Check out the full commit logs between release (25.12.1) and (25.13.0rc1).

25.12.1

25 Jul 10:56
762d8ed
Compare
Choose a tag to compare

Features

  • Agent heartbeat handler queries Kernel ids instead of Agent id (#4766)
  • Implement ActionValidator (#5244)
  • Implement reconnection logic in ValkeySentinelClient (#5276)

Improvements

  • Apply simple model query pattern for readability (#4767)

Fixes

  • Fix model service creation failure when service-definition.toml is missing (#5264)
  • Fix model service deletion failure for non super-admin users (#5266)
  • Broken VFolder Clone service (#5269)
  • Fixed a problem with deserializing dataclass (#5271)
  • Fix broken VFolder GetTaskLogs service (#5272)
  • Add missing TRACE log-level option in ai.backend.logging package (#5274)
  • status_data not initialized properly when creating multi node session (#5280)
  • Apply a workaround to avoid segfault upon fast termination of mgr etcd CLI commands that queries and updates etcd configurations (#5283)

Full Changelog

Check out the full changelog until this release (25.12.1).

Full Commit Logs

Check out the full commit logs between release (25.12.0) and (25.12.1).