Releases: OpenCSGs/csghub-server
v1.14.0-ce
v1.14.0-ce
✨ New Features
- Deployment Workflow Migration: Refactored the core Deploy Scheduler and successfully migrated its logic to use Temporal Workflow for improved reliability and scalability. ([#495])
- Finetuning Job Support: Added support to submit fine-tune tasks directly as standard jobs. ([#522])
- AI Gateway Integration: The AI Gateway now supports the MCP (Multi-Cluster Platform) for enhanced model serving capabilities. ([#529])
- Jupyter Notebook Support: Application space deployment now supports Jupyter Notebook environments.
- Internal Notification Tool: Introduced a new tool to generate basic internal notification messages. ([#551])
🚀 Enhancements & Bug Fixes
- Dependency Updates: Bumped
argo-workflowsfrom 3.5.13 to 3.6.12. UpgradedcasdoorSDK to v1.22.0 and updated user info retrieval to use UUID. ([#481], [#534]) - API & Core: Enhanced middleware robustness. Ensured header keys are handled as case-sensitive. Enhanced concurrency in
CheckRepoFilesand improved test coverage. Enabled soft delete functionality for licenses. ([#547], [#482], [#530], [#553]) - Deployment & Runner: Improved runner network discovery logic to read
clusterIPif no ingress IP is found. AddedStorageClasssupport in cluster report events. ([#484], [#525]) - Finetuning: Added
model/datasetfields to finetune job operations for richer metadata. ([#527]) - Integration: Added
User-Emailto request headers for DataFlow and Label Studio integrations. ([#492]) - Git/LFS Fixes: Fixed a client bug preventing the deletion of remote branches. Fixed a cache bug in LFS sync upload ID. ([#521], [#489])
- Temporal Workflow Fixes: Fixed command errors and a worker panic when the sensitive checker was not enabled. Added initialization sensitive checker in the temporal worker. ([#501], [#506], [#519])
- Multi-Sync & Rproxy Fixes: Corrected the logic for setting the service host name in the
rproxyheader. Fixed a bug where multi-sync MCP servers failed to display file lists and READMEs. ([#485], [#533]) - General Bug Fixes: Fixed bugs related to getting evaluations with access control, AI Gateway external LLM support, and repository creation. ([#529], [#532], [#536])
Full Changelog: v1.12.0-ce...v1.14.0-ce
v1.12.1-ce
What's Changed
- Add initialization for sensitive checkerfix fix init sensitive checke… by @phantom-rabbit in #507
- Resolving sensitive check disable causes worker startup panic. by @CementZhang in #520
Full Changelog: v1.12.0-ce...v1.12.1-ce
v1.12.0-ce
New Features
-
Atomic Repository Creation: We've implemented atomic creation of repositories. This enhancement ensures that the
$\text{git}$ operation and the database persistence complete as a single, indivisible unit. This prevents interruptions or inconsistencies that could previously occur between the file system operation and the database record, leading to a more robust and reliable system. -
Automatic Runner Discovery and Cluster Auto-Scaling: The system now supports automatic discovery for remote runners. The
$\text{csghub-server}$ will automatically detect and register runner services as they come online or go offline. This functionality enables cluster-level auto-scaling, allowing the platform to dynamically adjust resources based on demand. -
Streamable Protocol for
$\text{mcp}$ Space Execution: We have introduced a streamable protocol to execute$\text{mcp}$ (Model/Compute/Processing) spaces. This new protocol replaces Server-Sent Events ($\text{SSE}$ ) to deliver better performance and significantly improve compatibility with reverse proxies.
Enhancement
- Tracing: enhance request handler with robust tracing, logging, and retry
- MultiSync: allow filter local only repositories
- Bug Fix: invalid var name in header for proxy csghub-dataflow request in k8s ingress
What's Changed
- Update mcp deploy to use git copy repo instead git mirror by @HaiHui886 in #461
- feat(http): Enhance HttpClient with robust tracing, logging, and retry by @CementZhang in #463
- Update mcp space to use streamable protocol and enhance logs by @HaiHui886 in #464
- Enhance runner cluster report to server by @CementZhang in #465
- Return 404 when file not found by @pulltheflower in #462
- Update commit files once for new space and use 2pc for create repo by @HaiHui886 in #466
- Fix local repo index filter bug by @pulltheflower in #468
- Add log for handle nil hardware case by @HaiHui886 in #470
- Update Dockerfiles and scripts by @HaiHui886 in #471
- Update docker-compose file for local development by @HaiHui886 in #472
- Use initial configuration files instead of env by @HaiHui886 in #473
- Update API response types for consistency and clarity by @QinYuuuu in #469
- Support remote runner hot-plug feature by @HaiHui886 in #477
- Sync changes by @pulltheflower in #474
- Enhance node readiness checks and add error handling by @QinYuuuu in #478
- Add new error codes for authentication, git, and request handling by @QinYuuuu in #479
- Merge fix to 1.12 ce by @HaiHui886 in #483
- Merge rproxy fix from main to 1.12 by @HaiHui886 in #486
- db88bca by @phantom-rabbit in #499
Full Changelog: v1.11.0-ce...v1.12.0-ce
v1.11.0-ce
✨ New Features
- Runner Service Refactored: The Runner service has been completely refactored. It now runs seamlessly both inside and outside a Kubernetes cluster and automatically reports its endpoint and configuration data to the CSGHub server. The Kubernetes cluster is now pluggable at runtime. Most importantly, if you configure a service account for the Runner service in Kubernetes, you no longer need a kubeconfig file, which is a significant security improvement.
- Go Error Documentation Tool: We've added a new command-line tool that scans Go code for custom error comments and automatically generates error documentation. This tool also includes numerous pre-defined errors and multilingual translations.
- Internationalized Notifications: Notification messages now support i18n (internationalization).
🚀 Enhancements & Bug Fixes
- Distributed Lock for DB Migration: Added a distributed lock to prevent conflicts during database migrations.
- Git Callback Fix: Fixed a bug where the Git callback was not triggered if a repository contained no LFS files.
- Localized Git Errors: Error messages for Git failures are now localized.
- Temporal Workflow Context Fix: Resolved an issue where an incorrect context was causing the Temporal workflow to terminate unexpectedly.
- MCP Repository Mirroring: It's now possible to mirror and sync MCP repositories.
- New File Deletion API: A new API has been added to delete files from repositories.
What's Changed
- Update docker-image.yml by @MasonXon in #438
- Sync ee database migrations by @pulltheflower in #439
- Add mirror routes for mcp by @pulltheflower in #437
- Update go.mod by @pulltheflower in #440
- perf: use BatchGet for better pagination performance in recom component by @Rader in #444
- [dataviewer] Convert and preview json/csv file fail by @HaiHui886 in #446
- Add error documentation generation and fixes by @QinYuuuu in #447
- Update error handling in discussion API by @QinYuuuu in #448
- Add i18n support for tag categories by @QinYuuuu in #449
- Change token usage types from int to string by @HaiHui886 in #450
- Update workflow image tag by @MasonXon in #403
- add delete repo file api by @HaiHui886 in #451
- Enhance error handling and tag management features by @QinYuuuu in #452
- update notification related functions and add remark to the collection repositories by @luojun96 in #454
- Sync recent changes by @pulltheflower in #455
- Refactor runner code to support run in k8s by @HaiHui886 in #459
- Space build&deploy incorrect status bug fix by @HaiHui886 in #460
Full Changelog: v1.10.0-ce...v1.11.0-ce
v1.10.0-ce
New Features
- DataFlow Tool: Introduced a one-click data processing feature with the new DataFlow tool. This tool can read datasets from CSGHub and save processed datasets back to CSGHub. For more details, please refer to the csghub-dataflow repository.
Enhancements
-
Model Inference: Improved model inference capabilities to support the ERNIE and Huanyuan models, as well as the ability to perform score model inference.
-
Sync Code Refactoring: Refactored and enhanced the synchronization code to allow users to cancel model or dataset syncing tasks.
-
AI Gateway Updates: Modified the response data structure in the AI Gateway to ensure compatibility with the OpenAI API when no running model is found.
-
Internationalization (i18n) Support: Expanded i18n support; all server-side APIs now offer customizable error messages that can be translated into three languages: Simplified Chinese (zh-CN), American English (en-US), and Traditional Chinese (zh-HK).
What's Changed
- [dataflow] Simplify dataflow routes by consolidating endpoints by @SeanHH86 in #401
- Fix_err_i18n by @QinYuuuu in #402
- update i18n for multi api by @QinYuuuu in #404
- feat(mirror): remove currentUser admin check from mirror methods by @QinYuuuu in #405
- Fix error judgement bug in multi-sync component by @pulltheflower in #408
- feat(auth): Added authentication middleware to routes and Removed redundant user checks in handlers by @QinYuuuu in #410
- feat: add version API by @phantom-rabbit in #412
- Remove default value of public root domain by @pulltheflower in #411
- Mirror refactor by @pulltheflower in #416
- fix git http error compare for main by @QinYuuuu in #414
- Merge code 7 29 by @ganisback in #419
- Add mirror task status to model show API by @pulltheflower in #421
- Remove unused file by @pulltheflower in #420
- fix add code repo and space repo tag parse by @phantom-rabbit in #424
- Improve error handling for model retrieval in AI Gateway by @QinYuuuu in #426
- Add error handling for invalid Parquet files by @QinYuuuu in #427
- Enhance error handling for Git and deployment retrieval by @QinYuuuu in #425
- Optimize 404 error handling for collections and model runs by @QinYuuuu in #431
- Enhance output of git lfs push by @pulltheflower in #417
- Change user nickname of multi-sync user by @pulltheflower in #422
- Add lfs check feature to main by @pulltheflower in #430
- Error code support for git by @pulltheflower in #434
- Add organization transfer feature by @pulltheflower in #436
- update notification services by @luojun96 in #428
- Update organization member management and error handling by @QinYuuuu in #433
Full Changelog: v1.9.0-ce...v1.10.0-ce
v1.9.5
What's Changed
- Fix ssh clone bug v1.9.0 by @pulltheflower in #429
Full Changelog: v1.9.4-ce...v1.9.5-ce
v1.9.4
What's Changed
- Add lfs check feature by @pulltheflower in #423
Full Changelog: v1.9.3-ce...v1.9.4-ce
v1.9.3
What's Changed
- Merge image namespace to 1.9.2 by @ganisback in #418
Full Changelog: v1.9.2-ce...v1.9.3-ce
v1.9.2
What's Changed
Full Changelog: v1.9.1-ce...v1.9.2-ce
v1.9.1-ce
What's Changed
- fix cluster initial issue by @ganisback in #413
Full Changelog: v1.9.0-ce...v1.9.1-ce