Releases: determined-ai/determined
Releases · determined-ai/determined
0.26.7
Release Notes
Changelog
- a125dd4 chore: bump version: 0.26.7-rc1 -> 0.26.7
- cc161d5 docs: add release notes for 0.26.7 (#8601)
- ed4d11a chore: bump version: 0.26.7-rc0 -> 0.26.7-rc1
- a138b0a fix: Change slot number back to 2 in Keras example (#8595)
- 47c885b chore: fix api state (#8589)
- dc175a1 chore: bump version: 0.26.7-dev0 -> 0.26.7-rc0
- 0584703 chore: lock published urls to preserve redirects
- 40d6c32 chore: lock api state for backward compatibility check
- 6e224ce chore: add docs dropdown link for new version
- e1c1749 fix(tasks): persist rendezvous readiness (#8545)
- 150eae4 chore: bump version: 0.26.6-dev0 -> 0.26.7-dev0
- 660de26 feat: Update the "Continue Single trial experiment" workflow (#8526)
- 940d1d7 fix: master cannot download s3 from us-east-1 (#8558)
- dac3fe8 build: avoid installing playwright by
make build(#8569) - 005634b ci: unit-test login scenarios, and others (#8471)
- 4a808c3 chore: remove --dry-run option from det deploy aws [MLG-983] (#8542)
- aa71929 chore: implement GetSlot, GetSlots, and GetAgent for K8s rm (#8464)
- 791e532 chore: release notes for 0.26.6 (#8566)
- 51f9f71 chore: fix bad arg validation for det deploy aws up (#8576)
- e5566ee docs: Rename submit experiment (#8570)
- c05eab1 chore: include cluster name in stack deletion confirmation (#8575)
- 2a55fc1 docs: Update oidc group claim name notes (#8573)
- c9ba474 chore: add lore help and experimental disclaimer (#8563)
- 8527011 feat: filter out inactive users from list and add option to see all users (#8421)
- 431dec0 chore: Model metadata sections move to Surface [WEB-1813] (#8557)
- 6263695 Revert "docs: Add a link to mldes trial (#8561)" (#8572)
- 3371746 chore: Port smaller Antd.Modals to standard Hew Modal (#8567)
- 799d6d4 ci: make backend own codeowner of go.mod / go.sum (#8568)
- 6232ce2 chore: bump version: 0.26.5-dev0 -> 0.26.6-dev0
- faa9dee docs: add release notes for 0.26.5 (#8564)
- 05c5320 test: store test results for
test-e2e-gke. (#8560) - 921826a docs: Fix workspaces projects left nav (#8565)
- 4bccbce fix: prevent workspace list race condition (#8524)
- ef9a686 test: add more Go db tests to internal/db (#8553)
- 17666a4 docs: Add a link to mldes trial (#8561)
- f461c74 Mention det python sdk demo (#8550)
- 8d3f6fb chore: Rename show_ssh_command in CLI and add message for VSCode WSL Users (#8387)
- f988566 chore: replace readme logo with svg [WEB-314] (#8559)
- f543c41 chore: remove OIDC Config from OSS (#8534)
- 12e31af build(deps): bump golang.org/x/net from 0.7.0 to 0.17.0 in /master (#8124)
- f9fc1d1 docs: Describe remote user management webui (#8479)
- bb37b53 test: skip all failing
mmdetectiontests. (#8540) - 3fff399 test: fix failing nightly test splits (#8549)
- ff97304 test: add more tests to db/postgres_experiments.go (#8537)
- c08613a fix: correct docs link in empty project [WEB-1879] (#8548)
- b9caae5 docs: Improve webhooks tasklog (#8544)
- 816d203 chore: Use Hew SplitPane [WEB-1682] (#8482)
- 76bdfe9 test: make
DET_MASTERconfigurable in the perf testMakefile. (#8533) - a6ae0c7 feat: add the slots property to the props (#8498)
- ef4b787 fix(api): delete experiment error handling corrections (#8510)
- 203477d chore: remove CI go unit tests and rename CI integration target (#8530)
- 65227fa ci: add Go coverage regex match so we can require some functions to be tested (#8514)
- 22af6f7 fix: allow slots per trial to be 0 [WEB-1871] (#8521)
- 0771185 test: add more Go db tests (#8519)
- 1c9575f fix: callback webhook action modals to re-fetch webhooks after successful call [WEB-1869] (#8522)
- 3d18c13 fix: trial spinner (#8528)
- ad55716 feat: Enable Retry for multi-trial exp with errored trials (#8518)
- 88784cc fix: Model delete redirect only needed if inside the model [WEB-1867] (#8512)
- 9a09d15 feat: add lore redirect (#8492)
- 6b4f52e test: remove
cifar10_tf_kerastests and examples. (#8444) - 0590e80 fix: ensure filter columns are valid when selecting special columns (#8517)
- 9f14c6e fix: typo (#8516)
- 0d59196 chore: add ElementsMatch to usergroup test (#8501)
- 5fff579 feat: add experiment id to SDK Trial objects (#8499)
- 8e6252f refactor: replace resource pool looping with direct call (#8503)
- b37403f fix: fetch projects after archive/unarchive (#8504)
- 7da2777 chore(deps): bump actions/setup-java from 3 to 4 (#8507)
- 7f5de0f fix(agentrm): resource pools must filter agents.list() by name (#8509)
- e33ebb7 chore: remove defunct yogadl dependency (#8450)
- 72ef508 test: fix failing test_delete_experiment_removes_tensorboard_files (#8511)
- 9abdbf1 fix: Overwrite omnibar antd modal with Hew standard modal [WEB-1830] (#8476)
- 17f1efb chore: re-enable CI metrics upload (#8506)
- c8804e5 chore: take out submodule update cmd (#8502)
- 42b38fb feat: Enable multi-trial "retry" for errored/cancelled exp (#8495)
- 0b855c7 feat: hide hyperparameter search for unmanaged multi tiral experiments (#8497)
- 7077ab4 fix: Update Trial Download Error Message (#8398)
- 05e5974 ci: add date to python cache, fix moto linting issue (#8493)
- bf79a43 ci: fix Go build cache by not making it run get-deps (#8483)
- ec90590 feat: allow creation of tasklog webhooks in webui + docs (#8434)
- 324f148 chore: add UpdateUsergroupMembership (#8489)
- 66b7225 chore: run migration moving trials to view and rename [DET-9989] (#8440)
0.26.6
0.26.5
Release Notes
Changelog
- cfa7730 chore: bump version: 0.26.5-rc3 -> 0.26.5
- 2755e5e docs: add release notes for 0.26.5 (#8564)
- 617fb0c chore: bump version: 0.26.5-rc2 -> 0.26.5-rc3
- 5eff3b4 chore: bump version: 0.26.5-rc1 -> 0.26.5-rc2
- 54aa7a8 feat: add the slots property to the props (#8498)
- 52adafb fix: allow slots per trial to be 0 [WEB-1871] (#8521)
- 6eb6bf0 fix(api): delete experiment error handling corrections (#8510)
- 54f2cec fix: trial spinner (#8528)
- 5cfc4cc chore: bump version: 0.26.5-rc0 -> 0.26.5-rc1
- 5f4a7f4 fix: ensure filter columns are valid when selecting special columns (#8517)
- 189b4c1 fix(agentrm): resource pools must filter agents.list() by name (#8509)
- f47bde5 ci: add date to python cache, fix moto linting issue (#8493)
- 22d6d25 chore: bump version: 0.26.5-dev0 -> 0.26.5-rc0
- 275ea84 chore: lock api state for backward compatibility check
- cea9ff4 fix: Experiment state now is an ExperimentState (#8457)
- 06b7b79 fix: tqdm logs within wrap_rank [MLG-1236] (#8488)
- f50f7db fix: use explicit e.state in bulk experiment delete query (#8491)
- 0f65698 fix(rm): tasks shouldn't hang on restore failures (#8486)
- f847b26 chore: Revert "test: quarantine GPU execution of test_task_logs (#8261)" (#8484)
- f085e10 fix: failing custom searcher due to ExtraEnvVars being overwritten (#8490)
- 88f64f6 chore: update libraries (#8463)
- 2e48a6f chore: migrate detaileduser and experimentitem types to io-ts (#8477)
- d8ed945 chore: add aliases to det dev commads (#8156)
- a2730cb feat: adding PACHD_ADDRESS and DEX_TOKEN to task env (#8473)
- 837bc29 chore: Update Hew Version to 0.6.12 (#8481)
- 1ee9b81 chore: clean up version dropdown update script (#8415)
- 398f879 docs: Add requirement and known issue for singularity-suid (#8478)
- 64e299e fix: wrong skip experiment config regex for log policies (#8475)
- 0b4e1d2 chore: cleanup some spurious cluster logs (#8468)
- 8c9dfbf fix: add delete cascade to generic metrics (#8469)
- 0b28148 ci: register unit pytest marks (#8470)
- 815f5ae fix: Kill task permission on interactive page (#8358)
- e67807d chore: preserve CI logs when bringing an AWS cluster down (#8461)
- 50d40bd chore: update trial complete or early exit to always notify searcher (#8466)
- 6bdf061 Update k8s install info (#8465)
- e06b472 chore: export AddUserTx (#8458)
- 16ea5a0 chore: introduce and use observables with improved update checking (#8405)
- 5805df2 chore: set up ownership for .circleci [skip ci] (#8402)
- c2a211d fix(api): handle delete experiment failures correctly (#8459)
- dfb4dc5 chore(actors): remove pkg/actor (#8452)
- 167e237 chore: add error check to KillNTSC (#8441)
- 4edbc7f chore: log RestoreAllCommands error (#8454)
- 9d71abd Fix minor issues including hard coded reference (#8427)
- 756a79c chore: bump CI node version to 20.9.0 (#8455)
- 3090e42 chore: use ResourcePool info for consistent capacity calculation [WEB-1796] (#8447)
- 192a2b3 chore(actors): remove pkg/actor usage from agentrm (#8395)
- 5ded38d chore: bump version: 0.26.4-dev0 -> 0.26.5-dev0
- 4339f67 docs: add release notes for 0.26.4 (#8451)
- a25e4f5 fix: add back pin icon in experiment list header (#8429)
- adb5191 ci: store npm log artifacts (#8449)
- e29006b fix: det slot task name for no-permissions RBAC users (#8416)
- 695a648 fix: SDK list_checkpoints not defaulting to searcher metric sort (#8448)
- 1ddad7d feat: add Topology into the RP details page (#8276)
- 2f7dda6 ci: cache install Python (#8426)
- cde18df fix: Calculate allocation bar stats same as overview [WEB-1822] (#8431)
- 9a48ff1 docs: Update upgrade instructions (#8346)
- e9a199a fix: k8s autoscaling nodes not counted towards RP (#8439)
- bd19e7a chore: command actor refactor & add intg test [DET-9660] (#8136)
- 99c4cea test: create a test for delete-tensorboards via
det e delete(#8336) - 7b7d1eb feat: Add remote user settings to Users table [WEB-1798] (#8397)
- 8051039 ci: fix linting with responses==0.24.1 (#8436)
- b087c10 chore: add version dropdown url for previous release (#8437)
- 8e9c505 test: fix model registry rbac wrong user regression (#8420)
- 292c75d fix: new experiment list tooltip styling (#8433)
- 1b47bf0 ci: delete broken fixture (#8428)
- bf07e61 fix: Wrap older modals in theme class [WEB-1824] (#8432)
- 0c8fad9 chore: filterformstore comment re: change tracking (#8386)
- 1041e56 fix: replace antd select with hew select (#8424)
- aa34aa7 feat: add workspace/project creation/deletion (#8430)
- 2e0a5a2 feat: client gets list_models, too. (#8425)
- 69df80e Revert "feat: Client gets list_models, too."
- d1343ca feat: Client gets list_models, too.
- a1e660d chore: converting SearchGroupsWithoutPersonalGroups into tx (#8419)
- 6ba688d test: fix TestAddAndRemoveBindings flake (#8423)
- d4c4195 chore: update Column and Row from Hew (#8412)
- d1d09e7 docs: Update non root container instructions (#8273)
0.26.4
Release Notes
Changelog
- bf665ae chore: bump version: 0.26.4-rc4 -> 0.26.4
- 2f86950 docs: add release notes for 0.26.4 (#8451)
- f2ef0fe chore: bump version: 0.26.4-rc3 -> 0.26.4-rc4
- f0a37a9 fix: Calculate allocation bar stats same as overview [WEB-1822] (#8431)
- 9acfbf2 chore: bump version: 0.26.4-rc2 -> 0.26.4-rc3
- 9dd0211 fix: k8s autoscaling nodes not counted towards RP (#8439)
- 3bf6647 chore: bump version: 0.26.4-rc1 -> 0.26.4-rc2
- 47397a4 fix: new experiment list tooltip styling (#8433)
- 680ac02 ci: fix linting with responses==0.24.1 (#8436)
- d4200d2 chore: add version dropdown url for previous release (#8437)
- 0a4b6bc test: fix model registry rbac wrong user regression (#8420)
- 242ff97 fix: Wrap older modals in theme class [WEB-1824] (#8432)
- e3c109a chore: bump version: 0.26.4-rc0 -> 0.26.4-rc1
- 22e18ae fix: replace antd select with hew select (#8424)
- 00d349a feat: add workspace/project creation/deletion (#8430)
- 9f727fe feat: client gets list_models, too. (#8425)
- b8c1be7 chore: update Column and Row from Hew (#8412)
- 4e6fd52 chore: bump version: 0.26.4-dev0 -> 0.26.4-rc0
- e9a457d chore: lock published urls to preserve redirects
- 2fae9ba chore: add docs dropdown link for new version
- 6c3bf84 chore: make insert-dropdown-url.sh executable (#8418)
- b5ca7f4 chore: fail deployment if launching part of the service fails (#8409)
- 8498674 fix: allow --json in det master config CLI command (#8413)
- d123932 fix: Place modal inside of ResourcePoolCard (#8414)
- ff19924 chore: Add eslint rule for ?? operator (#8410)
- d56b3ae chore: convert DOS line endings to Unix (#8411)
- c1219eb fix: Hide stats card when 0 on cluster page (#8359)
- da77efb fix: added permission check on GetAllocation (#8281)
- 3b0550c chore: Bumpenvs 0.26.4 (#8407)
- e48d03d fix: user flag to prompt for password during user requests (#8158)
- 513e6d7 fix: Project and Workspace cards wrap modal divs (#8378)
- 2497d84 chore: export AddUserTx (#8403)
- ad764f0 refactor: implement Glossary component from Hew (#8385)
- 52326d1 feat: change cli command for patch master log config DET[9720] (#8054)
- 1e9155d chore(type): stricter tsconfig (#8349)
- 16f18cc chore: revert task obfuscation lint failures (#8406)
- dde3156 chore: Implement Theming updates in Determined [WEB-1726] (#8388)
- 4edfc3c ci: move packaging test to test-e2e-longrunning (#8381)
- d3c208a ci: cache go modules deps and build cache (#8383)
- 8924996 chore: temporarily disable CI upload job (#8399)
- 356f651 Revert "chore: temporarily disable upload_test_results job step"
- 6dd9701 chore: temporarily disable upload_test_results job step
- ba49dbd ci: up parallelism for slowest test_e2e premerge tests (#8374)
- 5f3e556 ci: finish removing growforest (#8389)
- 62084e2 fix: NTSC task and slot viewing obscured for RBAC users with no Viewer Permissions (#8311)
- 0254f7d chore: fix nil ptr on allocation.Proto() (#8372)
- 119e759 chore: fix profiler test in CI (#8382)
- b428d5e feat: add hide column header menu item to explist (#8342)
- 7ae0501 chore: update the lore service port (#8375)
- 052cf8d feat: Cluster historical usage charts move to UI Kit LineChart [WEB-1786] [WEB-1764] (#8327)
- 819948d feat: clear filter from experiment table header (#8376)
- a590999 test: fix slow delete_checkpoint test (#8377)
- b0505db chore: Job/task displays Running instead of Scheduled (#8335)
- 1d64941 chore: short dsat e2e tests (#8288)
- 6afa836 chore: fix CI mnist_pytorch (#8364)
- 4d3eaab chore: Update Horovod Cycle Time (#8362)
- d3b01cb docs: Add det pach tutorial (#8082)
- 7cebc30 fix: adjust card size on workspaces page (#8370)
- 5c93cb0 chore: enable more Go linters (#8333)
- a279967 fix: aws deployment can deploy priority scheduler (#8345)
- 3d9293c fix: fixed bug in error handling in experiment.go (#8339)
- 194bfd5 fix: Cell can be undefined in experiment list table (#8360)
- 1da92aa chore: bump environment images to ubuntu 18.04 [MLG-1194] (#8356)
- 990c56f chore: add list_experiments to experimental.client (#8361)
- 3a7d9ea fix(tests): lower e2e_gpu_quarantine parallelism (#8363)
- 4c48458 fix: patched remote users were able to login with password (#8337)
- baf5c96 chore: port over PyTorch example to use Trainer API [MLG-1181] (#8292)
- 235bd8f feat: delete TB files from the SDK (#8329)
- 2fe3d99 chore: update Typography from UI kit (#8323)
- 2b23674 fix: prevent carriage return in env from crashing deepspeed launcher (#8321)
- 461c307 chore: Remove DesignKit since it's now maintained in Hew [WEB-1790] (#8338)
- 5ee87ec fix: Set group name and number columns to handle Safari [DET-9948] [DET-9949] (#8355)
- 10deef9 fix(experiments): transient errors shouldn't leave trial hung (#8352)
- 512b9f3 chore: remove accidental mock commit (#8354)
- 9d17dbf feat: Show "-" for null values in data cells for experiment list (#8343)
- ea50987 fix: properly interpret flag values (#8326)
- 8b6fc68 fix: Allow SAML and OIDC logins to work differently [WEB-1797] (#8308)
- 274288e docs: fix linting failure (#8351)
- 73bf0e8 docs: log policies (#8302)
- 8418029 chore: ft slot capacity check for each trial [DET-9897] (#8213)
- 494ca57 fix: replace TODO with ctx for deleteTensorboard (#8332)
- cfde2f6 docs: Docs Version Dropdown Automation (#8340)
- 8e69941 chore: Remove examples/legacy (#8153)
- af995ba fix: cli is not a library! (#7891)
- bf0a03d test: fix
ray.air.sessionimport. (#8344) - 9bb10cc ci: mypy fix for responses>=0.24.0 (#8341)
- b924b25 fix: add pin icon in dropdown (#8324)
- 62b7f3b chore: remove fit-content from
TimeAgoc(#8328) - 86d6962 chore: update determined-ui to hew (#8334)
- f580385 fix: metric group charts have more than one color (#8304)
- 1966373 feat: Add tensorboard delete command to CLI (#8227)
- 656c8b2 chore: bump version: 0.26.3-dev0 -> 0.26.4-dev0
- af43248 docs: add release notes for 0.26.3 (#8322)
- b262a3d chore: Update lore.yaml to use the new version
- d64a0ac chore: use a single .golangci.yml file (#8320)
- ad94d20 chore: Add progress bar from UI Kit [WEB-1675] (#8181)
- b3b5be0 feat: implement CodeSample from UI Kit [WEB-1677] (#8270)
- d723b7f docs: fix typo in user edit release note (#8319)
- 6e5d840 chore: initial experiment actor refactor (#8229)
- 8a1ff58 chore: use a single root level go mod (#8285)
- 3511abf chore: delete dead code (#8313)
- d0e6375 chore: add a new deployment type for aws (#8279)
- 3929e8c chore(actors): remove ctx usage in agent_state.go (#8267)
- 5bf1b87 ci: delete broken wait_for helper (#8312)
- 50535f1 test: quarantine GPU execution of test_task_logs (#8261)
- d5b8e80 chore: deployment's --dry-run option doesn't print template (#8303)
- ac89d44 fix: allow experiments with directory checkpoint storage to parse (#8310)
- 306c0c3 fix: Project info not presists when forking (#8307)
- dc1b131 chore: sort out issues after bringing EE e2e_tests into OSS (#8084)
- d182abe chore: slurm support for blocklist (#1111)
- efdf62b fix: return correct location URL for /Users SCIM API endpoint (#1115)
- 37a84d1 fix: ruamel.yaml fixes for EE
- 0ce925a chore: Update nightly tests that use legacy cifar10_pytorch (#1102)
- 6cad296 fix: update for error message change in product (#1098)
- 1e302b5 chore: update e2e tests affected by examples_pruning (#1100)
- ad3dcda chore: cleanup model registry rbac test
- a3ffb5d test: enable command run tests for PBS (#1073)
- 9dd0e42 test: enable command and deepspeed tests run on slurm/pbs (#1044)
- ea4f4c4 chore(templates): ee fixes for template rbac
- c48e48d fix: Test test_slurm_verify_home fails with podman and it shouldn't [FE-136] (#1028)
- 760a738 test: Add pytorch2 distributed e2e tests on slurm [FE-168] (#1007)
- b5aee79 chore: use longer running no op experiment when seeding workspace (#994)
- facbda9 test: run test_hpc_job_pending_reason only on gcp vm (#996)
- 393d0b5 ci: FE-133 Configure non agent slurm/pbs tests to skip without explicitly listing test names in circleci. (#977)
- fd15535 ci: add ee-only files to the import-restrictions linter exclusions.
- 9cf2a26 test: slurm/pbs test for pending reason (FE-90) (#960)
- b3c2ca3 chore(actors): allocation.go, ee side
- eb7d1a1 test: [ALLGCP] Add e2e test for HPC that verifies that user HOME is preserved (#972)
- f3a8b0e test: fix test_slurm.py lint error (#949)
- 71896f3 chore: FE-91: Update base images (slurm/pbs) to include a populated singularity_image_cache (#943)
- 5891567 feat: add rbac to
api/v1/master/config[DET-9633] (#931) - 0a5c32e ci: FE-72: Add test-e2e-pbs-*-gcp tests (#941)
- 4c233c0 feat: add rbac for strict job queue control (#927)
- 6e23aa2 chore: removed admin dependency from delete model/version (#912)
- 7c6c59e feat: rbac for templates (#909)
- e89cc08 ci: DET 9622: (ee) test_slurm.py::test_cifar10_pytorch_distributed failures (#919)
- c6ee094 fix: test_rbac goes to wrong url (#918)
- 6b34e0d fix: DET-9483 successfully run e2e_slurm_preemption tests as part of nightly workflow (#903)
- 4f6277d ci: FE-14 Migrate test-e2e-slurm to GCP slurmcluster (#879)
- f4507f7 tests: fix a miss indentation leading to missing project err (#878)
- 5cad9bb chore: fix a missing check for global permissions in jq (#874)
- bca3848 feat: add rbac support for reading job queue (#871)
- 5c79474 chore: update how we wait for tasks to be ready (#863)
- b292862 test: fix
test_master_host[DET-9482]. (#851) - 5375a08 ci: quarantine flaky slurm tests (#850)
- 8e44c6c fix: Patch groups test [DET-9473] (#845)
- 49d2e08 fix: fix bug with launching tensorboards on trials (#842)
- d4dcbe5 test: Fix and add e2e_slurm_preemption tests to nightly workflow [D...
0.26.3
Release Notes
Changelog
- bd74446 chore: bump version: 0.26.3-rc3 -> 0.26.3
- 162de31 docs: add release notes for 0.26.3 (#8322)
- a472745 chore: bump version: 0.26.3-rc2 -> 0.26.3-rc3
- bab1dad fix: allow experiments with directory checkpoint storage to parse (#8310)
- ec438f2 fix: adjust width size in group table (#8309)
- 0d49b53 chore: fix job service panic when workspace does not exist (#8306)
- 13525e8 fix: check externalConfig is enabled before setting det_jwt as auth header (#8298)
- 101f279 fix: undefined handling in
CreateGroupModal(#8301) - b9e64ab docs: quick fix for version dropdown (#8300)
- 48e915f chore: bump version: 0.26.3-rc1 -> 0.26.3-rc2
- 3c2cb53 ci: update wrapper config to always run
- 55c8a3a chore: bump version: 0.26.3-rc0 -> 0.26.3-rc1
- c138065 chore: bump version: 0.26.3-dev0 -> 0.26.3-rc0
- 862b41e chore: bump version: 0.26.2-dev0 -> 0.26.3-dev0
- b2e4b02 docs: add release notes for 0.26.2 (#8245)
- 989a0e3 fix: update bumpversion cfg for new CircleCI config (#8293)
- 54a12b8 fix: fix docs linting (#8291)
- 108cca7 docs: add documentation for Keras and PyTorch profilers [MLG-1094] (#8253)
- a7dfd47 chore: lock published urls to preserve redirects
- 8ae316c chore: lock api state for backward compatibility check
- d0c8273 chore: refactor filterformstore (#8239)
- 2992b82 feat: Updated LineChart in UI Kit [WEB-1700] (#8105)
- c6c3235 fix: login error message (#8240)
- 949ecf9 feat: support PyTorch Profiler in DeepSpeed trials [MLG-1095] (#8251)
- b2ea839 fix: use proper default cpu env image in the helm chart. (#8287)
- ccc78dd chore: install request as dev dep to fix proxy.js (#8286)
- af40f2c fix: only train for one batch in PyTorch Trainer test mode (#8260)
- cbab7e3 fix: dsat with all yaml formats (#8284)
- d17a2cc fix: migrate CLI from deprecated SDK methods (#8282)
- f43acd8 feat: directory checkpoint storage [DET-9594] (#8255)
- f375fbb feat: log policies (#8145)
- 862951f chore(actors): remove slots, slot proxy hacks from agentrm (#8266)
- 8d79c17 feat: webhook type task logs (#8175)
- b086982 test: fix delete experiments potential flake (#8283)
- fe06a0a fix: update copy for Agent UID/GID modal in the user mgmt UI. (#8278)
- 1fbb9f0 chore: restore set resource pool using job service (#8280)
- 0bea20b chore(actors): remove actors from resource aggregation (#8265)
- 76b4bfe feat: update add members to group (#8262)
- f24748c docs: Fix a 404 error (#8277)
- e4701b2 fix: remove Topology from the ResourcePoolDetail page (#8274)
- bec5edd chore(actors): refactor k8s rm without actors (#8264) [DET-9658]
- 46b9360 fix: icon size in TaskBar (#8263)
- 5c12463 test: don't commit Go mocks (#8258)
- a33d2b8 chore: ci should always setup python venv for caching (#8257)
- c09c529 ci: wart removal (#8147)
- 65219ca docs: fix inaccuracies in
bind_mountsdocs. (#8254) - d3bd2cb feat: unify new/edit group modals [WEB-1741] (#8236)
- 3179505 build(deps): bump actions/setup-node from 3 to 4 (#8230)
- e0a7efc fix: support ruamel.yaml>=0.18.0 (#8237)
- 2365df6 docs: check for dropped urls in PRs (#8247)
- f9c6402 fix: deepspeed e2e_tests fail when environment variable contains a newline character [FE-256] (#8154)
- ec7e004 fix: prevent passing login r= param to relayState (#8244)
- ab4cf83 feat: Update "Add Members" to Workspace experience (#8195)
- 25adf7d fix: dont suggest registering or deleting a checkpoint that is already deleted (#8246)
- c52ef71 feat: add new CLI command to edit multiple fields at once (#8075)
- 1d86343 chore: move files from /Clusters into /Cluster [WEB-1730] (#8231)
- 15ee78f docs: Add article for using detached mode (#8217)
- 115d7c2 fix: GetTasks doesn't respect rbac (#8233)
- 47b37eb chore: update Avatar component in UI kit [WEB-1055, WEB-1734] (#8178)
- 4c59393 chore: indirect import for jobs to avoid import cycle (#8238)
- aa77386 chore: add back missing commit in Python SDK (#8206)
- 9a80d99 ci: alternate mechanism for running nightly tests (#8221)
- 27a279b refactor: use Nameplate component in NavigationSidebar [WEB-1057] (#8152)
- c0e3062 revert: fix: support ruamel.yaml==0.18.0 (#8235)
- ae839a8 fix: support ruamel.yaml==0.18.0 (#8228)
- bb7020a fix: inaccurate task queue time [DET-9912] (#8225)
- 12e23b0 fix: pin ruamel.yaml<0.18.0 (#8232)
- 164b920 feat: Update users and groups tab to reflect count (#8224)
- 2ac9661 chore: shorten distributed-quarantine name (#8234)
- c964987 docs: Apply minor edits (#8215)
- f61d250 chore: support redirect on auth failure for echo routes (#8196)
- f251add docs: update the status of TLS security for notebooks. (#8191)
- 54551c5 refactor:
CliErrordoes not neede_stackproperty. (#8179) - 7c6b2d0 chore: deprecate
apexsupport. (#7526) - bfda78d docs: quick fix for version dropdown (#8223)
- 5929161 docs: Fix broken Slack links (#8220)
- b076179 chore: refactor actor system out of internal/job (#8174)
- facdc30 fix: remove 'contents' parameter from remove_notes (#8209)
- 448162d docs: Edit setup checklist (#8214)
- 164579f feat: update group table (#8194)
- 8f5e883 fix: Multi-trial visualizations switch from metricType to group string [DET-9896] (#8137)
- 834bd2f chore: remove WebSocket actor (#7552)
- b6e28d7 fix: prevent extra updates when observing settings store (#8212)
- 707f77d fix: Trigger function updates new user modified_at without error (#8210)
- f9c8a86 chore(actors): refactor k8s' resource_pool.go without actors (#8186) [DET-9657]
- ea3d6bf docs: sort articles by weight custom extension (#8208)
- 464fb54 docs: Create new advanced setup section (#8203)
- dd1c6f0 docs: toctree tile css (#8207)
- 53b9bdb fix: CLI uses default where pagination not included in args [DET-9908] (#8192)
- 079304a fix: filter agents/nodes by poolName (#8205)
- 3af965f chore: move ui kit to separate repo (#8104)
- 8735acf docs: remove references to PyTorchTrialContext.from_config() (#8187)
- 03383f2 fix: fix issue with db migrations (#8193)
- 2ff18bc fix: Display byte axis values using humanReadableBytes [DET-9906] (#8189)
- bd5e628 feat: add "Topology" section to the cluster UI (#8108)
- 6dfde99 fix: icons should appear in safari (#8190)
- 36ad3ca chore(actors): refactor pods.go without actors (#8170) [DET-9901]
0.26.2
Release Notes
Changelog
- 85b5135 chore: bump version: 0.26.2-rc4 -> 0.26.2
- 25e578d docs: add release notes for 0.26.2 (#8245)
- 1f7945e chore: bump version: 0.26.2-rc3 -> 0.26.2-rc4
- 87c11fb fix: inaccurate task queue time [DET-9912] (#8225)
- 07ea3b2 fix: pin ruamel.yaml<0.18.0 (#8232)
- 6e8a762 docs: quick fix for version dropdown (#8223)
- 8896d23 fix: remove 'contents' parameter from remove_notes (#8209)
- b3fdf9e chore: bump version: 0.26.2-rc2 -> 0.26.2-rc3
- 5585f38 fix: prevent extra updates when observing settings store (#8212)
- 826f633 chore: bump version: 0.26.2-rc1 -> 0.26.2-rc2
- 829c3bf fix: Trigger function updates new user modified_at without error (#8210)
- 0b0d268 fix: CLI uses default where pagination not included in args [DET-9908] (#8192)
- b0071c2 chore: bump version: 0.26.2-rc0 -> 0.26.2-rc1
- f049bd6 fix: fix issue with db migrations (#8193)
- 1f4bbe2 fix: icons should appear in safari (#8190)
- 5d3a80d chore: bump version: 0.26.2-dev0 -> 0.26.2-rc0
- 57fee58 chore: lock published urls to preserve redirects
- 883135a chore: various deprecations to standardize SDK get/list/iter (#8165)
- 0ab908e chore: add release notes for Python SDK (#8184)
- 2c09e10 docs: fix redirects (#8188)
- bfd20f1 docs: update experiment config reference for records_per_epoch in PyTorchTrials (#8185)
- 1fb50cd fix: pin icon should adapt to theme colors (#8183)
- aff81a5 chore: migrate SDK to a generic OrderBy [MLG-1056] (#8171)
- 88549ee Revert "feat: use max_results in SDK's list_trials (#8173)" (#8182)
- c29d086 fix: tensorboard sync for profiler data, Core API v2 managed mode [MLG-1063] (#8163)
- 8324bf2 fix: Change oicd client secret env var name to comply with naming convention (#8113)
- 3946185 docs: Restore path to singularity file (#8180)
- d12373f chore: reinstate core_api example e2e tests (#8148)
- 16ef0bb feat: use max_results in SDK's list_trials (#8173)
- ee2e633 refactor: add delete cascade to tables affected by experiment deletion (#8016)
- 404831b chore: final trial actor refactor (#8164)
- eb98aeb fix: case insensitive member search (#8166)
- b76dd90 fix: Learning curve point click (#8155)
- 95f2ad7 chore: task resources actor refactor (#8157)
- 1bff62e chore: configure a training port offset (#8125)
- e4d5ad1 docs: Modify the info architecture (#8100)
- 556022a chore: change CSS values (#8151)
- 88d9533 fix: Theme dropped after page refreshing (#8139)
- 515f128 refactor: update UI Kit Icon component [WEB-1699] (#8122)
- 82b4d31 chore: Post pruning hotfixes (#8141)
- 67f6dc1 Python SDK v1 (#8005)
- 3a24611 docs: fix python-sdk reference syntax (#8146)
- e7247a5 docs: Restore core api integer incrementing tut (#8115)
- 2a7f141 chore: add option to proxy requests to internal service (#8044)
- d40dee3 fix: Move setting of playwright browsers path (#8144)
- 31c2515 chore: update metrics documentation (#8118)
- b973356 chore: Add Message component to UI kit [WEB-1056] (#8133)
- 5df3d57 chore: remove actors from resource manager interface (#8126)
- f9a61b4 chore: upgrade to node v20 and npm audit [WEB-1662] (#8036)
- 38ff7b5 docs: update link in prometheus docs (#8138)
- ea8d760 chore: Examples pruning (#8140)
- 8b9ec03 fix: move couldn't-connect-to-master message into ship_logs.py (#8127)
- f52f7c6 docs: Apply style guide edits (#8134)
- 2f65eec chore: bump version: 0.26.1-dev0 -> 0.26.2-dev0
- ee3c478 docs: add release notes for 0.26.1 (#8131)
- 5d55da1 ci: dannys/reenable docs autoassign (#8012)
- 017e0f2 docs: Remove ref to fluent bit (#8128)
- 42535c2 chore: add hf context to all dist e2e [DET-9893] (#8119)
- 7e6689b chore: make agent.yaml readable by default (#8095)
- 8d2b5ce fix: hotfixes for ship_logs.py (#8116)
- 233cd95 feat: add allocation exit status to db and implementation for get allocation (#7897)
- 22f1f92 chore: remove task allocation group actor ref (#7853)
- 42bfb72 fix: Return ResourcePools with a fixed order (#8103)
- 3b42b91 chore: write log shipper in python [MLG-993] (#7974)
- ae7c415 fix: DET_CERT_MASTER_FILE=noverify det shell open (#8110)
- 3e1b83e docs: redirects.py moves subdirectories properly (#8111)
- 29aa772 fix: fix primary key for allocation_accelerators table (#8106)
- f9e9a45 build: print sphinx-build command on make build (#8109)
- a51ef7c fix: update WorkspaceMemberAddModal when new user or group is created [WEB-1111] (#8069)
- ff9b77a fix: trigger
autoupdate_users_modified_atbyusernamechange (#8093) - 4899df7 feat: Edit experiment from list (#8086)
- d638303 feat: Dont allow users to add deactivated users to a workspace (#8073)
- f559eee fix: single point different axis ranges (#8096)
- 0c54010 fix: Empty / NotEmpty operators for descriptions, tags [WEB-1751] (#8090)
- 469c2ae perf: improve task stats IMAGEPULL performance (#8067)
- e7751fe chore: auth task logs [DET-7554] (#8089)
- 588d817 chore: cleanup allocation exit logic (#8088)
- e2b51e7 feat: add implementation for get and set acceleration data api and intg test [DET-9748] (#7856)
- 5483d12 fix: ignore
last_auth_atto updatemodified_aton users table (#8091) - 0f8c1d4 fix: Trial data loading state (#8083)
- 6a91c3d fix: Metric type is blank in comparison chart (#8085)
- 1819810 chore: allow attaching select dropdown to select container (#7940)
- e5b2e35 refactor: removed reference removed agents/./slots/. endpoint (#7424)
- ecc0f6b chore: support "formData" in swagger bindings parsing (#8078)
- 9dfc0a3 feat: Hide deactivated user in ws members list (#8077)
- 89c8da2 feat: introduce batch actions in user management page (#8056)
- ca63335 docs(performance): add sections to help with getting started (#8079)
- 9ea05a1 chore: change master config to k8s secret (#8053)
- 58fab08 fix: Experiment name settable in fork config (#8081)
- 6006d42 fix: Show experiment loading state (#8066)
- 8073a2d fix: order of API paths in proto for AssignMultipleGroups (#8080)
- 5be764b feat(performance): implement slack report send [INFENG-234] (#8045)
- 18d7286 fix: shell open gives unfriendly message on terminated (#8074)
- fc2e8bf test: fix test_pytorch_parallel logs check (#8064)
- ca763a4 fix:
det deploy aws listresults were incomplete. (#8062) - c8edfab chore: cleanup pod informer logging (#8072)
- 14bd162 docs: quick fix for version dropdown (#8070)
- bcfdeac chore(tests): workaround data races in uptrace/bun (#8065)
- 5d56e95 chore(tests): fix data races in webhook integrations (#8061)
- ce5bcf2 ci: backend owns e2e_tests folders cluster, command, and template (#8037)
- 3075820 chore(tests): fix data races in streaming API intg tests (#8059)
0.26.1
Release Notes
Changelog
- a6b26b0 chore: bump version: 0.26.1-rc3 -> 0.26.1
- 4bd3dcb docs: add release notes for 0.26.1 (#8131)
- de1526b chore: bump version: 0.26.1-rc2 -> 0.26.1-rc3
- 6e19285 fix: Return ResourcePools with a fixed order (#8103)
- 355cb62 chore: bump version: 0.26.1-rc1 -> 0.26.1-rc2
- 3f97397 fix: Trial data loading state (#8083)
- 740e730 fix: trigger
autoupdate_users_modified_atbyusernamechange (#8093) - 288db9f fix: single point different axis ranges (#8096)
- d972837 fix: Empty / NotEmpty operators for descriptions, tags [WEB-1751] (#8090)
- 3c5b281 fix: ignore
last_auth_atto updatemodified_aton users table (#8091) - b086b84 chore: bump version: 0.26.1-rc0 -> 0.26.1-rc1
- 2283723 fix: Metric type is blank in comparison chart (#8085)
- 90dfb54 fix: Experiment name settable in fork config (#8081)
- ce7549a fix: shell open gives unfriendly message on terminated (#8074)
- 663f4d8 docs: quick fix for version dropdown (#8070)
- 180d1b3 chore: bump version: 0.26.1-dev0 -> 0.26.1-rc0
- 1a96b8a chore: lock published urls to preserve redirects
- d1c3fa4 Revert bump version (#8063)
- 9b3b65a chore(tests): fix data race in grpclog init during integrations (#8058)
- ef0ef50 chore: lock published urls to preserve redirects
- b241b78 chore: bump version: 0.26.1-dev0 -> 0.27.0-dev0
- b14cc40 test: stable diffusion example tests [MLG-903] (#7855)
- 4cb3151 docs: Improve upgrade instructions (#8032)
- aa736a2 fix: detached mode tensorboard storage support [MLG-872] (#7992)
- d11f6c9 fix: fake cert gen.sh generated broken certs (#8055)
- a8a3399 fix: bring back logging for
core.train.report_*calls. (#7975) - f4524ac fix: pass final forked config to server for new experiment (#8051)
- 5ff6e9c feat: update InlineForm inputs (#8033)
- 9c05b00 feat: updating agent user group affects users.modified_at (#8052)
- 9414539 fix: fasterrcnn image not found [MLG-516] (#8047)
- 6aca426 chore(tests): fix data races in k8s tests (#8049)
- 764f407 chore(tests): fix data races in telemetry tests (#8048)
- 90fb051 chore: rename 'cancel' to 'stop' for experiments and trials [WEB-291] (#8038)
- a4c244e fix: command resolve pool defaulting to workspace 0 instead of 1 (#8050)
- ccefc6d chore: allow passing clusterID in master config (#8042)
- 1214a2b fix: Return ResourcePools rather than names (#7990)
- 7aa8541 fix: govcloud agent AMIs are out of sync with bumpenvs [MLG-986] (#7983)
- be1f734 chore: add DatePicker to UI Kit [WEB-1674] (#8040)
- 5bd7b8a feat: SDK can list workspaces. (#7765)
- c0467b0 ci(performance): create initial gha workflow [INFENG-224] (#7969)
- 9dcedad fix: k8s custom pod spec affinity would get ignored (#8043)
- f4d3c47 fix: Note UI respects project permissions (#8028)
- 338d2f6 fix: rename
last_logintolast_auth_at(#8022) - 34373ac feat: Batch actions for multiple users into one request [WEB-1640] (#7971)
- 1d85304 test: fix shell open test flake (#8035)
- 827d9a6 feat: Filter user list by role id for EE (#7988)
- e57b8d6 fix: Hide checkpoint deletion btn when already deleted (#8039)
- 04609e0 fix: Remove the unexposed GetJobQStats from RM interface and all RMs (#8030)
- d00ddc9 chore: Add error case and tests to Loadable [WEB-1333][WEB-1711] (#8025)
- 78dfc83 fix: Support longer titles on HParam scatter plots (#8031)
- fd9b406 fix: nil ptr for Proto() on users who haven't logged in (#8029)
- 846fbc3 fix: add css formating for miltiple input errors (#8011)
- 42a4911 chore: adding an example for distributed batch inference for mnist (#7976)
- c83dfe6 docs: fair-share scheduling policy [skip ci] (#7981)
- ccfda89 fix: Add fetch to resource pool bindings page (#8023)
- 4b085ff chore: move Loadable to kit [WEB-1688] (#7973)
- 67fdacb fix: sort files for conflict resolution in sharded checkpoints (#8014)
- 10fcb10 feat: add tooltip linebreak to the project card (#7995)
- 4b4ad8d fix: add existing "non-setting" query parameter into the settingsToQuery function (#8013)
- f021ac2 test: fix master intg user flake (#8019)
- 8937f2b fix: align default markdown font-family to theme [WEB-617] (#8009)
- e03e469 feat: "det notebook|shell|tensorboard open" doesn't error when task not ready (#8008)
- 109e69f fix: tensorboard deletion when det e delete [DET-9844] (#7997)
- e3ffc4f docs: Fix minor issues (#7999)
- 2e77f52 fix: regular integer spacing of chart ticks [WEB-1714] (#8010)
- f1ae472 feat: add
last seencolumn in user management table (#7991) - d2a5505 refactor: remove external dependencies from UI kit [WEB-1689] (#7968)
- e86abd5 fix: fix sorting in GetUsers endpoint (#8001)
- 69921eb feat: commands download user files at startup so k8s can support larger context directories [DET-8830] (#7889)
- 86e6f47 fix: report searcher progress according to reporting period (#8006)
- 5b2f238 ci: more precisely select files for splitting in E2E tests (#7989)
- f918188 fix: show deep files in experiment code viewer (#7945)
- 1d2b60b fix: data fetch shouldn't interrupt editing model version description [WEB-1703] (#8000)
- 7dff1f2 chore: bun debug mode off (#7996)
- 0241795 fix: Version dropdown in docs is scrollable (#7994)
- 46da826 ci: disable docs review action [skip ci] (#7982)
- 390e0ac chore: add tests for postgres_users.go (#7875)
- a60c0d7 fix: Hide action menu on dashboard project cards (#7986)
- a2fb7ef chore: bump version: 0.26.0-dev0 -> 0.26.1-dev0
- 7eaf361 docs: add release notes for 0.26.0 (#7987)
- 25439fc chore: put grpc panics and other logs into 'master logs' (#7965)
- 8c373cf fix: Force GCP node name length to be less than maximum length (#7964)
- 7f21b4f chore(templates): refactor templates to their own package (#7876)
- b9fb3eb chore: Add toast to UI kit (#7950)
- d45e22b feat: Support filter by status and role for users (#7953)
- 5b5fb2f chore: fix det deploy aws requiring --db-size (#7984)
- e254bf3 docs: Update custom pod specs page (#7970)
- 3d1ec1f fix: Update Tasks Stats Causes Deadlock [DET-9853] (#7980)
- 5c5e0ec feat: single experiment continue [DET-9703] (#7764)
- 34c5bdb docs: document prometheus auth (#7957)
- 6e3cad3 chore(codeowners): map performance dir to web team (#7916)
- 9a5cfd7 fix: handle nil actor message and nil actor errors in agent RM (#7951)
- d25612b feat: Add instance flavor and size arguments for det deploy aws [INFENG-227] (#7931)
- e031a2f fix: update
modified_atby insert in user table (#7949) - 2d65e82 fix: Change measure of text lines in log containers [WEB-1664] (#7860)
- 14a779c chore: add last_login column to users table/model (#7948)
- f4e3638 chore: fix docs reference in cli [MLG-891] (#7926)
- a4fdd0a chore: Improve failure diagnostics in shell test [FE-216] (#7932)
- b576b6b chore: less verbose mockery output (#7822)
- 9df980c chore(performance): add initial Makefile and README (#7914)
- 47a4070 chore: handle case where steps completed is more than max length (#7816)
- 1193acd chore: log k8s nil event objects at trace level and ignore (#7962)
- 1c86762 chore: max_slots_per_pod can be per resource pool [DET-9771] (#7923)
- 3267b1f fix: return searcher_metric_value as-is (#7961)
- b5845b7 fix: singularity agent env variable (#7960)
- 19d703b fix: mitigate user settings race conditions (#7905)
- f4382a7 fix(db): handle erroneous nulls from the summary metric migration (#7958)
- 65cabae fix(experiments): don't transition experiment to "" state on crash (#7956)
- aa30e86 docs: quick fix for version dropdown (#7952)
- 18e2f1f fix(scheduler): tolerate missing groups in priority scheduling by skipping them (#7947)
- 4b19928 chore: bunify & tidy up internal/user (#7886)
- 3f067a3 fix(allocation): allocation lifetimes should contain resource lifetimes (#7944)
- b6b5a84 Remove references to --auto-bind-mount (#7910)
- 2ba2580 chore(deps): bump tibdex/github-app-token from 2.0.0 to 2.1.0 (#7938)
- 2baaf30 fix: clear selection after action (#7921)
- 4db1c08 refactor: css in docs (#7934)
- a392dc8 fix: button in 404 page (#7936)
- 0e3f4d0 fix: avoid hiding tabs in single trial experiment [WEB-1651] (#7941)
0.26.0
Release Notes
Changelog
- 29705a8 chore: bump version: 0.26.0-rc3 -> 0.26.0
- 084e485 docs: add release notes for 0.26.0 (#7987)
- 2882e78 chore: bump version: 0.26.0-rc2 -> 0.26.0-rc3
- 623774e fix: Update Tasks Stats Causes Deadlock [DET-9853] (#7980)
- c11c5e4 fix: handle nil actor message and nil actor errors in agent RM (#7951)
- df6c317 fix: update
modified_atby insert in user table (#7949) - 9a795c1 chore: bump version: 0.26.0-rc1 -> 0.26.0-rc2
- d330154 chore: log k8s nil event objects at trace level and ignore (#7962)
- a8465d8 fix(db): handle erroneous nulls from the summary metric migration (#7958)
- ab9d933 fix(experiments): don't transition experiment to "" state on crash (#7956)
- 7b506f9 fix(allocation): allocation lifetimes should contain resource lifetimes (#7944)
- 2b12176 chore: bump version: 0.26.0-rc0 -> 0.26.0-rc1
- 78482b2 docs: quick fix for version dropdown (#7952)
- 0893e30 fix: clear selection after action (#7921)
- a9a382f fix: button in 404 page (#7936)
- 7f85454 chore: bump version: 0.26.0-dev0 -> 0.26.0-rc0
- 9ac8f82 chore: lock published urls to preserve redirects
- 6ba27fe chore: lock api state for backward compatibility check
- b52b3a6 chore: bump version: 0.25.2-dev0 -> 0.26.0-dev0
- c2cea7d chore: include api op and param description in py bindings (#7798)
- c98cc07 feat: Allow passing in swagger json as an argument (#7843)
- 482285f docs: Add another top nav link (#7933)
- d9e1bb5 chore: track dead code [WEB-258] (#7924)
- 537cc3d docs: Update launcher version to 3.3.8 for consistency with docs (#7915)
- 5b859c6 fix(cli): det model describe should call GET /model not GET /models (#7912)
- 262c33a docs: Clarify weighted fair-share scheduling policy (#7913)
- 4a486bb feat: Add performance tests for endpoints used in the WebUI initial load [WEB-1459] (#7906)
- cc360ac feat: Add workspaces to the SDK client (#7883)
- c87ca94 fix: api_command.go does not merge map values when overrides TaskContainerDefaults [FE-114] (#7887)
- dc40688 chore: update docs ownership per discussion [INFENG-225] [skip ci] (#7907)
- ee79213 test: fix e2e_tests ray dependency. (#7925)
- 688ff63 fix: align items in task list (#7894)
- 694f44a feat: submit forms in modals by pressing enter [WEB-1130] (#7857)
- 376ea50 fix: Display data point in line chart when epoch is 0 (#7898)
- 8eff8ac chore: update user docs (#7902)
- 4441ac8 feat: Input should capture the Esc button and Clicks while focused [WEB-1251] (#7859)
- 0365ca7 revert: "chore(actors): remove pkg/actors usage from pods.go (#7658) [DET-9652]" (#7908)
- c1a0cf4 chore(actors): remove pkg/actors usage from pods.go (#7658) [DET-9652]
- ff2e16d fix: NTSC use workspace's agent group info (#7892)
- 8bef0d4 chore: no code owners for auto-generated files (#7896)
- 5278758 chore: increase Go's max line length to 120 (#7903)
- b09334d feat: Add display name to user list in cli [MLG-930] (#7901)
- 4301fc2cc feat: move UI related files to the UI kit. (#7852)
- 3f9a980 feat: Hide code related actions based on model definition size (#7854)
- dde10f1 Revert "ci: temporarily move e2e to only nightly [skip ci] (#7837)"
- 35aa028 feat: add an API to get an allocation's exit status (#7731)
- e56ed43 chore: prompt for docs in github question template (#7895)
- 4d827cd fix: remove unused go code (#7893)
- b93f0c9 feat: disable actions of unmanaged experiments/trials (#7874)
- 3b1bebf fix: Metadata deleting last row, cancelling delete [WEB-1655] (#7805)
- 9cddca9 Refactor: Use userSettings store in learning curve (#7783)
- 4a6afb1 feat: add config option to omit default resource pools (#7885)
- e5555ca fix: redefine user columns updated in postgres_users toUpdate (#7890)
- 29561a8 test: quarantine nightly cifar10-keras convergence test (#7780)
- 1cfc7f3 fix: Project move/delete updates UI state [WEB-1668] (#7870)
- 00dfcca test: enable command run tests for hpc (#7880)
- 90d66b8 chore: disable interactive matching for dev bindings (#7747)
- b6632d1 chore: postgres_users.go bun migration [DET-8238] (#7769)
- 826f2b4 feat: containerize performance tests [INFENG-222] (#7863)
- 66f6f4a ci: fix webui test results upload (#7877)
- 64ec5ed Revert "feat: add config option to omit default resource pools (#7696)" (#7878)
- b95d57f fix: use setPartial in experiment list setting (#7873)
- e18d5cf feat: add config option to omit default resource pools (#7696)
- 5e69b6c feat: k8s agent enable disable [DET-9750] (#7779)
- d2e5abb docs: Remove black borders on gif (#7872)
- 144dd0f test: remove deepspeed marks from dsat tests (#7871)
- 40b4341 docs: Add gif to the Readme (#7865)
- a5b29cf docs: Adjust diagrams replacing fluentbit icon (#7867)
- 096935f chore: update user docs (#7864)
- 6d5ad2d docs: Add page for using Determined Agent on Slurm/PBS (#7866)
- 79060ba chore: Remove imagenet (#7664)
- cdceeac feat: Display unmanaged experiments with label (#7861)
- 16a6262 fix: Make models list editing work via ModelActionDropdown [WEB-1603] (#7799)
- 84a6612 fix: doc url in jupyter config modal (#7862)
- 5e04699 fix: fix how we are calling the bert embedding example (#7851)
- d8c7bd2 docs: Clarify meaning of trial api (#7818)
- 813ed36 fix: error message for
det agent [enable|disable]. (#7839) - 9132dcb feat: expose
externalExperimentIdandexternalTrialId(#7840) - 9874951 chore: bump version: 0.25.1-dev0 -> 0.25.2-dev0
- a5bdfa7 docs: add release notes for 0.25.1 (#7850)
- 38dc440 chore: trial actor refactor (#7821)
- f3aaf4d fix: pass configString once to createexperimentmodal (#7849)
- 346d4aa chore(deps): bump tibdex/github-app-token from 1.8.2 to 2.0.0 (#7847)
- 93e5341 feat: helm ca.cert injection, cluster-wide non-namespaced res creation flag, password change and minor-fix (#7808)
- 442bac6 feat: backend support for inference metric tracking part 2 (#7592)
- 06080b9 feat: allow metrics with duplicate keys and the same value [MLG-890]. (#7820)
- e42c973 feat: enable display of metrics with floating point epoch [MLG-857] (#7829)
- 3c9e0e2 feat: add new API endpoint to get and post accelerator data (#7723)
- febbe18 fix: enable RP bindings management for workspace admins (#7834)
- e843173 ci: temporarily move e2e to only nightly [skip ci] (#7837)
- 4956673 fix: display
progressvalue as it is (#7836) - 4d74e95 refactor: flipped k8's enable reattach to always true [DET-9726] (#7692)
- 4010b74 chore: nil exception on GetResourcePoolsRequest error (#7835)
- 12d393f fix: dupe checkpoints (#7833)
- 6b12390 fix: log viewer not updating when page switched (#7823)
- 590ea21 ci: fix check-rebaseable syntax [ci skip] (#7826)
- a99385d ci: Add a newline to the output for pre-check (#7824)
- c2ce179 chore: support binary output via dev curl (#7778)
- d78bafe fix: SSO button text color (#7819)
- b4f6f0c fix: correct useResize hook to return proper element sizes [WEB-1656] (#7807)
- e31a077 chore: Split out partial updates into setPartial (#7815)
- 8bcee31 chore: make pre-commit dev setup opt-in. (#7774)
- 675de43 chore: minor copy change (#7810)
- dceb00c chore: agent device discovery too greedy (#7802)
- 1284914 chore(deps): bump actions/checkout from 3 to 4 (#7786)
- 9651c9b fix: progress filter in exp (#7811)
- c284b09 fix: lower severity of allocation log changed when debugging (#7803)
- ff830f9 fix: Learning curve will send falsey metricType (#7809)
- 3c2ab1e chore(deps): bump tibdex/github-app-token from 1.8.0 to 1.8.2 (#7772)
- da23134 docs: HPC launcher doc tweaks, add image scheme docker-archive:// (#7812)
- 1a89c56 docs: Add sections on HPC upgrade and package verificaiton (#7804)
- a51892e fix: Avoid dropdown repeating in ExpList fields dropdown [WEB-1598] (#7800)
- fab413b chore: tools/k8s doesn't use coscheduler (#7795)
- 2b95373 docs: Update the installation guide (#7762)
- 8291a18 ci: quarantine some flaky nightlies (#7725)
0.25.1
Release Notes
Changelog
- 39a421a chore: bump version: 0.25.1-rc2 -> 0.25.1
- 61c11df docs: add release notes for 0.25.1 (#7850)
- e0d0ed2 chore: bump version: 0.25.1-rc1 -> 0.25.1-rc2
- 74eeb77 fix: enable RP bindings management for workspace admins (#7834)
- 1d8e3d2 fix: display
progressvalue as it is (#7836) - 2c86593 chore: bump version: 0.25.1-rc0 -> 0.25.1-rc1
- 117b173 fix: log viewer not updating when page switched (#7823)
- b93bc72 fix: SSO button text color (#7819)
- cfdacb4 fix: correct useResize hook to return proper element sizes [WEB-1656] (#7807)
- 81b673d fix: progress filter in exp (#7811)
- 29ad1d1 fix: Learning curve will send falsey metricType (#7809)
- b0a7e4e fix: Avoid dropdown repeating in ExpList fields dropdown [WEB-1598] (#7800)
- ebd1906 docs: Update the installation guide (#7762)
- 1bf08e5 chore: bump version: 0.25.1-dev0 -> 0.25.1-rc0
- 7f7e89b chore: lock published urls to preserve redirects
- 59ebdf0 chore: lock api state for backward compatibility check
- 9b6c6c7 fix: Get distributed jobs working with devcluster [FE-181] (#7785)
- d786078 chore: revert trial actor refactor (#7797)
- f4ca02a docs: quick fix for version dropdown (#7796)
- 72d34d9 chore: reduce master log noise (#7794)
- 12a513c chore: Create/document a mechanism to run the nightly tests on a PR [FE-146] (#7750)
- af24954 Revert "chore: track dead code [WEB-258] (#7767)" (#7793)
- c815f76 fix: handles custom TLS certs in enrich_task_logs.py [DET-9803] (#7782)
- 5c83901 chore: remove empty
determined/common/api/checkpoint/. (#7776) - a2b873f chore: suppress the daemonize message on HPC jobs (#7775)
- 5f2f6b8 fix: glitchy width in code editor (#7771)
- bdeb0ea chore: track dead code [WEB-258] (#7767)
- 2167292 chore: trial actor refactor (#7559)
- 06e361e fix: correct date range for avg queued time charts [WEB-1621] (#7754)
- 7c765ae refactor: remove fluent bit & replace with slurm log shipper [DET-9704] (#7639)
- 214198d fix: include
unmanagedfield inGetExperiment. (#7768) - f8caa0e feat: Create performance tests [WEB-1458] (#7741)
- b0badb2 fix: Handle chart x-axis with all points at x=0 [WEB-1622] (#7760)
- a6d0fba chore: rearrange log level constants (#7752)
- b22f652 chore: ignore flake8 import restrictions pre-commit check (#7759)
- ef8a295 chore: bump version: 0.25.0-dev0 -> 0.25.1-dev0
- 9333c9d docs: add release notes for 0.25.0 (#7756)
- 8af14ab chore(actors): refactor pod.go (#7617)
- 046e060 test: make error checking case insensitive fixing rbac test (#7749)
- 6c530d3 build: fix
go-version-checkcommand (#7751) - 418931b refactor: make glide-table conform to standard event handler pattern and fix paginated row selection bug [WEB-1471, WEB-1561] (#7704)
- 93d861d chore: use Message for no data in ComparisonView (#7654)
- 4d25428 docs: tweak brew instructions (#7743)
- cf57ce4 chore: upgrade go 1.20 to 1.21 (#7657)
- 0553f19 fix: make rbac messages consistent (#7745)
- e1675b2 fix: not all resource pools should be labeled "default" [WEB-1600] (#7744)
- 2e907ce fix: resource pool card workspace tweaks (#7732)
- 5154c3b fix: proxy tunnel server should use
SO_REUSEADDR. (#7735) - e845836 fix: React build issue (#7742)
- 2fc5f57 fix: Faster polling for first experiment metrics [WEB-1576] (#7740)
- ad46c44 chore: add a new assertion method to check command exit status and report any errors (#7737)
- 418d5ae fix: allow deletion of workspaces when case-insensitive matches exist (#7738)
- 690d451 docs: Reorganize model dev guide sidenav (#7713)
- 9936984 fix: properly display group metrics in metrics tab charts [WEB-1604] (#7727)
- 15e150b fix: allow zeroes for user agent id and group agent id (#7730)
- 8c84750 fix: catch correct import error and set tensorboard logging to false for --test --local (#7715)
- b6f4f30 fix: allow NodeInformer to fail with permission error [DET-9772] (#7703)
- 00a0bc3 fix(cli): not found errors should retain useful context (#7733)
- 688ea88 fix: fix failing e2e_cpu tests (#7734)
- 613e0ce chore: New constructor for Determined objects using existing session. (#7663)
- 108ffea fix: backfilled tasks weren't seen as trial tasks (#7729)
- 1fcd2f9 docs: Add user guides to the Documentation section (#7721)
- d20f577 fix: changing x axis type should reset any current custom zoom (#7728)
- ef5ae83 chore: update determined cli to handle timestamp format for external jobs (#7668)
- 2eadef1 feat: show external jobs on the resource pool page (#7666)
- 2a570bb chore: crash cluster given RM crash (#7621)
- b66ff4a fix: correct GPU name for A100-80GB. (#7724)
- fdddcbf chore: Add nightly tests to release branches (#7720)
- ae6c927 fix: reset chart min/max when changing xaxisdomain (#7719)
- 5e6af2a fix: properly encode metric to keys for LineChart and ParallelCoordinates (#7714)
0.25.0
Release Notes
Changelog
- fea5014 chore: bump version: 0.25.0-rc7 -> 0.25.0
- 3201f27 docs: add release notes for 0.25.0 (#7756)
- 29fbea2 chore: bump version: 0.25.0-rc6 -> 0.25.0-rc7
- 16509f6 test: make error checking case insensitive fixing rbac test (#7749)
- 79a5faa chore: bump version: 0.25.0-rc5 -> 0.25.0-rc6
- 9fde7c7 fix: not all resource pools should be labeled "default" [WEB-1600] (#7744)
- 154168f fix: resource pool card workspace tweaks (#7732)
- d3a42d7 chore: bump version: 0.25.0-rc4 -> 0.25.0-rc5
- 41f9251 fix: React build issue (#7742)
- 1c81e4c chore: bump version: 0.25.0-rc3 -> 0.25.0-rc4
- c4443b3 fix: make rbac messages consistent (#7745)
- 1a83243 fix: allow deletion of workspaces when case-insensitive matches exist (#7738)
- f298ebf fix: properly display group metrics in metrics tab charts [WEB-1604] (#7727)
- 450d1b5 fix: allow zeroes for user agent id and group agent id (#7730)
- fd13b29 chore: bump version: 0.25.0-rc2 -> 0.25.0-rc3
- c121387 chore: bump version: 0.25.0-rc1 -> 0.25.0-rc2
- 1cea783 fix: allow NodeInformer to fail with permission error [DET-9772] (#7703)
- 5d5718f fix(cli): not found errors should retain useful context (#7733)
- 98e0621 fix: backfilled tasks weren't seen as trial tasks (#7729)
- 306baaa fix: changing x axis type should reset any current custom zoom (#7728)
- 406656f chore: bump version: 0.25.0-rc0 -> 0.25.0-rc1
- 00f3af9 fix: correct GPU name for A100-80GB. (#7724)
- 4dd33e4 fix: properly encode metric to keys for LineChart and ParallelCoordinates (#7714)
- a028cd2 chore: Add nightly tests to release branches (#7720)
- 17796e3 fix: reset chart min/max when changing xaxisdomain (#7719)
- 05f808b chore: bump version: 0.25.0-dev0 -> 0.25.0-rc0
- 1cb537e chore: lock published urls to preserve redirects
- 7196181 chore: lock api state for backward compatibility check
- 6e39429 chore: bump version: 0.24.0-dev0 -> 0.25.0-dev0
- 1c8ce3f fix: add missing workspace_id from get_templates (#7706)
- efdc70b fix: code cleanup for mapx unit test (#7710)
- 0a34529 fix: add unit test cases for mapx methods Values and Clear (#7699)
- 1a4bee4 feat:
det deploy gcpsupport for a2-ultragpu and g2-standard. (#7702) - 2fd2535 fix: users can see inaccessible RPs (#7707)
- e83660c fix: rp bindings intg test failure (#7701)
- f1e9b72 chore: Remove estimatortrial (#7700)
- e2f2173 feat: replace clone function with structuredClone and add polyfill (#7624)
- f528cc6 fix: botched rebase/rename in the detached mode. (#7695)
- 08de858 fix: Continue Trial modal does not reset mode [WEB-1566] (#7688)
- f9c5600 fix: error message in jupyter (#7693)
- 2607fd1 fix: patch workspace has duplicate update statements (#7697)
- bf1b87b fix: correct outstanding error in mapx (#7698)
- cdc41e9 fix: add type check to pod spec merge (#7691)
- e340d50 chore: add Values and Clear methods for mapx (#7669)
- 6bc3c68 docs: algolia scraper to scrape only xml (#7690)
- 0977986 docs: fix new release notes (#7694)
- 76134b2 chore: dev cli support for calling master apis (#7462)
- 43c715b docs: add release notes for 0.24.0 (#7680)
- c513e70 chore: add new RBAC permission view external jobs (#7671)
- 128b106 docs: work around bug causing version dropdown to fail (#7685)
- 26559ed feat: check if default resource pools are bound (#7687)
- 7d8fce5 docs: improve writing of the github readme (#7689)
- 857309f feat: add rp bindings permissions (#7673)
- 5c68568 chore: api intg tests [DET-9725] (#7589)
- 1ad812d docs: Improve the GitHub Readme (#7613)
- 8239b19 fix: default pools editable and submittable (#7647) (#7672)
- 4f60d64 chore: unpin click version (#7684)
- ec90842 chore(deps): bump arduino/setup-protoc from 1 to 2 (#7537)
- b4cbe9c chore: enable mask closable by default for drawers (#7676)
- b1f02ed chore: limit reported slots (#7683)
- 75c1f17 fix: tensorflow version for macos (#7679)
- 0a5b406 fix: allow special characters in user manangement filter (#7681)
- 1af32e3 feat: Update user.modified_at when user added or removed from groups (#7665)
- 23a8224 fix: Case-insensitive client-side username search [DET-9770] (#7677)
- a368a55 chore: limit reported slots (#7648)
- 60a07e4 fix: make -C master clean build [DET-9333] (#7660)
- fcf7807 chore: custom metrics group in new experiment list (#7518)
- dc006b1 docs: Fix formatting (#7670)
- 6ee0521 docs: Introduce users to pachyderm w det (#7661)
- 4ef8b20 feat: detached mode v1 / core api v2. (#7060)
- d580ecf fix: allow checkpoints to be GCed without validation metrics and add tests (#7653)
- 204caa5 fix: optional chaining in
extractMetricValue(#7662) - 0ead6ac chore: telemetry actor refactor [DET-9663] (#7585)
- e437bd8 docs: Point to pytorch distributed launcher (#7649)
- 08ff4be docs: fix epoch metrics article (#7643)
- 395aa40 docs: Update resource pool to workspace mapping (#7642)
- 28cbdc2 fix: properly show the pagination for experiment list paged view (#7638)
- 68624f6 chore: avoid creating new table columns for non-legacy metrics (#7656)
- e558063 chore: Add eslint rule for imports to take one line [WEB-1567] (#7650)
- 0bc5dce ci: bump everything to torch==1.11 (#7599)
- 4eb6ebe fix: Project delete/move triggers update of workspace projects list [WEB-1497] [WEB-1377] (#7646)
- 145dd63 ci: indicate GHA run URL when reporting a cherry-pick conflict (#7635)
- 8d2b531 chore: Show Tooltip instead of actions for Default Resource Pools [WEB-1554] (#7644)
- 0bab558 fix: select component width (#7640)
- 32fac33 chore: use eslint rule to avoid relative imports through parent [WEB-1496] (#7637)
- dfd5475 chore: remove unused parseFloat for decoding string metric values (#7641)
- e472ea7 fix: alphabetical binding workspaces and search copy change [WEB-1552, WEB-1553] (#7633)
- 7b96933 fix: properly clear out the settings from the database [WEB-1559] (#7636)
- e9e66b1 fix: fix incorrect return type for downsampled metrics (#7618)
- 930fc9d feat: custom metric groups (formally known as types) [WEB-1469] (#7570)
- 75e93d9 docs: bump rstfmt version (#7611)
- 34c5b5a fix: trigger jobs fetchAll on pagination changes [WEB-1546] (#7602)
- 29e63af Remove say workaround and update version (#7628)
- 0a24176 chore: fix pod-spec merge logic (#7574)
- ce3136a feat: Don't show charts where all series are Loaded(no data) [WEB-1524] (#7609)
- d3c027b feat: OptionsMenu moved to left group (#7623)
- 4d002c4 docs: Add article on how to view epoch metrics (#7504)
- 4327a25 fix: rp binding resolving resource pools (#7629)
- f79e95e ci: fix release branch selection when cherry-picking EE PRs (#7630)
- d6a5c79 fix: Handle metric names finish loading, but still empty (#7634)
- 2844566 chore: support mobile view in UIKit [WEB-1314] (#7626)
- 3ec9c49 fix: button filter text (#7632)
- 0d293b5 fix: ChartGroup vertical spacing (#7631)
- 6fd4b21 feat: replace custom
isEqualto lodashisEqual(#7625) - fa91629 feat: add searcher metric sorting (#7614)
- 5ff0b7d fix: avoid converting workspace name to sentence casing [WEB-1548] (#7622)
- 37259d4 fix: treat searcher metrics value as a number in the ui (#7612)
- bbe70e0 feat: Resource pool tab for workspace (#7582)
- 369ddf3 feat: Copy cell value from experiment list table (#7604)
- 3cb434d fix(actors): trial lifetime must contain allocation lifetime, still (#7615)
- 033a9f6 fix: Single-point tooltip closes when mouse exits chart [WEB-1541] (#7595)
- b8f95ad docs: Add css rule to turn off scrolling when clicking on section links (#7610)
- 9994aa3 fix: code editor height issues (#7573)
- acdd6c4 docs: improve a release note (#7601)
- 37cc9f0 chore: add agent --image-root (#7597)
- 69ab985 refactor: trial's can have one or many tasks [DET-9647] (#7355)
- 7c076b2 ci: fix remote name in PR tracking script (#7607)
- b2ee9b3 fix(actors): create valid fake group actor for checkpoint GC, don't leak it (#7606)
- 776be10 fix: fix checkpoint gc which was incorrectly deleting some checkpoints (#7523)