Skip to content

Commit 8a29d6f

Browse files
author
Joe Stubbs
committed
Add 0.10.0 entry
1 parent b4d0b21 commit 8a29d6f

File tree

1 file changed

+51
-15
lines changed

1 file changed

+51
-15
lines changed

CHANGELOG.md

Lines changed: 51 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,34 @@
11
# Change Log
22
All notable changes to this project will be documented in this file.
33

4+
## 0.10.0 - 2018-08-01
5+
### Added
6+
- New endpoints in the Admin API, `/actors/v2/admin/workers` and ``/actors/v2/admin/executions`, for retrieving data
7+
about workers and executions, respectively.
8+
- New `abacosamples/agave_submit_jobs` sample image for submitting a job from an actor.
9+
10+
### Changed
11+
- Fix issue where Spawner process would crash when receiving a Timeout error from the Docker daemon when a compute node
12+
was under heavy load.
13+
- Hardening of various worker actions when compute node is under heavy load, including hardening of stats collection,
14+
results socket creation and teardown, and actor container stopping. Adds significant improvements to exception handling
15+
and retry logic in these failure cases, and puts actor in error states when unrecoverable errors are encountered. Among
16+
other things, these improvements should prevent multiple actor containers from running concurrently under the same
17+
worker.
18+
- Numerous improvements to documentation.
19+
20+
### Removed
21+
- No change
22+
23+
424
## 0.9.0 - 2018-07-06
525
### Added
626
- Extended support for a tenant-specific identity configurations; specifically, enabling use/non-use of
727
TAS integration at the tenant level as well as use of global UID and GID.
828

929
### Changed
10-
- Fixed a reliance
30+
- Fixed a reliance on the existence of the Internal/everyone role in the JWT; now, if no roles are present in the JWT,
31+
Abaco inserts the "everyone" role enabling basic access and functionality.
1132

1233
### Removed
1334
- No change
@@ -18,10 +39,15 @@ TAS integration at the tenant level as well as use of global UID and GID.
1839
- Added support for a tenant-specific global_mounts config.
1940

2041
### Changed
21-
- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well as on worker nodes in the cluster.
22-
- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image that could cause the new worker to be killed.
23-
- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
24-
- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the actor.
42+
- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well
43+
as on worker nodes in the cluster.
44+
- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image
45+
that could cause the new worker to be killed.
46+
- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy
47+
load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
48+
- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner
49+
updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the
50+
actor.
2551

2652
### Removed
2753
- No change
@@ -34,26 +60,33 @@ TAS integration at the tenant level as well as use of global UID and GID.
3460
- Additional fields for each execution are now returned in the executions summary.
3561

3662
### Changed
37-
- The routines used when executing an actor container have been simplified to provide better performance and to prevent some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
38-
- Added several safety guards to the health checker code to prevent crashes of the health checker when working with unexpected data (e.g. when a worker's last_execution is not defined)
63+
- The routines used when executing an actor container have been simplified to provide better performance and to prevent
64+
some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
65+
- Added several safety guards to the health checker code to prevent crashes of the health checker when working with
66+
unexpected data (e.g. when a worker's last_execution is not defined)
3967
- Fixed bug due to message formatting issue in message returned from a POST to the /workers endpoint.
4068

4169
### Removed
42-
- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections providing additional fields for each execution.
70+
- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections
71+
providing additional fields for each execution.
4372

4473

4574
## 0.6.0 - 2018-03-08
4675
### Added
4776
- Add support for binary messages through a FIFO mount to the actor.
48-
- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket mounted into the actor container and streamed to a Results queue specific to the execution.
77+
- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket
78+
mounted into the actor container and streamed to a Results queue specific to the execution.
4979
- Read host id from the environment to support dynamic assignment such as when deploying with kubernetes.
50-
- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the worker had not processed an execution.
80+
- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the
81+
worker had not processed an execution.
5182
- An actor's state object can now be an arbitrary JSON-serializable object (not just a dictionary).
5283

5384
### Changed
54-
- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This distributed commands across multiple spawners better.
85+
- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This
86+
distributed commands across multiple spawners better.
5587
- Default expiration time for Results channels has been increased from 100s to 20 minutes.
56-
- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload was not a JSON-dictionary.
88+
- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload
89+
was not a JSON-dictionary.
5790
- Fixed an issue preventing an actor's state object from being updated correctly.
5891

5992
### Removed
@@ -64,9 +97,12 @@ TAS integration at the tenant level as well as use of global UID and GID.
6497
### Added
6598
- Fixed issue where permissions errors were giving a confusing message about "unrecognized exception".
6699
- Fixed bug causing a worker to be added to the workers_store with the wrong worker_id in a narrow case.
67-
- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker had already stopped.
68-
- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g., default_environment)
69-
- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the actor was launched via a nonce.
100+
- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker
101+
had already stopped.
102+
- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g.,
103+
default_environment)
104+
- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the
105+
actor was launched via a nonce.
70106
- Add shutdown_all_workers convenience utility.
71107
- Several tests added, specifically to validate behavior when invalid inputs were provided.
72108

0 commit comments

Comments
 (0)