You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+51-15Lines changed: 51 additions & 15 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,34 @@
1
1
# Change Log
2
2
All notable changes to this project will be documented in this file.
3
3
4
+
## 0.10.0 - 2018-08-01
5
+
### Added
6
+
- New endpoints in the Admin API, `/actors/v2/admin/workers` and ``/actors/v2/admin/executions`, for retrieving data
7
+
about workers and executions, respectively.
8
+
- New `abacosamples/agave_submit_jobs` sample image for submitting a job from an actor.
9
+
10
+
### Changed
11
+
- Fix issue where Spawner process would crash when receiving a Timeout error from the Docker daemon when a compute node
12
+
was under heavy load.
13
+
- Hardening of various worker actions when compute node is under heavy load, including hardening of stats collection,
14
+
results socket creation and teardown, and actor container stopping. Adds significant improvements to exception handling
15
+
and retry logic in these failure cases, and puts actor in error states when unrecoverable errors are encountered. Among
16
+
other things, these improvements should prevent multiple actor containers from running concurrently under the same
17
+
worker.
18
+
- Numerous improvements to documentation.
19
+
20
+
### Removed
21
+
- No change
22
+
23
+
4
24
## 0.9.0 - 2018-07-06
5
25
### Added
6
26
- Extended support for a tenant-specific identity configurations; specifically, enabling use/non-use of
7
27
TAS integration at the tenant level as well as use of global UID and GID.
8
28
9
29
### Changed
10
-
- Fixed a reliance
30
+
- Fixed a reliance on the existence of the Internal/everyone role in the JWT; now, if no roles are present in the JWT,
31
+
Abaco inserts the "everyone" role enabling basic access and functionality.
11
32
12
33
### Removed
13
34
- No change
@@ -18,10 +39,15 @@ TAS integration at the tenant level as well as use of global UID and GID.
18
39
- Added support for a tenant-specific global_mounts config.
19
40
20
41
### Changed
21
-
- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well as on worker nodes in the cluster.
22
-
- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image that could cause the new worker to be killed.
23
-
- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
24
-
- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the actor.
42
+
- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well
43
+
as on worker nodes in the cluster.
44
+
- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image
45
+
that could cause the new worker to be killed.
46
+
- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy
47
+
load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
48
+
- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner
49
+
updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the
50
+
actor.
25
51
26
52
### Removed
27
53
- No change
@@ -34,26 +60,33 @@ TAS integration at the tenant level as well as use of global UID and GID.
34
60
- Additional fields for each execution are now returned in the executions summary.
35
61
36
62
### Changed
37
-
- The routines used when executing an actor container have been simplified to provide better performance and to prevent some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
38
-
- Added several safety guards to the health checker code to prevent crashes of the health checker when working with unexpected data (e.g. when a worker's last_execution is not defined)
63
+
- The routines used when executing an actor container have been simplified to provide better performance and to prevent
64
+
some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
65
+
- Added several safety guards to the health checker code to prevent crashes of the health checker when working with
66
+
unexpected data (e.g. when a worker's last_execution is not defined)
39
67
- Fixed bug due to message formatting issue in message returned from a POST to the /workers endpoint.
40
68
41
69
### Removed
42
-
- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections providing additional fields for each execution.
70
+
- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections
71
+
providing additional fields for each execution.
43
72
44
73
45
74
## 0.6.0 - 2018-03-08
46
75
### Added
47
76
- Add support for binary messages through a FIFO mount to the actor.
48
-
- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket mounted into the actor container and streamed to a Results queue specific to the execution.
77
+
- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket
78
+
mounted into the actor container and streamed to a Results queue specific to the execution.
49
79
- Read host id from the environment to support dynamic assignment such as when deploying with kubernetes.
50
-
- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the worker had not processed an execution.
80
+
- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the
81
+
worker had not processed an execution.
51
82
- An actor's state object can now be an arbitrary JSON-serializable object (not just a dictionary).
52
83
53
84
### Changed
54
-
- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This distributed commands across multiple spawners better.
85
+
- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This
86
+
distributed commands across multiple spawners better.
55
87
- Default expiration time for Results channels has been increased from 100s to 20 minutes.
56
-
- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload was not a JSON-dictionary.
88
+
- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload
89
+
was not a JSON-dictionary.
57
90
- Fixed an issue preventing an actor's state object from being updated correctly.
58
91
59
92
### Removed
@@ -64,9 +97,12 @@ TAS integration at the tenant level as well as use of global UID and GID.
64
97
### Added
65
98
- Fixed issue where permissions errors were giving a confusing message about "unrecognized exception".
66
99
- Fixed bug causing a worker to be added to the workers_store with the wrong worker_id in a narrow case.
67
-
- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker had already stopped.
68
-
- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g., default_environment)
69
-
- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the actor was launched via a nonce.
100
+
- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker
101
+
had already stopped.
102
+
- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g.,
103
+
default_environment)
104
+
- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the
105
+
actor was launched via a nonce.
70
106
- Add shutdown_all_workers convenience utility.
71
107
- Several tests added, specifically to validate behavior when invalid inputs were provided.
0 commit comments