Add 0.10.0 entry

Joe Stubbs · Joe Stubbs · commit 8a29d6ff3fa3 · 2018-08-02T10:52:50.000-05:00
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,13 +1,34 @@
 # Change Log
 All notable changes to this project will be documented in this file.
 
+## 0.10.0 - 2018-08-01
+### Added
+- New endpoints in the Admin API, `/actors/v2/admin/workers` and ``/actors/v2/admin/executions`, for retrieving data
+about workers and executions, respectively.
+- New `abacosamples/agave_submit_jobs` sample image for submitting a job from an actor.
+
+### Changed
+- Fix issue where Spawner process would crash when receiving a Timeout error from the Docker daemon when a compute node
+was under heavy load.
+- Hardening of various worker actions when compute node is under heavy load, including hardening of stats collection,
+results socket creation and teardown, and actor container stopping. Adds significant improvements to exception handling
+and retry logic in these failure cases, and puts actor in error states when unrecoverable errors are encountered. Among
+other things, these improvements should prevent multiple actor containers from running concurrently under the same
+worker.
+- Numerous improvements to documentation.
+
+### Removed
+- No change
+
+
 ## 0.9.0 - 2018-07-06
 ### Added
 - Extended support for a tenant-specific identity configurations; specifically, enabling use/non-use of
 TAS integration at the tenant level as well as use of global UID and GID.
 
 ### Changed
-- Fixed a reliance
+- Fixed a reliance on the existence of the Internal/everyone role in the JWT; now, if no roles are present in the JWT,
+Abaco inserts the "everyone" role enabling basic access and functionality.
 
 ### Removed
 - No change
@@ -18,10 +39,15 @@ TAS integration at the tenant level as well as use of global UID and GID.
 - Added support for a tenant-specific global_mounts config.
 
 ### Changed
-- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well as on worker nodes in the cluster.
-- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image that could cause the new worker to be killed.
-- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
-- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the actor.
+- Changed RabbitMQ connection handling across all channel objects to greatly reduce cpu load on RabbitMQ server as well
+as on worker nodes in the cluster.
+- Implemented a stop-no-delete command on the command channel to prevent a race condition when updating an actor's image
+that could cause the new worker to be killed.
+- Fixed an issue where Docker fails to report container execution finish time when the compute server is under heavy
+load. In this case, we note return finish_time as computed from the start_time and the run_time (calculated by Abaco).
+- Fixed issues with Actor update: 1) owner can no longer change in case a different user from the original owner
+updates the actor image, 2) last_update_time is always updated, and 3) ensure updater has permanent permissions for the
+actor.
 
 ### Removed
 - No change
@@ -34,26 +60,33 @@ TAS integration at the tenant level as well as use of global UID and GID.
 - Additional fields for each execution are now returned in the executions summary.
 
 ### Changed
-- The routines used when executing an actor container have been simplified to provide better performance and to prevent some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
-- Added several safety guards to the health checker code to prevent crashes of the health checker when working with unexpected data (e.g. when a worker's last_execution is not defined)
+- The routines used when executing an actor container have been simplified to provide better performance and to prevent
+some issues such as stats collection generating a UnixHTTPConnectionPool Readtime when compute server is under load.
+- Added several safety guards to the health checker code to prevent crashes of the health checker when working with
+unexpected data (e.g. when a worker's last_execution is not defined)
 - Fixed bug due to message formatting issue in message returned from a POST to the /workers endpoint.
 
 ### Removed
-- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections providing additional fields for each execution.
+- The 'ids' collection has been removed from the executions endpoint response in favor of an 'executions' collections
+providing additional fields for each execution.
 
 
 ## 0.6.0 - 2018-03-08
 ### Added
 - Add support for binary messages through a FIFO mount to the actor.
-- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket mounted into the actor container and streamed to a Results queue specific to the execution.
+- Add support for a "results" endpoint associated with each execution. Results are read from a Unix Domain socket
+mounted into the actor container and streamed to a Results queue specific to the execution.
 - Read host id from the environment to support dynamic assignment such as when deploying with kubernetes.
-- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the worker had not processed an execution.
+- Add create_time attribute to workers and fix issue with health agents shutting down new workers too quickly if the
+worker had not processed an execution.
 - An actor's state object can now be an arbitrary JSON-serializable object (not just a dictionary).
 
 ### Changed
-- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This distributed commands across multiple spawners better.
+- Messages to add multiple new workers are now sent as multiple messages to the command queue to add 1 worker. This
+distributed commands across multiple spawners better.
 - Default expiration time for Results channels has been increased from 100s to 20 minutes.
-- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload was not a JSON-dictionary.
+- Fixed a bug in the auth check that caused certain POST requests to fail with "not authorized" errors when the payload
+was not a JSON-dictionary.
 - Fixed an issue preventing an actor's state object from being updated correctly.
 
 ### Removed
@@ -64,9 +97,12 @@ TAS integration at the tenant level as well as use of global UID and GID.
 ### Added
 - Fixed issue where permissions errors were giving a confusing message about "unrecognized exception".
 - Fixed bug causing a worker to be added to the workers_store with the wrong worker_id in a narrow case.
-- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker had already stopped.
-- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g., default_environment)
-- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the actor was launched via a nonce.
+- Fixed an issue where the put_sync in the health check was causing messages to be left on the queue when the worker
+had already stopped.
+- Fixed issue where requests to update an actor (i.e., PUT requests) were ignoring certain fields (e.g.,
+default_environment)
+- Fixed bug preventing the Agave OAuth client from being properly instantiated within the actor container when the
+actor was launched via a nonce.
 - Add shutdown_all_workers convenience utility.
 - Several tests added, specifically to validate behavior when invalid inputs were provided.