AgendaReclaimer

Timed Process: Agenda Reclaimer

General

Progress monitoring will be performed inside a Lambda.

Source

Environment Vars

All environment variables are required.

Variable	Description / Notes
IDENTITY_URL	The URL of the identity instance to use.
IDM_USER	The IDM user to use
IDM_ENCRYPTED_PASS	The encrypted password. (assumes a KMS encrypted PasswordAES string)
AGENDA_PROGRESS_URL	The URL of the AgendaProgress endpoint

Parameters

Field	Type	Required	Default	Description / Notes
maximumExecutionSeconds	int	N	60	The maximum amount of time the execution should run for. This will not result in an absolute exit at the specified time (fuzzy).This time should be less than the lambda's own execution timeout!
tableName	String	Y		The name of the table to scan items from
idFieldName	String	Y		The name of the id field in the table to scan
timeFieldName	String	Y		The name of the time field in the table to scan
reclaimAgeMinutes	int	N	0	The age in minutes an object must be for reclaim consideration. Reclaim objects where: * ProcessingState = Executing * now - reclaimAgeMinutes > timeField value
scanDelayMillis	long	N	0	The number of milliseconds to delay between scan calls (dynamo has capacity limits to contend with).
targetBatchSize	int	N	50	The number of objects to attempt to gather before attempting to reclaim.
objectScanLimit	int	N	50	The number of objects to allow dynamo to scan before returning (may return 0 to objectScanLimit items)
logReclaimOnly	bool	N	false	Flag indicating if the reclaim actions should be logged only

Sample Cloudwatch Json

{
"maximumExecutionSeconds":60,
"agendaProgressEndpointURL":"https://www.myurl.com",
"idFieldName":"id",
"maximumExecutionSeconds":120,
"objectScanLimit":25,
"reclaimAgeMinutes":4320,
"scanDelayMillis":500,
"tableName":"POP-ProgressAgenda-dev",
"targetBatchSize":50,
"timeFieldName":"updatedTime"
}

Progress Timeout Fail

General

When the updated time on an incomplete (started) AgendaProgress passes some threshold we should mark the work as failed (with an indicator of timeout).

Consider adding in a response/evaluation so if an executor does somehow come back to life it knows the progress was failed due to timeout.

Evaluation

Check for EXECUTING AgendaProgress objects with an updateTime < threshold time.

Action

Mark the AgendaProgress as follows:

ProcessingState: Complete
ProcessingStateMessage: failed (CompleteStateMessage)
Add a DiagnosticEvent describing the timeout and associated details.

Re-evaluate the TaskCallback evaluation of a failed item. Make sure it is not only scanning the OperationProgress elements for failure and DiagnosticEvents.

The associated Task should be marked as failed.

Progress Timeout Reclaim / Retry

This is an extended implementation of the progress timeout above.

Related: Agenda Retry

General

When the updated time on an incomplete (started) AgendaProgress passes some threshold we should attempt to restart the work.

OR

If an Agenda fails attempt to retry it.

Evaluation

Check for AgendaProgress objects that fail/timeout.

Timeout

Retry if:

Executing AgendaProgress objects with an updateTime < threshold time.
De-queued AgendaProgress objects with an updateTime < threshold time ( this state does not exist yet ).

Failure

Retry if:

AgendaProgress indicates failure
If some number of attempts have not been made yet (implies a counter is on the AgendaProgress object)
Failure is retryable (this requires the handlers and executor to provide some hints)

Action

Reclaimer
- Update the Agenda Progress to indicate it is a retry attempt
- Re-add to ready-agenda (should have some kind of indicator of priority to jump past other work when the queue is next populated)
Executor
- Pull the existing Agenda Progress for existing operation progress (is there a flag indicating this?)
- Populate the in-memory payloads for the completed operations
- Restart any failed operations (is there any information passed to the handler? – we should for the sake of things like jobId)
Handler
- Detect existing progress and proceed accordingly

Development / Issues To Resolve

There is no AgendaProgress ProcessingState for "removed from queue but not yet started" - for the sake of timeout this gap should be covered.
The first iteration could just be run the Agenda again vs. attempting to continue from the point of failure (or be parallel efforts).

Priority Operation Processing

Submission
Scheduling
- Binding Agenda to a Queue
Execution
the ResourcePool
- Puller
- Executor
- Handler
  - Pod Handler
    - Sample
  - Executor Handler
    - Http Caller
    - Logger

Data Object API

Agenda
the workflow
Agenda Template
the workflow definition
Customer
Insight
the scheduling queue definition
Operation Progress
the state of the running Agenda operations
Progress
the state of the running Agendas
ResourcePool
the processing resources

Service Agenda

Agenda Service
the workflow submission
Progress Service
rolled up agenda progress summary
ResourcePool Service
getting work and updating progress

Timed Processes

AgendaReclaimer
restarting stuck Agendas
AgendaRetry
retrying failed Agendas
DataObjectReaper
reaping expired data objects
PodReaper
reaping stuck Kubernetes pods

Installation

Install

Development

DevKubernetesSetup
RunLocalExecution
- RunWithMiniKube

Demo / Examples

SampleAuthorizer
SampleHandler

AgendaReclaimer

Timed Process: Agenda Reclaimer

General

Source

Environment Vars

Parameters

Sample Cloudwatch Json

Progress Timeout Fail

General

Evaluation

Action

Progress Timeout Reclaim / Retry

General

Evaluation

Timeout

Failure

Action

Development / Issues To Resolve

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Priority Operation Processing

Data Object API

Service Agenda

Timed Processes

Installation

Development

Demo / Examples

Clone this wiki locally