|
| 1 | +# Async ENI Task Queue Design |
| 2 | + |
| 3 | +## Overview |
| 4 | + |
| 5 | +The Async ENI Task Queue is designed to decouple the blocking ENI attachment operations from the main Node reconciliation loop in the Terway controller. This improves the responsiveness of the controller and prevents blocking the reconciliation worker threads during slow Aliyun API calls. |
| 6 | + |
| 7 | +## Key Components |
| 8 | + |
| 9 | +### 1. ENITaskQueue (`pkg/controller/multi-ip/node/eni_task_queue.go`) |
| 10 | + |
| 11 | +The core component that manages the lifecycle of async ENI operations. |
| 12 | + |
| 13 | +- **In-Memory Queue**: Stores task state (`ENITaskRecord`) keyed by ENI ID. |
| 14 | +- **Async Processing**: Uses goroutines to handle individual attach tasks. |
| 15 | +- **Notification**: Signals the controller via a channel when a task completes. |
| 16 | + |
| 17 | +### 2. Executor (`pkg/eni/ops/executor.go`) |
| 18 | + |
| 19 | +Provides the low-level ENI operations. |
| 20 | + |
| 21 | +- **AttachAsync**: Initiates the attach operation via Aliyun API (non-blocking). |
| 22 | +- **CheckStatus**: Checks the current status of an ENI. |
| 23 | +- **Wait Logic**: Handles backoff and polling for status changes. |
| 24 | + |
| 25 | +## Workflow |
| 26 | + |
| 27 | +### 1. Submission |
| 28 | + |
| 29 | +When the controller determines a new ENI is needed: |
| 30 | + |
| 31 | +1. It creates the ENI via OpenAPI (blocking, as it's fast). |
| 32 | +2. It calls `SubmitAttach` to queue the attach operation. |
| 33 | +3. The task is added to the map with `Pending` status. |
| 34 | +4. A background goroutine is started for the task. |
| 35 | +5. The Node CR status is optimistically updated to `Attaching`. |
| 36 | + |
| 37 | +### 2. Processing (`processAttachTask`) |
| 38 | + |
| 39 | +The background goroutine performs the following steps: |
| 40 | + |
| 41 | +1. **Status Check**: Verifies the current ENI status. If already `InUse`, marks as `Completed` (handles controller restarts). |
| 42 | +2. **Initiate Attach**: Calls `AttachAsync` if needed. |
| 43 | +3. **Wait**: Sleeps for an initial delay (based on ENI type). |
| 44 | +4. **Poll**: Polls the API until the status becomes `InUse` or timeout. |
| 45 | +5. **Completion**: Updates the task status to `Completed` or `Failed` and notifies the controller. |
| 46 | + |
| 47 | +### 3. Reconciliation (`syncTaskQueueStatus`) |
| 48 | + |
| 49 | +In the main `Reconcile` loop: |
| 50 | + |
| 51 | +1. The controller calls `GetCompletedTasks`. |
| 52 | +2. Completed tasks are **removed** from the queue. |
| 53 | +3. The Node CR is updated with the result (e.g., `InUse` status, IP details). |
| 54 | + |
| 55 | +## State Machine |
| 56 | + |
| 57 | +- **Pending**: Task submitted, waiting to start. |
| 58 | +- **Running**: Goroutine started, operation in progress. |
| 59 | +- **Completed**: Operation successful. |
| 60 | +- **Failed**: Operation failed (API error). |
| 61 | +- **Timeout**: Operation timed out. |
| 62 | + |
| 63 | +## Reliability & Idempotency |
| 64 | + |
| 65 | +- **Duplicate Submission**: `SubmitAttach` ignores tasks that are already `Pending` or `Running`. |
| 66 | +- **Controller Restarts**: The `processAttachTask` first checks the actual ENI status. If the ENI was attached during a previous run (but status wasn't updated), it detects this and completes immediately. |
| 67 | +- **Reconciliation Loop**: The controller periodically reconciles and checks for completed tasks, ensuring the CR state eventually matches the actual state. |
0 commit comments