Skip to content

Commit 2445789

Browse files
authored
feat: Create edge-function-shutdown-reasons-explained.mdx (supabase#39980)
* Create edge-function-shutdown-reasons-explained.mdx * Make reviewdog happy * Update Rule003Spelling.toml * prettier
1 parent a107333 commit 2445789

File tree

2 files changed

+262
-0
lines changed

2 files changed

+262
-0
lines changed
Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
---
2+
title = "Edge Function shutdown reasons explained"
3+
topics = [ "functions" ]
4+
keywords = [ "shutdown", "termination", "event loop", "wall clock", "cpu time", "memory", "early drop" ]
5+
database_id = ""
6+
7+
[[errors]]
8+
http_status_code = 546
9+
message = "Edge Function 'wall clock time limit reached'"
10+
11+
[[errors]]
12+
message = "Edge Function terminated due to memory limit"
13+
14+
[[errors]]
15+
message = "Edge Function terminated due to CPU limit"
16+
---
17+
18+
Learn about the different reasons why an Edge Function (worker) might shut down and how to handle each scenario effectively.
19+
20+
## Understanding shutdown events
21+
22+
When an Edge Function stops executing, the runtime emits a `ShutdownEvent` with a specific `reason` that explains why the worker ended. Understanding these shutdown reasons helps you diagnose issues, optimize your functions, and build more resilient serverless applications.
23+
24+
These events are surfaced through logs and observability tools, allowing you to monitor function behavior and take appropriate action such as implementing retries, adjusting resource limits, or optimizing your code.
25+
26+
## Shutdown reasons
27+
28+
### EventLoopCompleted
29+
30+
**What it means:** The function's event loop finished naturally. There are no more pending tasks, timers, or microtasks. The worker completed all scheduled work successfully.
31+
32+
**When it happens:** This is the normal, graceful shutdown scenario. All synchronous code has executed, and all awaited promises have resolved.
33+
34+
**What to do:** Nothing special. This indicates successful completion. No retry is needed.
35+
36+
### WallClockTime
37+
38+
**What it means:** The worker exceeded the configured wall-clock timeout. This measures the total elapsed real time from when the function started, including time spent waiting for I/O, external API calls, and sleeps.
39+
40+
**When it happens:** Your function is taking too long to complete, regardless of how much actual computing it's doing. Currently, the wall clock limit is set at 400 seconds.
41+
42+
**What to do:**
43+
44+
- Break down long-running tasks into smaller functions
45+
- Use streaming responses for operations that take time
46+
- Consider moving long-running work to background jobs or queues
47+
- Ensure your code handles partial completion gracefully
48+
- Make operations idempotent so they can safely retry from the beginning
49+
50+
**Related:** See [Edge Function 'wall clock time limit reached'](./edge-function-wall-clock-time-limit-reached-Nk38bW) for more details.
51+
52+
### CPUTime
53+
54+
**What it means:** The worker consumed more CPU time than allowed. CPU time measures actual processing cycles used by your code, excluding time spent waiting for I/O or sleeping. Currently limited to 200 milliseconds.
55+
56+
**When it happens:** Your function is performing too much computation. This includes complex calculations, data processing, encryption, or other CPU-intensive operations.
57+
58+
**What to do:**
59+
60+
- Profile your code to identify CPU-intensive sections
61+
- Optimize algorithms for better performance
62+
- Consider caching computed results
63+
- Move heavy processing to background jobs or external services
64+
- Break large datasets into smaller chunks
65+
- Use more efficient data structures
66+
67+
### Memory
68+
69+
**What it means:** The worker's memory usage exceeded the allowed limit. The `ShutdownEvent` includes detailed memory data showing total memory, heap usage, and external allocations.
70+
71+
**When it happens:** Your function is consuming too much RAM. This commonly happens when buffering large files, loading entire datasets into memory, or creating many objects without cleanup.
72+
73+
**What to do:**
74+
75+
- Use streaming instead of buffering entire files or responses
76+
- Process data in chunks rather than loading everything at once
77+
- Be mindful of closures and variables that might prevent garbage collection
78+
- Avoid accumulating large arrays or objects
79+
- Review the `memory_used` fields in logs to identify whether heap or external memory is the issue
80+
81+
### EarlyDrop
82+
83+
**What it means:** The runtime detected that the function has completed all its work and can be shut down early, before reaching any resource limits. This is actually the most common shutdown reason and typically indicates efficient function execution.
84+
85+
**When it happens:** Your function has finished processing the request, sent the response, and has no remaining async work (pending promises, timers, or callbacks). The runtime recognizes that the worker can be safely terminated without waiting for timeouts or other limits.
86+
87+
**Why this is good:** `EarlyDrop` means your function is running efficiently. It completed quickly, didn't exhaust resources, and the runtime could reclaim the worker for other requests. Most well-designed functions should end with `EarlyDrop`.
88+
89+
**What to do:**
90+
91+
- Nothing: this is the desired outcome for most functions
92+
- If you're seeing `EarlyDrop` but expected more work to happen, check for:
93+
- Promises that weren't properly awaited
94+
- Event listeners or timers that weren't cleaned up
95+
- Background tasks that should have completed but didn't
96+
- If you intentionally have background work that should continue after the response, make sure those promises are awaited before the function returns
97+
98+
### TerminationRequested
99+
100+
**What it means:** An external request explicitly asked the runtime to terminate the worker. This could be from orchestration systems, manual intervention, platform updates, deployments, or a user-initiated cancellation.
101+
102+
**When it happens:** The platform needs to stop your function immediately, often during deployments or infrastructure maintenance.
103+
104+
**What to do:**
105+
106+
- Implement graceful cleanup code, but expect it might not always run
107+
- Design for abrupt termination. Use durable storage to persist work in progress
108+
- Make operations atomic or use transactions where possible
109+
- Track execution state externally so incomplete work can be detected and resumed
110+
- Test your function's behavior when forcibly stopped
111+
112+
## Additional diagnostic events
113+
114+
While not shutdown reasons themselves, these events provide important context:
115+
116+
- **BootEvent / BootFailure:** Indicates whether a worker started successfully or failed during initialization
117+
- **UncaughtException:** Signals an unhandled error with the exception message and CPU time used
118+
- **LogEvent:** Runtime-emitted logs with severity levels (Debug, Info, Warning, Error)
119+
- **WorkerMemoryUsed:** Detailed memory snapshot including total, heap, external memory, and memory checker data
120+
121+
## Shutdown event metadata
122+
123+
Each `ShutdownEvent` includes valuable diagnostic information:
124+
125+
- `reason`: One of the shutdown reasons described above
126+
- `cpu_time_used`: Amount of CPU time consumed (in milliseconds)
127+
- `memory_used`: Memory snapshot at shutdown with breakdown of total, heap, and external memory
128+
- `execution_id`: Unique identifier for tracking this specific execution across logs and retries
129+
130+
## Best practices for resilient functions
131+
132+
### 1. Design for idempotency
133+
134+
Make your functions safe to run multiple times with the same input. Use execution IDs from metadata to detect duplicate runs and avoid repeating side effects.
135+
136+
```typescript
137+
// Store execution_id to detect retries
138+
const executionId = Deno.env.get('EXECUTION_ID')
139+
const alreadyProcessed = await checkIfProcessed(executionId)
140+
141+
if (alreadyProcessed) {
142+
return new Response('Already processed', { status: 200 })
143+
}
144+
```
145+
146+
### 2. Implement checkpointing.
147+
148+
Save progress frequently to durable storage
149+
150+
```typescript
151+
// Save progress incrementally
152+
for (const batch of dataBatches) {
153+
await processBatch(batch)
154+
await saveProgress(batchId)
155+
}
156+
```
157+
158+
### 3. Use streaming for large data
159+
160+
Avoid loading entire files or responses into memory. Stream data to reduce memory footprint.
161+
162+
```typescript
163+
// Stream responses instead of buffering
164+
return new Response(readableStream, {
165+
headers: { 'Content-Type': 'application/json' },
166+
})
167+
```
168+
169+
### 4. Monitor and alert
170+
171+
Track shutdown reasons in your observability system. Set up alerts for:
172+
173+
- Frequent `Memory` shutdowns: investigate memory usage patterns
174+
- Frequent `CPUTime` shutdowns: optimize computational work
175+
- Frequent `WallClockTime` shutdowns: reduce latency or break up work
176+
- Frequent `EarlyDrop` or `TerminationRequested`: check platform scaling and deployment patterns
177+
178+
Access your function logs at [Functions Logs](/dashboard/project/_/functions).
179+
180+
### 5. Handle cleanup gracefully
181+
182+
While you can't always rely on cleanup code running, implement it anyway for the cases where graceful shutdown is possible.
183+
184+
```typescript
185+
// Cleanup handler (may not always run)
186+
addEventListener('unload', () => {
187+
// Close connections, flush buffers, etc.
188+
cleanup()
189+
})
190+
```
191+
192+
## Example shutdown event payloads
193+
194+
**Normal completion:**
195+
196+
```json
197+
{
198+
"event": {
199+
"Shutdown": {
200+
"reason": "EventLoopCompleted",
201+
"cpu_time_used": 12,
202+
"memory_used": {
203+
"total": 1048576,
204+
"heap": 512000,
205+
"external": 1000
206+
}
207+
}
208+
},
209+
"metadata": {
210+
"execution_id": "4b6a4e2e-7c4d-4f8b-9e1a-2d3c4e5f6a7b"
211+
}
212+
}
213+
```
214+
215+
**Wall-clock timeout:**
216+
217+
```json
218+
{
219+
"event": {
220+
"Shutdown": {
221+
"reason": "WallClockTime",
222+
"cpu_time_used": 50,
223+
"memory_used": {
224+
"total": 2097152,
225+
"heap": 1024000,
226+
"external": 5000
227+
}
228+
}
229+
},
230+
"metadata": {
231+
"execution_id": "5c7b5f3f-8d5e-5g9c-0f2b-3e4d5f6g7h8i"
232+
}
233+
}
234+
```
235+
236+
## Troubleshooting checklist
237+
238+
Use this quick reference when investigating shutdown issues:
239+
240+
| Shutdown Reason | Primary Action |
241+
| ---------------------------------------------- | --------------------------------------------------------------------------------------- |
242+
| Many `Memory` shutdowns | Switch to streaming; process data in chunks; investigate heap vs external allocations |
243+
| Many `CPUTime` shutdowns | Optimize algorithms; cache results; move heavy compute to background workers |
244+
| Many `WallClockTime` shutdowns | Reduce I/O waits; use async operations; break into smaller functions |
245+
| Frequent `EarlyDrop` or `TerminationRequested` | Check platform scaling policies; review deployment logs; implement better checkpointing |
246+
247+
## Additional resources
248+
249+
- [Debugging Edge Functions](/docs/guides/functions/logging)
250+
- [Edge Function limits](/docs/guides/functions/limits)
251+
- [Inspecting Edge Function environment variables](./inspecting-edge-function-environment-variables-wg5qOQ)

supa-mdx-lint/Rule003Spelling.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,8 @@ allow_list = [
2727
"[Bb]ackend",
2828
"[Bb]ackoff",
2929
"[Bb]lockchains?",
30+
"BootEvent",
31+
"BootFailure",
3032
"[Bb]reakpoints?",
3133
"[Bb]uilt-ins?",
3234
"[Cc]hangelogs?",
@@ -37,6 +39,7 @@ allow_list = [
3739
"[Cc]ooldowns?",
3840
"[Cc]oroutines?",
3941
"ComposeAuth",
42+
"CPUTime",
4043
"[Cc]ron",
4144
"[Cc]rypto",
4245
"[Cc]ryptography",
@@ -49,10 +52,12 @@ allow_list = [
4952
"[Dd]evs?",
5053
"[Dd]iff(s|ing|ed)?",
5154
"[Dd]ropdown",
55+
"EarlyDrop",
5256
"[Ee]nqueues?",
5357
"[Ee]ntrypoints?",
5458
"[Ee]nums?",
5559
"[Ee]nv",
60+
"EventLoopCompleted",
5661
"[Ee]x",
5762
"[Ee]xecutables?",
5863
"[Ff]atals",
@@ -69,10 +74,12 @@ allow_list = [
6974
"[KMG]bps",
7075
"[KMG]iB",
7176
"[Ll]iveness",
77+
"LogEvent",
7278
"[Mm]atryoshka",
7379
"[Mm]essageBird",
7480
"MetaMask",
7581
"[Mm]icroservices?",
82+
"microtasks",
7683
"[Mm]iddlewares?",
7784
"[Mm]onorepos?",
7885
"[Mm]ultimodal",
@@ -113,14 +120,18 @@ allow_list = [
113120
"[Ss]ubfolders?",
114121
"[Ss]ubmodules?",
115122
"[Ss]wappiness",
123+
"TerminationRequested",
116124
"[Tt]imebox(ed)?",
117125
"[Tt]odos?",
118126
"[Tt]radeoffs?",
127+
"UncaughtException",
119128
"[Uu]nlink(ing|s|ed)?",
120129
"[Uu]pserts?",
121130
"[Uu]ptime",
122131
"[Ww]aitlists?",
132+
"WallClockTime",
123133
"[Ww]ebhooks?",
134+
"WorkerMemoryUsed",
124135
"Airbyte",
125136
"AndroidX",
126137
"AppleAuthentication",

0 commit comments

Comments
 (0)