Skip to content

Commit 797cab9

Browse files
committed
revert: restore TAPI wiki files to keep PR focused
Will clean up wiki/ in a separate PR
1 parent fde90a5 commit 797cab9

File tree

6 files changed

+1864
-0
lines changed

6 files changed

+1864
-0
lines changed

wiki/tapi-backoff-plan.md

Lines changed: 395 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,395 @@
1+
# Implementation Plan: TAPI Backoff & Rate Limiting for React Native SDK
2+
3+
## Overview
4+
5+
Implement exponential backoff and 429 rate-limiting strategy per the TAPI Backoff SDD to handle API overload during high-traffic events (holidays, Super Bowl, etc.). The implementation adds:
6+
7+
1. **Global rate limiting** for 429 responses (blocks entire upload pipeline)
8+
2. **Per-batch exponential backoff** for transient errors (5xx, 408, etc.)
9+
3. **Upload gate pattern** (no timers, state-based flow control)
10+
4. **Configurable via Settings CDN** (dynamic updates without deployments)
11+
5. **Persistent state** across app restarts (AsyncStorage)
12+
13+
## Key Architectural Decisions
14+
15+
### Decision 1: Two-Component Architecture
16+
17+
- **UploadStateMachine**: Manages global READY/WAITING states for 429 rate limiting
18+
- **BatchUploadManager**: Handles per-batch retry metadata and exponential backoff
19+
- Both use Sovran stores for persistence, integrate into SegmentDestination
20+
21+
### Decision 2: Upload Gate Pattern
22+
23+
- No timers/schedulers to check state
24+
- Check `canUpload()` at flush start, return early if in WAITING state
25+
- State transitions on response (429 → WAITING, success → READY)
26+
27+
### Decision 3: Sequential Batch Processing
28+
29+
- Change from `Promise.all()` (parallel) to `for...of` loop (sequential)
30+
- Required by SDD: "429 responses cause immediate halt of upload loop"
31+
- Transient errors (5xx) don't block remaining batches
32+
33+
### Decision 4: Authentication & Headers
34+
35+
- **Authorization header**: Add `Basic ${base64(writeKey + ':')}` header
36+
- **Keep writeKey in body**: Backwards compatibility with TAPI
37+
- **X-Retry-Count header**: Send per-batch count when available, global count for 429
38+
39+
### Decision 5: Logging Strategy
40+
41+
- Verbose logging for all retry events (state transitions, backoff delays, drops)
42+
- Use existing `analytics.logger.info()` and `.warn()` infrastructure
43+
- Include retry count, backoff duration, and error codes in logs
44+
45+
## Implementation Steps
46+
47+
### Step 1: Add Type Definitions
48+
49+
**File**: `/packages/core/src/types.ts`
50+
51+
Add new interfaces to existing types:
52+
53+
```typescript
54+
// HTTP Configuration from Settings CDN
55+
export type HttpConfig = {
56+
rateLimitConfig?: RateLimitConfig;
57+
backoffConfig?: BackoffConfig;
58+
};
59+
60+
export type RateLimitConfig = {
61+
enabled: boolean;
62+
maxRetryCount: number;
63+
maxRetryInterval: number; // seconds
64+
maxTotalBackoffDuration: number; // seconds
65+
};
66+
67+
export type BackoffConfig = {
68+
enabled: boolean;
69+
maxRetryCount: number;
70+
baseBackoffInterval: number; // seconds
71+
maxBackoffInterval: number; // seconds
72+
maxTotalBackoffDuration: number; // seconds
73+
jitterPercent: number; // 0-100
74+
retryableStatusCodes: number[];
75+
};
76+
77+
// Update SegmentAPISettings to include httpConfig
78+
export type SegmentAPISettings = {
79+
integrations: SegmentAPIIntegrations;
80+
edgeFunction?: EdgeFunctionSettings;
81+
middlewareSettings?: {
82+
routingRules: RoutingRule[];
83+
};
84+
metrics?: MetricsOptions;
85+
consentSettings?: SegmentAPIConsentSettings;
86+
httpConfig?: HttpConfig; // NEW
87+
};
88+
89+
// State machine persistence
90+
export type UploadStateData = {
91+
state: 'READY' | 'WAITING';
92+
waitUntilTime: number; // timestamp ms
93+
globalRetryCount: number;
94+
firstFailureTime: number | null; // timestamp ms
95+
};
96+
97+
// Per-batch retry metadata
98+
export type BatchMetadata = {
99+
batchId: string;
100+
events: SegmentEvent[]; // Store events to match batches
101+
retryCount: number;
102+
nextRetryTime: number; // timestamp ms
103+
firstFailureTime: number; // timestamp ms
104+
};
105+
106+
// Error classification result
107+
export type ErrorClassification = {
108+
isRetryable: boolean;
109+
errorType: 'rate_limit' | 'transient' | 'permanent';
110+
retryAfterSeconds?: number;
111+
};
112+
```
113+
114+
### Step 2: Add Default Configuration
115+
116+
**File**: `/packages/core/src/constants.ts`
117+
118+
Add default httpConfig:
119+
120+
```typescript
121+
export const defaultHttpConfig: HttpConfig = {
122+
rateLimitConfig: {
123+
enabled: true,
124+
maxRetryCount: 100,
125+
maxRetryInterval: 300,
126+
maxTotalBackoffDuration: 43200, // 12 hours
127+
},
128+
backoffConfig: {
129+
enabled: true,
130+
maxRetryCount: 100,
131+
baseBackoffInterval: 0.5,
132+
maxBackoffInterval: 300,
133+
maxTotalBackoffDuration: 43200,
134+
jitterPercent: 10,
135+
retryableStatusCodes: [408, 410, 429, 460, 500, 502, 503, 504, 508],
136+
},
137+
};
138+
```
139+
140+
### Step 3: Enhance Error Classification
141+
142+
**File**: `/packages/core/src/errors.ts`
143+
144+
Add new functions to existing error handling:
145+
146+
```typescript
147+
/**
148+
* Classifies HTTP errors per TAPI SDD tables
149+
*/
150+
export const classifyError = (
151+
statusCode: number,
152+
retryableStatusCodes: number[] = [408, 410, 429, 460, 500, 502, 503, 504, 508]
153+
): ErrorClassification => {
154+
// 429 rate limiting
155+
if (statusCode === 429) {
156+
return {
157+
isRetryable: true,
158+
errorType: 'rate_limit',
159+
};
160+
}
161+
162+
// Retryable transient errors
163+
if (retryableStatusCodes.includes(statusCode)) {
164+
return {
165+
isRetryable: true,
166+
errorType: 'transient',
167+
};
168+
}
169+
170+
// Non-retryable (400, 401, 403, 404, 413, 422, 501, 505, etc.)
171+
return {
172+
isRetryable: false,
173+
errorType: 'permanent',
174+
};
175+
};
176+
177+
/**
178+
* Parses Retry-After header value
179+
* Supports both seconds (number) and HTTP date format
180+
*/
181+
export const parseRetryAfter = (
182+
retryAfterValue: string | null,
183+
maxRetryInterval: number = 300
184+
): number | undefined => {
185+
if (!retryAfterValue) return undefined;
186+
187+
// Try parsing as integer (seconds)
188+
const seconds = parseInt(retryAfterValue, 10);
189+
if (!isNaN(seconds)) {
190+
return Math.min(seconds, maxRetryInterval);
191+
}
192+
193+
// Try parsing as HTTP date
194+
const retryDate = new Date(retryAfterValue);
195+
if (!isNaN(retryDate.getTime())) {
196+
const secondsUntil = Math.ceil((retryDate.getTime() - Date.now()) / 1000);
197+
return Math.min(Math.max(secondsUntil, 0), maxRetryInterval);
198+
}
199+
200+
return undefined;
201+
};
202+
```
203+
204+
### Step 4: Update HTTP API Layer
205+
206+
**File**: `/packages/core/src/api.ts`
207+
208+
Modify uploadEvents to support retry headers and return full Response:
209+
210+
```typescript
211+
export const uploadEvents = async ({
212+
writeKey,
213+
url,
214+
events,
215+
retryCount = 0, // NEW: for X-Retry-Count header
216+
}: {
217+
writeKey: string;
218+
url: string;
219+
events: SegmentEvent[];
220+
retryCount?: number; // NEW
221+
}): Promise<Response> => {
222+
// Changed from void
223+
// Create Authorization header (Basic auth format)
224+
const authHeader = 'Basic ' + btoa(writeKey + ':');
225+
226+
const response = await fetch(url, {
227+
method: 'POST',
228+
body: JSON.stringify({
229+
batch: events,
230+
sentAt: new Date().toISOString(),
231+
writeKey: writeKey, // Keep in body for backwards compatibility
232+
}),
233+
headers: {
234+
'Content-Type': 'application/json; charset=utf-8',
235+
'Authorization': authHeader, // NEW
236+
'X-Retry-Count': retryCount.toString(), // NEW
237+
},
238+
});
239+
240+
return response; // Return full response (not just void)
241+
};
242+
```
243+
244+
### Step 5: Create Upload State Machine
245+
246+
**New File**: `/packages/core/src/backoff/UploadStateMachine.ts`
247+
248+
(See full implementation in code)
249+
250+
### Step 6: Create Batch Upload Manager
251+
252+
**New File**: `/packages/core/src/backoff/BatchUploadManager.ts`
253+
254+
(See full implementation in code)
255+
256+
### Step 7: Create Barrel Export
257+
258+
**New File**: `/packages/core/src/backoff/index.ts`
259+
260+
```typescript
261+
export { UploadStateMachine } from './UploadStateMachine';
262+
export { BatchUploadManager } from './BatchUploadManager';
263+
```
264+
265+
### Step 8: Integrate into SegmentDestination
266+
267+
**File**: `/packages/core/src/plugins/SegmentDestination.ts`
268+
269+
Major modifications to integrate state machine and batch manager:
270+
271+
(See full implementation in code)
272+
273+
## Testing Strategy
274+
275+
### Unit Tests
276+
277+
1. **Error Classification** (`/packages/core/src/__tests__/errors.test.ts`)
278+
279+
- classifyError() for all status codes in SDD tables
280+
- parseRetryAfter() with seconds, HTTP dates, invalid values
281+
282+
2. **Upload State Machine** (`/packages/core/src/backoff/__tests__/UploadStateMachine.test.ts`)
283+
284+
- canUpload() returns true/false based on state and time
285+
- handle429() sets waitUntilTime and increments counter
286+
- Max retry count enforcement
287+
- Max total backoff duration enforcement
288+
- State persistence across restarts
289+
290+
3. **Batch Upload Manager** (`/packages/core/src/backoff/__tests__/BatchUploadManager.test.ts`)
291+
- calculateBackoff() produces correct exponential values with jitter
292+
- handleRetry() increments retry count and schedules next retry
293+
- Max retry count enforcement
294+
- Max total backoff duration enforcement
295+
- Batch metadata persistence
296+
297+
### Integration Tests
298+
299+
**File**: `/packages/core/src/plugins/__tests__/SegmentDestination.test.ts`
300+
301+
Add test cases for:
302+
303+
- 429 response halts upload loop (remaining batches not processed)
304+
- 429 response blocks future flush() calls until waitUntilTime
305+
- Successful upload after 429 resets state machine
306+
- Transient error (500) retries per-batch without blocking other batches
307+
- Non-retryable error (400) drops batch immediately
308+
- X-Retry-Count header sent with correct value
309+
- Authorization header contains base64-encoded writeKey
310+
- Sequential batch processing (not parallel)
311+
- Legacy behavior when httpConfig.enabled = false
312+
313+
## Verification Steps
314+
315+
### End-to-End Testing
316+
317+
1. **Mock TAPI responses** in test environment
318+
2. **Verify state persistence**: Trigger 429, close app, reopen → should still be in WAITING state
319+
3. **Verify headers**: Intercept HTTP requests and check for Authorization and X-Retry-Count headers
320+
4. **Verify sequential processing**: Queue 3 batches, return 429 on first → only 1 fetch call should occur
321+
5. **Verify logging**: Check logs for "Rate limited", "Batch uploaded successfully", "retry scheduled" messages
322+
323+
### Manual Testing Checklist
324+
325+
- [ ] Test with real TAPI endpoint during low-load period
326+
- [ ] Trigger 429 by sending many events quickly
327+
- [ ] Verify retry happens after Retry-After period
328+
- [ ] Verify batches are dropped after max retry count
329+
- [ ] Verify batches are dropped after max total backoff duration (12 hours)
330+
- [ ] Test app restart during WAITING state (should persist)
331+
- [ ] Test legacy behavior with httpConfig.enabled = false
332+
- [ ] Verify no breaking changes to existing event tracking
333+
334+
## Critical Files
335+
336+
### New Files (3)
337+
338+
1. `/packages/core/src/backoff/UploadStateMachine.ts` - Global rate limiting state machine
339+
2. `/packages/core/src/backoff/BatchUploadManager.ts` - Per-batch retry and backoff
340+
3. `/packages/core/src/backoff/index.ts` - Barrel export
341+
342+
### Modified Files (5)
343+
344+
1. `/packages/core/src/types.ts` - Add HttpConfig, UploadStateData, BatchMetadata types
345+
2. `/packages/core/src/errors.ts` - Add classifyError() and parseRetryAfter()
346+
3. `/packages/core/src/api.ts` - Add retryCount param and Authorization header
347+
4. `/packages/core/src/plugins/SegmentDestination.ts` - Integrate state machine and batch manager
348+
5. `/packages/core/src/constants.ts` - Add defaultHttpConfig
349+
350+
### Test Files (3 new + 1 modified)
351+
352+
1. `/packages/core/src/backoff/__tests__/UploadStateMachine.test.ts`
353+
2. `/packages/core/src/backoff/__tests__/BatchUploadManager.test.ts`
354+
3. `/packages/core/src/__tests__/errors.test.ts` - Add classification tests
355+
4. `/packages/core/src/plugins/__tests__/SegmentDestination.test.ts` - Add integration tests
356+
357+
## Rollout Strategy
358+
359+
1. **Phase 1**: Implement and test in development with `enabled: false` (default to legacy behavior)
360+
2. **Phase 2**: Enable in staging with verbose logging, monitor for issues
361+
3. **Phase 3**: Update Settings CDN to include httpConfig with `enabled: true`
362+
4. **Phase 4**: Monitor production metrics (retry count, drop rate, 429 frequency)
363+
5. **Phase 5**: Tune configuration parameters based on real-world data
364+
365+
## Success Metrics
366+
367+
- Reduction in 429 responses from TAPI during high-load events
368+
- Reduction in repeated failed upload attempts from slow clients
369+
- No increase in event loss rate (should be same or better)
370+
- Successful state persistence across app restarts
371+
- No performance degradation in normal operation
372+
373+
## Implementation Status
374+
375+
**Status**: ✅ COMPLETE (2026-02-09)
376+
377+
All implementation steps have been completed:
378+
379+
- ✅ Type definitions added
380+
- ✅ Default configuration added
381+
- ✅ Error classification functions implemented
382+
- ✅ API layer updated with retry headers
383+
- ✅ Upload State Machine implemented
384+
- ✅ Batch Upload Manager implemented
385+
- ✅ Integration into SegmentDestination complete
386+
- ✅ TypeScript compilation successful
387+
388+
**Next Steps**:
389+
390+
1. Write unit tests for error classification functions
391+
2. Write unit tests for UploadStateMachine
392+
3. Write unit tests for BatchUploadManager
393+
4. Add integration tests to SegmentDestination.test.ts
394+
5. Perform end-to-end testing
395+
6. Update Settings CDN configuration

wiki/tapi-backoff-sdd-v2.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
fatal: path 'wiki/tapi-backoff-sdd-v2.md' exists on disk, but not in 'tapi-docs'

0 commit comments

Comments
 (0)