fix(source-twilio): retry HTTP 401 on transient CloudFront errors to avoid sync failure#75214
fix(source-twilio): retry HTTP 401 on transient CloudFront errors to avoid sync failure#75214devin-ai-integration[bot] wants to merge 6 commits intomasterfrom
Conversation
…oudFront auth failures Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksPR Slash CommandsAirbyte Maintainers (that's you!) can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful Resources
|
|
Deploy preview for airbyte-docs ready! ✅ Preview Built with commit 758a50a. |
Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
/publish-connectors-prerelease
|
|
|
…o IGNORE Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
…anager.devin.ai/proxy/github.com/airbytehq/airbyte into devin/1773934746-twilio-401-transient
IGNORE breaks the check test because Twilio's real 401 responses (including genuine auth failures) also have X-Cache: Error from cloudfront. RETRY preserves correct check behavior - after retries are exhausted, the error is raised and the check properly fails for invalid credentials. Co-Authored-By: gl_anatolii.yatsuk <gl_anatolii.yatsuk@airbyte.io>
|
↪️ Triggering Reason: Draft PR with CI passing, pre-release published. Ready for live validation of CloudFront 401 error handling fix. |
|
Fix Validation EvidenceOutcome: Could not Run Tests — Live connection testing blocked pending human approval; regression tests partially completed. Evidence SummaryPre-flight checks passed: the fix is non-breaking, reversible, and safe. Regression tests on pre-release The CHECK operation passing is particularly significant — it validates that the RETRY approach correctly fails for genuine invalid credentials while enabling retry for transient CloudFront 401s. Live connection testing was blocked because the required human approval for connection pinning was not received within the session timeframe. An internal test connection was identified and qualified, ready for testing once approval is obtained. Production sync logs were analyzed and confirmed the exact error pattern (HTTP 401 + Next Steps
Connector & PR DetailsConnector: Evidence PlanProving CriteriaA sync that was previously failing with HTTP 401 +
For regression testing: A healthy connection continues to sync successfully with the pre-release version. Disproving Criteria
Cases AttemptedCase 1 — Regression Tests (CI)
Case 2 — Internal Connection (planned, not executed)
Pre-flight Checks
Design Intent Check: The RETRY approach is intentional and well-reasoned. An earlier commit used IGNORE, but this breaks the check operation for genuine invalid credentials. RETRY preserves correct check behavior while handling transient CloudFront errors via the CDK's built-in retry mechanism. Detailed Evidence Log
Note: Connection IDs and detailed logs are recorded in the linked private issue. |
|
What
Sporadic HTTP 401 errors from Twilio's API have been observed in production, caused by transient CloudFront CDN edge failures (
X-Cache: Error from cloudfront) rather than actual credential issues. These currently cause the entire sync to fail because the CDK's default error mapping treats 401 as a non-retryableconfig_error.Subsequent partitions/time slices succeed with the same credentials, confirming the issue is infrastructure-level and time-bound.
How
Added an
HttpResponseFilterwithaction: RETRYand a predicate to thebase_requester'serror_handlerin the declarative manifest. The predicate matches when both conditions are true:status: 401(Twilio's error format)X-Cacheheader equalsError from cloudfront(CloudFront CDN failure signature)When matched, the request is retried with exponential backoff. If retries are exhausted, the error is raised as a
system_error(instead of the defaultconfig_error), which allows the sync to continue with remaining partitions rather than failing entirely.Why RETRY instead of IGNORE
IGNOREwas attempted but breaks the connector'scheckoperation: Twilio's genuine 401 responses (invalid credentials) also pass through CloudFront and carryX-Cache: Error from cloudfront. WithIGNORE, the check incorrectly succeeds for invalid credentials, failing thetest_check['invalid_config']standard test.RETRYpreserves correct check behavior — after retries are exhausted, the 401 is still raised and the check properly fails for invalid credentials.Review guide
manifest.yaml— the only functional change. New response filter at lines 30–34, placed between the 429 (rate limit) and 404 (ignore) filters.Checklist for reviewer
X-Cache: Error from cloudfrontheader is present on both transient and genuine Twilio 401s. This means genuine auth failures will also be retried before eventually failing, adding retry delay (~seconds) to the failure path. Verify this trade-off is acceptable.RETRY, invalid credentials are retried then correctly fail. Confirm this is acceptable behavior (adds a few seconds of retry delay to the check failure path)._matches_filterevaluateshttp_codes OR predicate OR error_message_contains— since nohttp_codesare set here, only the predicate is evaluated, and both conditions within it must be true.X-Cacheheader value (Error from cloudfront) matches production logs — the original incident had this exact string.User Impact
Syncs that previously failed entirely due to transient Twilio/CloudFront 401 errors will now retry the affected request and, if retries are exhausted, continue with remaining partitions instead of aborting. Genuine authentication failures still fail correctly (after retry delay).
Can this PR be safely reverted and rolled back?
Link to Devin session: https://app.devin.ai/sessions/d2bdc8f33f1740f0a55d6ba23f7340d3