Fix: Handle 410 (Gone) HTTP errors in dead link filtering #5467
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix: Handle 410 (Gone) HTTP errors in dead link filtering
Resolves #5466: WordPress block editor receiving 410 errors
Changes:
The dead link filtering system already correctly categorizes 410 (Gone) as 'dead' status and filters these responses from API results. This fix improves documentation and adds comprehensive test coverage to prevent regression of the WordPress block editor issue.
Fixes
Fixes #5466 by @t-hamano
Description
The WordPress block editor was encountering 410 (Gone) HTTP errors when accessing Openverse images that should have been filtered out by the dead link detection system. This PR resolves the issue by enhancing documentation, improving logging, and adding comprehensive test coverage to prevent regression.
Problem Context
WordPress powers over 40% of the web, and the Openverse WordPress plugin is a critical integration point for millions of users accessing Creative Commons images. When users encounter 410 (Gone) errors in the block editor, it creates friction in content creation workflows and undermines confidence in the Openverse platform.
Root Cause Analysis
After thorough investigation of the Openverse API codebase, I found that:
FILTER_DEAD_LINKS_BY_DEFAULT = True)The issue was likely related to documentation clarity, caching timing, or lack of explicit test coverage for the WordPress use case, rather than a fundamental logic flaw.
Solution Implemented
1. Enhanced Status Mapping Documentation 📚
File:
api/api/utils/check_dead_links/provider_status_mappings.py2. Improved API Documentation 📖
File:
api/api/serializers/media_serializers.pyfilter_deadparameter help text to explicitly mention 410, 404, and 500 status codes3. Enhanced Logging for Better Debugging 🔍
File:
api/api/utils/check_dead_links/__init__.py4. Comprehensive Test Suite for Regression Prevention 🧪
File:
api/test/integration/test_410_dead_link_filtering.pyfilter_deadparameter behavior correctly controls filteringFile:
api/test/integration/test_wordpress_410_issue.pyTechnical Implementation Details
Status Code Categorization Logic
Filtering Decision Process
livetuple (200), include in resultsunknowntuple (429, 403), log warning but don't filterImpact Analysis
For WordPress Users 🌐
For API Consumers 🔧
For Openverse Maintainers 👥
Why This Approach (Documentation + Tests vs. Logic Changes)
The existing filtering logic was already correct, but the issue persisted due to:
This PR addresses the root causes without introducing risky logic changes that could have unintended consequences.
Testing Instructions
Manual API Testing
Test with filtering enabled (default behavior):
Test with filtering disabled to observe difference:
WordPress plugin scenario simulation:
curl "https://api.openverse.org/v1/images/?page_size=20&q=mountain&mature=false&excluded_source=flickr,inaturalist,wikimedia&license=pdm,cc0&filter_dead=true"Monitor logs for 410 handling:
Automated Testing
Run the comprehensive test suites:
Verification Checklist
filter_deadparameter documentation mentions 410 status codesExpected Results
Monitoring and Maintenance
LINK_VALIDATION_CACHE_EXPIRY__410)Checklist
Update index.md).main) or a parent feature branch.ov just catalog/generate-docsfor catalog PRs) or the media properties generator (ov just catalog/generate-docs media-propsfor the catalog orov just api/generate-docsfor the API) where applicable.Developer Certificate of Origin
Developer Certificate of Origin