-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
Open
Labels
β°-spill-overIssues that were picked up in past sprints, but couldn't completeIssues that were picked up in past sprints, but couldn't completeβοΈ Under ReviewFeature requests that are currently under reviewFeature requests that are currently under reviewβ¨ EnhancementImprovement on an existing featureImprovement on an existing feature
Milestone
Description
Description
The Docker API server lags behind the Python library. This issue tracks adding endpoints/parameters to expose the following library features:
1. Adaptive crawling
- AdaptiveCrawler, AdaptiveConfig, CrawlState, CrawlStrategy, StatisticalStrategy
- Missing: endpoints to run/tune adaptive crawls
2. C4A Script language
- c4a_compile, c4a_validate, c4a_compile_file, CompilationResult, ValidationResult, ErrorDetail
- Missing: submit/validate/execute script endpoints
3. URL seeding
- AsyncUrlSeeder, SeedingConfig
- Missing: sitemap/common-crawl/discovery endpoints
4. Chunking
- ChunkingStrategy, RegexChunking
- Missing: chunking configuration
5. Browser adapters
- BrowserAdapter, PlaywrightAdapter, UndetectedAdapter
- Missing: adapter/stealth selection
6. Proxy rotation
- ProxyRotationStrategy, RoundRobinProxyStrategy
- Missing: rotation strategy selection (beyond raw proxy)
7. Dispatchers
- SemaphoreDispatcher, BaseDispatcher
- Missing: dispatcher selection (only MemoryAdaptive used internally)
8. Link preview
- LinkPreview, LinkPreviewConfig
- Missing: link preview/scoring endpoint
9. Profiling/monitoring
- BrowserProfiler, CrawlerMonitor
- Missing: profiling/monitoring endpoints
10. HTTP-only crawling
- HTTPCrawlerConfig
- Missing: HTTP crawler methods/params (non-browser). API uses browser-based crawling with LXMLWebScrapingStrategy
11. Virtual scroll
- VirtualScrollConfig
- Missing: infinite-scroll capture configuration
12. Undetected/stealth browser
- UndetectedAdapter; browser_config/browser_type='undetected'; stealth options
- Missing: explicit stealth mode controls
Acceptance criteria
1. New/extended endpoints and/or request schemas added
- New endpoints: Add missing API routes (e.g.,
/adaptive/crawl,/deep-crawl,/c4a-script/compile,/hub/crawlers) - Extended schemas: Enhance existing endpoints to accept new parameters (e.g., add
virtual_scroll_configto/crawl, addtable_extraction_strategyoptions) - Request schemas: Update
schemas.pyto include new request models for the missing features
2. Docs and examples updated
- API documentation: Update the docs to show new endpoints and parameters
- Parameter documentation: Add descriptions, examples, and validation rules for new fields
- Examples: Add working code examples showing how to use each new feature.
3. Minimal e2e tests per feature group
- Test coverage: Create integration tests that verify each new feature works end-to-end
- Happy path: Test successful usage of each feature
- Validation: Test error handling (invalid parameters, edge cases, etc.)
- Feature groups: Organize tests by category (adaptive crawling, deep crawling, C4A scripts, etc.)
rhajou, SimonMayerhofer and noobmaster19
Metadata
Metadata
Assignees
Labels
β°-spill-overIssues that were picked up in past sprints, but couldn't completeIssues that were picked up in past sprints, but couldn't completeβοΈ Under ReviewFeature requests that are currently under reviewFeature requests that are currently under reviewβ¨ EnhancementImprovement on an existing featureImprovement on an existing feature