feat(validator): optimize parallel execution#228
Conversation
…cheduling Introduce `execution_type()` method to distinguish CPU-bound and I/O-bound backends for optimal parallel scheduling: - Add `execution_type()` to `BaseValidatorBackend` (defaults to "cpu"). - Override `execution_type()` in Spectral, Redocly, and SpecLynx backends to return "io". - Replace `ProcessPoolExecutor` with `ThreadPoolExecutor` for I/O-bound backends. - Run I/O backends in parallel threads while CPU backends execute sequentially in main thread to avoid GIL contention. - Update tests to validate GIL-aware scheduling behavior and mixed backend execution. - Extend test coverage for CPU-only, I/O-only, and mixed backend scenarios.
Introduce three-tier parallel scheduling to optimize validation performance: - Add "cpu-heavy" execution type for long-running pure-Python backends that benefit from multi-process parallelism. - Implement `ProcessPoolExecutor` with spawn context for cpu-heavy backends to achieve true multi-core execution without GIL contention. - Update `execution_type()` return type annotations to use `Literal` for type safety in I/O backends. - Refactor orchestrator to schedule io, cpu, and cpu-heavy backends simultaneously using `ExitStack`. - Extend test coverage with cpu-heavy backend mocks, three-tier execution tests, and diagnostics aggregation validation.
char0n
left a comment
There was a problem hiding this comment.
Looks good to me. That's what we wanted.
There was a problem hiding this comment.
Pull request overview
This PR optimizes OpenAPIValidator.validate() parallel scheduling by introducing backend “execution tiers” (I/O vs CPU vs CPU-heavy) so subprocess/network validators can run concurrently in threads while long-running pure-Python validators can use multi-process execution.
Changes:
- Add
execution_type()toBaseValidatorBackendand implement it across built-in backends. - Refactor the validator orchestrator to schedule I/O backends on a
ThreadPoolExecutor, CPU-heavy backends on aProcessPoolExecutor(spawn), and CPU backends sequentially. - Expand parallel-execution tests to cover the three-tier scheduling behavior and mixed backend aggregation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py | Implements three-tier scheduling and executor orchestration. |
| packages/jentic-openapi-validator/src/jentic/apitools/openapi/validator/backends/base.py | Introduces execution_type() on the backend interface. |
| packages/jentic-openapi-validator/src/jentic/apitools/openapi/validator/backends/default/init.py | Declares default backend as CPU tier. |
| packages/jentic-openapi-validator/src/jentic/apitools/openapi/validator/backends/openapi_spec.py | Declares spec backend as CPU tier. |
| packages/jentic-openapi-validator-spectral/src/jentic/apitools/openapi/validator/backends/spectral/init.py | Declares Spectral backend as I/O tier. |
| packages/jentic-openapi-validator-redocly/src/jentic/apitools/openapi/validator/backends/redocly/init.py | Declares Redocly backend as I/O tier. |
| packages/jentic-openapi-validator-speclynx/src/jentic/apitools/openapi/validator/backends/speclynx/init.py | Declares SpecLynx backend as I/O tier. |
| packages/jentic-openapi-validator/tests/test_validate_parallel.py | Adds coverage for tiered scheduling, including cpu-heavy process pool behavior. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Outdated
Show resolved
Hide resolved
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
packages/jentic-openapi-validator/src/jentic/apitools/openapi/validator/backends/base.py
Show resolved
Hide resolved
…rs param Cap ProcessPoolExecutor max_workers to min(num_backends, cpu_count) to prevent oversubscription, and expose a separate max_process_workers parameter for explicit control. Also add a warning for unknown execution_type() values and use Literal return type for type safety.
Switch time.time() to time.monotonic() for elapsed-time measurements to avoid flakiness from system clock adjustments. Reduce cpu-heavy test delays (1.5-2.0s → 0.5-0.6s) to cut test runtime roughly in half while still reliably proving concurrency. Guard the fork-context fixture with a try/except so tests skip on platforms without fork support.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Show resolved
Hide resolved
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Outdated
Show resolved
Hide resolved
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Outdated
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
Validate max_workers and max_process_workers are positive when provided, use explicit `is not None` check instead of truthiness for the process pool fallback, and document thread-safety requirements for I/O backends.
… behavior The mock backends use time.sleep which releases the GIL, so the timing tests prove process-pool dispatch and concurrent scheduling rather than true CPU-bound multi-core parallelism. Updated docstrings to reflect this and reference the structural test that validates ProcessPoolExecutor configuration.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
...ges/jentic-openapi-validator/src/jentic/apitools/openapi/validator/core/openapi_validator.py
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
packages/jentic-openapi-validator/tests/test_validate_parallel.py
Outdated
Show resolved
Hide resolved
…ests Replace the use_fork_for_process_pool fixture with process_pool_as_thread_pool, which substitutes ProcessPoolExecutor with a ThreadPoolExecutor wrapper. This avoids both the spawn-can't-import issue and the fork-in-multithreaded-process deadlock risk that produced DeprecationWarnings in the three-tier tests.
includes and replaces #226 (adds cpu-heavy execution tier for long-running backends)
Introduce
execution_type()method to distinguish CPU-bound and I/O-bound backends for optimal parallel scheduling:execution_type()toBaseValidatorBackend(defaults to "cpu").execution_type()in Spectral, Redocly, and SpecLynx backends to return "io".ProcessPoolExecutorwithThreadPoolExecutorfor I/O-bound backends.Introduce three-tier parallel scheduling to optimize validation performance:
ProcessPoolExecutorwith spawn context for cpu-heavy backends to achieve true multi-core execution without GIL contention.execution_type()return type annotations to useLiteralfor type safety in I/O backends.ExitStack.