-
Notifications
You must be signed in to change notification settings - Fork 16
[FEATURE] PHP 8.5 with parallel rendering and performance optimizations #1143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I would suggest that we discuss this in the TYPO3 slack #typo3-documentation-sig first. There was this idea that we should stay compatible with the currently supported TYPO3 versions. Also there are ppl who are using these packages without the docker container for special purposes. To remove so many PHP versions is quite breaking. Also our pipelines are currently running on PHP 8.2 so I would be hesistant to raise the PHP requirements this much at once. DDEV Xdebug is an important tool during development, before there is support I would oppose this change. Also I am not sure if we even test with PHP 8.5 in the phpdocumentor guides yet. Have you checked? As a first step I would suggest to add testing for PHP 8.4 and 8.5. In a next step we could drop PHP 8.1 support maybe if there is a general agreement. |
|
@linawolf, I was expecting your reply ;-) and I completely understand your concerns. For your information: I've been doing this whole DevSecOps thing (even before it was called that) for almost 30 years now. So I know very well what I'm getting myself into. But above all, I also know very well what I don't know, especially project-specific internals that only you know significantly better. So I'm not so arrogant as to think you'll just snatch this up from my hands. I will, however, complete the pull request so we have something to discuss. Working on it will also uncover all the actual obstacles and expose any imagined problems. I also have no problem if the PR is ultimately rejected. It's just a learning experience. My thoughts before I started the PR: It's just that, as far as I understand, the preferred use case is Docker anyway. And the use of containers should be further encouraged. Everyone is free to use these containers, and everyone should be able to do so. And you should strongly encourage your community to use containers for a wide range of reasons. Ultimately, this leads to a safer product for everyone. Developers who, for whatever perfectly legitimate reasons, don't use containers, will also have a level of experience that will allow them to easily adapt their runtime environment to the requirements. But that's just my perspective, which is why I was fully aware of the risk of the pull request being rejected, and I accept that. |
|
@jaapio could you have a look? I think you have worked in this direction already? |
|
Thanks a lot for taking the time to contribute and for sharing your ideas. We really appreciate the effort and the thought you’ve put into improving this tool, it’s always great to see others thinking along with us about what could be better or what might be missing. That said, the current pull request is quite large and covers many different topics at once. Because of its size and the variety of changes and commit messages, it’s very difficult for us to review it thoroughly and give meaningful feedback. For that reason, we’re not able to merge this PR in its current form. There are definitely parts in here that caught our interest and gave us inspiration, especially around improving the rendering process and overall performance. One thing to keep in mind, though, is that performance improvements based on caching mainly benefit repeated renders of the same documentation set. Some cache layers can also introduce a real risk of serving outdated documentation, as cache invalidation, especially when external data is involved, can be quite tricky. If you’d like to continue contributing, we’d strongly encourage splitting your ideas into smaller, focused pull requests that each address a single topic. This makes it much easier for us to review changes, discuss trade-offs, and provide useful feedback. Finally, several of the improvements you’re proposing would likely fit better in upstream libraries within the phpDocumentor organization. In particular, phpDocumentor/guides is the core foundation of this project. We aim to keep this repository focused on TYPO3-specific logic and templating. Contributing improvements upstream would benefit not just this project, but the wider PHP ecosystem that depends on those libraries. Thanks again for your interest and for sharing your work. we hope to collaborate further in a more incremental way. |
said that in a overhelmingly long wall of text ... just not noting it was WIP! |
|
I'm not sure how to interpret that response. I tried to explain why I closed the pr. It's never a good feeling for us nor for the person doing the pr to close a PR. That's why I tried to explain it with many details. If you want to talk feel free to find us in the Typo3 slack. |
|
Hi @jaapio, Before continuing: my previous response was sharper in tone than intended. I stand by the substance, but not the delivery. This PR was intended as a work in progress and exploratory by nature. Its breadth, the number of aspects it touched, and even experimenting with upstream behavior were intentional, signals that this was about exploring the problem space and identifying potential performance improvement paths, not proposing a finished solution. I’m not questioning the technical correctness of your assessment — it’s solid and well explained. What I do want to challenge is how this was handled from a contributor perspective. Treating an exploratory PR like a final proposal and effectively closing it out without acknowledging that context is discouraging. An introduction on how to acknowledge work, followed by a decision that ends that work in an incomplete state, sends mixed signals and makes experimentation feel unwelcome. I understand you don’t know my context, and that’s fine. But that’s exactly why a bit more framing and acknowledgment would have helped here. The decision is yours, and I respect it. I just want to be clear that the process matters if you want people to keep investing exploratory effort into this project. |
|
Hi @CybotTM, Thanks for coming back on this, I really appreciate that. I understand your feedback and will take it into account going forward. You’re right that I overlooked the fact that the PR was still marked as a draft/WIP, and I can see how that affected the way my response came across. Is there a way we can proceed with this in a more structured way? Perhaps we could start with a discussion in an issue first, so we can get a better shared understanding of the goals you’re exploring and see where it best fits (here or upstream). Thanks again for taking the time to explain your perspective. |
|
Or how about trying things out in a fork or separate branch? |
|
Hi @linawolf, It's already a fork and a separate branch ;-) . I only created the pull request to keep you informed about the ongoing work/exploration. In my opinion, that's settled. I'll continue my explorations and create pull requests as soon as I've decided what's worth a pull request and what isn't. |
|
I created a summarize page with the current results here: https://cybottm.github.io/render-guides/ |
- Add CPU cores (16) to benchmark methodology - Add note about pcntl_fork and --parallel-workers - Add References section with link to PR TYPO3-Documentation#1143
|
I reopened this PR, so we have an open PR to discuss the topics covered by the benchmark you added, that makes it more visible what is going on. I think it's still wise to split this PR into separate PR's when we think some improvements should be applied. My first question, is it possible to also include memory and cpu usage in your report? Because time is just one metric, but as most CI jobs are running without cache, which would be a cold run, I think the extra memory usage and cpu usage makes sense to include. |
I can add them, even I think it does not matter that much due to ... various reasons ... but adding the metrics will mostly explain or I will try to explain what is in my head if it really needs more explanation or adds any value later ;-) . |
- Add CPU cores (16) to benchmark methodology - Add note about pcntl_fork and --parallel-workers - Add References section with link to PR TYPO3-Documentation#1143
df4ed02 to
f14a208
Compare
- Add CPU cores (16) to benchmark methodology - Add note about pcntl_fork and --parallel-workers - Add References section with link to PR TYPO3-Documentation#1143
9262ff7 to
0eda962
Compare
- HTML performance report comparing main vs feature/php-8.5-only - Benchmark results: 36% improvement (small), 72% improvement (large) - Rendered rendertest docs for visual comparison - Lists all upstream patches and project changes [DOCS] Add PR reference and parallel processing note - Add CPU cores (16) to benchmark methodology - Add note about pcntl_fork and --parallel-workers - Add References section with link to PR TYPO3-Documentation#1143 docs: update performance report with CPU/memory metrics - Add CPU usage and peak memory columns to benchmark tables - Update benchmark values with latest measurements: * Changelog: 992s → 43.1s (-96%), 437% CPU, 942MB peak * Core API: 131.1s → 33.9s (-74%), 387% CPU, 567MB peak * Rendertest: 11.2s → 5.6s (-50%), 188% CPU, 120MB peak - CPU >100% indicates parallel processing across multiple cores - Update cold vs warm and incremental build tables docs: Add main branch memory metrics to performance report Split performance comparison into two tables: 1. Time comparison (main vs feature) 2. Resource usage comparison (CPU + Memory for both branches) Key findings: - TYPO3 Core API: 12,234 MB (main) vs 567 MB (feature) = 95.4% reduction - Rendertest: 1,011 MB (main) vs 120 MB (feature) = 88.1% reduction The massive memory savings come from the forked process architecture which isolates document processing in child processes. docs: update performance report with benchmark matrix data - Update Rendertest metrics with accurate main branch data (8.16s, 97MB) - Add Resource Usage Comparison table with main vs feature memory data - Add Parallel Processing Comparison section (sequential, auto, 16 workers) - Update Cold vs Warm and Incremental sections with new measurements - Feature branch is 32% faster and uses 66% less memory even without parallelism docs: fix report inaccuracies in resource usage and cold/warm tables - Resource Usage: show 'not measured' for main branch large docs instead of assumed values - Cold vs Warm: remove CPU/Memory columns (showed cold data, not warm) - Add note explaining main branch only benchmarked on small docs - Mark Rendertest main values as '(measured)' for clarity docs: update performance report with full benchmark matrix results All benchmarks now run on all three doc types (small, large, changelog) with real measured values for both main and feature branches. Key results: - Changelog: 1180s → 56s (95% faster), 2986MB → 900MB (70% less memory) - CoreAPI: 186s → 44s (76% faster) - Rendertest: 9.3s → 6.8s (27% faster), 98MB → 33MB (66% less memory) docs: add parallel processing comparison for large and extra-large docs Added tables comparing sequential, auto, and 16-worker modes for: - TYPO3 Core API (957 files): 71-78% faster than main - TYPO3 Core Changelog (3667 files): 92-95% faster than main Key findings: - Sequential mode outperforms parallel modes on these doc sets - Main branch: 20 min cold, 15 min warm for Changelog - Feature branch: under 1 minute for all modes docs: document incremental rendering bug and theoretical baseline - Add warning box explaining the cache loss during parallel compilation - Add single-document baseline measurement (~1.25s framework startup) - Add theoretical optimal performance table showing potential speedup - Add root cause analysis explaining the fork/cache state issue - Update 'partial' column to show results same as warm (all docs re-render) The incremental rendering infrastructure is fully implemented but cache state is lost when ParallelCompiler forks child processes. Exports are collected in children but never merged back to parent.
874829c to
6c492f5
Compare
- Add leaf document partial benchmarks (youtube.rst, LOG.rst) - Compare root vs leaf document changes in incremental section - Apply Netresearch brand colors (#2F99A4 turquoise, #FF4D00 orange) - Use Raleway for headings, Open Sans for body text - Add Netresearch logo [n] in header - Update footer with Netresearch attribution docs: add performance report and rendered docs for GitHub Pages - HTML performance report comparing main vs feature/php-8.5-only - Benchmark results: 36% improvement (small), 72% improvement (large) - Rendered rendertest docs for visual comparison - Lists all upstream patches and project changes [DOCS] Add PR reference and parallel processing note - Add CPU cores (16) to benchmark methodology - Add note about pcntl_fork and --parallel-workers - Add References section with link to PR TYPO3-Documentation#1143 docs: update performance report with CPU/memory metrics - Add CPU usage and peak memory columns to benchmark tables - Update benchmark values with latest measurements: * Changelog: 992s → 43.1s (-96%), 437% CPU, 942MB peak * Core API: 131.1s → 33.9s (-74%), 387% CPU, 567MB peak * Rendertest: 11.2s → 5.6s (-50%), 188% CPU, 120MB peak - CPU >100% indicates parallel processing across multiple cores - Update cold vs warm and incremental build tables docs: Add main branch memory metrics to performance report Split performance comparison into two tables: 1. Time comparison (main vs feature) 2. Resource usage comparison (CPU + Memory for both branches) Key findings: - TYPO3 Core API: 12,234 MB (main) vs 567 MB (feature) = 95.4% reduction - Rendertest: 1,011 MB (main) vs 120 MB (feature) = 88.1% reduction The massive memory savings come from the forked process architecture which isolates document processing in child processes. docs: update performance report with benchmark matrix data - Update Rendertest metrics with accurate main branch data (8.16s, 97MB) - Add Resource Usage Comparison table with main vs feature memory data - Add Parallel Processing Comparison section (sequential, auto, 16 workers) - Update Cold vs Warm and Incremental sections with new measurements - Feature branch is 32% faster and uses 66% less memory even without parallelism docs: fix report inaccuracies in resource usage and cold/warm tables - Resource Usage: show 'not measured' for main branch large docs instead of assumed values - Cold vs Warm: remove CPU/Memory columns (showed cold data, not warm) - Add note explaining main branch only benchmarked on small docs - Mark Rendertest main values as '(measured)' for clarity docs: update performance report with full benchmark matrix results All benchmarks now run on all three doc types (small, large, changelog) with real measured values for both main and feature branches. Key results: - Changelog: 1180s → 56s (95% faster), 2986MB → 900MB (70% less memory) - CoreAPI: 186s → 44s (76% faster) - Rendertest: 9.3s → 6.8s (27% faster), 98MB → 33MB (66% less memory) docs: add parallel processing comparison for large and extra-large docs Added tables comparing sequential, auto, and 16-worker modes for: - TYPO3 Core API (957 files): 71-78% faster than main - TYPO3 Core Changelog (3667 files): 92-95% faster than main Key findings: - Sequential mode outperforms parallel modes on these doc sets - Main branch: 20 min cold, 15 min warm for Changelog - Feature branch: under 1 minute for all modes docs: document incremental rendering bug and theoretical baseline - Add warning box explaining the cache loss during parallel compilation - Add single-document baseline measurement (~1.25s framework startup) - Add theoretical optimal performance table showing potential speedup - Add root cause analysis explaining the fork/cache state issue - Update 'partial' column to show results same as warm (all docs re-render) The incremental rendering infrastructure is fully implemented but cache state is lost when ParallelCompiler forks child processes. Exports are collected in children but never merged back to parent.
6c492f5 to
3b766c9
Compare
The incremental cache was not being injected into the parallel compiler, causing cache state to be lost during parallel compilation. Child processes collected exports but the parent never received the IncrementalBuildCache instance to merge them into. This fixes incremental/partial builds when using parallel compilation.
…esults - Fixed benchmark script to modify file content (not just touch) for partial tests - Updated report to show actual incremental build performance: - Warm builds: 0.24s (96% faster than cold) - Partial builds: 0.24s (only changed files re-rendered) - Removed 'Known Issue' warning - incremental rendering now works correctly - Added benchmark results from testing
…tation sets Results show 96-99.7% improvement in incremental builds: TYPO3 Core Changelog (3668 files): - Cold: 172.7s → Warm: 0.47s (-99.7%) → Partial: 0.44-0.48s (-99.7%) TYPO3 Core API (957 files): - Cold: 35.3s → Warm: 0.39s (-99%) → Partial: 0.40-0.41s (-99%) Rendertest (94 files): - Cold: 6.9s → Warm: 0.26s (-96%) → Partial: 0.24-0.25s (-96%) Key insight: Incremental builds are nearly constant time regardless of project size (~0.4-0.5s for any project), scaling with changed files only. Also clarified memory reporting: 900MB is parent process peak, total system memory ~1.3GB for Changelog (vs 3GB on main).
- Added memory usage for each documentation set (cold builds) - Added actual file counts rendered for each partial scenario: - Partial-Index: 1 file (root has no dependents) - Partial-Leaf: 1-2 files (leaf + dependents that reference it) - Clarified dependency cascade behavior: - Export changes (anchors, titles) trigger dependent re-renders - Simple content changes only re-render the modified file - Verified cascade working: leaf doc changes trigger 2 files rendered
When a heavily-referenced document changed, only the modified file was re-rendered instead of cascading to all dependent documents. Root cause: Render phase can only process documents that were parsed, but dependent documents weren't included in the parse list. Fix: - ForkingRenderer: Integrate IncrementalCacheListener for dirty set computation, skip clean documents during rendering - IncrementalCacheListener: Include dependent documents in parse list by querying dependency graph for all dirty document dependents - DI config: Wire cacheListener into ForkingRenderer Result: Changing EventDispatcher/Index.rst (156 references) now triggers re-rendering of 158 files in 0.37s (99% faster than cold).
- Show total system memory (~1.3 GB) instead of parent process only (900 MB) - Replace Partial-Index/Partial-Leaf columns with Single File/Cascade - Add cascade test result: 158 files rendered in 0.37s for EventDispatcher
- Resource Usage: Core API 521→~580 MB, Rendertest 33→~200 MB - Parallel Processing tables: all updated consistently - Note: Fork overhead increases memory on small docs, saves on large docs
- Replace array_shift with SplQueue for O(1) dequeue in propagateDirty() - Use keyed arrays (docPath => true) instead of value arrays for O(1) lookups - Replace in_array() calls with isset() checks throughout - Optimize merge() to use array union instead of nested loops - Remove unnecessary array_unique() since visited check prevents duplicates Performance impact: - propagateDirty: O(n²) → O(V+E) where V=vertices, E=edges - addImport: O(n) → O(1) - merge: O(n²) → O(E) This addresses the algorithmic bottleneck identified in performance analysis, improving dependency cascade performance for large documentation projects.
Cache Sharding: - Split monolithic _build_meta.json into sharded exports - Exports stored in _exports/<hash-prefix>/<hash>.json - Only dirty exports written on incremental save (O(1) per doc vs O(n)) - Maintains backwards compatibility with legacy cache format Merge Phase Optimization: - Replace nested loop duplicate detection with hash-based lookup - runMergePhase: O(n²) → O(n) using seenChildren map - fixDocumentEntryReferences: O(n²) → O(n) using existingChildrenByEntry Impact on large projects (3000+ docs): - Cache save time reduced ~90% for incremental builds - Merge phase 10-100x faster depending on toctree depth
- DependencyGraph: Add type annotations for SplQueue and array operations - IncrementalBuildCache: Fix type casting for sharded export loading - ForkingRenderer: Use array_fill_keys instead of array_flip for dirtySet - Regenerate PHPStan baseline (removed 3 now-fixed errors) docs: add environment variability note to benchmark methodology
Updated benchmarks run on 2026-01-09: Cold Build Performance (main → feature): - Changelog: 970s → 47.3s (95% faster) - Core API: 128.6s → 34.9s (73% faster) - Rendertest: 9.1s → 6.4s (30% faster) Warm/Incremental Build Performance (feature only): - Changelog: 47.3s cold → 1.84s warm (96% improvement) - Core API: 34.9s cold → 1.55s warm (96% improvement) - Rendertest: 6.4s cold → 1.78s warm (72% improvement) Also adds 'changelog' docs type to benchmark script.
Results are environment-specific and accumulate with timestamps, so they should not be tracked in version control.
Adds 4 documents with 13 cross-references: - Index.rst: Hub document with 3 labels - DependentA/B/C.rst: Each references hub and siblings This enables testing: - Cross-reference rendering - Dependency tracking - Cascade re-rendering when hub is modified
Adds 'cascade' scenario that modifies CrossRefs/Index.rst hub document to trigger cascade re-rendering of dependent documents. Results: ~1.56s average (similar to warm/partial for small docs) - Hub + 3 dependents + singlepage = 5 files re-rendered
34be9b0 to
84e8b6a
Compare
- Update external-reference-resolver-cache.patch with O(1) hash set lookup - Update inline-lexer-regex-cache.patch to use ExternalReferenceResolver::isSupportedScheme() - Update guides-cli-container-cache.patch with graceful fallback for uncacheable containers - Update guides-cli-symfony8-compat.patch with backwards-compatible method_exists() - Add Upstream Pull Requests section to performance report - Link to PRs #1287, #1288, #1289, #1290, #1291
- Merge 'Upstream Performance Patches' and 'Upstream Pull Requests' into unified 'Upstream Contributions' section - Show patches grouped by PR with direct links - Add 'Not Submitted Upstream' section explaining why 8 patches are excluded - Update metrics card: 19 → 15 patches in upstream PRs
7c5fa9d to
1873781
Compare
- document-node-has-entry.patch (PR #1287): adds hasDocumentEntry() method - document-entry-file-comparison.patch (PR #1290): use file path instead of object identity - cachable-inline-rule.patch (PR #1288): remove unused InlineLexer import
Update patches to match upstream PRs exactly for accurate benchmarks: - PR #1287: render-context, pre-node-renderer-factory, external-reference-resolver - PR #1288: buffer, inline-parser, line-checker, inline-lexer - PR #1289: guides-cli-container-cache - PR #1292: project-node-cache
- Add PR references to patch descriptions - Remove 6 low-value regex-cache patches (PHP PCRE caches internally) - Add new patches: document-node-has-entry, document-entry-file-comparison, cachable-inline-rule - Regenerate composer.lock and patches.lock.json
- Replace #1290 and #1292 with #1293 (combined PR) - Mark #1291 as Merged - Update patch count: 15 → 14 (in 5 PRs instead of 6)
Delivery Status Update (2025-01-23)Following the feedback about the PR size, I've reorganized the delivery strategy into incremental, reviewable chunks: ✅ Completed: Upstream PRs Rebased & ReadyAll upstream PRs to phpDocumentor/guides have been rebased and are awaiting review:
#1291 (Symfony 8 compat) was already merged. 📋 Next: Phase 1 Caching PRs for render-guidesI'll create atomic PRs for the caching infrastructure:
These will be PHP 8.1 compatible and can be reviewed/merged independently. 🔮 Later PhasesAfter Phase 1:
This PRThis PR remains as an exploratory/reference branch showing the complete optimization. The Performance Report demonstrates the combined effect of all optimizations. @jaapio @linawolf - The upstream PRs are ready for review when you have time. Would appreciate any feedback on the Phase 1 approach for render-guides. |
Delivery Status UpdatePhase 1: Upstream PRs (phpDocumentor/guides)All 4 open PRs have been rebased on latest main and are ready for review:
Phase 2: Caching PRs CreatedThree caching PRs have been created for render-guides, all PHP 8.1 compatible:
These are standalone caching layers using the decorator pattern, ready for independent review and merge. Next Steps
|
CI Status UpdateAfter testing locally and in CI, two of the Phase 2 caching PRs had integration issues: ✅ Working
❌ Closed (Need Different Approach)
Next Steps
|
Summary
Major performance optimization introducing parallel rendering and incremental builds. Achieves 10x faster renders on multi-core systems.
📊 View Performance Report
Performance Results
Key Features
pcntl_forkBuild Metadata
Stores
_build_meta.jsonin output directory containing:Delivery Plan
This PR serves as an exploratory/WIP branch. The optimizations are being delivered incrementally:
Phase 1: Caching Infrastructure (Upstream)
Phase 2: Infrastructure Components (Upstream - Draft PRs)
Closed render-guides PRs (Moved Upstream)
#1152#1153#1154Future Phases (After Upstream Merges)
Technical Notes
Test Plan
References