-
Notifications
You must be signed in to change notification settings - Fork 1
50 performance optimizations #52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Lines 948-960: Replaced heap allocation (zend_vm_stack_new_page) with stack allocation - Line 949: Allocated VM stack memory directly on the C stack using char vm_stack_memory[ZEND_FIBER_VM_STACK_SIZE] - Lines 950-955: Manually initialized the VM stack structure with proper pointers and boundaries - Zero malloc calls: Eliminated the heap allocation for VM stack during coroutine startup Memory Optimization Benefits: 1. Eliminated heap allocation: No more zend_vm_stack_new_page() malloc call 2. Stack-based allocation: VM stack memory is now allocated on the C stack 3. Automatic cleanup: Stack memory is automatically freed when function exits 4. Better performance: Reduced malloc/free overhead and memory fragmentation 5. Cache locality: Stack allocation provides better memory locality Technical Implementation: - Used ZEND_FIBER_VM_STACK_SIZE constant (typically 4KB) for stack size - Properly initialized VM stack structure with top, end, and prev pointers - Maintained compatibility with existing VM stack operations - Preserved all existing functionality while optimizing memory allocation
Changes - Added embedded zend_async_waker_t waker field to async_coroutine_t - Set coroutine->coroutine.waker = &coroutine->waker during initialization - Removed zend_async_waker_new() and zend_async_waker_destroy() calls - No API changes - existing code works unchanged Benefits - Eliminates malloc/free for waker allocation (~16 bytes overhead saved) - Better cache locality and performance - Reduces memory fragmentation Trade-off - Increases coroutine size by ~120 bytes - Worthwhile for frequent waker usage patterns
- Added zend_async_waker_t waker; field to struct _async_coroutine_s - Added descriptive comment explaining the embedded approach 2. Updated initialization (coroutine.c:1123): - Set coroutine->coroutine.waker = &coroutine->waker; to point to embedded waker 3. Removed dynamic allocation: - Eliminated calls to zend_async_waker_new() and zend_async_waker_destroy() - Added comments explaining that waker is embedded and doesn't need separate destruction 4. Optimized field access (throughout coroutine.c): - Replaced all instances of coroutine->coroutine.waker->field with coroutine->waker.field for better performance - Updated in functions like getSuspendLocation(), isQueued(), coroutine_info(), async_coroutine_object_gc(), etc.
…rd declarations - Move all PHP METHOD functions to dedicated section at end of file - Add forward declarations to resolve function dependencies - Maintain embedded waker optimization and direct field access - Improve code organization with clear C API vs PHP method separation
…d recreated on each coroutine switch.
…ansferring the exception from the Waker to the current context. Previously, this was handled by the switching code.
* fix for async_coroutine_cancel + zend_async_waker_define
* fix execute_next_coroutine_from_fiber logic for context switching.
…tion; now the code is required to explicitly clear the WAKER state after a SUSPEND() operation.
…fer control to scheduler coroutine.
…leanup caused by the switch_to_scheduler() function. ZEND_ASYNC_CURRENT_COROUTINE = NULL; — explicit cleanup of the current coroutine.
add ZVAL_UNDEF(&X) after zval_ptr_dtor(&X)
The first VM stack segment is allocated on the fiber’s C stack and becomes invalid once fiber_entry returns. Previously, fiber_context_cleanup() could still access it, leading to Invalid read errors. Now additional heap-allocated stack pages are freed immediately after return_to_main(transfer), and EG(vm_stack) pointers are reset. The first segment is left untouched and is released automatically with the fiber stack.
The first VM stack segment is allocated on the fiber’s C stack and becomes invalid once fiber_entry returns. Previously, fiber_context_cleanup() could still access it, leading to Invalid read errors. Now additional heap-allocated stack pages are freed immediately after return_to_main(transfer), and EG(vm_stack) pointers are reset. The first segment is left untouched and is released automatically with the fiber stack.
…to 50-performance-optimizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements "50 performance optimizations" for the async PHP extension, focusing on fiber context management and coroutine switching performance improvements. The changes introduce a fiber context pooling system to reduce allocation overhead, optimize circular buffer operations with inline functions for hot paths, and restructure the scheduler for better performance.
Key changes include:
- Fiber context pooling: Implementation of a reusable fiber context pool with
ASYNC_FIBER_POOL_SIZE
configuration - Circular buffer optimizations: Inline specialized versions for pointer operations with bitwise arithmetic
- Scheduler restructuring: Major refactoring of context switching logic and fiber management
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
File | Description |
---|---|
scope.c | Replace zend_async_waker_destroy calls with zend_async_waker_clean for optimized cleanup |
scheduler.h | Add fiber context pool configuration and new fiber context creation API |
scheduler.c | Major refactoring with fiber context pooling, optimized switching logic, and reorganized code structure |
run-tests.sh | Add -m flag to test runner for memory leak checking |
php_async.h | Add fiber_context_pool to global state |
libuv_reactor.c | Optimize reactor execution by skipping uv_run() when no handles are active |
internal/circular_buffer.h | Add inline optimized functions for pointer operations with specialized push/pop |
internal/circular_buffer.c | Move circular_buffer_is_not_empty to inline implementation |
docs/context-switching-flow.puml | Add PlantUML documentation for fiber workflow |
coroutine.h | Restructure with new async_fiber_context_t and embedded waker design |
coroutine.c | Complete rewrite with embedded wakers, fiber context management, and reorganized structure |
benchmarks/*.php | Add comprehensive benchmarking suite for performance testing |
async_API.c | Add missing zend_async_waker_clean calls after suspend operations |
async.c | Replace zend_async_waker_destroy with zend_async_waker_clean |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
/* Fast specialized version for pointer push (8 bytes) */ | ||
static zend_always_inline zend_result circular_buffer_push_ptr(circular_buffer_t *buffer, void *ptr) { | ||
// Check if buffer is full using bitwise AND (capacity is power of 2) | ||
if (EXPECTED(((buffer->head + 1) & (buffer->capacity - 1)) != buffer->tail)) { |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bitwise AND operation assumes buffer capacity is always a power of 2, but there's no validation to ensure this constraint. If capacity is not a power of 2, the modulo operation will be incorrect and could cause buffer corruption.
Copilot uses AI. Check for mistakes.
if (EXPECTED(buffer->head != buffer->tail)) { | ||
// Direct pointer read - no memcpy overhead | ||
*ptr = *(void**)((char*)buffer->data + buffer->tail * sizeof(void*)); | ||
buffer->tail = (buffer->tail + 1) & (buffer->capacity - 1); |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue as above - bitwise AND operation for modulo only works correctly when capacity is a power of 2. This could lead to incorrect tail advancement if capacity is not a power of 2.
buffer->tail = (buffer->tail + 1) & (buffer->capacity - 1); | |
buffer->tail = (buffer->tail + 1) % buffer->capacity; |
Copilot uses AI. Check for mistakes.
|
||
static zend_always_inline void fiber_pool_init(void) | ||
{ | ||
circular_buffer_ctor(&ASYNC_G(fiber_context_pool), ASYNC_FIBER_POOL_SIZE, sizeof(async_fiber_context_t*), NULL); |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ASYNC_FIBER_POOL_SIZE is defined as 4, which is a power of 2, but the circular buffer inline optimizations in circular_buffer.h assume all capacities are powers of 2. If this size is ever changed to a non-power-of-2 value, the bitwise operations will fail.
Copilot uses AI. Check for mistakes.
{ | ||
circular_buffer_t *buffer = &ASYNC_G(fiber_context_pool); | ||
|
||
if (buffer->capacity > 0 && false == circular_buffer_is_full(buffer)) { |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comparison false == circular_buffer_is_full(buffer)
uses Yoda condition style inconsistently with the rest of the codebase. Consider using !circular_buffer_is_full(buffer)
for better readability and consistency.
if (buffer->capacity > 0 && false == circular_buffer_is_full(buffer)) { | |
if (buffer->capacity > 0 && !circular_buffer_is_full(buffer)) { |
Copilot uses AI. Check for mistakes.
ZEND_ASSERT(ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) && | ||
"Coroutine waker must be dequeued before destruction"); |
Copilot
AI
Sep 3, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion checks waker status but the embedded waker is initialized in the constructor. If a coroutine is destroyed before being properly dequeued, this assertion will fail. The assertion should handle the case where the waker was never queued.
ZEND_ASSERT(ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) && | |
"Coroutine waker must be dequeued before destruction"); | |
ZEND_ASSERT((ZEND_ASYNC_WAKER_NOT_IN_QUEUE(&coroutine->waker) || !coroutine->waker.was_ever_queued) && | |
"Coroutine waker must be dequeued before destruction or never queued"); |
Copilot uses AI. Check for mistakes.
…_FIBER_VM_STACK_SIZE] visibility
JIT PHP in tracing mode can compile functions on demand. When memory runs out, JIT crashes with an error because it tries to compile a closure. This code attempts to work around the issue so that the test runs correctly. So we use: $function = function(bool $out = true) { if($out) echo "Shutdown function called\n"; }; $function(false); register_shutdown_function($function);
No description provided.