Optimize force_publish_missed_schedules and confirm_scheduled_posts queries #376
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This update optimizes the queries used by the
force_publish_missed_schedulesandconfirm_scheduled_postsfunctions, triggered by thea8c_cron_control_force_publish_missed_schedulesanda8c_cron_control_confirm_scheduled_postsinternal events, respectively.In its current form, the query could take several seconds to run in a table with millions of rows. Since it runs every two minutes (10 minutes for
confirm_scheduled_posts), it sometimes pollutes the slow query logs for some customers. The original query looks like this:When tested on a table with over 19M rows, this query had an estimated cost of
11411403.00and relied on thetype_status_dateindex. However, it didn't fully take advantage of the index's structure. The access type for the main query wasindex, meaning a large portion of the index was still being scanned, with a filtering efficiency of only3.33%(when MySQL usesindexas the access type, it means that instead of scanning the table's rows directly, it reads the entire index sequentially. But this differs fromrangeorref, which selectively scan parts of an index based on conditions). This inefficiency stemmed from applying the filters forpost_statusandpost_datetoo late, leading to unnecessary overhead and slow performance.The optimized query introduces a subquery to retrieve distinct
post_typevalues:This result is then joined with the main table (
wp_posts) to filter rows more effectively. In the test, this reduces the query cost to1316524.20(an88%improvement). The subquery creates a temporary table with only the distinctpost_typevalues, scanning just3,483rows. The main query then uses arefaccess type with thetype_status_dateindex, focusing only on rows wherepost_statusandpost_datematch the criteria. This drops the number of rows examined per scan to3,577and improves filtering efficiency to33.33%, which is significantly faster and more efficient.This improvement is primarily due to better use of the
type_status_dateindex, which includespost_type,post_status,post_date, andIDin that order. By using a subquery to pre-filter all possiblepost_typevalues, the main query avoids scanning irrelevant rows from the index. Additionally, applying thepost_statusfilter earlier in the main query leverages the index's second column, enabling MySQL to filter rows more effectively and reduce unnecessary scans.The result is a query that performs much better on large datasets, cutting down resource usage and execution time without changing its functionality. In testing, the execution time for the query on the same 19 million-row table dropped from
4.51sto just0.001s. This change makes the functionality more scalable on tables with tens of millions of rows, delivering a significant performance boost and reducing the load on customer databases.I also realized test cases didn't exist for the
force_publish_missed_schedulesandconfirm_scheduled_postsfunctions, so I added them as part of this PR.