Skip to content

Conversation

@duyhuynhdev
Copy link

  • Update the doubling logic according to the first approach, as the second approach still has corner cases where a single log event can dominate others if it has a large event count.
  • Remove the timer callback and use the profiler flush callback instead. In addition, expose the new maxsample setting to support the flush callback.
  • Update the relevant classes and unit tests to align with the new workflow and doubling logic.
  • Add new unit tests to cover the updated behaviour.
  • Fix CI complaints, including unsorted language files.

@duyhuynhdev duyhuynhdev force-pushed the refine-the-doubling-logic-#418 branch 3 times, most recently from 67e5801 to 70663e9 Compare December 10, 2025 06:52
Copy link
Contributor

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @duyhuynhdev

This is a big change so we're going to need to test this thoroughly. I've started with comments from a code review, but I haven't tested this personally yet.

My main concerns so far is how this affects task detection and whether we are losing any sample data when we change the structure while combining components.

Comment on lines 140 to 118
// We want to prevent doubling up of processing, so skip if an existing process is still executing.
// The profile logs will be kept and processed the next time.
self::$logs[] = $log;
$this->logcount += $log->count();
// Doubling sampling period if it reaches the limit.
if ($this->logcount >= $this->samplelimit) {
$this->on_reach_limit($manager);
$this->logcount = $this->logcount - $this->samplelimit;
}
if (self::$alreadyprofiling) {
debugging('tool_excimer: starting cron_processor::on_interval when previous call has not yet finished');
if ($isfinal) {
// The final flush call when profiler is destroyed.
if ($log->count() < $this->samplelimit) {
// This should never happen.
debugging('tool_excimer: alreadyprofiling is true during final on_interval.');
}
return;
}
self::$alreadyprofiling = true;
Copy link
Contributor

@bwalkerl bwalkerl Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic to prevent double up of processing may not be needed anymore now that we've switched from timers to flush callbacks (I'm not sure if this is paused during the callbacks). I think it's worth looking into further because it would be great if we could remove $self::logs entirely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage, the exact underlying scenario isn’t clear because the documentation doesn’t cover this case. However, this likely happens when maxSamples is set too low, causing the process to continue running while callbacks are triggered repeatedly. In this scenario, self::logs can be used for stored new coming samples.

Copy link
Contributor

@bwalkerl bwalkerl Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously there were edge cases of the 10s interval overlapping, see #377

With your changes these should be much rarer as in the worst case the number of possible samples in on_interval should be very low. I'm not even sure if the profiler would be taking samples here.

I'm interested in what happens when max samples is set too low. The previous issue this fixed shouldn't be a concern if it only happens with bad config, but is there a chance of endless loops if this is set to something ridiculous like 1?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be possible, because we don’t have any constraint on the maxSamples value. That’s why I introduced the $logs array as a waiting list. New logs from the callback are collected in this array until the current process completes. I’m not sure how often overlaps will occur with the new logic, but $logs won’t impact the existing flow. On the contrary, it helps mitigate overlapping if it does occur.

public function process($log, manager $manager, bool $isfinal = false) {
// We want to prevent overlapping of processing, so skip if an existing process is still executing.
// The profile logs will be kept and processed the next time.
self::$logs[] = $log;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments about overlapping processing in cron.


// Doubling sampling period if it reaches the limit.
$this->logcount += $log->count();
if ($this->partialsave && $this->logcount >= $this->samplelimit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want this applied to non-partial saves too? I believe it was previously applied there as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only get sampling once with non non-partial saves which means we won't set the callback for non-partial saves ( as previous) so we the sampling period doubling cannot be applied this case. However, the merge is still applied when we process the samples.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that clarifies things and looks correct.

In that case, what happens if partial save is enabled and this is called by the flush callback when the ExcimerProfiler object is destroyed? Restarting it there seems bad.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$this->logcount is used to manage increases to the samplingperiod.Even in the final flush callback, the only affected element is the samplingperiod. So, it should not impact any profile, because the samplerate is no longer derived from the samplingperiod.

Copy link
Contributor

@bwalkerl bwalkerl Dec 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm less concerned about the profile and more about the harm of restarting the profiler while it's being destroyed. It could be handled gracefully behind the scenes, but we should be able to detect this ourselves and explicitly handle the logic for not restarting it in this case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I separated the on_flush and profile process functions.

Copy link
Contributor

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple more comments from testing. The transformations of the graphs after merging also doesn't look right to me (tested with partial saves to see before and after), so will need to test that more.

Comment on lines 83 to 75
$manager->get_timer()->setCallback(function () use ($manager) {
$manager->get_profiler()->setFlushCallback(function ($log) use ($manager) {
// Once overlapping has happened once, we prevent all future partial saving.
if (!$this->hasoverlapped) {
$this->process($manager, false);
$this->process($log, $manager);
}
});
}, $this->maxsamples);
Copy link
Contributor

@bwalkerl bwalkerl Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change partial save never marks the profile as finished.

I'm also thinking that with this patch it might be OK to re-enable partial saves by default (and lighten the old warnings) as the number of samples won't rise quickly when the DB is having issues, which was the main reason we disabled it..

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a bug. Let me fix it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After the update partial save is no longer saving partial profiles every time there is a flush callback.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update it in the next commit

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's updating (or rather, will once you fix the typo ')'), but it will save the last profile twice. We want to avoid this redundancy - we shouldn't be adding extra DB updates when they're not needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Ben, can you explain more about the typo and saving the last profile twice issue?

$string['field_month'] = 'Month';
$string['field_name'] = 'Name';
$string['field_numsamples'] = 'Number of samples';
$string['field_numsamples_value'] = '{$a->samples} samples ({$a->events} events) @ ~{$a->samplerate}ms';
Copy link
Contributor

@bwalkerl bwalkerl Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to make this consistent with the hover display on the graph, which currently only uses 'samples' for events. We probably need to iron out the terminology since samples will have more meaning than events to most ends users.

I also think it would be better to keep the old display without events when the number of events is 0 (old profiles etc)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

number of events should not be 0. If there is no numevents it will use the numsamples instead.

'events' => number_format($data['numevents'] ?? $data['numsamples'], 0, $decsep, $thousandssep),

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Old profiles prior to the upgrade are showing as 0 in the database to me, but this isn't a big concern as old profiles will be flushed out eventually. That main issue here is making the terminology consistent with the graph.

@duyhuynhdev duyhuynhdev force-pushed the refine-the-doubling-logic-#418 branch from 70663e9 to 6137ab7 Compare December 22, 2025 10:43
@duyhuynhdev duyhuynhdev force-pushed the refine-the-doubling-logic-#418 branch from 6137ab7 to 52cfb71 Compare December 22, 2025 22:33
Copy link
Contributor

@bwalkerl bwalkerl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @duyhuynhdev

I've left a couple more comments, plus responses to some of your other comments that have actions.

$trace = $this->samples[$i + 1]['trace'];
}
$newsamples[] = [
'eventcount' => ceil(($this->samples[$i]['eventcount'] + $this->samples[$i + 1]['eventcount']) / 2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ceiling preferred here? Always rounding up will increase the margin of error. The number of samples won't be perfect unless we allow decimals, but maybe we can introduce some logic here to keep it more balanced.

I also still think we need more comments about what we're doing here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I've updated it to the round() function. I intended to use the round function but somehow used the ceil() function instead.

Copy link
Contributor

@bwalkerl bwalkerl Dec 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round makes more sense, but it's going to run into the same problems as it will always be .5.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is rounding to the nearest 0.5 a problem? For example, rounding 3.5 to 4.0 or 3.4 to 3.0 is acceptable to me, because the small difference should not materially affect the analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By itself it's not an issue, but since this will always be rounding up the number of samples is going to be slightly overestimated which can have flow on effects to duration estimates.

I'm not sure how much of a problem this is in practice - if there's only one or two with more samples it's not an issue, but if say a tenth of the samples have this then it could have a larger impact.

@duyhuynhdev duyhuynhdev force-pushed the refine-the-doubling-logic-#418 branch from d246be2 to ecee240 Compare December 29, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants