Skip to content

Conversation

@kilograham
Copy link
Contributor

Fix some parts of #1812

The problem with the rewrite of the alarm pool code is that adding an alarm forces an IRQ which cases a SEV. We attempted to fix this in best_effort_wfe_or_timeout by consuming the event then doing a WFE again. This is clearly wrong if the event was set prior to the call.

The best we can do (since we cant tell whether an WFE will block) is to only add the alarm if there isn't an IRQ (we only check this alarm pool alarm) already set to happen on or before the target time (which will cause a SEV). This means that the first call will to this method will always return, but when used in a loop, the second will do the right thing.

This is OK as the prescribed behavior is to use the method in a polling loop

@kilograham kilograham added this to the 2.0.1 milestone Aug 14, 2024
// already. This means that repeated calls to this function with the same timeout will work correctly
// after the first one! This is fine, because we ask callers to use a polling loop on another
// event variable when using this function
if (ta_wakes_up_on_or_before(alarm_pool_get_default()->timer, alarm_pool_get_default()->timer_alarm_num,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the other core (or a preemptive IRQ on the current core) modifies this default timer alarm in the middle of this function call? This call here doesn't look atomic, and the timer may be updated (cancelled, moved further into the future) before the __wfe() just below, which would make the __wfe() wait longer than the requested timeout?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh true; that's what i get for trying to write code with COVID!

Copy link
Contributor Author

@kilograham kilograham Aug 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although actually the circumstances where it could not get woken up are really just the other core canceling an alarm (or adding the first one further out in the future)... i think adding a SEV at the end of the alarm (IRQ) handler should solve that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's what i get for trying to write code with COVID!

Oh no! You should just rest.

although actually the circumstances where it could not get woken up are really just the other core canceling an alarm (or adding the first one further out in the future)... i think adding a SEV at the end of the alarm (IRQ) handler should solve that.

I think the alarm needs to guarantee that, if its IRQ gets enabled then:

  1. when the IRQ fires it will do a __sev()
  2. the alarm time will never be delayed or cancelled, it can only ever be brought forward (earlier) in time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually I think I misunderstood what you meant when I read it yesterday… when the irq handler sets a new time, it should never make it later than the current time - it should always slow the previous target to happen first

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when the irq handler sets a new time, it should never make it later than the current time

Yes.

Also, it should never cancel/disable an IRQ if it's already set up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sorry, this is in the context of the alarm for the alarm pool which is never disabled any more - it is doomed to fire every 2^32 us

@lurch
Copy link
Contributor

lurch commented Aug 15, 2024

This is targeting master - should it be targeting develop?

@kilograham kilograham changed the base branch from master to develop August 15, 2024 16:12
@kilograham kilograham self-assigned this Sep 3, 2024
peterharperuk
peterharperuk previously approved these changes Sep 27, 2024
Copy link
Contributor

@peterharperuk peterharperuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote some test code to reproduce the scenario I think this is trying to fix. Without this change I can see the problem happening, a sev gets swallowed by best_effort_wfe_or_timeout.
After these changes I see that best_effort_wfe_or_timeout will return early due to the event.

@kilograham
Copy link
Contributor Author

I should probably merge this into develop, and add another issue for additional tests

@peterharperuk
Copy link
Contributor

I can add a test

Copy link
Contributor

@peterharperuk peterharperuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy with this :)

@kilograham kilograham merged commit bd5523c into develop Sep 30, 2024
2 checks passed
@kilograham kilograham deleted the best_effort_wfe branch September 30, 2024 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants