Skip to content

Automatic poll tick estimation (WIP)#232

Open
nelsonje wants to merge 1 commit intomasterfrom
nelsonje/scalingFix2
Open

Automatic poll tick estimation (WIP)#232
nelsonje wants to merge 1 commit intomasterfrom
nelsonje/scalingFix2

Conversation

@nelsonje
Copy link
Member

.My previous scaling fix didn't work well on Cray XC machines, apparently because the optimal poll tick value was much less than on our IB clusters. This is an attempt to set the poll tick value dynamically.

The idea behind this change is that the execution of a Grappa program should be a balance between polling the network / deaggregating messages, and doing work / aggregating new messages. In the old fixed poll_ticks way we specified only the rate at which to poll. As job sizes grow, the time it takes to poll once grows, and so a fixed poll rate leads to a smaller and smaller amount of time dedicated to non-polling work. This change tries to keep the balance between polling and non-polling work constant. A new flag is provided to control this balance.

It does a few things:

  • Reverts previous poll_ticks estimation mechanism
  • Makes sure polling cannot monopolize all scheduling decisions when the poll_ticks value is too small (leading to livelock) by ensuring that some non-polling thread is given a chance to run after the polling thread runs.
  • For each polling worker execution, measure how long it took to run. Then set the time of the next polling thread execution to FLAGS_poll_factor * that time. (Note that if we don't have that much work to do the idle thread will run and poll as well)

@bmyerz
Copy link
Member

bmyerz commented Oct 13, 2015

so it appears the two main changes are:

  • poll ticks start at end of polling worker
  • catchall case to make sure polling worker doesn't run twice

@nelsonje nelsonje changed the title Automatic poll tick estimation Automatic poll tick estimation (WIP) Oct 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants