You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Summary:
Pull Request resolved: #3428
This diff adds the basic building blocks for a zero overhead RecMetrics implementation. Follow up patches will contain integration with users of torchrec.
One of the main pain points of using RecMetricModule is that metric updates and computes are done synchronously. In training jobs, there has been cases where metric updates take +20% of a training iteration. Metric computations, although less frequent, can takes over a couple of seconds.
CPUOffloadedRecMetricModule aims to perform all metric updates/computes asynchronously, completely removing them from the critical path.
This patch adds:
- CPUOffloadedRecMetricModule: RecMetricModule that offloads metric update() and compute() to CPU using background threads and dual queues.
Reviewed By: iamzainhuda
Differential Revision: D83773529
fbshipit-source-id: bfe4517b51fd693f9a891d13075663ec123b6ff4
0 commit comments