Skip to content

Commit f44611b

Browse files
jeffkbkimmeta-codesync[bot]
authored andcommitted
3/N: CPUOffloadedRecMetricModule (#3428)
Summary: Pull Request resolved: #3428 This diff adds the basic building blocks for a zero overhead RecMetrics implementation. Follow up patches will contain integration with users of torchrec. One of the main pain points of using RecMetricModule is that metric updates and computes are done synchronously. In training jobs, there has been cases where metric updates take +20% of a training iteration. Metric computations, although less frequent, can takes over a couple of seconds. CPUOffloadedRecMetricModule aims to perform all metric updates/computes asynchronously, completely removing them from the critical path. This patch adds: - CPUOffloadedRecMetricModule: RecMetricModule that offloads metric update() and compute() to CPU using background threads and dual queues. Reviewed By: iamzainhuda Differential Revision: D83773529 fbshipit-source-id: bfe4517b51fd693f9a891d13075663ec123b6ff4
1 parent 1acdd13 commit f44611b

File tree

2 files changed

+1156
-0
lines changed

2 files changed

+1156
-0
lines changed

0 commit comments

Comments
 (0)