You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to propose incorporating an essential evaluation metric for 3D talking heads into the TorchMetrics library: Upper Face Dynamic Deviation (FDD).
Motivation
Current TorchMetrics offerings lack dedicated metrics for evaluating 3D talking heads, except for LVE. I think this metric also fits in multimodal folder of this library.
Pitch
This metric is widely used in speech-driven facial animation research, it measures the variation of facial dynamics for motion sequences in comparison with ground truth. It gives an indication of how close the standard deviation (or upper face motion variation) of generated sequences (of test-set audios) is compared to the variation observed in ground truth.