"One other idea to consider that could be quite powerful. Is it possible to extract facial expressions to drive an avatar? Look at the face mesh in the Mediapipe Google example.
Extract a fixed list of facial expressions: happy, sad, curious, surprise, etc. ---> map to predefined face animations. Here is a threejs example (starting point) https://threejs.org/examples/#webgl_animation_skinning_morph