Can be done analagously to Enzyme.active_reg_inner. This means we will have type inference return a constant type here (and make traced_type and the rest much faster/infer). Moreover we can leverage a global cache for all type queries. It will be invalidated upon a new def of traced_type_inner.
This should speed up all calls to traced_type.
It does require traced_type to be computable as a function of type arguments.
@avik-pal do we need the sharded value for traced_type_inner or does the sharded type suffice?