Will you consider video input

It seems that for some cases like suitcase (like randomly pick one image from amazon), it cannot generate the prismatic part in the final urdf (which is actually not sim-ready). Will it help if this method can use human video input as done by previous sota?