It seems that for some cases like suitcase (like randomly pick one image from amazon), it cannot generate the prismatic part in the final urdf (which is actually not sim-ready). Will it help if this method can use human video input as done by previous sota?