The current mlua thread API seems to push the values into the main Lua stack first before using xmove to move the pushed values from main stack to the thread state.
Would it not be more efficient to modify IntoLua trait to accept another parameter of lua_State to directly copy the values to thread state directly which would vastly reduce the number of stack operations performed by mlua?