Hi! I'm a student learning CS285 online. Thank you for your great and generous work!
When I'm doing homework1 and running the same code in two different machines, one Linux and one Windows, I got two different actor result (but the expert results are the same).
Aftrer looking into details, because of random seed, the data batches used to update parameters in every training iteration are exactly the same between two machines. Differences start to show up after running gradient descent even the first time.
So my question is just if the differences comes from different machine situations or there are some other reasons? What do you guys think?