Skip to content

TiPI fixes#29

Open
scheunemann wants to merge 12 commits intogeorgmartius:masterfrom
scheunemann:TiPI-fixes
Open

TiPI fixes#29
scheunemann wants to merge 12 commits intogeorgmartius:masterfrom
scheunemann:TiPI-fixes

Conversation

@scheunemann
Copy link
Contributor

@scheunemann scheunemann commented Nov 8, 2018

I think the update order of the a_buffer is off. For testing, I added ode_robots/simulations/humanoid-TiPI as from here and plugged in the pimax from selforg/controller.

Always trigger learn() (minor)

I think it is beneficial that step() always enters learn() for ensuring that buffers like, e.g.,L_buffer get filled properly. Otherwise the data contained is not correct when swapping between eps[C|A] == 0 and eps[C|A] > 0 during runtime (I am actually doing this in my experiments). Done here.

Filling a_buffer

I think after reading an adding sensor values to the buffer s_buffer[t], learning should be triggered (learn()) and only then $a_t$ should be stored in the respective buffer. I added an assert here which fails in the original version.

Considering the original equation before learn(), e.g.,Matrix a = (C*(s_smooth) + h).map(g); with time indices, the value computed there was a_t = tanh(C_{t-1} * s_t + h_{t-1}) putting it after learn() before increasing $t$ makes it $a_t = tanh(C_t * s_t + h_t)$ and $a_{t-1} = tanh(C_{t-1} * s_{t-1} + h_{t-1})$ for the next step. The assert won't fail.

Caclulating the sum (A20/28)

For $\tau>2$ I think there are two issues related to the the sum (A20) or (28) in the special one-layer NN case. Firstly, the sum is meant to calculate $\Delta C$ but with adding $C_{t-1}$ what gets calculated is $C_t$. This $C_t$ then gets passed on to the calculation for, e.g., (t-2). I changed that in (694ab54) and left a comment.

$L(t-l)$ is calculated with $\frac{\partial \phi(s_{t-l}, a_{t-1})}{\partial s_{t-l}}$ and thus dependend on $C_{t-l}$. That is also how $\delta u$, which is dependent on $L^{(l-1)}(t-l)$, is calculated.
I adapted the sum so $\gamma$ being dependend on $C_{t-l}$.

it's a bit tough to express everything in markdown with using these TeX-links. If you are interested I am happy to provide a PDF or have a chat over that. Also, I can make a PR with only the controller-changes if you prefer so.

scheunemann and others added 12 commits August 15, 2018 15:27
get updates from original master
…n fact $a_{t-1} = K(s_t)$ and thus doesn't equal $a_{t-1} = C_{t-1} * s_{t-1} + h_{t-1}$
…then will be wrong for learning, hence always enter `learn()`
… iteration seems to compute the weight update with adding $C_{t-1}$ to the change within the sum. Results are the same as before for $\tau=2$.
…arameters C at time (t-l). $\partial \psi(s_{t-l}) / \partial s_{t-l}$ is depended on $a_{t-l}$ and therefore on $C_{t-l}$.

Again, nothing changes for $\tau=2$.

const Matrix& metric = useMetric ? gs.map(one_over).map(sqr) : gs.mapP(1, constant);

C += ((( dmu * (ds[l]^T) - (epsrel & al) * (sl^T)) & metric) * epsCN
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line actually calculates the weights at time t rather then the delta. These total weights at time t then get passed for calculating the term for the next step, e.g, t-2.

@scheunemann scheunemann changed the title Minor TiPI fixes TiPI fixes Nov 9, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants