As the subject.  Will you use z_L, z_{L-1}, ..., z_1, all of them, to do the training or the inference? A more detailed question? How do you calculate -log(p(z)), does it equal to \sum_{i=1}^L (log(p(z_i)))? Thanks.