Question from character level RNN classifier, why not use the hidden state across epochs?

In the RNN classification [example](https://github.com/spro/practical-pytorch/blob/master/char-rnn-classification/char-rnn-classification.ipynb), using characters of names to predict the names language, the train function re-zeros the hidden state (and gradient) every epoch. I was wondering why this is done, instead of carrying over the final hidden states of the epoch before?