Why does the output have one more channel than the num of landmarks?

In your code of the model you have

> self.add_module('l' + str(hg_module), nn.Conv2d(256, num_landmarks+1, kernel_size=1, stride=1, padding=0))

and in evaler.py the last channel isn't used

> pred_heatmap = outputs[-1][:, :-1, :, :][i].detach().cpu()

So I guess it's used somewhere in the loss?