Hi @unsky
Please, can you check, did I read your Focal Loss formulas correctly?
For CE, delta is:
if (i == j) then delta = 1-p
if (i != j) then delta = -p
For Focal Loss (when gamm=2), delta is:
if (i == j) then delta = (1-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
if (i != j) then delta = (-p)* alpha * (1 - pt) * (2 * pt * log(pt) + pt - 1)
Where are:
pt = softmax(i) - is a probability of the correct class id.
p = softmax(j)
where is i = label truth class id.