PyTorch BCE Loss and BCE with logits
A little finding in today’s work:
what is the difference between sigmoid layer+ BCE loss and BCE with with logits loss?
The answer is numerical stability.
There are several situations we need to consider in a softmax operation.
- Underflow: when a number is rounded to 0, some operations may raise an error. E.g, divide by 0.
- Overflow: a number is so big and is considered to be $\inf$.
Let’s check the softmax example.
$softmax(x)_i=\frac {exp(x_i)} {\sum(exp(x_j))}$
If all every $x_i = small \ constant$
$LSE(x_1, x_2,…,x_n)=log(exp(x_1)+exp(x_2)+…+exp(x_n))$