Sigmoid Cross Entropy Mean Sqared Error

Sigmoid in cross-entropy and mean-squared-error

Sigmoid in cross-entropy and mean-squared-error

Machine-Learning


I was asked in an interview that what will happen if we use squared loss(after sigmoid) instead of cross entropy in a binary classcification problem?

First of all, if we regard binary classification problem as a regression problem, we will fail due to unranged values. But what will happen if we replace cross entropy(CE) loss with squared loss? Let's see.

Take one example as an example. We have


We compute derivatives of these function and sigmoid function, we get



Then use SGD we can update parameters


You can see when using squared loss, when or , that means the gradient is also zero, so it will fail to update parameter even if . But CE doesn't have that kind of the saturate problem.

Finally, it reminds me of something said in DL-book by Bengio, 'You must have some log form loss to cancel the exponential part when your output is sigmoid'

GBDT

I was asked about the intuition and theory behind GBDT in an interview.

Here is the explanation of Boosted tree and GBDT

Boosted tree

Imaging you want to train a regressor, the objective function is MSE

Reblogging

好久没写过博客了,看《暗时间》感觉重新开始写东西是一件很有用的事情,于是就开始吧,顺便练练英文。