Using probability as a shovel, we’ll dig a little deeper into binary cross-entropy loss (you know, the thing that we optimize to train logistic regression models).