what does smooth/soft probablity mean?

Question

I was recently reading the Knowledge Distillation paper, and encountered the term smooth probabilities. The term was used to denote when the logits were divided a temperature.

Neural networks typically produce class probabilities by using a softmax output layer that converts the logit, zi, computed for each class into a probability, qi, by comparing zi with the other logits where T is a temperature that is normally set to 1. Using a higher value for T produces a softer probability distribution over classes.

What does that mean intuitively?

It means that the distribution is made less 'spiky', or accentuated. It will be readily apparent if you take a random set of input logits and visualize it yourself. This distribution originates from statistical mechanics: https://en.wikipedia.org/wiki/Boltzmann_distribution — Emre, Dec 13 '17 at 17:56

score 7 · Answer 1 · answered Dec 14 '17 at 11:15

When $T$ gets larger, i.e. $$T \rightarrow \infty $$ the probability distribution resembles the uniform distribution, so $q_i = \frac{1}{n}$. You can find the proof here, as well as some other interesting properties.

Given the previously mentioned fact, I guess he is referring to smoothing in the sense that the noise if left out and what remains are the important patterns of the data, that is the distribution is made less 'spiky' as @Emre mentioned.

what does smooth/soft probablity mean?

1 Answers1

Linked