I was recently reading the Knowledge Distillation paper, and encountered the term smooth probabilities. The term was used to denote when the logits were divided a temperature.

Neural networks typically produce class probabilities by using a
softmaxoutput layer that converts the logit,zi, computed for each class into a probability,qi, by comparingziwith the other logits whereTis a temperature that is normally set to 1. Using a higher value forTproduces a softer probability distribution over classes.
What does that mean intuitively?