Every time an LLM picks the next word, it is sampling from a probability distribution. Every time a classifier says "90% cat, 10% dog," it is outputting probabilities. Understanding probability is ...