@johnpettigrew @elset @infobeautiful it's a parameter on the trail end of the model. After it decides on a distribution for the probability of the next output, it still has to choose which word/token/number to actually spit out. Temperature squishes the distribution so it's less likely to simply pick the most likely, giving it some extra randomness.