With GPT Tokenizers (like BPE used by OpenAI) does the number of tokens used to represent a word correlate with the frequency of that word in the training data?
Is it a way to reverse-engineer frequency in the hidden training data?
Is it a way to reverse-engineer frequency in the hidden training data?