Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?
I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.
Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?
I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.
@Tarah Yes, but you're busy finishing $thing, and so I put it on wide scan.
(Not sure how public that thing is right now.)