Mastodawn

Bleg for help: How might I get an equivalent of Google Books ngram viewer to tell me how frequent given terms might be in an LLM training data set?

I'd be happiest with a tool like the ngram viewer, even if constrained to a single open weight model.

Show thread

Tarah Wheeler Mar 8

@adamshostack dude, you know I do that, right?

Show thread

Adam Shostack

@Tarah Yes, but you're busy finishing $thing, and so I put it on wide scan.

(Not sure how public that thing is right now.)

Show thread

Tarah Wheeler Mar 9

@adamshostack not very $public but I need additional cases for the discussion anyway. Let’s see what you have, and I’ll tell you if the method will work.