I have scanned 128 books, for a total of 55,909 unique words. (still not a lot)
The goal is to differentiate the relative reading ease between books, looking ONLY at vocabulary. Sentence length and word count are comprehension metrics, not reading ease.
My first approach does not produce a unique 'fingerprint' for each book. It seems most books' word frequency mirrors the macro list containing all words.
Well, here's that big announcement: Go! Go! PogoGirl is hitting consoles on Feb 10th! Yup, that's only 10 days away. Get HYPED! https://www.ohsat.com/post/pogogirl-hits-consoles-soon/
Lol "Green Eggs And Ham" is getting the same score as "Persuasion" by Jane Austen.
It ignores sentence length and structure, but I still feel like there must be an oversight in the averaging for those two to be scored equally.
with an embarrassingly small sample size of 12 books, the top 100 most frequent English words are exactly the ones I would have guessed
these are used FAR above the next 100 words, so I am not expecting this list to change much with more books