LLM query:

I've found that chatGPT can parody some styles well (e.g. Lovecraftian horror, Dr Seuss) but others it has got nothing (e.g. Hunter S Thompson, Brecht, Riddley Walker). Why is this?

My suspicion is that it is NOT due to the amount of original text, but due to the amount of fanfiction, which is essentially human-augmented enlargement of the original training data

Reckons?

#chatGPT #LLMs #AI

@tomstafford Cool hypothesis! 😎 I don’t think the datasets have been made public. Could try it with authors that have a distinct style but no fan fiction. #Hemingway? #SalmanRushdie?
@henry yep, that'd be the way to test it. Find matched authors with similar amounts of original text and equally distinctive styles [no idea how you'd gauge that] but with / without fan fiction and challenge chatGPT to do parodies

@tomstafford Ayup, reckon you're on to something. Somewhere between #modelcollapse and #datapoisoning sits "you trained your AI on human behaviour by letting it read fanfic?!" 👀

https://www.economist.com/science-and-technology/2023/04/05/it-doesnt-take-much-to-make-machine-learning-algorithms-go-awry

It doesn’t take much to make machine-learning algorithms go awry

The rise of large-language models could make the problem worse

The Economist
@tomstafford Fascinating hypothesis! Hope you keep us in the loop if you learn anything!