
AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably - Scientific Reports
As AI-generated text continues to evolve, distinguishing it from human-authored content has become increasingly difficult. This study examined whether non-expert readers could reliably differentiate between AI-generated poems and those written by well-known human poets. We conducted two experiments with non-expert poetry readers and found that participants performed below chance levels in identifying AI-generated poems (46.6% accuracy, χ2(1, N = 16,340) = 75.13, p < 0.0001). Notably, participants were more likely to judge AI-generated poems as human-authored than actual human-authored poems (χ2(2, N = 16,340) = 247.04, p < 0.0001). We found that AI-generated poems were rated more favorably in qualities such as rhythm and beauty, and that this contributed to their mistaken identification as human-authored. Our findings suggest that participants employed shared yet flawed heuristics to differentiate AI from human poetry: the simplicity of AI-generated poems may be easier for non-experts to understand, leading them to prefer AI-generated poetry and misinterpret the complexity of human poems as incoherence generated by AI.
Natureit falls prey to every fallacy of AI creativity research (and AI research in general), e.g., that "AI" is a monolithic technology, that "AI" is independent of human intention, that "AI"'s telos is to produce artifacts "indistinguishable" from "humans," that the ability to "replicate" certain genres of art (especially genres positioned as highly "creative," like poetry) are benchmarks along that telos, etc.
the paper really should be called "People who don't give a shit one way or another react ambivalently to output of billion-dollar machine designed by hucksters to trick people into thinking its outputs are plausible exemplars of textual artifacts in a specified genre" (the study participants were crowd-sourced online and paid less than a living wage)
even setting aside the ways in which the researchers don't bother to question pre-existing distinctions between "poetry experts" and "non-experts" (not to mention "poetry" and "non-poetry"), it's remarkable how they ignore context as a factor. "guessing the conditions of a textual artifact's production when it is stripped of context" is a *particular kind* of reading, and brings along its own frames and assumptions...
likewise: evaluating a text following arbitrary criteria using a Likert scale is a *particular kind* of reading. as is expressing a binary preference between two texts. these are all very unusual frames for textual interpretation (and especially the interpretation of poems!). someone's reading practices in these situations isn't necessarily indicative of their practices in other interpretive contexts!
I do think there's an insight in their discussion, i.e., the reading practices that people bring to text right now are in flux *specifically because* of the proliferation of LLM-generated text. so what the study is really witnessing is not evidence that "AI-generated" and "human-generated" poems are "indistinguishable," but evidence of the ways in which people approach text (and poems in particular) when the concept of authorship is politicized in the particular way that it is politicized now
ANYWAY, I don't care how many survey responses you get, the "AI-generated" poems in the study are *definitely not* "indistinguishable" from the "human" poems. almost all of the "AI" poems have identical structure (AABB rhyming stanzas) and similar topics; they resemble the "real" poems of a particular poet only in that they occasionally incorporate lexical items that are vaguely related to that poet's work. for example, this is supposed to be a Ginsberg poem. (end of thread)