Mastodawn

paulette d. koronkevich 5d ago

7/10, avg time 31s, total 5m21s, but also I did make the mistake of seeing a citation not generated correctly and immediately assuming that that meant it was AI, another I felt should have cited bowman but did not so thought it might be AI but alas

rntz 5d ago

@koronkebitch yeesh, I did badly on this. 6/10 avg time 1m22s.

John Regehr 5d ago

@rntz @koronkebitch 5/10 😭

burakemir 5d ago

@regehr @rntz @koronkebitch 5/10 😓 time pressure does not help...

The Janx Devil 4d ago

@burakemir @regehr @rntz @koronkebitch 7/10 avg 51s tot 8m38s (and I would have gotten 8/10 if I had not mistakenly rejected the first real paper because its authors were anonymized)

Still. Bad human. No cookie.

bhakti 5d ago

@koronkebitch 7/10 26s I got baited

Steven Schaefer 5d ago

@koronkebitch somehow I got 10/10 in 1m38s

I think this says more about my propensity for slop than it does about PL knowledge

welp, back to my fraught OOPSLA allnighter

@stschaef good luck! excited to read whatever you come up with, no matter the state <3

markusde 5d ago

@koronkebitch 10/10 2s average and nobody knows how I did it

John Nesky 5d ago

@koronkebitch FYI, on my android device, the browser displays the pdf as a file name and a download button, and the file name gives away the answer.

Jon Sterling 5d ago

@koronkebitch this is so scary

@jonmsterling yup we were having an existential crisis in the lab yesterday (will surely continue today)

J Carr 5d ago

@koronkebitch I did poorly with avg time 43s, but that was partially some faulty assumptions.
Definitely human papers that surprised me with how shallow they were, or with weird page counts. Only obvious things for AI were silly explanations and ideas which obviously don't work.

joomy 5d ago

@koronkebitch 9 / 10 correct, 15s avg time, 2m 38s total time. not too bad.

I very quickly skimmed the papers and looked for any sign of life or humor. or if they mention a language in the abstract, I checked if the paper was consistent with that. my mistake was a real paper that I thought was AI. tough test though.

@joomy I wonder if we are all failing on the same one 😬

Nora Dimitrijević 5d ago

@koronkebitch a heuristic that has worked well for me is whether the paper has a reference section at the end. Though I guess it’s only a matter of time till that heuristic stops working.

@d10c yup...

@d10c recently reviewed a paper with 100% hallucinated refs

Guillaume Munch-Maccagnoni 4d ago

@koronkebitch No thanks, 4/10, 1m14s avg. Help welcome.

Joseph C. Osborn 4d ago

@koronkebitch 9/10 in 7 minutes (I had one paper I falsely accused of being ai); a key indicator seemed to be that slop papers claim eight or nine main contributions. I’m no PL person but I am a CS academic and I feel like it’s hard to write one paper that honestly does three or more things.

@JoeOsborn TRUE

Omar Antolín 4d ago

@koronkebitch @hallasurvivor This was super fun! I got 7/10 and made errors both ways. PL is not at all my field, I wonder if I'd do better or worse for algebraic topology papers (I'd hope better, but who knows).

david

4d ago

@koronkebitch I've read almost no PL papers before but I got 8/10 with an avg time of 1m 37s

I was looking for obvious tells like broken LaTeX and over-explanation and overly commented code snippets but that made me miss a few because I misidentified notation I didn't understand as broken math

Maciej Barć 4d ago

@koronkebitch

so called educated guess

@koronkebitch 10/10 by downloading the papers and asking Opus 4.6 to decide (GPT-5.4 got 9/10) 🤪

@koronkebitch Sadly, this means that one could make the slop papers even more convincing by letting Claude recursively improve and judge its own output. But I don’t believe this process would converge to a convincing paper:

@koronkebitch Both models missed obvious signs. The slop papers have very carelessly put-together layout with plenty of whitespace, equations running into the margin and no beautiful diagrams or figures, but neither model noticed this. This makes it possible to identify AI-generated papers purely visually. Additionally, neither model commented on mistakes in the AI-generated calculi, lemmas or proofs, even though such mistakes likely exist and could be found by a human reviewer.