Mastodawn

𝗔𝘂𝘁𝗼𝗺𝗮𝘁𝗶𝗰 𝗿𝗲𝘃𝗶𝗲𝘄𝗲𝗿𝘀 𝗰𝗮𝗻 𝗺𝗶𝘀𝘀 𝗳𝘂𝗻𝗱𝗮𝗺𝗲𝗻𝘁𝗮𝗹 𝗿𝗲𝗮𝘀𝗼𝗻𝗶𝗻𝗴 𝗲𝗿𝗿𝗼𝗿𝘀.

👀 LLM-generated reviews may look convincing — but how reliable are they in practice?

In our recent TACL paper, we introduce a 𝗰𝗼𝗻𝘁𝗿𝗼𝗹𝗹𝗲𝗱 𝗰𝗼𝘂𝗻𝘁𝗲𝗿𝗳𝗮𝗰𝘁𝘂𝗮𝗹 𝗲𝘃𝗮𝗹𝘂𝗮𝘁𝗶𝗼𝗻 𝗳𝗿𝗮𝗺𝗲𝘄𝗼𝗿𝗸 to systematically test automatic reviewers.

𝗪𝗵𝗮𝘁 𝘄𝗲 𝗳𝗶𝗻𝗱:
📊 They rely heavily on surface-level signals
⚠️ They often miss mismatches between claims and actual results

Show thread

UKP Lab 20h ago

𝗪𝗵𝘆 𝗶𝘁 𝗺𝗮𝘁𝘁𝗲𝗿𝘀:
As LLMs are increasingly integrated into peer review workflows at major AI conferences, these limitations directly affect research quality and evaluation fairness.

𝗪𝗵𝗮𝘁 𝗵𝗲𝗹𝗽𝘀:
✅ Human–LLM collaboration shows the strongest potential
✅ Repeated evaluation of review-specific skills is essential
✅ Controlled benchmarks are needed to assess reasoning, not just fluency

🔗 Project: https://ukplab.github.io/tacl2026-counter-review-logic
📄 Paper: https://arxiv.org/abs/2508.21422
👨‍💻 Code: https://github.com/UKPLab/arxiv2025-counter-review-logic

SOCIAL MEDIA TITLE TAG

SOCIAL MEDIA DESCRIPTION TAG TAG

Show thread

UKP Lab 20h ago

Work by Nils Dycke & Iryna Gurevych (Ubiquitous Knowledge Processing (UKP) Lab, Technische Universität Darmstadt and National Research Center for Applied Cybersecurity ATHENE)

See you at #EACL2026 in Rabat 🕌!

#UKPLab #LLMs #PeerReview #AIforScience #TrustworthyAI #NLP #Evaluation