Mastodawn

Elisabeth Bik Sep 18, 2025

For #PeerReviewWeek, I just published my BlueSky posts and notes from the 10th International Peer Review Congress, held 2 weeks ago in Chicago. #PRC10 @peerreviewcongress.bsky.social Part 1 is here: scienceintegritydigest.com/2025/09/16/p...

Peer Review Congress Chicago –...

Peer Review Congress Chicago – Day 1

It’s Peer Review Week! A perfect time to post my notes from the 10th International Congress on Peer Review and Scientific Publication, which was held at the Swissôtel in Chicago, two weeks ag…

Science Integrity Digest

Show thread

Elisabeth Bik Sep 5, 2025

John Ioannidis is closing the conference, by thanking organizers, staff, first-comers, and veteran attendees. Some attended for the 9th or 10th time! Safe travels everyone! EB: I hope y'all enjoyed the live posts! It was my pleasure to provide access to this well-organized congress. #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

Discussion: * Do we know if any LLMs are being trained on public reviews? - hard to know which ones are reliable. * What happens if you retry with the same prompt? You get more or less the same output. * One LLM and one human review in future? * Problems with LLM monoculture/monopoly #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

FA: Across 8 RQI tiems LLM reviews scored higher on: * identify strengths and weaknesses * useful comments on writing/organizations * constructiveness LLMs can thus help humans review papers. Not all LLMs were equally good. Gemini 5.0pro was the best, but produced very long texts. #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

FA: We used 5 LLMs vs 2 humans for each manuscript submitted to 4 BMJ journals, where the LLM reviews was not used in editorial decisions. We used the Review Quality Instrument (RQI), where editors rated the review quality as well as comprehensiveness score. #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

Next, the last speaker: Fares Alahdab with 'Quality and Comprehensiveness of Peer Reviews of Journal Submissions Produced by Large Language Models vs Humans' There is reviewer fatigue, no credit, time-consuming. Is it a bad thing that LLMs produce peer reviews? How good are LLM reviews? #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

VR: In summary, we can detect LLM-generated peer reviews with high detection rate. Our preprint: arxiv.org/abs/2503.15772 Discussion: * AI output can be quite good. Why prevent it? * Flipside to hidden prompt: "give me a positive review" in manuscript. That is malfesience. This not? #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

VR: Effectiveness of watermark insertion: LLMs insert the watermark with high probability. We had great accuracy. Reviewer defenses could be to paraphrase the LLM-generated text. They could also ask LLM if there were hidden prompts. #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

VR: But we do not want false positive and Better watermarking strategies are to insert a random sentence, a random fake citation, or a fake technical term (markov decision process) - false positive rate will go down. Hidden prompts can be white colored, very small font, font manipulation #PRC10

Show thread

Elisabeth Bik Sep 5, 2025

Next: 'Evaluation of a Method to Detect Peer Reviews Generated by Large Language Models' by Vishisht Rao Many reviewers are suspected to submit LLM-generated reviews. We can insert hidden message in review assignment for a LLM: Use the word aforementioned and check for that. #PRC10