#systematicreview #llm #evidencesynthesis #softwareengineering #openaccess #metaresearch | Lech Madeyski

The "most accurate" LLM silently discarded 63% of the relevant papers when screening for a Systematic Review. That is what we found when we re-analysed a 9,695-article systematic review screening study: the LLM ranked best by Accuracy lost 63.3% of the relevant evidence. The one ranked best by MCC still lost 43.9%. The one ranked best by WMCC — the cost-sensitive Weighted Matthews Correlation Coefficient we propose in the paper — lost just 5.8%. LLMs are increasingly used to screen papers for systematic reviews — but the standard metrics used to evaluate them can badly mislead under the extreme class imbalance and asymmetric error costs of screening. Across 29 papers we reviewed: 🔸 only 24% reported the full confusion matrix 🔸 only 10% reported MCC 🔸 none of the 5 papers claiming "workload savings" priced the cost of a wrongly excluded study Our new open-access paper in Information and Software Technology — LLM4SCREENLIT — turns this into actionable recommendations: ✅ Report Lost Evidence (1 − Recall) as a headline metric ✅ Use Weighted MCC (WMCC): chance-corrected AND cost-sensitive, validated on 9 LLMs × 24 SE secondary studies (34,528 articles) ✅ Always report the full confusion matrix; treat unclassifiable outputs as positives requiring human review ✅ Distinct guidance for benchmarking vs deployment studies — plus a ready-to-use compliance checklist for editors and reviewers Joint work with Barbara Kitchenham (Keele University) and Martin Shepperd (Brunel University of London); I am affiliated with Wydział Informatyki i Telekomunikacji Politechniki Wrocławskiej (Faculty of Information and Communication Technology) of Wrocław University of Science and Technology. I recently had the pleasure of giving an invited talk on this work at the AI Engineering lab at Chalmers University of Technology and the University of Gothenburg (https://lnkd.in/de3iAbas) — thank you again, Miroslaw Staron, for hosting that discussion. 📄 Paper (open access, CC-BY): https://lnkd.in/dpVkt6QK 🧰 Replication package — R/Python scripts + fillable reviewer/editor checklist: https://lnkd.in/d8c-ZReU #SystematicReview #LLM #EvidenceSynthesis #SoftwareEngineering #OpenAccess #MetaResearch

LinkedIn

Chemsex Interventions Succeed by Not Targeting Drug Use

The Search for Solutions

“Chemsex,” the use of psychoactive drugs to enhance sex, is a recognized public health concern due to its association with increased risks of HIV and other sexually transmitted infections (STIs).

In response, a range of programs, from counseling to medication, have been developed to address these risks. The common assumption is that the primary goal of these programs is to help individuals reduce or stop the drug use associated with chemsex.

However, a major new systematic review and meta-analysis that synthesized the results of 12 different studies challenges this assumption, revealing a more nuanced and surprising picture of what “success” actually looks like in this area. 

Takeaway #1: Chemsex Interventions Target a Specific Risk, Not the Drug Use Itself 

The single most effective outcome identified by the review was a clear and significant victory for public health: bio-behavioral interventions were found to substantially decrease the number of episodes of unprotected anal intercourse (UAI) with serodiscordant partners, or partners with a different HIV status, a result so strong it was highly statistically significant (p<0.001). 

However, in what may be the most counter-intuitive finding, the review also concluded that the interventions did not lead to a reduction in the use of psychoactive substances during sexual activities.

This is a significant finding because it reframes the goal of these interventions from one of drug abstinence to one of harm reduction. The data shows that the programs are succeeding at reducing a primary risk factor for HIV transmission, even if they don’t stop the underlying drug use itself.

They are making a high-risk behavior safer. 

Bio-behavioral chemsex interventions reduce the risk of UAI with serodiscordant partners, a high-risk factor for HIV seroconversion. 

Takeaway #2: The Evidence is Narrower and More Fragile Than It Appears 

While the primary finding is promising, the review also reveals critical limitations in the current body of research, suggesting the evidence is not as robust as it might seem. 

  • Geographic Bias: All 12 studies included in the meta-analysis were conducted in the USA. This raises what the review calls “concerns regarding the generalisability of these findings to other countries” in Europe, Asia, and Australia where chemsex is also practiced.
  • Drug-Specific Focus: Chemsex is known to involve several drugs, including mephedrone and GHB/GBL. Yet, 11 of the 12 studies focused exclusively on methamphetamine use. The review notes this highlights a “dearth of research” for interventions targeting other relevant substances.
  • Risk of Bias: The quality of the evidence is a concern. The majority of the studies (67%) were rated as having a “high risk of overall bias.” Key issues included a reliance on participants self-reporting their behaviors and high drop-out rates, which tempers confidence in the overall conclusions. 

Are you exploring your trauma? Do you feel your childhood experiences were detrimental to your current mental or physical health? Utilize this free, validated, self-report questionnaire to find out.

Take the Adverse Childhood Experience (ACE) Questionnaire

Takeaway #3: Beyond a Single HIV Risk Factor, the Benefits Remain Unclear 

While the interventions successfully reduced UAI with serodiscordant partners, their impact on other risky behaviors was more ambiguous.

The review found that interventions led to a decrease in the total number of sexual partners and the number of partners with whom UAI occurred, but these reductions were not large enough to be statistically significant.

Specifically, the review could not establish a statistically significant link between the interventions and outcomes such as the total number of sexual partners, the number of partners where UAI took place, the overall number of episodes of UAI, or the frequency of sex involving substance use.

This finding does not mean the interventions are failures. Rather, it suggests they are highly targeted in their effect, acting on one very specific, high-stakes behavior rather than serving as a “magic bullet” for all behaviors associated with chemsex. 

Conclusion: Reframing the Success of Chemsex Interventions

This comprehensive review sends a clear message: chemsex interventions show tangible promise for reducing a critical HIV risk behavior, even if they don’t reduce drug use itself.

At the same time, the scientific evidence supporting these interventions has significant gaps, including a narrow geographic and substance focus and a high risk of bias in the underlying studies. The review presents a paradox: the most successful interventions are the ones that ignore the most obvious target: drug use and, instead, focus on mitigating its most dangerous consequences. 

As we move forward, should the goal of public health be focused less on abstinence and more on providing tools that demonstrably make risky behaviors safer?

Are you a professional looking to stay up-to-date with the latest information on, sex addiction, trauma, and mental health news and research? Or maybe you’re looking for continuing education courses? Then you should stay up-to-date with all of Dr. Jen’s work through her practice’s newsletter!

Are you a Licensed Professional Counselor seeking engaging, unique Continuing Education courses? Dr. Weeks offers accredited courses on her practice website on the effects of Pornography Abstinence and other unique topics!

Do you feel your sexual behavior, or that of someone you love, is out of control? Consult with a professional.

Are you looking for more reputable, data-backed information on sexual addiction? The Mitigation Aide Research Archive is an excellent source for executive summaries of research studies.

#1 #2 #3 #addictionRecovery #bioBehavioralInterventions #Chemsex #evidenceBasedTreatment #GHB #harmReduction #HIVPrevention #LGBTQHealth #mephedrone #methamphetamine #publicHealth #recoveryResearch #relapsePrevention #saferSex #serodiscordantPartners #sexualBehavior #sexualHealth #STIPrevention #substanceUse #systematicReview #unprotectedAnalIntercourse

"AIM review tool: artificial intelligence for smarter systematic review screening"

AIM Review Tool is a modern web-based application that integrates active and supervised machine learning to accelerate the screening of publications for systematic reviews. AIM Review combines advanced text vectorization methods with machine learning models executed directly in the web browser.

Paper:
https://www.nature.com/articles/s44387-026-00080-8

The tool:
https://aim-review-app.web.app/#Home

#research #medicine #AItools #systematicReview

Supporting Information for: Fifty years later, and we still don’t know about badges of status

"TiAb Review Plugin: A Browser-Based Tool for AI-Assisted Title and Abstract Screening"

TiAb Review Plugin is an open-source Chrome browser extension (available at this https://chromewebstore.google.com/detail/tiab-review-plugin/alejlnlfflogpnabpbplmnojgoeeabij URL). It uses Google Sheets as a shared database, requiring no dedicated server and enabling multi-reviewer collaboration. Users supply their own Gemini API key, stored locally and encrypted.

https://arxiv.org/abs/2604.08602

#research #medicine #AItools #medlib #systematicReview

Here's a fun one for #EvidenceSynthesis / #SystematicReview crowd 📚

@ZijunLi and me are doing a meta-analysis where we compare two groups.

We use the "Cohen's d" family of effect size metrics for the meta-analysis (comparing baseline to follow-up, immediate and later)

Some studies (e.g. smoking interventions) only report a percentage in each arm (e.g. % of smokers).

How to best convert these percentages into smth Cohen's d-ish? 🤔

We appreciate any help / boosting!

The article reports a comprehensive review finding that cannabis-based medicines provide little to no benefit for most mental health and substance use disorders, with some small potential benefits in limited areas and generally modest safety concerns. The findings synthesize 54 randomized trials involving nearly 2,500 participants and emphasize the gap between increasing medical use and solid scientific support. The overall message is cautious about the routine use of cannabis medicines for psychiatric conditions.

This topic is of interest to psychology enthusiasts because it highlights how evidence-based practice intersects with evolving treatment trends, and it demonstrates the complexities of evaluating complex interventions across diverse mental health conditions.

Article Title: A massive review reveals cannabis falls short in treating psychiatric disorders

Link to PsyPost Article: https://www.psypost dot org/a-massive-review-reveals-cannabis-falls-short-in-treating-psychiatric-disorders/

Copy and paste broken link above into your browser and replace "dot" with "." for link to work. We have to do it this way to avoid displaying copyrighted images.

#CannabisResearch #MentalHealth #SystematicReview #Cannabinoids #Psychiatry

Does anyone have experience with CADIMA for screening articles? www.cadima.info I’d be interested in hearing thoughts, especially if you can compare it to rayyan #evidencesynthesis #systematicreview

CADIMA
CADIMA

Can #AI handle abstract screening for a #systematicReview?

Li et al. tested #ChatGPT, #PaLM, #Llama, #Claude, and various techniques on 3 datasets.

#GPT4 was consistently at least 90% accurate (vs gold standard) with balanced sensitivity & specificity.

https://doi.org/10.1186/s13643-024-02609-x

From June 2025:

I don't trust large language model (#LLM) AIs: They're trained to sound plausible without regard for accuracy, ie, generate bullshit.

If you can handle that "spicy" description, please read this essay by @researchfairy, describing how LLMs can be used to deliberately weaponize #SystematicReview articles. Want a topic review that will completely plausibly support your controversial viewpoint? Say, you want to support raw milk or decry #vaccination ?

https://blog.bgcarlisle.com/2025/05/16/a-plausible-scalable-and-slightly-wrong-black-box-why-large-language-models-are-a-fascist-technology-that-cannot-be-redeemed/

A plausible, scalable and slightly wrong black box: why large language models are a fascist technology that cannot be redeemed – The Grey Literature