This question is for folks who have done some kind of computing research.
Did you ever get formal training in how to do a literature review? What about informal training?
Some options, in case that lowers the barrier to entering the conversation:
This question is for folks who have done some kind of computing research.
Did you ever get formal training in how to do a literature review? What about informal training?
Some options, in case that lowers the barrier to entering the conversation:
@cxli For context: the #acmdl frictions make systematic reviews painful. It feels borderline unusable as a research tool and is incomplete.
#googlescholar is more complete, but the accuracy of the metadata drops off. I've found that historic searches (e.g., <1950) are mostly incorrectly dated.
I was curious whether this is corroborated by research and came across: https://pmc.ncbi.nlm.nih.gov/articles/PMC7079055/
...
@cxli Interestingly, this study (conducted in 2019) reports that the #ACMDL allows bulk download. I don't know if this feature is just hard to find or if it's been removed since then.
(Maybe @JonathanAldrich would know?)
@JonathanAldrich @cxli What I'm wondering is whether people like me are even the target audience for ACM DL subscriptions. If yes, then surely others would be interested in these features! If no, I'd like to know what our alternatives are.
I'd love to hear any insights you have on this, @JonathanAldrich! I really appreciate having some insight into the mechanics of these orgs.
@JonathanAldrich @cxli Hm. I suspect a lot of the ACM members who don't want their work to be training data are also proponents of open access. I don't know if these options are as mutually exclusive as they appear.
I'm also not convinced that firms selling LLMs services would have a competitive advantage over what a usable ACMDL UI could provide, but maybe I'm alone here?
@JonathanAldrich @cxli re:LLMs. I guess I might consider using an LLM to verify aspects of the review, but not for the primary research.
Here's an example task I recently tried to do: I wanted to catalogue the benchmarks used in ASPLOS 2026 papers. My query was very simple: just the papers from the proceedings that use the word "benchmark" somewhere. I wanted a table of the names of the suites, domain, units (or "entity types"), size, dates of introduction, and a few other things.
@JonathanAldrich @cxli Ah so here is an ACM-published paper that includes a lit review: https://dl.acm.org/doi/pdf/10.1145/3406544
I would love it if the authors' annotations were available through the #ACMDL and linked to papers, supporting queries like, "get all of the empirical papers that don't involve human subjects."

This systematic literature review investigates the influential factors guiding researchersâ active engagement in open science through research data sharing and subsequent reuse, spanning various scientific disciplines. The review addresses key objectives and questions, including identifying distinct sample types, data collection methods, critical factors, and existing gaps within the body of literature concerning data sharing and reuse in open science. The methodology employed in the review was detailed, outlining a series of systematic steps. These steps encompass the systematic search and selection of relevant studies, rigorous data extraction and analysis, comprehensive evaluation of selected studies, and transparent reporting of the resulting findings. The reviewâs evaluation process was governed by well-defined inclusion and exclusion criteria, encompassing publication dates, language, study design, and research outcomes. Furthermore, it adheres to the PRISMA 2020 flow diagram, effectively illustrating the progression of records through the review stages, highlighting the number of records identified, screened, included, and excluded. The findings include a concise tabular representation summarizing data extracted from the 51 carefully selected studies incorporated within the review. The table provides essential details, including study citations, sample sizes, data collection methodologies, and key factors influencing open science data sharing and reuse. Additionally, common themes and categories among these influential factors are identified, shedding light on overarching trends in the field. In conclusion, this systematic literature review offers valuable insights into the multifaceted landscape of open science participation, emphasizing the critical role of research data sharing and reuse. It is a comprehensive resource for researchers and practitioners interested in further understanding the dynamics and factors shaping the open science ecosystem.
@JonathanAldrich @etosch @cxli possibly unpopular take: if LLMs should be trained on anything, it should be scientific papers, so if this is ACM's reasoning for not supporting automated workflows, it's doubly harmful
(yes, I know: they want to get paid for it)
@ricci @JonathanAldrich @cxli Counterpoint: what is the purpose of LLMs?
I think I get what's implied --- scientific papers meet a quality metric for training data. However, if your goal is to use LLMs for customer support, they are absolutely the wrong training data!
@etosch @JonathanAldrich @cxli oops I was going to follow up on this and forgot. Yeah part of my thinking was that they should generally contain on average information that's more likely to be correct than random Internet text. But also I was thinking about availability of text: there's copyright and economic questions around things like published books, but most academics are *happy* to get their papers out there as widely as possible. They're written with the explicit purpose of getting information out there and we're not expecting to get paid for them so some of the thorny issues around other sources of text are not present.
But yeah let's not use them for training customer service LLMs
@cxli FWIW the tl;dr version of this article is in the Discussion section:
> Overall, we found that only 14 of the 28 academic search systems examined are wellâsuited to evidence synthesis in the form of systematic reviews...[and...can be used as principal search systems: ACM Digital Library, BASE, ClinicalTrials.gov, Cochrane Library, EbscoHost ..., OVID ..., ProQuest ..., PubMed, ScienceDirect, Scopus, TRID, Virtual Health Library, Web of Science ..., and Wiley Online Library.