Mastodawn

@cxli Interestingly, this study (conducted in 2019) reports that the #ACMDL allows bulk download. I don't know if this feature is just hard to find or if it's been removed since then.

(Maybe @JonathanAldrich would know?)

Jonathan Aldrich Apr 14

@etosch @cxli I don't know the history but right now I think they are doing it as a defense against unauthorized LLM training and other things that act like DDOS. It can cause problems for certain kinds of academic use; given this, I'm honestly not sure it's worth the cost.

@JonathanAldrich @cxli I've had several research threads over the past 3-4 years that have more or less stalled out because while the DL seems like the best resource for them, it's just too labor intensive to manually search, click, download, refine the search, exclude papers already read, etc.

Jonathan Aldrich Apr 14

@etosch @cxli What is the alternative to manual search? You want to script the search? Would you use LLMs or other tools to read the downloaded papers and automatically decide if they are relevant to your systematic review?

@JonathanAldrich @cxli re:LLMs. I guess I might consider using an LLM to verify aspects of the review, but not for the primary research.

Here's an example task I recently tried to do: I wanted to catalogue the benchmarks used in ASPLOS 2026 papers. My query was very simple: just the papers from the proceedings that use the word "benchmark" somewhere. I wanted a table of the names of the suites, domain, units (or "entity types"), size, dates of introduction, and a few other things.

@JonathanAldrich @cxli The problem was that just programmatically extracting a list of the names of the papers and their DOIs doesn't seem possible without (I'm assuming) breaking the ACM DL's terms of service.

@JonathanAldrich @cxli Instead I get my paginated list of papers and I have to click through each of them, copy their names and DOIs, then open them, and then fill out my table.

Jonathan Aldrich

@etosch @cxli Yeah I can see why you'd want support for automation. I think it's totally reasonable, but I don't think the ACM has been thinking about these uses.

@JonathanAldrich @cxli Follow up question: do you happen to know if the ACM employs librarians or archivists?

Jonathan Aldrich Apr 15

@etosch @cxli Not sure about staff. I know we have some on the Publications Board. If you are interested I can try to make a connection.

@JonathanAldrich @cxli Yeah, actually that would be very cool, thank you!

Jonathan Aldrich Apr 15

@etosch @cxli Ok, reflecting on this, providing more service from the DL is in the purview of the ACM DL Board, not the Publications Board (which really does publications policy). Some members of the DL Board with specific library expertise include Stephen Downie (UIUC, Ph.D. in Lib/info sci), Michael Ley (creator of DBLP), and Phoebe Ayers (Librarian at MIT).

@JonathanAldrich @cxli For precedence/As an example: This 2006 paper (Empirical evaluation in Computer Science research published by ACM, Wainer et al., https://www.sciencedirect.com/science/article/pii/S0950584909000093) replicates Tichy et al.'s 1995 review of empiricism in CS. Wainer et al. randomly select 200 papers and annotate them (with some exclusions). 1/🧵

@JonathanAldrich @cxli The annotations/classifications they do could benefit form automation, but they don't require LLMs. I'm sure you could get very high accuracy from a small set of features using traditional ML. Hell, if the ACM really wanted to support researchers, they could provide classification/annotation as a service for these kinds of reviews.

@JonathanAldrich @cxli Ah so here is an ACM-published paper that includes a lit review: https://dl.acm.org/doi/pdf/10.1145/3406544

I would love it if the authors' annotations were available through the #ACMDL and linked to papers, supporting queries like, "get all of the empirical papers that don't involve human subjects."

@JonathanAldrich @cxli btw while I'm not sure such a tool is necessary or worth the cost, here is an example of a paper I'd found that uses an AI research assistant that has features that someone might want: https://link.springer.com/article/10.1007/s10115-024-02284-3

Factors influencing open science participation through research data sharing and reuse among researchers: a systematic literature review - Knowledge and Information Systems

This systematic literature review investigates the influential factors guiding researchers’ active engagement in open science through research data sharing and subsequent reuse, spanning various scientific disciplines. The review addresses key objectives and questions, including identifying distinct sample types, data collection methods, critical factors, and existing gaps within the body of literature concerning data sharing and reuse in open science. The methodology employed in the review was detailed, outlining a series of systematic steps. These steps encompass the systematic search and selection of relevant studies, rigorous data extraction and analysis, comprehensive evaluation of selected studies, and transparent reporting of the resulting findings. The review’s evaluation process was governed by well-defined inclusion and exclusion criteria, encompassing publication dates, language, study design, and research outcomes. Furthermore, it adheres to the PRISMA 2020 flow diagram, effectively illustrating the progression of records through the review stages, highlighting the number of records identified, screened, included, and excluded. The findings include a concise tabular representation summarizing data extracted from the 51 carefully selected studies incorporated within the review. The table provides essential details, including study citations, sample sizes, data collection methodologies, and key factors influencing open science data sharing and reuse. Additionally, common themes and categories among these influential factors are identified, shedding light on overarching trends in the field. In conclusion, this systematic literature review offers valuable insights into the multifaceted landscape of open science participation, emphasizing the critical role of research data sharing and reuse. It is a comprehensive resource for researchers and practitioners interested in further understanding the dynamics and factors shaping the open science ecosystem.

SpringerLink