Mastodawn

@cxli Interestingly, this study (conducted in 2019) reports that the #ACMDL allows bulk download. I don't know if this feature is just hard to find or if it's been removed since then.

(Maybe @JonathanAldrich would know?)

@etosch @cxli I don't know the history but right now I think they are doing it as a defense against unauthorized LLM training and other things that act like DDOS. It can cause problems for certain kinds of academic use; given this, I'm honestly not sure it's worth the cost.

@JonathanAldrich @cxli I've had several research threads over the past 3-4 years that have more or less stalled out because while the DL seems like the best resource for them, it's just too labor intensive to manually search, click, download, refine the search, exclude papers already read, etc.

Emma Tosch, Thought Follower

@JonathanAldrich @cxli I'm curious what their threat model is for LLMs (aside DDOS) and how that relates to their costs and revenue. Like, what I really want is a database connection and _maybe_ some UI and querying features. I'd prefer to work locally, but I could also see value in working on an ACM-hosted private notebook (which could become public up on publication). I _do not_ want an "AI research assistant." I would accept certain constrained AI/ML tools, if I understood their affordances.

@JonathanAldrich @cxli What I'm wondering is whether people like me are even the target audience for ACM DL subscriptions. If yes, then surely others would be interested in these features! If no, I'd like to know what our alternatives are.

I'd love to hear any insights you have on this, @JonathanAldrich! I really appreciate having some insight into the mechanics of these orgs.

@etosch @cxli I can't speak definitively, but beyond DDOS (which seems implausible anyway) I think the threat model is simply that a lot of ACM members don't want to allow LLMs to train on their papers. And while ACM wouldn't necessarily mind allowing training on the remainder, ACM would like to get paid for it, enabling us to provide more services to members at lower cost.

@etosch @cxli And the people who don't want LLMs to train on their papers generally feel that way for ethical reasons. It's not an issue for me, but I respect those who feel that way and I think ACM wants to also respect those wishes. Of course, when your whole library is Open Access (and honestly, even if it is not), this is very hard to enforce. It may be a losing battle.

@JonathanAldrich @cxli whoops, you mentioned open access here; this is what happens when I reply to messages one at a time. 🙃

@JonathanAldrich @cxli Hm. I suspect a lot of the ACM members who don't want their work to be training data are also proponents of open access. I don't know if these options are as mutually exclusive as they appear.

I'm also not convinced that firms selling LLMs services would have a competitive advantage over what a usable ACMDL UI could provide, but maybe I'm alone here?

@etosch @cxli Yeah LLMs.txt is supposed to allow training limits to coexist with open source (and indeed robots.txt could also be used). But compliance is voluntary. ACM's rate-limiting tools are a backstop--and lawsuits could be another one--but it's hard to be sure how effective they are.