This question is for folks who have done some kind of computing research.

Did you ever get formal training in how to do a literature review? What about informal training?

Some options, in case that lowers the barrier to entering the conversation:

Learned in a formal course
7.3%
Learned from peers
27.3%
Learned from advisor
34.5%
Other
30.9%
Poll ended at .
As a follow-up question: what platform do you do use for search?
Google Scholar
76.9%
JSTOR/EBSCOhost/via library
0%
ArXiv
7.7%
Other
15.4%
Poll ended at .
@etosch
for computers: acm
for everything else: library search
in general: following citation chains from any work that i'm starting from
@cxli I originally had ACM in here but could only have four options, so I replaced it with other!

@cxli For context: the #acmdl frictions make systematic reviews painful. It feels borderline unusable as a research tool and is incomplete.

#googlescholar is more complete, but the accuracy of the metadata drops off. I've found that historic searches (e.g., <1950) are mostly incorrectly dated.

I was curious whether this is corroborated by research and came across: https://pmc.ncbi.nlm.nih.gov/articles/PMC7079055/
...

Checking your browser - reCAPTCHA

@cxli Interestingly, this study (conducted in 2019) reports that the #ACMDL allows bulk download. I don't know if this feature is just hard to find or if it's been removed since then.

(Maybe @JonathanAldrich would know?)

@etosch @cxli I don't know the history but right now I think they are doing it as a defense against unauthorized LLM training and other things that act like DDOS. It can cause problems for certain kinds of academic use; given this, I'm honestly not sure it's worth the cost.

@JonathanAldrich @etosch @cxli possibly unpopular take: if LLMs should be trained on anything, it should be scientific papers, so if this is ACM's reasoning for not supporting automated workflows, it's doubly harmful

(yes, I know: they want to get paid for it)

@ricci @JonathanAldrich @cxli Counterpoint: what is the purpose of LLMs?

I think I get what's implied --- scientific papers meet a quality metric for training data. However, if your goal is to use LLMs for customer support, they are absolutely the wrong training data!

@ricci @JonathanAldrich @cxli That said, I assume you mean that if LLMs are to be an expert system, they should be trained on appropriate domain expertise. I still think conference papers are the wrong training data. There is an enormous amount of implicit knowledge and norms in academic conference papers that isn't explicitly encoded anywhere. If we want a tool that's functionally closer to LLMs, IMO we need a formal target and more traditional (and constrained!) ML to achieve that.

@etosch @JonathanAldrich @cxli oops I was going to follow up on this and forgot. Yeah part of my thinking was that they should generally contain on average information that's more likely to be correct than random Internet text. But also I was thinking about availability of text: there's copyright and economic questions around things like published books, but most academics are *happy* to get their papers out there as widely as possible. They're written with the explicit purpose of getting information out there and we're not expecting to get paid for them so some of the thorny issues around other sources of text are not present.

But yeah let's not use them for training customer service LLMs