I'm cautiously optimistic about transparency tools we can use to build bespoke LLM products for the humanities & social sciences.

These screenshots show what the https://www.langchain.com/langsmith tool exposes behind the scenes of a simple historical query to the https://github.com/assafelovic/gpt-researcher research tool.

gpt-researcher is running locally on my machine, using OpenAI and Tavily services, but uses open source code so this could be changed.

#digitalhumanities #digitalsocialscience #digitalhistory
#langchain

LangSmith

Get your LLM app from prototype to production.

It's also worth noting that we get more info from the command line and of course the code itself with langchain (and presumably other frameworks), which amounts to documentation of the tool's state at the time of the query.

It suggests there is some hope for the definition and implementation of transparency standards (research software engineering / technical and scholarly).

#digitalhumanities #digitalsocialscience #digitalhistory
#langchain
#rse

@jamessmithies This seems to rely heavily on langsmith, which langchain markets as their key (paid) offering for companies and developers; and Tavily relies on (paid) OpenAI API access, as you note. So how realistic is that this could be an open source toolset?
@[email protected] @[email protected] Personally, I'm also concerned that the neural networks inside some of these systems are effectively black boxes that will not be opened anytime soon, if ever. Having access to source code does not give you insight into what the neural networks in the guts of the code are actually doing computationally. Especially if they have an architecture that hasn't been fully shared and hundreds of billions of internal parameters trained on data no single person could ever comprehend.

This is a potential vector for all sorts of bad things:
- I fully expect that within a decade we'll be regularly hearing about cybersecurity incidents that are traceable back to opaque neural networks; we are already seeing the contours of how some of these might look
- An engineered artifact has not been properly engineered till it can be scoped and its failure modes understood, which is by definition not possible when there are black boxes inside the system
- I don't see how you can possibly write a scientific paper with results that can be replicated or at least understood if the system under test has an opaque black box in it. "To replicate these results, insert < < black box full of 1 trillion floating point numbers > > into your system and press 'OK'" doesn't feel like science to me.

@abucci @dingemansemark Agreed. It’s opacity all the way down. My perspective at the moment is that we need to start prototyping, primarily to develop our test / assessment / problem identification capability. Models & tools that enable that are useful. I’m also (I’m a historian, rather than a scientist) interested in where they fit in the research workflow: ideation / hypothesis dev (possibly), research / source analysis (possibly), editing (possibly), oracle (no).
@[email protected] @[email protected] Makes sense. Do you think it's feasible for fields like history to develop enough guardrails to productively use LLM-based systems? I have no idea frankly, and I'm coming at it as a computer scientist with all the biases that brings, but I'm very curious.
@abucci @dingemansemark I’m empirically minded enough to be unsure. How well will RAG over prejudiced 19th century source documents perform? Could an ‘expert’ history AI deal with that to professional standards? Will there be specialist Marxist history tools, post-colonial history tools or specialists in time periods? Will citation be adequate? It might all fall apart with some poking around. And yes, security: time for prototyping and testing, not product dev.
@abucci @jamessmithies I agree with those points (and have made some of them in print: https://pure.mpg.de/rest/items/item_3526897_1/component/file_3526898/content ). I think that too much research, by uncritically jumping onto the generative AI bandwagon, risks reproducing the status quo and/or buying into a gigantic regression to the mean. Prototyping can help to make risks and weak spots visible, but only if it's done out in the open & in reproducible ways
@[email protected] Thanks so much for your work on this Mark I think it's great. I've shared this one article half a dozen times since I first saw you post it. @[email protected]
@abucci (yes I should have added an 'as you know' — no implication intended that you didn't)
@[email protected] Oh I wasn't trying to be snarky I am genuinely appreciative. I think it's very important work.
@dingemansemark @abucci That’s an incredibly useful paper, thank you. There’s a mountain of work to do.
@dingemansemark Good point, I should have been clear about that in my post. langsmith might not be an option beyond prototyping for me. But the principle remains, I hope: a degree of transparency is possible and I’d stand by my sense that a degree of cautious optimism is warranted.