I know quote tweets aren't a thing here but I feel strongly about this, so here goes!

It makes me sad to see the below take pop up so often. Because they're trained directly on data and adopt the perspective of those data, the epistemology of ML/AI/LLM is, IMO, perfectly aligned with the situated knowledges perspective. Feminists, this is actually *our* moment to shine! We can do so much with these methods! We can absolutely use these methods in a way that aligns with a #feminist epistemology.

@LauraNelson I don't think LLMs are able to do this, though, because they require so much data to train from scratch? Maybe there's a way to fine-tune towards the situated knowledges perspective, though.

The big question is if there's a way that pre-training also has embedded biases, which does seem to be the case.

@alex Of course the pre-training has embedded biases, because it's trained on social data and society has biases. But we can leverage that fact, if we acknowledge that it's part of the LLM package. One problem is those working on these models often come at it from the view-from-nowhere perspective. What if we started from the situated knowledge perspective? If we did that we would be approaching LLM very differently, with, in my view, powerful potential.
@LauraNelson I think that's fair. I'm curious how this would work in practice, though?
@alex Yeah I think that's the exciting part! How would that look in practice? I have lots of ideas. First is to be more deliberate about what it's trained on and carefully define/describe exactly what perspective each LLM captures. And also be more specialized. Not one LLM to rule them all, but more targeted LLMs that are, again, more deliberate and calibrated. We can still go big, but also be more precise.
@LauraNelson @TedUnderwood Yeah, I mean this is something @emilymbender has been explicit about -- we just don't know enough about what this stuff has been trained on to ascribe them to have a particular perspective, so we basically need to assume they have this hegemonic view-from-nowhere perspective

@LauraNelson @TedUnderwood @emilymbender If we just try to point to the Common Crawl or The Pile, the only way people have characterized this is through the lens of bias. While that gives us some information, I don't think it's enough to call it a "viewpoint."

Also, cf. part I of our paper https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4217148

@alex @LauraNelson @TedUnderwood My take on the OP isn't that LLMs can legitimately be said to have a "view from nowhere", but rather that the people who think they have knowledge at all* are the types who think that "view from nowhere" is possible and that surely massive scale would deliver it.

*as opposed to information about word form distribution in some specific dataset, which is what they have.

@emilymbender @LauraNelson @TedUnderwood Wait is the OP the poster in the image or Laura. I dunno the toot etiquette here
@alex @LauraNelson @TedUnderwood Sorry to be cryptic! I was using OP to refer to the toot in the image.
@alex @LauraNelson @emilymbender I think it will be interesting, empirically, to see what kinds of provenance labels we need to provide, at what stage of training, to produce a model that can (say) identify the implicit perspective in a passage of prose. When I say “identify” I’m speaking purely about its behavior; I don’t want to get into the morass of whether the model actually knows things or is just predicting words.
@emilymbender @alex @TedUnderwood Yeah I agree. So for me, the reaction shouldn't be LLM are useless/dangerous/reinforce the view from nowhere. The reaction should instead be, LLM have incredible potential, but not the way you're using them. We can do better!
@emilymbender @TedUnderwood @LauraNelson @alex as the op that is what I meant, yes

@alex @LauraNelson @TedUnderwood @emilymbender and to be clear in the sequel I talked about how I think viewing LLMs as having knowledge is just going to exacerbate cultural and epistemological hegemony problems we already have with large scale information management

it takes the problem of the texas school board deciding what goes in high school textbooks across the country into a more global and more insidious issue, basically

@left_adjoint @alex @TedUnderwood @emilymbender I see what you're saying here (and I appreciate your perspective!). But I do think we can view LLMs as having knowledge (I don't think we need to broach the topic of whether they understand), but it's situated knowledge. And that's exciting. But y'all are right here: we need more information about data provenance first to know what view a LLM captures.
@alex @TedUnderwood @emilymbender I guess I don't see the leap from "we don't know enough about them to know the perspective" to "we need to assume a hegemonic view from nowhere." The hegemonic view *is* a view from somewhere. And that can tell us a lot about society. Maybe we start there?
@LauraNelson @TedUnderwood @emilymbender I see what you're saying. I think one _could_ do something with that, but I don't know what it'd tell us without knowledge of what the data is. Like, I don't need to prod a model to tell me that most of the text is racist, sexist, ableist, and Western-centric. But I wish I had more information about data provenance to discuss it _as_ a viewpoint. (e.g. this is why your work is so cool, Laura)
@alex @LauraNelson @TedUnderwood @emilymbender
If people want to collaborate on a paper about this, it be great. Iv been wanting to write about how tools like the following from a humanist perspective but don’t have time to do it solo: https://github.com/jalammar/ecco. This doesn’t tell us exactly the data these models are trained on but can help In understanding where certain values are “situated” in them.
GitHub - jalammar/ecco: Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).

Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, B...

GitHub
@alex @LauraNelson I’m working on a proposal; one piece of it is that we’ll need more information about social and historical provenance in training. Right now the image models often get that, at least tacitly, but I don’t think the text models do. I haven’t worked out all the details to put it mildly!
@LauraNelson @alex Given limited resources I’ll approach it through fine-tuning, but part of the goal, I think, is to figure out how much these models are (or aren’t) constrained by the lack of perspectival information in their original training
@TedUnderwood @alex My gut hunch is that the text models do get that too, but we haven't found a way to make that explicit yet. But this iteration of text generating models is so new. I think we'll (you'll, hopefully?) figure it out.
@LauraNelson @alex I've seen this (taking a more situated perspective) play out in a number of artistic contexts, e.g. Stephanie Dinkins training a model on her own oral history recordings, queer.ai training on queer lit resources, and Lillian-Yvonne Bertram zero-shot (or maybe fine-tuning, I forget) training GPT-3 on Gwendolyn Brooks. There's also a few projects popping up where people train or fine-tune a model on their own diaries.
@katyilonka @alex I love this! I'll have to check all of these out.

@LauraNelson okay so getting back to your point I both agree with the goal of a true epistemic pluralism in how we handle large scale information management, let's call it a kind of cyborg epistemology, but I also kinda agree with folks like @alex and @emilymbender that I don't think LLMs are going to be good tools for the job by their structure

I don't even think LLMs are useless here, but I do think that the model of "ask a question, get an answer" is probably a dead-end in terms of what the tools we need actually should look like

@left_adjoint @alex @emilymbender Now I love this discussion, and I love the idea of cyborg epistemology. But I also think there's great potential with LLMs and their structure, even in the question/answer format (and other formats!). We just have to rip it out of the hands of the absolutist, view-from-nowhere folk.

@LauraNelson @alex @emilymbender and to elaborate slightly further I guess what I kind of mean is that I think whatever tools are needed to build the knowledge base for intentionally situated LLMs are probably going to be more useful than an LLM trained on that corpus itself, y'know?

I think the final interface to this corpus can be far more interactive and illustrative and show us connections better than a chatgpt interface could, if you feel me?

@left_adjoint @alex @emilymbender Yeah I think your latter point is exactly what I'm thinking. We can make a far better and more informative interface than chatgpt, which is itself gimicky and a proof of concept. ChatGPT is a start though, and one I think, with many changes along the lines of what you're suggesting, could actually be informative in the way I'm (we're?) thinking.

@LauraNelson it's cool to know that we're closer to the same vision here than not

I really don't know practically where to start, though, beyond vague pictures in my head of queries returning big knowledge graphs you can interact with somehow

something kind of like "proof objects" in automated theorem proving, where the proof can be calculated algorithmically but you can inspect the steps that lead to the conclusion and how the final artifact is built

@left_adjoint Transparency and clarity is absolutely needed. I'm also quite vague here (these versions of LLMs are so new!), but I'm thinking something that shows how the knowledge graph or Q&A or whatever changes via different perspectives? Like Q&A systems could provide multiple answers, one via X perspective and one via Y perspective. With X and Y clearly defined and inspectable. That way the answer via perspective X is not seen as absolute but one of many.
@LauraNelson @left_adjoint This tangent about different perspectives of knowledge bases reminds me of these experiments from Eunsol Choi's group at UT Austin, led by Hung-Ting Chen: https://arxiv.org/abs/2210.13701
Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence

Question answering models can use rich knowledge sources -- up to one hundred retrieved passages and parametric knowledge in the large-scale language model (LM). Prior work assumes information in such knowledge sources is consistent with each other, paying little attention to how models blend information stored in their LM parameters with that from retrieved evidence documents. In this paper, we simulate knowledge conflicts (i.e., where parametric knowledge suggests one answer and different passages suggest different answers) and examine model behaviors. We find retrieval performance heavily impacts which sources models rely on, and current models mostly rely on non-parametric knowledge in their best-performing settings. We discover a troubling trend that contradictions among knowledge sources affect model confidence only marginally. To address this issue, we present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates in retrieved evidences.

arXiv.org

@maria_antoniak @left_adjoint This is an interesting solution: "To address this issue, we present a new calibration study, where models are discouraged from presenting any single answer when presented with multiple conflicting answer candidates in retrieved evidences."

Providing multiple answers, and explaining why. I hope more people are exploring that option.

@LauraNelson I agree with your view, but I disagree that any concepts at all, of any sort, emerge from auto associative predictive statistics on tokens.
@LauraNelson I feel like someone should inform the person in the screenshot of the concept of an average or even the concept of sampling from a distribution. Even the maximalist "gpt is already an AGI" crowd don't think it's a view from nowhere all knowing eye
@notkavi I think the OP is very aware of all that. They know their stuff. And they're right: some of the ways in which LLMs are discussed do absolutely wrongly assume the view-from-nowhere epistemology.
@LauraNelson hmm ok I think I was picturing the wrong kind of thing that they were criticizing. I can see how this is a reasonable criticism of the whole "LLMs as a replacement for Google" idea
@notkavi There's a real danger that people in positions of power will misinterpret/overinterpret LLMs, and this will cause real harm to a lot of real people. Caution is merited.
@LauraNelson hey, so when i was first getting into nlp, language models, as used around my research group, modelled snapshots of language use in a particular setting. the interpretation read into them was that they "captured linguistic conventions in a community" where these conventions were contingent on the community and subject to change across time; an unfulfilled dream was to figure out ways of measuring how exactly individual language-users could cause these conventions to evolve.

@LauraNelson e.g., here's my phd advisor's paper on linguistic change in online communities -- really, beer-review forums. https://www.cs.cornell.edu/~cristian/Linguistic_change_files/linguistic_change_lifecycle.pdf

here, language models are bespoke and bigram-level, used to model month-by-month patterns of language used by this group of people who reviewed craft beers on the internet. baked into the research question is the intuition that these patterns change over time.

@LauraNelson the cool finding is that as you write more beer reviews, you might start to settle into your habitual ways of writing these reviews, even as the community moves on without you.

(side note, i feel maybe a variant or reversal of this about my current mastodon instance? anyway, not important.)

@LauraNelson so, a few thoughts here. first, at least in the data-science-y space, measurements of "adherence to a community norm" have a tendency to get baked into ways of sorting and ranking people. i don't love the question of whether an individual "uses language in a typical way" _even if_ you add lots of qualifications to the word "typical". "you use language weirdly, therefore you must be thinking of leaving this community, let's somehow fix that."

@LauraNelson

but maybe there's something here? i.e., in the vein of, "here we map the gradual ossification of bureaucratic-speak in this organization, showing the development of linguistic devices that help it shirk responsibility for the various crises it's implicated in."

@LauraNelson second, the "LMs have knowledge" trend is totally bizarre to me, given the "community norm" interpretation that i was socialized into. maybe that's why people make distinctions between NLP and comp-soc-sci/text-as-data.

(i mentioned both interpretations to an anthro professor recently, who seemed a bit weirded out by the idea of writing down a unified interpretation of "what it means to find patterns in text", so...that too.)

@LauraNelson third, LMs -- even small-ish ones -- rely on having sufficient data. there are only so many ways you can slice a corpus by time and space until you fall through the statistical ice. i think that leads to lots of the "future qualitative work could..." paragraphs in papers.
@LauraNelson i guess that's where the idea of fine-tuning comes in, but i'm not convinced -- if you think that approach leads to better understanding of a smaller-data-setting (because the soup of LLM training data gets you at least partway there), aren't you sort of inheriting the view-from-nowhere-esque assumption that you're trying to problematize?
@tisjune Ahh these thoughts are all fascinating! First, "LMs have knowledge" should mabye instead be "LMs contain knowledge" or "LMs encode information that could be transformed into knowledge" (that last one is very clunky tho). And also, really large LMs might be able to be used to distinguish between perspectives contained within them (provided there's enough data for a particular perspective). I'm sure I don't know the tech enough, but maybe that could be a direction to take them?