The crybabies who freak out about *The Communist Manifesto* appearing on university curriculum clearly never read it - chapter one is basically a long hymn to capitalism's flexibility and inventiveness, its ability to change form and adapt itself to everything the world throws at it and come out on top:

https://www.marxists.org/archive/marx/works/1848/communist-manifesto/ch01.htm#007

1/

Communist Manifesto (Chapter 1)

History of the Bourgeois and Proletarian class

If you'd like an essay-formatted version of this thread to read or share, here's a link to it on pluralistic.net, my surveillance-free, ad-free, tracker-free blog:

https://pluralistic.net/2023/08/18/openwashing/#you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means

2/

Pluralistic: “Open” “AI” isn’t (18 August 2023) – Pluralistic: Daily links from Cory Doctorow

Today, leftists signal this protean capacity of capital with the -washing suffix: #greenwashing, #genderwashing, #queerwashing, #wokewashing - all the ways capital cloaks itself in liberatory, progressive values, while still serving as a force for extraction, exploitation, and political corruption.

3/

A smart capitalist is someone who, sensing outrage at a world run by 150 old white guys in boardrooms, proposes replacing half of them with women, queers, and people of color. This is a superficial maneuver, sure, but it's an incredibly effective one.

In "Open (For Business): Big Tech, Concentrated Power, and the Political Economy of Open AI," a new working paper, @Mer__edith, @davidthewid and #SarahBMyers document a new kind of -washing: #openwashing:

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4543807

4/

Openwashing is the trick that large "AI" companies use to evade regulation and neutralizing critics, by casting themselves as forces of ethical capitalism, committed to the virtue of #openness. No one should be surprised to learn that the products of the "open" wing of an industry whose products are neither "artificial," nor "intelligent," are also not "open." Every word AI huxters say is a lie; including "and," and "the."

5/

So what work does the "open" in "open AI" do? "Open" here is supposed to invoke the "open" in "#OpenSource," a movement that emphasizes a software development methodology that promotes code #transparency, #reusability and #extensibility, which are three important virtues.

But "open source" itself is an offshoot of a more foundational movement, the #FreeSoftware movement, whose goal is to promote *freedom*, and whose *method* is openness.

6/

The point of #SoftwareFreedom was #TechnologicalSelfDetermination, the right of technology users to decide not just what their technology *does*, but who it does it *to* and who it does it *for*:

https://locusmag.com/2022/01/cory-doctorow-science-fiction-is-a-luddite-literature/

The open source split from free software was ostensibly driven by the need to reassure investors and businesspeople so they would join the movement.

7/

Cory Doctorow: Science Fiction is a Luddite Literature

From 1811-1816, a secret society styling themselves “the Luddites” smashed textile machinery in the mills of England. Today, we use “Luddite” as a pejorative referring to backwards, anti-technology…

Locus Online

The "free" in free software is (deliberately) ambiguous, a bit of wordplay that sometimes misleads people into thinking it means "#FreeAsInBeer" when really it means "#FreeAsInSpeech" (in Romance languages, these distinctions are captured by translating "free" as "libre" rather than "gratis").

8/

The idea behind open source was to rebrand free software in a less ambiguous - and more instrumental - package that stressed cost-savings and software quality, as well as "ecosystem benefits" from a co-operative form of development that recruited tinkerers, independents, and rivals to contribute to a robust infrastructural commons.

9/

But "open" doesn't merely resolve the linguistic ambiguity of libre vs gratis - it does so by removing the "liberty" from "libre," the "freedom" from "free." "Open" changes the pole-star that movement participants follow as they set their course. Rather than asking "Which course of action makes us more free?" they ask, "Which course of action makes our software better?"

10/

Thus, by dribs and drabs, freedom leeches out of openness. Today's tech giants have mobilized "open" to create a two-tier system: the largest tech firms enjoy broad freedom themselves - they alone get to decide how their software stack is configured. But for all of us who rely on that (increasingly unavoidable) software stack, all we have is "open": the ability to peer inside that software and see how it works, and perhaps suggest improvements to it:

https://www.youtube.com/watch?v=vBknF2yUZZ8

11/

How markets coopted free software's most powerful weapon (LibrePlanet '18 Keynote) — Benj. Mako Hill

YouTube

In the Big Tech net, it's freedom for them, openness for us. "Openness" - transparency, reusability and extensibility - is valuable, but don't mistake it for technological self-determination. As the tech sector becomes ever-more concentrated, the limits of openness become more apparent.

But even by those standards, the openness of "open AI" is thin gruel indeed (that goes triple for the company that calls itself "#OpenAI," which is a *particularly* egregious openwasher).

12/

The paper's authors start by suggesting that the "open" in "open AI" is meant to imply that an "open AI" can be scratch-built by competitors (or even hobbyists), but that this isn't true. Not only is the material that "open AI" companies publish insufficient for reproducing their products, even if those gaps were plugged, the resource burden required to do so is so intense that only the largest companies could do so.

13/

Beyond this, the "open" parts of "open AI" are insufficient for achieving the other claimed benefits of "open AI": they don't promote auditing, or safety, or competition. Indeed, they often cut *against* these goals.

"Open AI" is a wordgame that exploits the malleability of "open," but also the ambiguity of the term "AI": "a grab bag of approaches, not... a technical term of art, but more ... marketing and a signifier of aspirations."

14/

Hitching this vague term to "open" creates all kinds of bait-and-switch opportunities.

That's how you get #Meta claiming that #LLaMa2 is "open source," despite being licensed in a way that is absolutely incompatible with any widely accepted definition of the term:

https://blog.opensource.org/metas-llama-2-license-is-not-open-source/

15/

Meta’s LLaMa 2 license is not Open Source - Voices of Open Source

Meta is lowering barriers for access to powerful AI systems, but unfortunately, Meta has created the misunderstanding that LLaMa 2 is “open source” - it is not.

Voices of Open Source

LLaMa-2 is a particularly egregious openwashing example, but there are plenty of other ways that "open" is misleadingly applied to AI: sometimes it means you can see the source code, sometimes that you can see the training data, and sometimes that you can tune a model, all to different degrees, alone and in combination.

But even the most "open" systems can't be independently replicated, due to raw computing requirements.

16/

This isn't the fault of the AI industry - the computational intensity is a fact, not a choice - but when the AI industry claims that "open" will "democratize" AI, they are hiding the ball. People who hear these "#democratization" claims (especially policymakers) are thinking about entrepreneurial kids in garages, but unless these kids have access to multi-billion-dollar data centers, they can't be "disruptors" who topple tech giants with cool new ideas.

17/

At best, they can hope to pay rent to those giants for access to their compute grids, in order to create products and services at the margin that *rely* on existing products, rather than displacing them.

The "open" story, with its claims of democratization, is an especially important one in the context of regulation.

18/

In Europe, where a variety of AI regulations are proposed, the AI industry has co-opted the open source movement's hard-won narrative battles about the harms of ill-considered regulation.

For open source (and free software) advocates, many tech regulations aimed at taming large, abusive companies - like requirements to surveil and control users to extinguish toxic behavior - wreak collateral damage on the free, open, user-centric superior alternatives to Big Tech.

19/

This leads to the paradoxical effect of passing regulation to "punish" Big Tech that end up simply shaving an infinitesimal percentage off the giants' profits, while destroying the small co-ops, nonprofits and startups before they can grow to be a viable alternative.

The years-long fight to get regulators to understand this risk has been waged by principled actors working for subsistence nonprofit wages or for free.

20/

Now the AI industry is capitalizing on lawmakers' hard-won consideration for collateral damage by claiming to be "open AI" and thus vulnerable to overbroad regulation.

But the "open" projects that lawmakers have been coached to value are precious because they deliver a level playing field, competition, innovation and democratization - all things that "open AI" fails to deliver.

21/

The regulations the AI industry is fighting also don't necessarily implicate the speech implications that are core to protecting free software:

https://www.eff.org/deeplinks/2015/04/remembering-case-established-code-speech

Just think about LLaMa-2. You can download it for free, along with the model weights it relies on - but not detailed specs for the data that was used in its training.

22/

EFF at 25: Remembering the Case that Established Code as Speech

One of EFF's first major legal victories was Bernstein v. Department of Justice, a landmark case that resulted in establishing code as speech and changed United States export regulations on encryption software, paving the way for international e-commerce. We represented Daniel J. Bernstein, a...

Electronic Frontier Foundation

And the source-code is licensed under a homebrewed license cooked up by Meta's lawyers, a license that only glancingly resembles anything from the #OpenSourceDefinition:

https://opensource.org/osd/

Core to Big Tech companies' "open AI" offerings are tools, like Meta's #PyTorch and Google's #TensorFlow. These tools are indeed "open source," licensed under real OSS terms.

23/

The Open Source Definition

Introduction Open source doesn’t just mean access to the source code. The distribution terms of open-source software must comply with the following criteria: 1. Free Redistribution The licens…

Open Source Initiative

But they are designed and maintained by the companies that sponsor them, and optimize for the proprietary back-ends each company offers in its own cloud. When programmers train themselves to develop in these environments, they are gaining expertise in adding value to a monopolist's ecosystem, locking themselves in with their own expertise. This a classic example of software freedom for tech giants and open source for the rest of us.

24/

One way to understand how "open" can produce a lock-in that "free" might prevent is to think of #Android: Android is an open platform in the sense that its sourcecode is freely licensed, but the existence of Android doesn't make it any easier to challenge the mobile OS duopoly with a new mobile OS; nor does it make it easier to switch from Android to #iOS and vice versa.

25/

Another example: #MongoDB, a free/open database tool that was adopted by #Amazon, which subsequently forked the codebase and tuning it to work on their proprietary cloud infrastructure.

The value of open tooling as a stickytrap for creating a pool of developers who end up as sharecroppers who are glued to a specific company's closed infrastructure is well-understood and openly acknowledged by "open AI" companies.

26/

#Zuckerberg boasts about how #PyTorch ropes developers into Meta's stack, "when there are opportunities to make integrations with products, [so] it’s much easier to make sure that developers and other folks are compatible with the things that we need in the way that our systems work."

Tooling is a relatively obscure issue, primarily debated by developers. A much broader debate has raged over training data - how it is acquired, labeled, sorted and used.

27/

Many of the biggest "open AI" companies are totally opaque when it comes to training data. Google and OpenAI won't even say how many pieces of data went into their models' training - let alone which data they used.

Other "open AI" companies use publicly available datasets like #ThePile and #CommonCrawl. But you can't replicate their models by shoveling these datasets into an algorithm. Each one has to be groomed - labeled, sorted, de-duplicated, and otherwise filtered.

28/

Many "open" models merge these datasets with other, proprietary sets, in varying (and secret) proportions.

Quality filtering and labeling for training data is incredibly expensive and labor-intensive, and involves some of the most exploitative and traumatizing #clickwork in the world, as poorly paid workers in the Global South make pennies for reviewing data that includes graphic violence, rape, and gore.

29/

Not only is the product of this #DataPipeline kept a secret by "open" companies, the very nature of the pipeline is likewise cloaked in mystery, in order to obscure the exploitative labor relations it embodies (the joke that "AI" stands for "absent Indians" comes out of the South Asian clickwork industry).

30/

The most common "open" in "open AI" is a model that arrives built and trained, which is "open" in the sense that end-users can "fine-tune" it - usually while running it on the manufacturer's own proprietary cloud hardware, under that company's supervision and surveillance. These tunable models are undocumented blobs, not the rigorously peer-reviewed transparent tools celebrated by the open source movement.

31/

If "open" was a way to transform "free software" from an ethical proposition to an efficient methodology for developing high-quality software; then "open AI" is a way to transform "open source" into a rent-extracting black box.

Some "open AI" has slipped out of the corporate silo. Meta's #LLaMa was leaked by early testers, republished on #4chan, and is now in the wild.

32/

Some exciting stuff has emerged from this, but despite this work happening outside of Meta's control, it is not without benefits to Meta. As an infamous leaked Google memo explains:

> Paradoxically, the one clear winner in all of this is Meta. Because the leaked model was theirs, they have effectively garnered an entire planet's worth of free labor.

33/

> Since most open source innovation is happening on top of their architecture, there is nothing stopping them from directly incorporating it into their products.

https://www.searchenginejournal.com/leaked-google-memo-admits-defeat-by-open-source-ai/486290/

34
/

Leaked Google Memo Admits Defeat By Open Source AI

Leaked memo acknowledges that Google cannot compete against open source AI and suggests a surprising tactic to regain dominance

Search Engine Journal

Thus, "open AI" is best understood as "as free product development" for large, well-capitalized AI firms, done by tinkerers who will not be able to escape these giants' proprietary compute silos and opaque training corpuses, and whose work product is guaranteed to be compatible with the giants' own systems.

The instrumental story about the virtues of "open" often invoke #auditability: the fact that anyone can look at the source code makes it easier for bugs to be identified.

35/

But as open source projects have learned the hard way, the fact that anyone *can* audit your widely used, high-stakes code doesn't mean that anyone *will*.

The #Heartbleed vulnerability in #OpenSSL was a wake-up call for the open source movement - a bug that endangered every secure webserver connection in the world, which had hidden in plain sight for years.

36/

The result was an admirable, successful effort to build *institutions* whose job it is to actually make use of open source transparency to conduct regular, deep, systemic audits.

In other words, "open" is a necessary, but insufficient, precondition for auditing. But when the "open AI" movement touts its "safety" thanks to its "auditability," it fails to describe any steps it is taking to replicate these auditing institutions - how they'll be constituted, funded and directed.

37/

The story starts and ends with "transparency" and then makes the unjustifiable leap to "safety," without any intermediate steps about how the one will turn into the other.

It's a #MagicUnderpantsGnome story, in other words:

Step One: Transparency

Step Two: ??

Step Three: Safety

https://www.youtube.com/watch?v=a5ih_TQWqCA

38/

Make Profit By Stealing Underpants - SOUTH PARK

YouTube

Meanwhile, OpenAI itself has gone on record as objecting to "burdensome mechanisms like licenses or audits" as an impediment to "innovation" - all the while arguing that these "burdensome mechanisms" should be mandatory for rival offerings that are more advanced than its own. To call this a "transparent ruse" is to do violence to good, hardworking transparent ruses all the world over:

https://openai.com/blog/governance-of-superintelligence

39/

Governance of superintelligence

Now is a good time to start thinking about the governance of superintelligence—future AI systems dramatically more capable than even AGI.

Some "open AI" is much more open than the industry dominating offerings. There's #EleutherAI, a donor-supported nonprofit whose model comes with documentation and code, licensed #Apache2. There are also some smaller academic offerings: #Vicuna (UCSD/CMU/Berkeley); #Koala (Berkeley) and #Alpaca (Stanford).

These are indeed more open (though Alpaca - which ran on a laptop - had to be withdrawn because it "hallucinated" so profusely).

40/

@pluralistic Oh hey, that's actually the reason why my homebrewed attempts at training a from-scratch Stable Diffusion clone on public domain data (sourced from Wikimedia Commons) just gave back a model that draws nothing but maps and can't understand prompts.

@pluralistic So... I think this part of the draft is inaccurate. Amazon never "forked the codebase". See the receptive cordial discourse 🧵 over on The Other Place.

Many other arguments in this paper are compelling to me! I personally think we need "open source" to stand for Freedom!
https://twitter.com/_msw_/status/1692290430547968213

@[email protected] on Twitter

“@mer__edith @davidthewid @sarahbmyers "By forking this open source project, hosting it on their own proprietary infrastructure, and whitelabeling it as its own, Amazon was able to exploit and integrate the work of open source developers into its own service offerings." This is inaccurate. AWS never "forked" MongoDB.”

Twitter
@pluralistic To me, this means we must push back when it seems that corporate interests are co-opting copyleft, and pushing for changes to what the definition of "open source" should be (like trying to get the SSPLv1 license through the OSI review process).
https://sfconservancy.org/blog/2020/jan/06/copyleft-equality/
Toward Copyleft Equality for All

I would not have imagined even two years ago that expansion of copyleft would become such an issue of interest in software freedom licensing. Historically and for good reason, addition of new forms of copyleft clauses has moved at a steady pace. The early 2000s brought network services clauses (such as that in the Affero GPL), which hinged primarily on requiring provision of source to network-remote users. Affero GPL implemented this via copyright-controlled permission of modification. These licenses began as experiments, and were not approved by some license certification authorities until many years later.

Software Freedom Conservancy
@pluralistic that's not accurate. Amazon didn't fork it. Amazon is the scum of the earth but Mongo did the relicensing all on their own when they realized being open source wasn't profitable enough for their shareholders.

@pluralistic why is it that this paradox always seems to happen in #BigTech #Regulation? Public sentiment against tech companies gets channeled into laws that just entrench their position!

Cynically, it's convenient for governments to have a handful of big tech companies that they can convince to do their bidding.

But surely not all policymakers are this cynical. How do the well-intentioned rest supporting such bills not see that these laws will have the opposite to the intended effect?

@matthew

Nothing new, government only handles the violence, outsource everything else to obedient servants that do the nitty-gritty stuff, like collecting taxes, ensuring the "legitimacy" of the ruling class.
Tech is just the latest frontier that has been regulated to death, to ensure no one gets any bright ideas that threatens the status quo.

Maybe we had an opportunity, and maybe some people have woken up to the Most Dangerous Superstition (that the State is needed)