zooming out a little bit, it does feel alarming to me that a lot of people whose stated politics are progressive or socialist or both are willing to give huge tech companies an easy ride for fully seizing the means of production for everyone, no matter where you personally work

@jcoglan what's the alternative?

It turns out LLMs are pretty easy to build now that we know how to do it. 5TB of data (not difficult to obtain) and a few millions of dollars in compute electricity turns out to do the job.

@simon I'm hearing "what's the alternative" a lot recently and if I took that attitude to very many things I would have to stop believing in anything
@jcoglan my chosen alternative is to try and teach people how to use these things productively and responsibly in a way that adds more value than it takes away

@simon @jcoglan I’m already getting stopped at “5TB data is easy to obtain” (without consent). There is no “responsibly” for me after that.

But even if it were, there are so many more things wrong with all this that I have a hard time understanding how anyone uses them at all outside of their manager tells them to because investments were made.

But that’s me, I’ve also never ridden an Uber. I must be holding things wrong.

@janl @simon @jcoglan serious question: what consent is required to scan every digitized work of art that is in public domain or to read the data from CommonCrawl?
@raphael @simon @jcoglan you are carving out an exception that is not relevant to my argument. It is extremely well documented that most popular LLMs have been trained on otherwise copyrighted materials and reproduce those in ways that is likely not covered by fair use (but I don’t have much hope for a legal argument, so moral it remains.

@janl @simon @jcoglan

But then your argument is not against LLMs in general, just this bad crop given by Big Tech.

@raphael @simon @jcoglan I struggle hard to separate the tool from the maker here. I think doing so is disingenuous even in the best light.

@janl @simon

I don't think the problem is in the tool itself. The troubling part to me is what @jcoglan
mentioned.

If all the "anti-AI" crowd focused their criticism and opposition on the corporations that are trying to monopolize and seek rent out of the whole world's information, it would be easier (I think) to gather more people on their side.

@raphael @simon @jcoglan and I think excepting the raw technology from this equation is not helping anyone but technologists who’d prefer not to be considering moral arguments 🤷‍♀️

@janl @simon @jcoglan

Call me an "Information wants to be free" maximalist if you want, but I have way less moral qualms about someone crawling all the internet and releasing an universally available LLM (open weights, no usage restrictions) than I have about anything coming out of, e.g, Apple.

@raphael I think we are done here.
@janl Sorry, did I touch a nerve here?