Mastodawn

G a b r i e l l e Jun 15

One thing about "AI" is with the technology OpenAI has (large neural network plus manual tagging) you could've made the best search engine ever. There could be a Copilot where you describe what you wanted and it finds an example of it in the corpus of open source software. You could go from a fuzzy image description to a stock image. These would be better than buggy code and fucked up images. But they wouldn't do that because the *service* OpenAI provides is obscuring that the content is stolen.

mcc Dec 4, 2025

If you describe a situation and OpenAI finds for you an existing image, you know what image database the image came from, and you know whether you're allowed to use it, and you know you're committing a crime, and OpenAI wants to relieve you of this final burden.

margot Dec 4, 2025

@mcc makes sense, the former sounds like an actual business model, not a way to accumulate so much wealth you can use it to grasp the levers of power

Olivier Galibert Dec 4, 2025

@mcc not sure. LLM are very bad at keeping information on their sources it seems

mcc Dec 4, 2025

@galibert they'd have to have trained it to do something different than they in fact trained it to do

@galibert @mcc LLM are not inherently bad at that, it's just the way they're trained.

CubeOfCheese Dec 5, 2025

@mcc oh that's a really interesting perspective

OpenAI
We commit crimes so you don't have to

The Duke of Fall

@mcc As the kids say: talk your shit, mcc. 🤘🏽

Nelson Lopez Dec 4, 2025

@mcc actually that just sounds so much better, like... some sort of stack overflow for finding code in github specifically, you just ask it what you want and it'd give you a trillion billion examples from a lot of different repositories

and it wouldn't even go against the spirit of open source unless you steal entire systems without proper acknowledgement

solo Dec 4, 2025

@mcc tbh, a search tool like that would even be 100% allowed as courts have decided that smth like that counts as being transformative enough to fall under fair use

0gust1 Dec 4, 2025

@mcc 100% agree.

I have always thought that:

- LLMs are lossy compressed hyperbooks.
- Companies misleadingly slapped an « oracle » UX over it.
- It has more potential as a navigational artifact than a knowledge artifact.

mcc Dec 4, 2025

@0gust1 The way I usually frame it is that machine learning can work-with-heavy-caveats for identification and categorization, but generation is a completely different problem and it does not work* for that.

* Except for certain problems of pure aesthetics, and the corporate LLMs/image models fail at those aesthetics.

Kelsey Jordahl Dec 4, 2025

@mcc likewise, they could also build into the LLMs the ability to cite sources reliably and trace the origins of facts and text. Some of them try to do that, but in my experience they are terribly unreliable (as in the majority of cited sources either don't exist or are irrelevant to the topic at hand). "Citation needed" is a wikipedia cliché, but crucial for building human knowledge and broken (by intent as you say) for LLMs.

mcc Dec 4, 2025

@kajord i think generation plus citation *together* is probably much harder to do reliably than either separately, but also, i believe their business incentives are way, way against it

sabik Dec 5, 2025

@mcc @kajord
I mean, there's RAG

Wulfy—Speaker to the machines Dec 4, 2025

Not sure when you've used #AI 👉properly👈.

In my experience the more vocal opponent of AI is the further back in time their (lack of use) goes.
With the most ardent opponents having never used the models, yet having most empathic (and increasingly inaccurate) opinions.

Attached media, a public query from today, with sources dropdown at the bottom.

Approx 30% of web searches comes from the engines nowadays.

(Edit: Hahaha, insta blocked by poster, I guess folks don't like to be called out on saying patent provable falsehoods 🤡

The poster, made a comment exposing their ignorance of features of existing AI. This one has 33,000 followers, question is "How many others like them have zero idea about the systems they critique"?)
#llm #ai #luddites

Furbland's Very Value = 71 Account™Dec 4, 2025

@n_dimension @mcc have you perhaps considered that:
1. you are very much mansplaining, mcc knows far more about this than you do
2. maybe fedi is not the right social media for you. go set up an account on farcaster or whatever the grifty techbros are using nowadays
3. even IF AI provided reputable content and sources as described by mcc, that still doesn’t solve all the ethical and environmental issues

that is all.

Wulfy—Speaker to the machines Dec 4, 2025

@GroupNebula563 @mcc

1. She literally said an untruth.
The exact definition of "not knowing more than I do"

2. Thanks for gatekeeping. Keep it up.

3. The post wasn't about ethics, it was about exposing ignorance of how the system evolved.

Thanks for engaging

Magnus Ahltorp Dec 4, 2025

@n_dimension @GroupNebula563 Your post was so incoherent that it was not possible to know what it was about, except that it legitimised wholesale infringement in a locked product.

Wulfy—Speaker to the machines Dec 4, 2025

@ahltorp @GroupNebula563

What are you talking about?

The folk hero mmc made a statement demonstrating she has not seen an LLM for at least 6 months.

Then when I demonstrated she made an error.
WTF is incoherent about it.
Maybe it's the folks who DONT use AI are losing thinking skills.

Where are you confused?
I'll walk you through

Furbland's Very Value = 71 Account™Dec 5, 2025

@n_dimension @ahltorp magnus, and anyone else reading this post: I checked their profile and they’re a UFO conspiracy theorist. I don’t think there’s any winning this argument. block, report, and move on

Wulfy—Speaker to the machines Dec 5, 2025

@GroupNebula563 @ahltorp

WTF are you talking about.

What are you going to report me for?
Pointing out another user outright misrepresented technology feature?

As to the UFO conspiracy theory.
Its you who is the conspiracy theorist.

Furbland's Very Value = 71 Account™Dec 5, 2025

@n_dimension @ahltorp all right I think we’re well and truly done here

@n_dimension That's just a LLM googling. It doesn't have the sources, it uses tool calls to use search engines and scrape web pages.

A LLM using a search engine under the hood is not proof that a LLM can replace a search engine.

And it doesn't solve fundamental problems (that can only be solved with a very different kind of training and different tools) such as making shit up and not giving credit to the source material of the training data (except for very well known things and only when you ask explicitly).

Wulfy—Speaker to the machines Dec 4, 2025

The sources are at the bottom of the dialogue.
You click it shows sources.
You need a computer to see it.
A computer is like a slate tablet only it uses electricity.
Your library has one.

@n_dimension Those are not sources from the training data. Those are sources extracted from a literal google search made by the LLM, with keywords chosen by the LLM. That's not what mcc is talking about. That's just tool calling. Do you know what tool calling is?

rakoo Dec 5, 2025

@n_dimension @mcc

but those citations are generated no ?

Jennifer Moore 😷Dec 5, 2025

This is neither an image search nor an example of open source software.

@n_dimension you're a huge asshole and you will continue to be hated and blocked by many for this type of behavior

Joel VanderWerf Dec 4, 2025

@mcc Creativity laundering.

CodeByJeff Dec 4, 2025

@mcc I doubt that

I think we are all forgetting the years and years of general enshittification of the web - all the crap, out of date examples; pages written more for clicks than helpfulness; transition to video for everything; etc, etc

I feel this was the correct technology path to follow, but everything they did about how they went about it is an immoral mess

- with a focus on enshittifying it straight out of the box

David =?🏴‍☠️Dec 4, 2025

@mcc haven’t you described perplexity.ai ?

mcc Dec 4, 2025

Furbland's Very Value = 71 Account™Dec 4, 2025

@david01928 @mcc nope, perplexity is just another LLM masquerading as a “search engine” that barely does anything useful

@david01928 That's just a LLM using a regular search engine and some other tricks, and pretty much all LLMs nowadays can do that through tool calling... but that's a very poor (and extremely limited) imitation of what mcc is actually talking about.

vader Dec 4, 2025

@mcc I still don't understand how the content is stolen.

Furbland's Very Value = 71 Account™Dec 4, 2025

@vader @mcc basically, the LLM is using content without consent. not obeying licenses, not attributing (or misattributing) the garbage it spits out, and actively avoiding attempts to curtail this behavior. I suggest you read this amazing article: https://aworkinglibrary.com/writing/toolmen

Toolmen

Even the best weapon is an unhappy tool.

A Working Library

ceets Dec 4, 2025

Yeah it's completely ignoring use licenses that humans would have to comply with.

vader Dec 16, 2025

@ct @mcc @GroupNebula563 Like what?

Furbland's Very Value = 71 Account™Dec 17, 2025

@vader @ct @mcc go ask ChatGPT or another one of those bullshit generators you’re so fond of, they’re going to be more willing to waste energy on you than we are

vader Dec 17, 2025

@GroupNebula563 @ct @mcc Deflections are a sign of the unintelligent who can't have a knowledgeable discourse. I'm sorry you're unequipped to have this conversation, but if you can't, maybe let ceets here explain their point of view. Potentially they actually have data and information that would be good for discussion.

vader Dec 16, 2025

@GroupNebula563 @mcc That's not how LLM's work. They learn from ingesting materials, creating tokens and learning patterns. Then it creates its own "garbage" based off of all that it has learned. I work in the industry. I don't need to read that article. By your logic, every author ever would need to attribute every single book they've read that could have ever influenced them.

Furbland's Very Value = 71 Account™Dec 17, 2025

@vader @mcc oop, we got a mansplainer. the problem here is that humans are… well… humans. they are capable of transformative thought and can come up with original ideas. LLMs, as you said previously, cannot. all they do is stitch things together based on what’s in their database. mcc ALSO works in the industry (the industry of *actual computing*, not bubbles that will burst in a matter of years), and (no offense) probably knows far more about it than you do. (1/2)

Furbland's Very Value = 71 Account™Dec 17, 2025

@vader @mcc (2/2) if you write a song with a violin in it, you do not have to *credit* the creator of the instrument (this is what humans do). if you stitch a bunch of parts of existing songs together without the consent or knowledge of the original writers or record labels and call it your own song, then you absolutely have to give credit and in some cases even that isn’t enough. anyways, maybe edi isn’t the best fit for you. maybe go back to X (formerly Twitter) or whatever

George B Dec 4, 2025

The links at the end of each line in the AI summary at the top of Google search results are often better than the search results themselves (and always better than the AI summary since they are an authoritative source)

mcc Dec 4, 2025

@gbargoud maybe they should have just incorporated that engine into the search results. as it is i'll never see it because i switched away from google completely solely in order to avoid the AI summary box

George B Dec 4, 2025

Yeah the UX for that is horrible and easy to miss but it shows just how great they could be if they were used as an index instead of a weird regurgitator like you suggested.

Su_G Dec 5, 2025

@mcc
Resonates: “… the *service* OpenAI provides is obscuring that the content is stolen.“ 😐

Talen Lee Dec 5, 2025

@mcc the *one thing* keep coming back to with these tools, watching students use them, is that the inference ability of these tools to do things like translate or refine interpretation of the student's intention is great

and then instead of being a search engine it's kinda useless

miki Dec 5, 2025

@mcc You absolutely *can do that*.

One of my AI use cases is "tip of the tongue" searches, things like "find me a movie where the dog dies, shortly followed by...." or "find me that comment from Hackernews on a story about Google buying some fitness startup that linked to books about x"

Modern LLMs will search and link to sources.

Ati Dec 5, 2025

@mcc Kagi does it. In a non open source way sadly but as a proof of concept it is here, working. Now we need open source alternative.

Marika Dec 6, 2025

@mcc the sad thing is, this is exactly what language models were invented for in the first place, in the field of information retrieval. Ironically, one of the best models to map image descriptions to existing images is OpenAI's CLIP model. The technology is there, and it's crazy good, but instead of making human knowledge more accessible than ever, we poison the Internet with nonsense, making actual information even harder to find.

Cassandrich Dec 6, 2025

@mcc "the *service* OpenAI provides is obscuring that the content is stolen."

👆👆👆

This. So much this. The folks bamboozled by the output of the automated plagiarism machines just don't grasp how utterly vast the corpus of stuff out there is, and how getting what they hoped out of the machine was just the result of something recognizably similar already existing.

THE service is sufficiently obscuring that similarity to create plausible deniability of plagiarism.

@mcc This reminds me of my cousin who loves ai, but only uses it for finding the poorly named programs in his obscure work server OS, and he's just right that is what llms are good for.

@mcc Yup. It's an accountability firewall. They provide two advantages to customers (as distinct from users) -

"We couldn't do this without a lot of stolen data and obviously we weren't going to take on the liability but we can just pay OpenAI to do it for us!"

AND -

"We can't be blamed for the decisions made at our request on our behalf by the LLM!"