Question: Do you think search engines such as @duckduckgo, @google, @Bing, @brave and others should filter out spam generated by LLM that pollutes search results? Please boost for reach. TIA.
yes
57%
no
1.7%
give filter option to user
36.2%
why would they care?
5.1%
Poll ended at .
Here is such an example of spam created by LLM. The laziness of Alibaba Cloud. They want to avoid paying any content writer but happily scrape content from all over the internet to sell their shity cloud service. Most cloud service providers pay someone to write content. Each of their page have this warning at the end.

@nixCraft
This is ...
Won't use it.
Not bc disclaimer is AI made.
Not bc a regular person has to ask a professional lawyer to check for validation.
Not bc for the "its all up to you" attitude.

But for just not sticking to the basic thing: respect towards people.

What do they expect?
Giving in and not:
- check the rules
- ask a professional
- feeling sorry for the creators, which got unasked drawn into this terms of use
- still not being sure, if it is legal
- paying for the professional

@nixCraft So, a disclaimer from a company, which explicitly denies that it is the company's opinion (check the middle of the text). That's worse than laziness, it's willful blindness.

@nixCraft Excuse me WTF?!

Are they creating product descriptions with AI, and just say "well whatever we wrote here we didn't write it, so you should double check everything in it"... or something?

@nixCraft @duckduckgo @google @Bing @brave a good start would be to remove the shitty ai from the search suggestions. Big g has gone down the shitter lately especially when using it in Finnish. It's suggesting completely nonsensical word suffixes and nonexistent words all the time for the last few months. 💩

@nixCraft @duckduckgo @google @Bing @brave

Answer 3 is a subset of answer 1, do you think using a poll interface instead of checkboxes will bias the results?

yes
12.5%
no
37.5%
maybe
25%
I don't know
25%
Poll ended at .
@nixCraft @duckduckgo @google @Bing @brave there's a Computerphile video about a way to detect whether something is LLM generated, it's quite interesting. I think an option to filter the results like that is critical in order to maintain some relative quality of information online, and I think it should be opt-out rather than opt-in since the average layman won't know or care to do it either way, but they would be victims of misinformation without knowing it otherwise

@nixCraft @duckduckgo @google @Bing @brave Search engines should only return results where the source material has a citation attached (and the citation exists).

Change my mind.

@davidr @nixCraft @duckduckgo @google @Bing @brave So if I'm searching for an answer to a computer problem, and someone has solved it and details it in their blog — what would the relevant citation be?
@fishidwardrobe @nixCraft @duckduckgo @google @Bing @brave The manufacturer's documentation, source code or similar.

@davidr @nixCraft @duckduckgo @google @Bing @brave Software doesn't have a "manufacturer" and sadly often has little documentation. Source code is only intelligible to a few.

For argument's sake, let's say the blog post links to the issue in Github – which has sat unsolved for six months, and is basically an unanswered question, so doesn't prove anything. The blogger writes, "I couldn't find anything about this anywhere, but here's what seemed to fix it for me."

Where's the citation?

@fishidwardrobe @nixCraft @duckduckgo @google @Bing @brave You are putting the onus on the wrong person. You should be asking the polluters of our hard-won small nugget of true knowledge why they should be allowed in, not allowing them in by default unless I come up with a perfect system.

@davidr @nixCraft @duckduckgo @google @Bing @brave So when you said "Search engines should only return results where the source material has a citation attached (and the citation exists)." was that putting the onus on the LLM folks? Or is that putting the onus on everyone else?

You seem to be asking everyone else to prove they have a right to a search result?

@fishidwardrobe @nixCraft @duckduckgo @google @Bing @brave No, it's putting the onus on knowledge pollution engines to prove they have a positive contribution to make.

@davidr @nixCraft @duckduckgo @google @Bing @brave But you said that *everyone* had to provide a citation. "Search engines should only return results where the source material has a citation attached (and the citation exists)."

I'm not arguing in favour of that (or LLMs). I'm saying that sometimes a citation is not possible.

@nixCraft @duckduckgo @google @Bing @brave I mean, how is it any different than the spam generated for SEO farming? Heck, at least stuff made by an LLM might be marginally more sensible or helpful than the loads of pages that are just SEO barf to get a higher page ranking.
@nixCraft @duckduckgo @google @Bing @brave not sure how anyone could tell the difference 😢
@nixCraft @duckduckgo @google @Bing @brave I dont think search engines should filter anything. There is a distinct difference between a search engine and a social media platform that you're not expecting your search results to be biased in any way. But I would definitely appreciate a toggle button, looks like the best option, if you want that filter you just enable it. I would even think it would be fine to enable by default if its easily visible and usable.
@duckduckgo @Bing @nixCraft @google @brave AI "detectors" generate so many false positives that legitimate content would be wrongly blocked
@nixCraft @duckduckgo @google @Bing @brave
Yes is the only acceptable option.
If I want to ask the opinion of LLM I can just ask them directly.
@nixCraft @clive They shouldn’t focus on filtering but focus on ranking by usefulness and rank trustworthy and unique content higher than random spammy ones
@nixCraft @duckduckgo @google @Bing @brave Badly scraped Reddit and Stackoverflow comments already pollute most programming related searches and I’m afraid LLM generated content is only going to make it worse.
@nixCraft @duckduckgo @google @Bing @brave Oops, misread the question. I thought it was asking if they would, not if they should.

@nixCraft @duckduckgo @google @Bing @brave

The problem is, a technological problem would require a technological solution, and the deployment of even more not-really-AI AI.

@nixCraft @duckduckgo @google @Bing @brave They can't, and Google + MSFT don't care, but it would be nice if they could.

@nixCraft @clive

Its more complicated than LLM generated junk. All kinds of sort-of facts are out there — its hard to know what information you CAN trust.

I’ve read/seen an pieces in otherwise reputable media… that aren’t quite right. Should that be allowed? Peer review is a *sometimes filter for wrong things. And then newly discovered, paradigm changing facts (!fomites won’t give you covid!) in a sea of dated (now considered wrong) info.

Curate the whole internet? Who gets to do that?

@nixCraft @duckduckgo @google @Bing @brave since #Quality is the main factor, OFC they should filter with a default-on toggle.

This way those that want said garbage can search it anyway.

@nixCraft @duckduckgo @google @Bing @brave it will be difficult to differentiate between spammy useless content generated by a LLM and useful content generated with the collaboration between humans and LLMs.