@marcusdeh @Pajo_16 @nixCraft Even that seems shaky these days - I'll try putting date restrictions on searches and regularly get stuff that the results page says is a week old, but actually dates from 2011. (Or vice versa.)
I'm not sure if they're just not respecting search syntax, something's breaking on the search engines' side of things, or if people are figuring out ways to make pages appear to be a different age than they actually are.
If you're looking for blogs and other personal sites I recommend bookmarking these search engines:
- https://search.marginalia.nu/
- https://ichi.do/
- https://clew.se/
- https://searchmysite.net/
- https://wiby.me/
Would also recommend checking out this very excellent piece as well for alternative search options: https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/
search.marginalia.nu is a small independent do-it-yourself search engine for surprising but content-rich websites that never ask you to accept cookies or subscribe to newsletters. The goal is to bring you the sort of grass fed, free range HTML your grandma used to write.
Blogrolls are a great option. :)
Mine's here if you want a good starting place: https://benjaminhollon.com/blogroll/
Then many of those sites have their own blogrolls; and so on and so on and so on.
> Is there any way to find these sites?
One alternative, independent search engine is #Mojeek that has its own index, using that you may be able to find things that Google/Microsoft decided to remove from their search results: https://www.mojeek.com/
@nixCraft
Tech corporations are strip-mining the commons in every possible way. It's despicable π‘
What will they do when they've finished this process? What will be left? It's unsustainable in the long term.
@nixCraft
Edit: As pointed out by others, still delete your answers as a form of protest if you wish. OpenAI may still get the data, but it will harm SO.
Edit 2: welp, looks like that might be off the table either way
https://m.benui.ca/@ben/112396505994216742
to be completely fair, I would be incredibly surprised (and I am trying to be charitable due to lack of concrete evidence) if OpenAI hasn't scanned every single SO question and answer ever made already. This was probably made so they would have ChatGPT answers on popular questions and stuff like that, which of course is still bad
Attached: 2 images Stack Overflow announced that they are partnering with OpenAI, so I tried to delete my highest-rated answers. Stack Overflow does not let you delete questions that have accepted answers and many upvotes because it would remove knowledge from the community. So instead I changed my highest-rated answers to a protest message. Within an hour mods had changed the questions back and suspended my account for 7 days.
That's not the point. Going forward stack overflow will be polluted with a bunch of AI "hallucinated" garbage, where hallucinated means "made shit up in order to produce a plausible answer".
StackOverflow dumps have been available to everyone for a long time.
https://stackoverflow.blog/2022/10/20/introducing-the-overflow-offline-project/
@chickfilla @nixCraft when you post something to Stack Overflow, you are licensing it with a Creative Commons license.
This open license is explicitly meant to facilitate sharing of knowledge and does not require permission from the author.
When someone decides to release content using an open license (which is great), they can't really complain when other people take advantage of said license.
I shared several of my programs as open source software. I won't get mad if people use them.
@lazza @nixCraft Likewise, if I release my contribution out in the open and then I remove it, regardless if someone has a copy of it or not, I have the right to do so.
Nobody is arguing they shouldn't, nor that they can't. This is more about boycotting SO. Just because you can do something, it doesn't mean you should, and more importantly, it doesn't mean you can't be criticized for it.
@lazza @chickfilla @nixCraft Creative Commons (aside from CC0) also requires attribution for derivative works. An LLM trained on CC material does not attribute its sources when itβs invoked. So itβs not compliant.
This is simple licence washing, and they get away with it because people let them.
@rubenerd @chickfilla @nixCraft the press release states that:
"This integration will [...] provide attribution to the Stack Overflow community within ChatGPT"
This relates to one side of the agreement (ChatGPT). The other product involved (OverflowAI) has this screenshot on its website.
If this is real, I would argue that attribution is being provided.
Real Talk: if Stack Overflow dies, we'll all be out of our tech jobs. The most common questions can't be answered by reading manpages.
@phaysis @nixCraft while I agree this would hurt a lot of developers, I don't think it's a healthy mindset to have. If your job depends on Stack Overflow answers to be done right, then you probably do need to spend more time reading manuals.
Sure most common questions are not directly answered by manuals (and sure many man pages are not very helpful) but that's because they are not meant for that. Ideally you should arrive at your answers by getting a better understanding of the systems you are trying to work with. It usually takes more time, but it also leads to a more rewarding experience that pays up more in the long term.
Not to say there isn't a place for forums, after all there's times where we don't even know where to start looking, but if your job depends on readily available answers to very specific questions scattered through a site, you might be doing it wrong imho.
Lets be real they did it already long time ago.
This is just to make it "legal".
Their answers are still bad
If you use copyleft content that requires derivate works to be equally licensed to train your IA, your IA is a derived work, so your IA should be distributed under that copyleft.
Are most IAs a massive copyright and copyleft violation? It's pretty much obvious. When will those copyright and copylefts be enforced? When somebody strong or brave enough decides to sue any of the main IA developers.
Wikipedia and Stack Overflow content included.