Mastodawn

some_guy

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web

https://lemmy.sdf.org/post/18848765

Microsoft’s AI boss thinks it’s perfectly OK to steal content if it’s on the open web - SDF Chatter

Show thread

apfelwoiSchoppen Jun 29, 2024

DMCA for them, no DMCA for us.

Show thread

Hobbes_Dent Jun 29, 2024

So let it train on Windows source and keep your mouth shut when someone uses it to mimic Windows but better.

Show thread

blindbunny Jun 29, 2024

You’re always morally justified to steal from Microsoft

Show thread

Vanth Jun 29, 2024

Fair use once it’s posted on the web? Thank you very much for the framework to pirate anything and everything.

Show thread

catloaf Jun 29, 2024

fun fact, windows is posted on the web: www.microsoft.com/en-us/…/windows10

Download Windows 10

Show thread

Refurbished Refurbisher Jun 29, 2024

Microsoft would prefer that you pirate Windows rather than use Linux, as it further entrenches their dominance in the market.

They mainly make their money off of business licenses anyway, similar to Adobe and Autodesk.

Show thread

Grandwolf319 Jun 30, 2024

If that’s the case, then why not release a free home version??

Show thread

Refurbished Refurbisher Jun 30, 2024

They already have a free version of Windows. Just don’t activate it.

Show thread

WalnutLum Jun 29, 2024

My one dark hope is AI will be enough of an impetus for somebody to update DMCA

Show thread

ZILtoid1991 Jun 29, 2024

If that gets updated, then it will favor big corporations.

Show thread

crusa187 Jun 29, 2024

Only because our “representatives” let them write the law entirely. Imagine if Congress wasn’t filled to the brim with 80 year old fundraisers…

Show thread

afraid_of_zombies Jul 1, 2024

When is the last time a crisis resulted in a better solution for the general public?

Show thread

snekerpimp Jun 29, 2024

So if I see it on the “open web”, I’m free to use it however I please? Oh, I get thrown in jail and everything I own taken away.

If companies are people per “citizens united”, why doesn’t the same apply to them?

Show thread

Ænima Jun 29, 2024

And if a company makes a negligent decision, which kills a million people over time, why is no one being put on death row? They can and do have it both ways, but I can still wish for a just world where if companies are people, they can be put to death for mass casualties caused by their decisions.

Show thread

Sanctus Jun 29, 2024

The web isn’t open because I have to pay to access it.

Show thread

dinckel Jun 29, 2024

Just yet another proof, that the more 0’s you have in your valuation, the less the laws apply to you

Show thread

Bilb!Jun 29, 2024

I agree

Show thread

Victoria Antoinette Jun 29, 2024

copying isn’t stealing

Show thread

ayaya Jun 29, 2024

If the model isn’t overfitted it’s also not even copying. By their nature LLMs are transformative which is the whole point of fair use.

Show thread

profdc9 Jun 29, 2024

So I have a LLM read a book and paraphrase its contents, that’s not stealing?

Show thread

kureta Jun 29, 2024

Show thread

A_Very_Big_Fan Jun 29, 2024

!Arthur Dent has his home demolished while humans simultaneously have Earth demolished by an alien race called Vogons, but him and Ford Prefect escape by hitchhiking onto the Vogon ship. They’re discovered and thrown into space, but miraculously saved by Ford’s relative (can’t remember how they’re related) and his ship The Heart of Gold, which is powerful but unpredictable. They wind up on a mythical planet due to that unpredictability, and learn that Earth was a designer planet created to calculate the ultimate answer to the ultimate question of life, the universe, and everything. (The famous “42” thing). The whole crew escapes the planet and decides to go to The Restaurant at the End of The Universe to eat and watch the universe end.!<

Have I just stolen The Hitchhikers Guide to the Galaxy and given it to you?

Show thread

oo1 Jun 30, 2024

You’ve probably not infringed the copyright, only the court can decide though; if you were to be challenged by the rights holder.

I think there are lots of factors in your defence:

you’re not selling it , your use is an example for education
I don’t think you’re reducing the market value for the original(s) in any way
you’ve not included substantial verbaitim sections of the original works , but I think you have used more than just facts and ideas (not sure though).

But add in some more quotes, flesh it out, and then try to sell it . . . each step weakens the ‘fair use’ defence.

This the the problem for the LLM, it can be used for many things, and if it has no filter or limit, then eventually the collective derived works might add up to commercial, substantial reuse, and might include enough to have copied a substantial portion of the original. Very hard to determine I’d think. Each individual use might be fair, but did the LLM itself go too far at some point?

Copyright holder probably struggles to challenge the LLM on the basis of all the things infinite mokeys might use it for in future.

Show thread

A_Very_Big_Fan Jun 30, 2024

This the the problem for the LLM, it can be used for many things, and if it has no filter or limit

I agree with pretty much everything before this but that particular comment was just talking about summaries, which imo is a lot more cut and dry. (SparkNotes, for example)

An LLM by itself is unlimited and unfiltered, but it’s not impossible to limit one and sell it. For all the shit OpenAI deserves to get, I have to give them one thing, their copyright restriction system seems to be on par with YouTube. I paid for a month of it when GPT4 came out and tried my hardest to bypass it, but it won’t even give me copyrighted texts when the words are all replaced with synonyms or jumbled around.

I think if someone’s offering their LLM as a service and has a system like that in place, they aren’t stealing any more than YouTube is stealing. Otherwise I agree that there’s a strong argument for copyright infringement.

Show thread

ayaya Jun 30, 2024

Again, even an exact copy is not stealing. It’s copyright infringement. Theft is a different crime.

But paraphrasing is not copyright infringement either. It’s no different than Wikipedia having a synopsis for every single episode of a TV series. Telling someone about what a work contains for informational purposes is perfectly fine.

Copyright infringement - Wikipedia

Show thread

kibiz0r Jun 29, 2024

Pirating Windows for your own personal, private use, which will never directly make you a single dollar: HIGHLY ILLEGAL

Scraping your creative works so they can make billions by selling automated processes that compete against your work: Perfectly fine and normal!

Show thread

yesman Jun 29, 2024

Do people still pirate Windows? You can download the iso directly from Microsoft’s website and you don’t need a registration key anymore.

Show thread

Scrollone Jun 29, 2024

You do need a registration key, but now it’s tied to the hardware so it activates as soon as you connect to the network, no need to actually type the registration key.

Show thread

Sckharshantallas Jun 30, 2024

They’re saying Windows will lock away some customization, but you don’t need a key to use it nowadays.

Show thread

experbia Jun 29, 2024

bunch of fuckin art pirates. crying about software piracy while they have their own bots pirating everyone’s art.

Show thread

kibiz0r Jun 29, 2024

It’s not even piracy though. I never saw anyone torrent Windows_XP_Home_Cracked.iso and go “Hey guys, check out this operating system I made!”

Show thread

Brickardo Jun 29, 2024

Does Netflix count as the open web? It definitely feels like so, but I’m ready for a wealth hoarder to tell me otherwise!

Show thread

WallEx Jun 29, 2024

So its no longer intellectual property if its on the internet? The nerves on this guy…

So you could just copy and use every single helpful support article from Microsoft?

Oh shit, there aren’t any

Show thread

GBU_28 Jun 29, 2024

Essentially the joke everyone made about nfts.

Show thread

CriticalMiss Jun 29, 2024

Sure bud, pirating some Microsoft Studio video games and windows ISOs right now. What? I found them on the open web!

Show thread

probableprotogen Jun 29, 2024

Honestly just pirate their games since they keep buying every fucking studio they can get their grummy hands on

Show thread

rottingleaf Jun 30, 2024

Starlancer was nice I think

Show thread

bruhduh Jun 29, 2024

I mean, Xbox one/series recently got proof of concept jailbreak, so… I think many people are on board with your thought

Show thread

MonkderDritte Jun 29, 2024

There is a thing called usage licenses.

Show thread

ElectroLisa Jun 29, 2024

Aight, I’ma steal leaked Windows XP source code :3

Show thread

Paragone Jun 29, 2024

Is his personal-information on the dark-web?

Is he saying that if his personal-information is on the dark-web, then it’s perfectly-OK for everybody & their robot to be using it??

XOR is he saying that there are 2 kinds of law:

1 for protecting his entitlement,

the other for disallowing rights from the lives he consumes, through his beloved herd/corporation/pseudo-person?

( obviously, he’s already answered the latter )

Show thread

Buffalox Jun 29, 2024

copying is not theft

Show thread

Womble Jun 29, 2024

Didnt you hear? We stan draconian IP laws now because AI bad.

Show thread

Snot Flickerman Jun 29, 2024

Is it that or is it that the laws are selectively applied on little guys and ignored once you make enough money? It certainly looks that way. Once you’ve achieved a level of “fuck you money” it doesn’t matter how unscrupulously you got there. I’m not sure letting the big guys get away with it while little guys still get fucked over is as big of a win as you think it is?

Examples:

The Pirate Bay: Only made enough money to run the site and keep the admins living a middle class lifestyle.

VERDICT: Bad, wrong, and evil. Must be put in jail.

OpenAI: Claims to be non-profit, then spins off for-profit wing. Makes a mint in a deal with Microsoft.

VERDICT: Only the goodest of good people and we must allow them to continue doing so.

The IP laws are stupid but letting fucking rich twats get away with it while regular people will still get fucked by the same rules is kind of a fucking stupid ass hill to die on.

But sure, if we allow the giant companies to do it, SOMEHOW the same rules will “trickle down” to regular people. I think I’ve heard that story before… No, they only make exceptions for people who can basically print money. They’ll still fuck you and me six ways to Sunday for the same.

I mean, the guys who ran Jetflicks, a pirate streaming site, are being hit with potentially 48 year sentences. Longer than a lot of way more serious fucking crimes. I’ve literally seen murderers get half that.

But yeah, somehow, the same rules will end up being applied to us? My ass. They’re literally jailing people for it right now. If that wasn’t the case, maybe this argument would have legs.

But AI companies? Totes okay, bro.

Five men face prison time for illegal streaming service Jetflicks

FBI investigation found site amassed a catalogue larger than all the big streaming platforms combined

The Guardian

Show thread

Grimy Jun 29, 2024

The laws are currently the same for everyone when it comes to what you can use to train an AI with. I, as an individual, can use whatever public facing data I wish to build or fine tune AI models, same as Microsoft.

If we make copyright laws even stronger, the only one getting locked out of the game are the little guys. Microsoft, google and company can afford to pay ridiculous prices for datasets. What they don’t own mainly comes from aggregators like Reddit, Getty, Instagram and Stack.

Boosting copyright laws essentially kill all legal forms of open source AI. It would force the open source scene to go underground as a pirate network and lead to the scenario you mentioned.

Show thread

Womble Jun 30, 2024

Yes, it is a travesty that people are being hounded for sharing information, but the solution to that isn’t to lock up information tighter by restricting access to the open web and saying if you download something we put up to be freely accessed and then use it in a way we don’t like you own us.

Show thread

0x0 Jul 1, 2024

letting fucking rich twats get away with it

That’s law in general…

Show thread

cmhe Jun 29, 2024

“Copying is theft” is the argument of corporarions for ages, but if they want or data and information, to integrate into their business, then, suddenly they have the rights to it.

If copying is not theft, then we have the rights to copy their software and AI models, as well, since it is available on the open web.

They got themselves into quite a contradiction.

Show thread

Buffalox Jun 29, 2024

If copying is not theft, then we have the rights to copy their software

No we don’t, copying copyrighted material is copyright infringement. Which is illegal. that does not make it theft though.
Oversimplifying the issue makes for an uninformed debate.

Show thread

cactusupyourbutt Jun 29, 2024

any content you produce is automatically copyrighted

Show thread

masterspace Jun 30, 2024

You realize that half of Lemmy is tying themselves in inconsistent logical knots trying to escape the reverse conundrum?

Copying isn’t stealing and never was. Our IP system that artificially restricts information has never made sense in the digital age, and yet now everyone is on here cheering copyright on.

Show thread

BoxOfFeet Jul 1, 2024

You wouldn’t download a car!

Show thread

ZILtoid1991 Jun 29, 2024

Issue is power imbalance.

There’s a clear difference between a guy in his basement on his personal computer sampling music the original musicians almost never seen a single penny from, and a megacorp trying to drive out creative professionals from the industry in the hopes they can then proceed to hike up the prices to use their generative AI software.

Show thread

GamingChairModel Jun 29, 2024

Yeah, I’m not a fan of AI but I’m generally of the view that anything posted on the internet, visible without a login, is fair game for indexing a search engine, snapshotting a backup (like the internet archive’s Wayback Machine), or running user extensions on (including ad blockers).

Show thread

petrol_sniff_king Jun 30, 2024

None of those things replace that content, though.

Look, I dunno if this is legally a copyrights issue, but as a society, I think a lot of people have decided they’re willing to yield to social media and search engine indexers, but not to AI training, you know? The same way I might consent to eating a mango but not a banana.

Show thread

sugar_in_your_tea Jun 30, 2024

Yes, it kind of is. A search engine just looks for keywords and links, and that’s all it retains after crawling a site. It’s not producing any derivative works, it’s merely looking up an index of keywords to find matches.

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues. Whether a particular generated result violates copyright depends on the license of the works it’s based on and how much of those works it uses. So it’s complicated, but there’s very much a copyright argument there.

Show thread

Halosheep Jun 30, 2024

My brain also takes information and creates derivative works from it.

Shit, am I also a data thief?

Show thread

sugar_in_your_tea Jun 30, 2024

That depends, do you copy verbatim? Or do you process and understand concepts, and then create new works based on that understanding? If you copy verbatim, that’s plagiarism and you’re a thief. If you create your own answer, it’s not.

Current AI doesn’t actually “understand” anything, and “learning” is just grabbing input data. If you ask it a question, it’s not understanding anything, it just matches search terms to the part of the training data that matches, and regurgitates a mix of it, and usually omits the sources. That’s it.

It’s a tricky line in journalism since so much of it is borrowed, and it’s likewise tricky w/ AI, but the main difference IMO is attribution, good journalists cite sources, AI rarely does.

Show thread

TheRealKuni Jun 30, 2024

An LLM can essentially reproduce a work, and the whole point is to generate derivative works. So by its very nature, it runs into copyright issues.

Derivative works are not copyright infringement. If LLMs are spitting out exact copies, or near-enough-to-exact copies, that’s one thing. But as you said, the whole point is to generate derivative works.

Show thread

sugar_in_your_tea Jun 30, 2024

Derivative works are not copyright infringement

They absolutely are, unless it’s covered by “fair use.” A “derivative work” doesn’t mean you created something that’s inspired by a work, but that you’ve modified the the work and then distributed the modified version.