Microsoft CEO of AI: Online content is 'freeware' for models • The Register
Microsoft CEO of AI: Online content is 'freeware' for models • The Register
@GolfNovemberUniform @possiblylinux127 Content on the open web is not freeware, it is not even totally open as 'fair use for any purpose'. It is often licenced by CC and needs to be accredited and only used for non commercial purposes. LLMs may also use full copyrighted works. The knowledge web is not only American, so different laws apply. The kind of ignorance this man shows is worthy of a 1st year student who thinks 'bc its on Google means I can use it for anything'.
“See u like AI because I’m selfish. Also those bad things are in the past, I’m using an ethical AI system now! But also, who gives a fuck because I only care about myself!”
Yeah you get it guy! Maybe you can be Trumps secretary of technology!
All of the resources and energy spent to get you this product you like. You can’t discount what it took to create something just because the final product is small and efficient. Take a look at the manufacturing footprint of nearly all complex hardware.
I’m not saying you created the AI but you are one of its supporters, without which there would be no AI.
If this was all just pitched as developing a new plain English coding language, I think the hype following it would be far more appropriate, but then the funding wouldnt follow to support the massive development costs of AI.
Its become a circle of hype chasing money chasing hype.
Its not you that is the problem so to speak though, its the collective “you’s” who think the same way.
I’m not discounting it. Improving productivity for office workers by 1% across the world is a massive amount
The power used to train the AI is alot, but after that using the AI uses a lot less electricity, if an AI spikes my gpu by 10 seconds to type something that would have taken me 30 minutes, I’ve saved on electricity:
As AI systems proliferate, their greenhouse gas emissions are an increasingly important concern for human societies. We analyze the emissions of several AI systems (ChatGPT, BLOOM, DALL-E2, Midjourney) relative to those of humans completing the same tasks. We find that an AI writing a page of text emits 130 to 1500 times less CO2e than a human doing so. Similarly, an AI creating an image emits 310 to 2900 times less. Emissions analysis do not account for social impacts such as professional displacement, legality, and rebound effects. In addition, AI is not a substitute for all human tasks. Nevertheless, at present, the use of AI holds the potential to carry out several major activities at much lower emission levels than can humans.
my AI is so good, it generated one that’s 100% identical
plus my AI uses less than 99% of the electricity of Microsoft’s
Also, this ground breaking AI model I made to do this was umm accidentally erased and I also forgot how to do make it.
Jury: “seems reasonable”
I’m fine with that, but let’s put some rules against this.
What you’re asking for is literally impossible.
A neural network is basically nothing more than a set of weights. If one word makes a weight go up by 0.0001 and then another word makes it go down by 0.0001, and you do that billions of times for billions of weights, how do you determine what in the data created those weights? Every single thing that’s in the training data had some kind of effect on everything else.
It’s like combining billions of buckets of water together in a pool and then taking out 1 cup from that and trying to figure out which buckets contributed to that cup. It doesn’t make any sense.
Sorry, I misinterpreted what you meant. You said “any AI models” so I thought you were talking about the model itself should somehow know where the data came from. Obviously the companies training the models can catalog their data sources.
But besides that, if you work on AI you should know better than anyone that removing training data is counter to the goal of fixing overfitting. You need more data to make the model more generalized. All you’d be doing is making it more likely to reproduce existing material because it has less to work off of. That’s worse for everyone.
Yeah but anything you create automatically has a copyright, so for example this comment is not in the public domain. Its use is limited to the context I am using it in. That is, I expect it to be copied for federation purposes, but I wouldn’t say that AI is covered in this context.
At least that’s the EU stance afaik. Like if I saw this comment on a billboard somewhere I’d see that as a breach of copyright and even privacy.
Microsoft is in a death spiral.
Even my coworkers who are complete idiots with technology, who actively sabotage themselves every time they touch any piece of hardware and software, have soured entirely on nearly every Microsoft product across the board.
Its funny how quickly people change their minds when they dont understand the technology on a deeper level. Its just: “this is frustrating now I hate it” and no further thought.
From the article:
Also, in 2022, several unidentified developers sued OpenAI and GitHub based on claims that the organizations used publicly posted programming code to train generative models in violation of software licensing terms
They can argue about it not being a copy all they want. If there is a single GPL licenced line of code scraped then anything they produce is a derivative work & must be licenced GPL.
nice.
I’ll play the uniformed devils advocate here:
I’m torn about my personal opinion about copyrights and software licensing in general. I think the main problem is the huge power imbalance between people and corporations, not so much the fact a company analyzed a bunch of available data to solve programming problems.
They don’t copy the data and sell it verbatim to others which would be a legal issue and in my mind also a moral issue, as they don’t add any additional value.
1: yes
2: Normally derivative works are patched or modified versions of the original. I think the common English meaning would apply & chatGPT et al are fucked. I doubt there is a precedent for this yet.