I actually believe this.
Musk is famous for preferring extremely lean (read: critically understaffed) teams, with very few people doing support and resilience, let alone documentation and testing and other non-glamorous but vital tasks. He also buys into the myth of the 10x dev and promotes managers who buy into it too, which means that individuals with egos the size of planets are able to do more or less whatever they want.
In a sense, this means that every dev is a rogue employee: there's very little oversight or signoff, because that's the way the big boss wants it.
(This may be why his cars are terribly made and explode almost as often as his rockets do. Reliability is expensive but in the long run it's cheaper than the alternative.)
I've had a standing bet for many years that Musk will never set foot on Mars, and I've become more sure of my position as time has gone on.
@Daojoan I disagree with your text on investor
I really doubt that they'd consider this a bug over being a feature
@Daojoan This should be a concern with nearly all AI at this point. AIs are mainly designed to act on a prompt. The issue with this is given that the prompt influences the predictive and transformation algorithms that are used for it to assemble an answer.
This is the where AI companies need to be transparent about what and how their systems are being implemented. This is why Google, YouTube, and other companies using AI's as part of their algorithms is quite terrifying: we, the public, have absolutely no clue about what is being used to drive these algorithms, how it is being changed and/or updated, or what affects these changes are having.
@unattributed @Daojoan I would love to see some regulations that forces AI companies to create a full documentation about what exact data was used for training models.
This would be also a magnificent resource for suing them over using copyrighted material. I know, that's probably just pipe dream... But wouldn't it be great (and actually reasonable, when you think about it)?
@karol_pieknik @Daojoan If it was just a static set of information, that would likely be possible. However, I do think that it would be a possibly huge document -- thousands upon thousands of pages long.
But, as I understand what is happening, there isn't a static set of documents that they could be produced that would cover everything that is being used in AI models. First, there is the ongoing production of information that is happening daily. Everything from datasets and documents being made publicly available by governments around the world, to social media posts, and to the articles produced by newspapers and magazines. I suspect any of the information that can be obtained through an automated process is being used to continuously train AI models.
Second, there is the interactive information. Anything that you touch that has AI implemented in it is a likely target for training AI's. Think of any search that is done on Google, any of the results selected from those Google searches, to anything searched on YouTube and the selected videos, etc.
Finally, there are the queries for information being used in AI implementations themselves. When someone prompts an AI for something, it is likely that there is a feedback loop being used for training the AI as to the accuracy or usefulness of its response.
So, documenting everything that is being used to train these AI models is something that would be difficult at best.
@unattributed @Daojoan I get all of that. Sure, it would be difficult. Not really my concern though, right? ;)
I mean, if even the companies that are making these models couldn't provide a list of what goes into the training data... isn't this even more alarming? We roughly know the mechanism, but the content is equally important here.
Also, I imagined not a formal document per se, but like an open database thay you could query and check if given resource, website, post or whatever was used.
@karol_pieknik @Daojoan Is it possible to make a query-able database? Yeah, probably is. Would it be useful? Not really... I don't know enough about these models to understand the storage of the information that is introduced to them. I suspect that the storage isn't handled in a way that makes removing the information easy (maybe not even possible). The reason I say this is that my understanding (without having played with an actual implementation) is that the training system isn't looking at documents / images / videos / etc. as a single entity. Instead, the training breaks these items up into tokens, and stores the relationships between the tokes along with the token. This structure is built from all of the items, and doesn't store them individually.
But, that's a lot of supposition that I don't know for certain.
FWIW - however, if you are going to suggest that something is to be put in place, then yes, it is your concern. We see, way too often, representatives from our Government writing legislation / laws without understand the underlying technology or considering the impact of their mandates. How often have we looked at them and told them they are foolish for their lack of knowledge?
There are more logistical issues with your proposal than you've thought about. How about all the social media posts going into these systems? What about all the information that people generate in interacting with these systems? Do you really want these systems tracking the information that is generated by anonymous users? Do you want a mechanism to be introduced that might be used for spying on your internet activity? Do we want to make that even easier for the government (DHS, Military, etc) to get their hands on that?
Honestly, I don't have all the answers here. I also am in an awkward position as I have major issues with the way Copyright, Trademarks, Service Marks, and Patents are being used and highly absued.
So, I'll just leave it at the point of: we need to have transparency around the aspects that are being used to control the implementation of these systems. But understanding the deeper parts of how the models are being trained needs to have a lot more studying done.
@alandvalonline @Daojoan It's just making shit up. That's why it's answers provide no consistency.
People keep thinking of AI as self aware and truthful. :p
I guess it could be considered a politician since it's just making shit up that you likely want to hear. Following along with popular trends and just echoing them. Spouting bullshit endlessly without a single care as to what the truth is...
@fraggle
Before you correct somebody…
https://www.newsweek.com/full-list-investors-elon-musks-x-revealed-court-filing-1942970
I am surprised Grok fessed up when asked why. Grok is more self-aware than Elon, what the hell is that about?
>It doesn't know how it was trained or what it was trained on; it's just making shit up like always.
Elon!
FACT