How does this pic show that Elon Musk doesnt know SQL?

https://infosec.pub/post/23705901

How does this pic show that Elon Musk doesnt know SQL? - Infosec.Pub

I’m a tech interested guy. I’ve touched SQL once or twice, but wasn’t able to really make sense of it. That combined with not having a practical use leaves SQL as largely a black box in my mind (though I am somewhat familiar with technical concepts in databasing). With that, I keep seeing [pic related] as proof that Elon Musk doesn’t understand SQL. Can someone give me a technical explanation for how one would come to that conclusion? I’d love if you could pass technical documentation for that.

The statement “this [guy] thinks the government uses SQL” demonstrates a complete and total lack of knowledge as to what SQL even is. Every government on the planet makes extensive and well documented use of it.

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

If he knew the domain, he would know this isn’t an issue. If he knew the technology he would be able to see the constraint and following investigation, reach the conclusion that it’s not an issue.

The man continues to be a malignant moron

The initial statement I believe is down to a combination of the above and also the lack of domain knowledge around social security. The primary key on the social security table would be a composite key of both the SSN and a date of birth—duplicates are expected of just parts of the key.

Since SSNs are never reused, what would be the purpose of using the SSN and birth date together as part of the primary key? I guess it is the one thing that isn’t supposed to ever change (barring a clerical error) so I could see that as a good second piece of information, just not sure what it would be adding.

Note: if duplicate SSNs are accidentally issued my understanding is that they issue a new one to one of the people.

www.ssa.gov/history/hfaq.html

Q20: Are Social Security numbers reused after a person dies?

A: No. We do not reassign a Social Security number (SSN) after the number holder’s death. Even though we have issued over 453 million SSNs so far, and we assign about 5 and one-half million new numbers a year, the current numbering system will provide us with enough new numbers for several generations into the future with no changes in the numbering system.

Social Security History

The Official History Website for the U.S. Social Security Administration.

Take this with a grain of salt as I’m not a dev, but do work on CMS reporting for a health information tech company. Depending on how the database is designed an SSN could appear in multiple tables.

In my experience reduplication happens as part of generating a report so that all relevant data related to a key and scope of the report can be gathered from the various tables.

It is common for long lived databases with a rotating cast of devs to use different formats in different tables as well! One might have it as a string, one might have it as a number, and the other might have it with hyphens in the same database.

Hell, I work in a state agency and one of our older databases has a dozen tables with databases.

  • One has the whole thing as a long int: 222333444
  • One has the whole thing as a string: 2223334444 (which of course can’t be directly compared to the one that is a long int…)
  • One has separate fields for area code and the rest with a hyphen: 222 and 333-4444
  • One has the whole thing with parenthesis, a space, and a hyphen as a string: (222) 333-4444

The main reason for the discrepancy is not looking at what was used before or not understanding that they can always change the formatting when displayed so they don’t need to include the parenthesis or hyphens in the database itself.

Okay but if that happens, musk is right that that’s a bit of a denormalization issue that mayne needs resolving.

SSNs should be stored as strings without any hyphen or additional markup, nothing else.

  • Storing as a number can cause issues if you ever wanna support trailing zeros
  • any “styling” like hyphens should be handled by a consuming front end system, you want only the important data in the DB to maximize query times

It’s more likely though it’s just a composite key…

This is not what he is actively doing though. He isn’t trying to improve databases.

He is tearing down entire departments and agencies and using shit like this to justify it.

Sure but my point is, if it was the scenario you described, then Elon would be talking about the right kind of denormalization problem.

Denormalization due to multiple different storing their own copies if the same data, in different formats worse yet, would actually be the kind if problem he’s tweeting about.

As opposed to a composite key on one table which means him being an ultracrepidarian, as usual.

Musk canceled the support for the long running Common Educstion Data Standards (CEDS) which is an initiative to promote better database standards and normalization for the states to address this kind of thing.

It does not fucking matter if he is technically correct about one tiny detail because he is only using to to destroy, not to improve efficiency.

I mean it matters here, as it’s literally the topic being actively discussed by the person who literally asked, so obviously it matters to them lol

The thing is, there are a large number of different reasons to store an SSN as a long int or a string depending on how it is used with the rest of the data. For a phone number, there can be a valid reason to store the area code separately to speed up data queries that narrow down by area code instead of all in one field and peeling it apart. There are also reasons to have additional, seemingly redundant, columns that can be used for optimizing searches or simplifying how queries are written.

A common one is that using 1 and 0 instead of Y an N is often faster for massively large dataset optimization, but isn’t as easily human readable.

There are complex reasons for choosing different approaches in a database, and the most important thing is generally consistency within the database. His point is meaningless without context beyond consistency, and the different government systems will have had different priorities, not to mention trying to update all of the databases to make them consistent is a MASSIVE fucking undertaking. And the systems can stay the way they are as long as they have APIs or other methods of transferring data that ARE normalized and consistent.

I have personally been working with reporting data to federal systems for 15 years as a semi knowledgeable technical person. This is what I do for a job. What he is saying is pointlessly small trivia used to justify tearing things down instead of improving them.

They weren’t justifying anything or making a moral statement, they were just discussing the technical question that was posed.