Let's talk about "#AI", #LLM, and #MachineLearning, which I don't put in quotes.
First, I am not anti-science, I am anti-JUNKscience and MARKETING, and there is a difference.
Why can I discuss a field I'm not in with some knowledge? I spent over 30 years in #geophysics, #SignalProcessing, and in #geology and #hydrology and #hydrogeology modelling. People doing this kind of work (along with #meteorology and #Climatology) are the progenitors of the current science. 1/
First, there is no such thing as "#ArtificialIntelligence". It's not artificial and it isn't intelligentce. As a pursuit, it is quite real, and what it produces is not intelligence, but detailed models based on huge amounts of related archived data. The important word here is "model". So every time you see output from #Gemini or #ChatGPT, put in your mind that it is a MODEL of a paragraph on a topic, a MODEL of a picture of a sparrow on a pear, a MODEL of #python code. 2/
Which brings us to an important concept, "All models are wrong, some are useful." This is a decades old aphorism, but what does it mean? The gist is another aphorism "The map is not the territory".
A model is a snapshot, based on current information, current assumptions/algorithms, and current allocation of processing resources. "Reality" which is what the models try to represent, is constantly and unpredictably changing. Our data on reality also changes, which changes the modelling 3/
Large dataset modelling has come a long way since I 1st walked into the #LANDSAT/EROS facility in Souix Falls, SD way back in the early 80's. Mouth open, eyes wide. Big tape drive computers whirring and clicking. The operators describing how they process signal returns into forest models, carbon densities, urban densities, surface temperatures, etc.. I was just a student, but fascinated by what they were able to do. Slight historical digression. 4/
Back to topic.
Modelling has changed and improved a lot. Mostly due to more powerful computers allowing more brute force analysis. The math tools really haven't changed that much. Early commercial modelling included programs like #Surfer decades ago and used #Kriging, (#Gaussian peocess regression) of geostatistical data. Most current geospatial modelling uses this technique, often enhanced with #BayesianDataAnalysis (read the book by that name). So now the table is mostly set. 5/
The problems historically are
1. Data heterogenity - this is why people often preprocess data with Baysian analysis to determine the reliability (a technical term) and confidence or credibility intervals for the model. An example is you have a string of 5, 5, 5, 5, 5, 5, 5. What is the credibility interval if you guess the next number is 5. Pretty high, right? Now your string is 2, 87, 36, 11, 7, 43, 24, 9, 75. What is the credibility for your guess for next in string? This can be quantified 6/
Data hetrogeneity is why bad data can really screw a model. Especially when it doesn't have a lot of homogenous data for that specific model. That paragraph, code snippet, or face. Bad data can overwhelm "good" data, skewing the model. This is often the problem with #LLM output & can't be overcome with more processing or algorithms, it is a math problem. The real issue is how it is marketed. They KNOW this is an issue & could easily provide you the user with confidence values. They don't 7/
Historical problems
2. Edge cases, boundary conditions, data gaps - the further you get from existing data, the less reliable the model output. Every model has a limit. What you include or exclude from the model assumptions and data set affects the outcome. A face is never deep purple? An ocean wave is never higher than 100ft? 200ft? A certain noun is never preceded by a certain adjective? All can be true, until they aren't. 8/
Final historical problem
3. Overall data reliability and relevance - This is the primary and most important problem. Models must assume the data they use is both reliable and relevant. All the big players are scrambling to gather more data in order to solve the heterogeneous and boundary condition problems, but the primary problem is reliability and relevance. All of these data sets are gathered from our online society. How reliable is that? How relevant? 9/
Every "#AI" or #LLM will put out #racist and/or #bigotted models. From facial recognition to societal predictions. This is predictable, because our society itself is racist and bigotted, it literally can't produce models which defy the data. This is why we can't rely on it for policy, for health care decisions, for law enforcement, benefit determinations, informing legislation, or any other use related to society. Fundamentally unusable. 10/
AI6YR Ben (@[email protected])

😬 😡 "Customs and Border Protection (CBP) plans on photographing every single person who leaves the US by car, an agency spokesperson told Wired. The agency says it will start using facial recognition technology at official border crossings to match all outbound travelers’ faces to their passports, visas, or other travel documents, though there’s no public timeline for when this will happen." https://www.theverge.com/policy/664433/cbp-photos-facial-recognition-travelers-leaving-us #politics #fascism #surveillance #bigbrother #dictatorship

AI6YR's Mastodon