Google's current #AIIsGoingGreat moment really checks all the bad #AI boxes. Starting with the dismissive "examples we've seen are generally very uncommon queries and aren’t representative of most people’s experiences" - Sure *sometimes* the answers are complete BS and possibly dangerous, but what about the times they aren't? Checkmate, Luddites!
Straight from Google CEO Sundar Pichai's mouth: 'these "hallucinations" are an "inherent feature" of AI large language models (LLM), which is what drives AI Overviews, and this feature "is still an unsolved problem"'
but they're gonna keep band-aiding until it's good, promise! ""Are we making progress? Yes, we are … We have definitely made progress when we look at metrics on factuality year on year. We are all making it better, but it’s not solved""
Today's #AIIsGoingGreat - Meta's chatbot helpfully "confirms" a scammers number is a legitimate facebook support number
(of course, #LLMs just predict likely sequences of text, and for a question like this, "yes" is one of the high probability answers. There's no indication any of the companies hyping LLMs as a source of information have any serious solution for this kind of thing)
https://www.cbc.ca/news/canada/manitoba/facebook-customer-support-scam-1.7219581
Kyle Orland hammers on my off-repeated complaint (https://mastodon.social/@reedmideke/110063208987793683) that filtering your information through an #LLM *removes* useful context: "When Google's AI Overview synthesizes a new summary of the web's top results, on the other hand, all of this personal reliability and relevance context is lost. The Reddit troll gets mixed in with the serious cooking expert"
https://arstechnica.com/ai/2024/06/googles-ai-overviews-misunderstand-why-people-use-google/
Today's #AIIsGoingGreat features Zoom CEO Eric Yuan blazed out of his mind on his own supply: "Today for this session, ideally, I do not need to join. I can send a digital version of myself to join so I can go to the beach. Or I do not need to check my emails; the digital version of myself can read most of the emails. Maybe one or two emails will tell me, “Eric, it’s hard for the digital version to reply. Can you do that?”"
"I truly hate reading email every morning, and ideally, my AI version for myself reads most of the emails. We are not there yet"
OK, points for recognizing we're "not there yet", in roughly the same sense the legend of Icarus foresaw intercontinental jet travel but was "not there yet"
That "#ChatGPT is bullshit" paper I boosted earlier does a nice job of laying out why the "hallucination" terminology is harmful "what occurs in the case of an #LLM delivering false utterances is not an unusual or deviant form of the process it usually goes through… The very same process occurs when its outputs happen to be true"
https://link.springer.com/article/10.1007/s10676-024-09775-5

Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.
Today's #AIIsGoingGreat:
1) Spammers flood web with content farms using #LLMs (based on tech pioneered by Google)
2) Google (reportedly) de-lists sites for having too much spammy LLM content
3) People paying freelancers to write real content run it through unreliable "#AI detectors" to avoid #2
4) Human writers get fired for false positives in #3
https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820
"Seriously, what’s the business model here? To offer ChatGPT and its entire featureset to people, for free, inside of Apple devices, in the hopes that OpenAI can upsell them?" - Hmm, an alternative is OpenAI believes it will be so popular/essential that they can upsell *Apple* when the contract renewal or V2 comes around. Burning VC billions to corner the market and then jacking up the price is a time-honored Valley tradition
Last week, Apple announced “Apple Intelligence,” a suite of features coming to iOS 18 (the next version of the iPhone’s software) in a presentation that FastCompany called “uninspired,” Futurism called “boring,” and Axios claimed “failed to excite investors.” The presentation, given at Apple’s Worldwide Developers conference (usually referred
In today's #AIIsGoingGreat: Adobe apparently adds the AI mark of shame to images that have been touched by its "AI" based tools, no matter how minor the effect. And then Meta uses that mark to (sometimes?) flag the image as "Made with AI"
Arguably correct, but also almost entirely useless… it's not like an image is more or less fake if you generative fill to cover up a spot rather than the regular old clone tool or airbrush
https://www.theverge.com/2024/6/24/24184795/meta-instagram-incorrect-made-by-ai-photo-labels
Today's #AIIsGoingGreat features @MozillaAI "To avoid confirmation bias and subjective interpretation, we decided to leverage language models for a more objective analysis of the data"
Aside from the obvious [citation f-ing needed] on LLMs providing "more objective analysis" what exactly was the input? Oh … "After each conversation, we wrote up summary notes" … definitely no room for bias and subjective interpretation to be introduced there
Accurately predicting the future would be an important milestone in the capabilities of artificial intelligence. However, research on the ability of large language models to provide probabilistic predictions about future events remains nascent. To empirically test this ability, we enrolled OpenAI's state-of-the-art large language model, GPT-4, in a three-month forecasting tournament hosted on the Metaculus platform. The tournament, running from July to October 2023, attracted 843 participants and covered diverse topics including Big Tech, U.S. politics, viral outbreaks, and the Ukraine conflict. Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question. We explore a potential explanation, that GPT-4 might be predisposed to predict probabilities close to the midpoint of the scale, but our data do not support this hypothesis. Overall, we find that GPT-4 significantly underperforms in real-world predictive tasks compared to median human-crowd forecasts. A potential explanation for this underperformance is that in real-world forecasting tournaments, the true answers are genuinely unknown at the time of prediction; unlike in other benchmark tasks like professional exams or time series forecasting, where strong performance may at least partly be due to the answers being memorized from the training data. This makes real-world forecasting tournaments an ideal environment for testing the generalized reasoning and prediction capabilities of artificial intelligence going forward.
#OpenAI "made staff sign employee agreements that required them to waive their federal rights to whistleblower compensation … threatened employees with criminal prosecutions if they reported violations of law to federal authorities under trade secret laws" -
"No reporting crimes" clause in contract has people asking a lot of question already answered by the contract
"big Wall Street investment banks including Goldman Sachs and Barclays, as well as VCs such as Sequoia Capital, have issued reports raising concerns about the sustainability of the AI gold rush, arguing that the technology might not be able to make the kind of money to justify the billions being invested into it" 🥳
https://wapo.st/3zVS5hR
#AIIsGoingGreat, featuring the old Silicon Valley "sell at a loss until everyone is hooked" strategy "[OpenAI] Total revenue has been $283 million per month, or $3.5 to $4.5 billion a year. This would leave a $5 billion shortfall"
Also "OpenAI gets a heavily discounted rate of $1.30 per A100 server per hour. OpenAI has 350,000 such servers, with 290,000 of those used just for ChatGPT" 🤯 https://pivot-to-ai.com/2024/07/24/openai-could-lose-5-billion-in-2024/
"The Dynamics 365 Field Service management system has also integrated Microsoft’s Copilot AI to help generate work orders based on customer requests. Copilot can also summarize ongoing work orders and update existing requests" - I for one cannot think of anything which could possibly go wrong using spicy autocomplete to write work orders for "services like machine maintenance, repair, cleaning, or home healthcare"
https://www.404media.co/how-a-microsoft-app-is-powering-employee-surveillance/
#AIIsGoingGreat
"[Leopold Aschenbrenner] emphasizes this as a critical moment, claiming “the free world’s very survival” is “at stake.” That reaching “superintelligence” first will give the U.S. or China “a decisive economic and military advantage” that determines global hegemony. He is also raising millions of dollars for an investment fund behind this thesis"
Courtesy of @ct_bergstrom*, Today's #AIIsGoingGreat features people who have somehow convinced themselves it's a good use of time to investigate whether an ouroboros of BS generators can do scientific research https://arxiv.org/abs/2408.06292
* https://mastodon.social/@ct_bergstrom@fediscience.org/112957271701969972
One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist
"To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores"
To do this, they "compared the artificially generated decisions with ground truth data for 500 ICLR 2022 papers extracted from the publicly available OpenReview dataset"
Which seems like serious logical error…
How to run Doom badly using only a billion or so times the compute resources required by the original
(snark aside, this is pretty cool)
@reedmideke whether AI or not, I’ve wondered if you could run very specialized video compression on things like recordings/streams of video games on Twitch/Youtube.
A PUBG map is 64 square kilometers and it has a lot of buildings repeated, most of it is basically 2D. Compressing it like it’s live video doesn’t take advantage of how predictable it is.
But bandwidth is probably too cheap to be worth the effort.