Google+ comparison is very apt, but also that opening example really hits the problem I've been yelling about since the #LLM hype cycle started: The fundamental mismatch between a system that randomly makes shit up and the uses it's being hyped for https://www.computerworld.com/article/2117752/google-gemini-ai.html
Gemini is the new Google+

Google's cutting-edge AI technology has a familiar connection to the past — and in this case, that isn't a good thing.

Computerworld
This, right here: "Erm, right. So you can rely on these systems for information - but then you need to go search somewhere else and see if they’re making something up? In that case, wouldn’t it be faster and more effective to, I don’t know, simply look it up yourself in the first place?"

Google's current #AIIsGoingGreat moment really checks all the bad #AI boxes. Starting with the dismissive "examples we've seen are generally very uncommon queries and aren’t representative of most people’s experiences" - Sure *sometimes* the answers are complete BS and possibly dangerous, but what about the times they aren't? Checkmate, Luddites!

https://arstechnica.com/information-technology/2024/05/googles-ai-overview-can-give-false-misleading-and-dangerous-answers/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

Google’s “AI Overview” can give false, misleading, and dangerous answers

From glue-on-pizza recipes to recommending "blinker fluid," Google's AI sourcing needs work.

Ars Technica
And as always, they insist they are fixing it: "We conducted extensive testing before launching this new experience and will use these isolated examples as we continue to refine our systems overall" - with *zero* indication they have a technical or even theoretical path to solving the general problem that #LLMs don't have any concept of truth
And then, the whole thing is made worse by positioning it as a replacement for search, in the top spot with google branding. The "eat rocks" article ranks high in the regular organic search results the same query, but users have a lot more clues that it was a joke
Why am I so sure #AI companies have no serious technical or theoretical solution to the underlying problem that #LLMs have no concept of truth? The fact their approach so far is manually band-aiding results that go viral or put them in legal jeopardy is a pretty big hint! https://www.theverge.com/2024/5/24/24164119/google-ai-overview-mistakes-search-race-openai
Google scrambles to manually remove weird AI answers in search

The company confirmed it is ‘taking swift action’ to remove some of the AI tool’s bizarre responses.

The Verge
👉 "Gary Marcus, an AI expert and an emeritus professor of neural science at New York University, told The Verge that a lot of AI companies are “selling dreams” that this tech will go from 80 percent correct to 100 percent. Achieving the initial 80 percent is relatively straightforward since it involves approximating a large amount of human data, Marcus said, but the final 20 percent is extremely challenging. In fact, Marcus thinks that last 20 percent might be the hardest thing of all"
What they're doing now seems like selling a calculator, and when a screenshot of it saying 2+2=5 goes viral on social media, they add a statement like "if x=2 and y=2 return 4" at the top of the program and say "see, we fixed it!"

Straight from Google CEO Sundar Pichai's mouth: 'these "hallucinations" are an "inherent feature" of AI large language models (LLM), which is what drives AI Overviews, and this feature "is still an unsolved problem"'

but they're gonna keep band-aiding until it's good, promise! ""Are we making progress? Yes, we are … We have definitely made progress when we look at metrics on factuality year on year. We are all making it better, but it’s not solved""

https://futurism.com/the-byte/ceo-google-ai-hallucinations

#AIIsGoingGreat

CEO of Google Says It Has No Solution for Its AI Providing Wildly Incorrect Information

Google CEO Sundar Pichai says problems with its AI can't be solved because hallucinations are an inherent problem in these AI tools.

Futurism
Just occurred to me Mitchell and Webb predicted our current pizza-gluing, gasoline spaghetti #AIIsGoingGreat moment 16 years ago https://www.youtube.com/watch?v=B_m17HK97M8
Mitchell & Webb: Cheesoid

YouTube

Today's #AIIsGoingGreat - Meta's chatbot helpfully "confirms" a scammers number is a legitimate facebook support number

(of course, #LLMs just predict likely sequences of text, and for a question like this, "yes" is one of the high probability answers. There's no indication any of the companies hyping LLMs as a source of information have any serious solution for this kind of thing)

https://www.cbc.ca/news/canada/manitoba/facebook-customer-support-scam-1.7219581

Winnipeg man caught in scam after AI told him fake Facebook customer support number was legitimate | CBC News

A Winnipeg man who says he was scammed out of hundreds of dollars when he called what he thought was a Facebook customer support hotline wants to warn others about what can go wrong. 

CBC

Kyle Orland hammers on my off-repeated complaint (https://mastodon.social/@reedmideke/110063208987793683) that filtering your information through an #LLM *removes* useful context: "When Google's AI Overview synthesizes a new summary of the web's top results, on the other hand, all of this personal reliability and relevance context is lost. The Reddit troll gets mixed in with the serious cooking expert"
https://arstechnica.com/ai/2024/06/googles-ai-overviews-misunderstand-why-people-use-google/

#AIIsGoingGreat

Google’s AI Overviews misunderstand why people use Google

Answers that are factually "wrong" are only part of the problem.

Ars Technica
On the same note, today's #AIIsGoingGreat courtesy of @ppossej, observing Microsoft copilot helpfully "summarizing" a phishing email. Even leaving aside obvious problem here, what exactly is supposed to be the value having an already short email filtered through spicy autocomplete? https://mastodon.social/@[email protected]ocial/112555512126646188
I initially dismissed today's #AIIsGoingGreat (HT @zhuowei) as a joke, but no* : "aiBIOS leverages an LLM to integrate AI capabilities into Insyde Software’s flagship firmware solution, InsydeH2O® UEFI BIOS. It provides the ability to interpret the PC user’s request, analyze their specific hardware, and parse through the LLM’s extensive knowledge base of BIOS and computer terminology to make the appropriate changes to the BIOS Setup"
* not an intentional one, anyway
https://www.insyde.com/press_news/press-releases/insyde%C2%AE-software-brings-higher-intelligence-pcs-aibios%E2%84%A2-technology-be

Today's #AIIsGoingGreat features Zoom CEO Eric Yuan blazed out of his mind on his own supply: "Today for this session, ideally, I do not need to join. I can send a digital version of myself to join so I can go to the beach. Or I do not need to check my emails; the digital version of myself can read most of the emails. Maybe one or two emails will tell me, “Eric, it’s hard for the digital version to reply. Can you do that?”"

https://www.theverge.com/2024/6/3/24168733/zoom-ceo-ai-clones-digital-twins-videoconferencing-decoder-interview

The CEO of Zoom wants AI clones in meetings

Zoom founder Eric Yuan on AI-powered “digital twins,” taking on Microsoft and Google, and the future of remote work,

The Verge

"I truly hate reading email every morning, and ideally, my AI version for myself reads most of the emails. We are not there yet"

OK, points for recognizing we're "not there yet", in roughly the same sense the legend of Icarus foresaw intercontinental jet travel but was "not there yet"

Actually interesting thing in that Eric Yuan interview "every day, I personally spend a lot of time on talking with our customer’s prospects. Guess what? First question they all always ask me now is “What’s your AI strategy? What do you do to embrace AI?…”"
- even if exaggerated, seems like a good indicator of how deeply C-suite types have bought into the hype, which in turn means they all need an "AI strategy" no matter how ludicrous
and the thing is, in terms of their personal incentives, they're probably not wrong. The analysts and shareholders and trade press want the new shiny thing, and if their current business gets caught on the wrong side of the bubble, they keep whatever bonuses they got in the interim and it probably won't hurt their future career prospects much
Bonus #AIIsGoingGreat from NYT with a deep look at (now defunct) skeevy news out BNN Breaking: "employees were asked to put articles from other news sites into the [#LLM] tool so that it could paraphrase them, and then to manually “validate” the results by checking them for errors… Employees did not want their bylines on stories generated purely by A.I., but Mr. Chahal insisted on this. Soon, the tool randomly assigned their names to stories"
https://www.nytimes.com/2024/06/06/technology/bnn-breaking-ai-generated-news.html?u2g=i&unlocked_article_code=1.x00.zn0r.s2tFDDFWR0fo&smid=url-share
#GiftArticle #GiftLink #BNN
The Rise and Fall of BNN Breaking, an AI-Generated News Outlet

BNN Breaking had millions of readers, an international team of journalists and a publishing deal with Microsoft. But it was full of error-ridden content.

The New York Times
#BNN founder Gurbaksh Chahal seems to be an all-around charming fellow "In 2013, he attacked his girlfriend at the time, and was accused of hitting and kicking her more than 100 times, generating significant media attention because it was recorded by a video camera he had installed in the bedroom … After an arrest involving another domestic violence incident with a different partner in 2016, he served six months in jail"
Some might argue that making the #AI acronym do double duty with "Apple Intelligence" is a recipe for confusion, but after all the hype I find it refreshingly honest to position the product as "about as smart as a piece of fruit"

That "#ChatGPT is bullshit" paper I boosted earlier does a nice job of laying out why the "hallucination" terminology is harmful "what occurs in the case of an #LLM delivering false utterances is not an unusual or deviant form of the process it usually goes through… The very same process occurs when its outputs happen to be true"

https://link.springer.com/article/10.1007/s10676-024-09775-5

ChatGPT is bullshit - Ethics and Information Technology

Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

SpringerLink
The authors rightly object to "confabulation" for similar reasons "This term also suggests that there is something exceptional occurring when the LLM makes a false utterance, i.e., that in these occasions - and only these occasions - it “fills in” a gap in memory with something false. This too is misleading. Even when the ChatGPT does give us correct answers, its process is one of predicting the next token"
They are far from the first to make the connection between #LLMs and Frankfurtian bullshit, but humor aside, they do make a compelling case that the terminology matters https://link.springer.com/article/10.1007/s10676-024-09775-5#Sec12
ChatGPT is bullshit - Ethics and Information Technology

Recently, there has been considerable interest in large language models: machine learning systems which produce human-like text and dialogue. Applications of these systems have been plagued by persistent inaccuracies in their output; these are often called “AI hallucinations”. We argue that these falsehoods, and the overall activity of large language models, is better understood as bullshit in the sense explored by Frankfurt (On Bullshit, Princeton, 2005): the models are in an important way indifferent to the truth of their outputs. We distinguish two ways in which the models can be said to be bullshitters, and argue that they clearly meet at least one of these definitions. We further argue that describing AI misrepresentations as bullshit is both a more useful and more accurate way of predicting and discussing the behaviour of these systems.

SpringerLink
"Calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived. This, as we’ve argued, is the wrong metaphor. The machines are not trying to communicate something they believe or perceive. Their inaccuracy is not due to misperception or hallucination … they are not trying to convey information at all. They are bullshitting"

Today's #AIIsGoingGreat:
1) Spammers flood web with content farms using #LLMs (based on tech pioneered by Google)
2) Google (reportedly) de-lists sites for having too much spammy LLM content
3) People paying freelancers to write real content run it through unreliable "#AI detectors" to avoid #2
4) Human writers get fired for false positives in #3

https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820

AI Detectors Get It Wrong. Writers Are Being Fired Anyway

AI is already stealing writers’ work. Now they’re losing jobs over false accusations of using it.

Gizmodo
Much like the #AI industry, the AI detector industry tries to have it both ways, hyping the purported value while covering their ass with disclaimers:
Jonathan Gillham, CEO of Originality.AI says "we advise against the tool being used within academia, and strongly recommend against being used for disciplinary action" while their marketing materials claim the product is “essential” in the classroom https://gizmodo.com/ai-detectors-inaccurate-freelance-writers-fired-1851529820
AI Detectors Get It Wrong. Writers Are Being Fired Anyway

AI is already stealing writers’ work. Now they’re losing jobs over false accusations of using it.

Gizmodo

"Seriously, what’s the business model here? To offer ChatGPT and its entire featureset to people, for free, inside of Apple devices, in the hopes that OpenAI can upsell them?" - Hmm, an alternative is OpenAI believes it will be so popular/essential that they can upsell *Apple* when the contract renewal or V2 comes around. Burning VC billions to corner the market and then jacking up the price is a time-honored Valley tradition

https://www.wheresyoured.at/untitled/

Let Tim Cook

Last week, Apple announced “Apple Intelligence,” a suite of features coming to iOS 18 (the next version of the iPhone’s software) in a presentation that FastCompany called “uninspired,” Futurism called “boring,” and Axios claimed “failed to excite investors.”  The presentation, given at Apple’s Worldwide Developers conference (usually referred

Ed Zitron's Where's Your Ed At
I very much doubt that will happen, but OpenAI are deep enough into their "AGI is just around the corner" kool-aid they might well believe it
In today's #AIIsGoingGreat, we learn that "This technology is proven to have some of the most comprehensive capabilities in the industry, fast and accurate in some of the most demanding conditions" is IBM marketing-speak for "not able to hold down a job at McDonalds" https://www.bbc.com/news/articles/c722gne7qngo
McDonalds removes AI drive-throughs after order errors

The voice recognition system seems not to have recognised what customers were really ordering.

Another great example how #LMM #AI produces extremely confident BS when presented with something (in this case, a rare name) not represented in the training data https://kansasreflector.com/2024/06/22/artificial-intelligence-has-spread-lies-about-my-good-name-and-im-here-to-settle-the-score/
Artificial intelligence has spread lies about my good name, and I'm here to settle the score • Kansas Reflector

Artificial intelligence lies. Everyone knows this by now, of course. Programs such as Chat GPT and Google's "AI overviews" generate nonsense.

Kansas Reflector

In today's #AIIsGoingGreat: Adobe apparently adds the AI mark of shame to images that have been touched by its "AI" based tools, no matter how minor the effect. And then Meta uses that mark to (sometimes?) flag the image as "Made with AI"

Arguably correct, but also almost entirely useless… it's not like an image is more or less fake if you generative fill to cover up a spot rather than the regular old clone tool or airbrush

https://www.theverge.com/2024/6/24/24184795/meta-instagram-incorrect-made-by-ai-photo-labels

Meta is incorrectly marking real photos as ‘Made by AI’

Meta is incorrectly adding its “Made by AI” label to real images photographers have taken. The problem seems to affect photos made with various image editing tools.

The Verge
But where do you draw the line, and how to capture that in metadata?

Today's #AIIsGoingGreat features @MozillaAI "To avoid confirmation bias and subjective interpretation, we decided to leverage language models for a more objective analysis of the data"

Aside from the obvious [citation f-ing needed] on LLMs providing "more objective analysis" what exactly was the input? Oh … "After each conversation, we wrote up summary notes" … definitely no room for bias and subjective interpretation to be introduced there

https://blog.mozilla.ai/uncovering-genai-trends-using-local-language-models-to-explore-35-organizations/

Uncovering GenAI Trends: Using Local Language Models to Explore 35 Organizations

Mozilla.ai spoke with 35 organizations in various sectors, including finance and government to learn how they are using large language models.

Mozilla.ai Blog
They go on provide the output of three models, which seem fairly generic and bland with the occasional grammatical oddity, but without the input, we have no way to judge how accurate or insightful they were. We just get @MozillaAI's subjective "They identified the majority of trends and patterns among the 35 organizations we studied… This exercise showcased how well local language models can extract valuable insights from large text datasets"
They also give us this, which, I dunno, all seem pretty obvious and not at all surprising?
Could have been a much more interesting post if @MozillaAI had tried to rigorously test their assertion that LLMs provide "a more objective analysis of the data"
Today's #AIIsGoingGreat is a system which, for a mere hundred million dollars in training costs, forecasts future events about as well as a coin toss: "Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question" https://arxiv.org/abs/2310.13014
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

Accurately predicting the future would be an important milestone in the capabilities of artificial intelligence. However, research on the ability of large language models to provide probabilistic predictions about future events remains nascent. To empirically test this ability, we enrolled OpenAI's state-of-the-art large language model, GPT-4, in a three-month forecasting tournament hosted on the Metaculus platform. The tournament, running from July to October 2023, attracted 843 participants and covered diverse topics including Big Tech, U.S. politics, viral outbreaks, and the Ukraine conflict. Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question. We explore a potential explanation, that GPT-4 might be predisposed to predict probabilities close to the midpoint of the scale, but our data do not support this hypothesis. Overall, we find that GPT-4 significantly underperforms in real-world predictive tasks compared to median human-crowd forecasts. A potential explanation for this underperformance is that in real-world forecasting tournaments, the true answers are genuinely unknown at the time of prediction; unlike in other benchmark tasks like professional exams or time series forecasting, where strong performance may at least partly be due to the answers being memorized from the training data. This makes real-world forecasting tournaments an ideal environment for testing the generalized reasoning and prediction capabilities of artificial intelligence going forward.

arXiv.org
#AIIsGoingGreat "Companies are acting like generative AI is going to change the world and are acting as such, while the reality is that this is a technology that is currently deeply unreliable and may not change much of anything at all" https://www.404media.co/goldman-sachs-ai-is-overhyped-wildly-expensive-and-unreliable/
Goldman Sachs: AI Is Overhyped, Wildly Expensive, and Unreliable

One of the world's largest investment banks wonders if generative AI will be worth the huge investment and hype: "will this large spend ever pay off?"

404 Media

#OpenAI "made staff sign employee agreements that required them to waive their federal rights to whistleblower compensation … threatened employees with criminal prosecutions if they reported violations of law to federal authorities under trade secret laws" -
"No reporting crimes" clause in contract has people asking a lot of question already answered by the contract

https://wapo.st/3WgZGAh

#GiftArticle #GiftLink

OpenAI illegally barred staff from airing safety risks, whistleblowers say

OpenAI whistleblowers filed a complaint with the SEC where they allege the AI company is silencing employees from sharing concerns about its AI technology.

The Washington Post
Also, did they use ChatGPT to write that shit? Because I'm pretty sure a competent attorney would advise you that a "no reporting crimes" clause is unlikely to work in your favor if crimes are, in fact, reported
"The summary by ChatGPT is pretty often empty waffle. It almost feels like a psychic con (I think I read this analogy somewhere) with its always-true generalisations that do not make an actual point" - Yep, Barnum effect satisfies a goal of "words that sound like they go together in this context" just as well as meaningful statements https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/
When ChatGPT summarises, it actually does nothing of the kind.

One of the use cases I thought was reasonable to expect from ChatGPT and Friends (LLMs) was summarising. It turns out I was wrong. What ChatGPT isn’t summarising at all, it only looks like it…

R&A IT Strategy & Architecture
I'm not sure I entirely buy the author's identification of "shortening" as a distinct behavior, but the point is well made, once again, that LLMs lack actual understanding "To truly summarise, you need to be able to detect that from 40 sentences, 35 are leading up to the 36th, 4 follow it with some additional remarks, but it is that 36th that is essential for the summary and that without that 36th, the content is lost" https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/
When ChatGPT summarises, it actually does nothing of the kind.

One of the use cases I thought was reasonable to expect from ChatGPT and Friends (LLMs) was summarising. It turns out I was wrong. What ChatGPT isn’t summarising at all, it only looks like it…

R&A IT Strategy & Architecture
And again, even if it *sometimes* gives you a decent summary, the *only way to be sure it did* is to actually read the original material yourself, in full ¯\_(ツ)_/¯

"big Wall Street investment banks including Goldman Sachs and Barclays, as well as VCs such as Sequoia Capital, have issued reports raising concerns about the sustainability of the AI gold rush, arguing that the technology might not be able to make the kind of money to justify the billions being invested into it" 🥳
https://wapo.st/3zVS5hR

#GiftArticle #GiftLink

Big Tech says AI is booming. Wall Street is starting to see a bubble.

The industry has rushed head-long into AI, and stock market investors are following them. But a growing group of Wall Street analysts are skeptical profitability.

The Washington Post

#AIIsGoingGreat, featuring the old Silicon Valley "sell at a loss until everyone is hooked" strategy "[OpenAI] Total revenue has been $283 million per month, or $3.5 to $4.5 billion a year. This would leave a $5 billion shortfall"

Also "OpenAI gets a heavily discounted rate of $1.30 per A100 server per hour. OpenAI has 350,000 such servers, with 290,000 of those used just for ChatGPT" 🤯 https://pivot-to-ai.com/2024/07/24/openai-could-lose-5-billion-in-2024/

OpenAI could lose $5 billion in 2024

OpenAI is hemorrhaging cash. It could lose about $5 billion this year and may have to raise more funding. [The Information, paywalled; Data Center Dynamics] The Information report is based on OpenA…

Pivot to AI
I can't help but think we could do something useful or at least fun with that kind of compute power

"The Dynamics 365 Field Service management system has also integrated Microsoft’s Copilot AI to help generate work orders based on customer requests. Copilot can also summarize ongoing work orders and update existing requests" - I for one cannot think of anything which could possibly go wrong using spicy autocomplete to write work orders for "services like machine maintenance, repair, cleaning, or home healthcare"

https://www.404media.co/how-a-microsoft-app-is-powering-employee-surveillance/
#AIIsGoingGreat

How a Microsoft App is Powering Employee Surveillance

Microsoft Dynamics 365 uses AI-generated performance metrics to single out individual workers, a new report has found.

404 Media
#AIIsGoingGreat Operators of bullshit generating machine *shocked* to find bullshit going on in their establishment https://www.theverge.com/2024/7/30/24210108/meta-trump-shooting-ai-hallucinations
Meta apologies after its AI chatbot said Trump shooting didn’t happen

After Meta’s AI assistant was caught denying the attempted assassination of former President Donald Trump, the company is blaming the technology behind its chatbot and others.

The Verge
#AIIsGoingGreat … so great the term is becoming toxic to consumers. Who could have predicted that transparent hype chasing and adding dumb chatbots where no one asked for them would end this way? https://futurism.com/the-byte/study-consumers-turned-off-products-ai
Study Finds Consumers Are Actively Turned Off by Products That Use AI

Researchers have found that including the words "artificial intelligence" in product marketing is a major turn-off for consumers.

Futurism

"[Leopold Aschenbrenner] emphasizes this as a critical moment, claiming “the free world’s very survival” is “at stake.” That reaching “superintelligence” first will give the U.S. or China “a decisive economic and military advantage” that determines global hegemony. He is also raising millions of dollars for an investment fund behind this thesis"

https://www.lawfaremedia.org/article/ai-timelines-and-national-security--the-obstacles-to-agi-by-2027

AI Timelines and National Security: The Obstacles to AGI by 2027

Leopold Aschenbrenner’s “Situational Awareness” builds claims of artificial general intelligence’s imminence on assumptions that demand further scrutiny.

Default
#AIIsGoingGreat "Do not hallucinate. Do not make up factual information" - Welp, problem solved! https://www.theverge.com/2024/8/5/24213861/apple-intelligence-instructions-macos-15-1-sequoia-beta
‘You are a helpful mail assistant,’ and other Apple Intelligence instructions

Here are some of the prompts that Apple Intelligence is using to guide AI models in the macOS 15.1 Sequoia developer beta.

The Verge
Today's #AIIsGoingGreat is brought to you by @jasonkoebler who followed the money all the way to the bottom of Facebook's AI slop pit https://www.404media.co/where-facebooks-ai-slop-comes-from/
Where Facebook's AI Slop Comes From

Facebook itself is paying creators in India, Vietnam, and the Philippines for bizarre AI spam that they are learning to make from YouTube influencers and guides sold on Telegram.

404 Media

Courtesy of @ct_bergstrom*, Today's #AIIsGoingGreat features people who have somehow convinced themselves it's a good use of time to investigate whether an ouroboros of BS generators can do scientific research https://arxiv.org/abs/2408.06292

* https://mastodon.social/@ct_bergstrom@fediscience.org/112957271701969972

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist

arXiv.org

"To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores"

To do this, they "compared the artificially generated decisions with ground truth data for 500 ICLR 2022 papers extracted from the publicly available OpenReview dataset"

Which seems like serious logical error…

They wrote a system which (according to them) scores real, honestly written human papers similarly to how humans would score them. They then use that to evaluate probabilistically generated imitations of human written papers, implicitly assuming an imitation scored by this method is "as good" as a similarly scored human written paper. But IMO this does not follow…
Their system confirms the imitation *looks like* a similarly scored human written paper, but all that says is it's a good imitation, not that it actually has characteristics we value, like accuracy and logical consistency
Notably, they do not appear to have confirmed the "near-human performance" for imitation papers. This could have been done by having real humans critically evaluate the imitations (knowing it was an LLM output and focusing on the actual logic and accuracy of the content), and then comparing the scores with the "automated reviewer", but that would be a lot of work ¯\_(ツ)_/¯
Think your spicy autocomplete can actually do research? OK, tell your chatbot to go to https://www.stsci.edu/ftp/presto/ops/program-lists/HST-TAC.html, find a random completed observation, download the proposal and the FITS files and write the paper
HST Time Allocation Committee Programs

How to run Doom badly using only a billion or so times the compute resources required by the original

(snark aside, this is pretty cool)

https://arstechnica.com/information-technology/2024/08/new-ai-model-can-hallucinate-a-game-of-1993s-doom-in-real-time/

New AI model can hallucinate a game of 1993’s Doom in real time

Dobos: “Why write rules for software by hand when AI can just think every pixel for you?”…

Ars Technica
You know #AIIsGoingGreat when convicted fraudsters Jacob Wohl and Jack Burkman jump on the bandwagon using fake names to found an AI lobbying startup https://www.politico.com/news/2024/09/02/jacob-wohl-jack-burkman-ai-lobbying-pseudonyms-00176917

@reedmideke whether AI or not, I’ve wondered if you could run very specialized video compression on things like recordings/streams of video games on Twitch/Youtube.

A PUBG map is 64 square kilometers and it has a lot of buildings repeated, most of it is basically 2D. Compressing it like it’s live video doesn’t take advantage of how predictable it is.

But bandwidth is probably too cheap to be worth the effort.