But where do you draw the line, and how to capture that in metadata?

Today's #AIIsGoingGreat features @MozillaAI "To avoid confirmation bias and subjective interpretation, we decided to leverage language models for a more objective analysis of the data"

Aside from the obvious [citation f-ing needed] on LLMs providing "more objective analysis" what exactly was the input? Oh … "After each conversation, we wrote up summary notes" … definitely no room for bias and subjective interpretation to be introduced there

https://blog.mozilla.ai/uncovering-genai-trends-using-local-language-models-to-explore-35-organizations/

Uncovering GenAI Trends: Using Local Language Models to Explore 35 Organizations

Mozilla.ai spoke with 35 organizations in various sectors, including finance and government to learn how they are using large language models.

Mozilla.ai Blog
They go on provide the output of three models, which seem fairly generic and bland with the occasional grammatical oddity, but without the input, we have no way to judge how accurate or insightful they were. We just get @MozillaAI's subjective "They identified the majority of trends and patterns among the 35 organizations we studied… This exercise showcased how well local language models can extract valuable insights from large text datasets"
They also give us this, which, I dunno, all seem pretty obvious and not at all surprising?
Could have been a much more interesting post if @MozillaAI had tried to rigorously test their assertion that LLMs provide "a more objective analysis of the data"
Today's #AIIsGoingGreat is a system which, for a mere hundred million dollars in training costs, forecasts future events about as well as a coin toss: "Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question" https://arxiv.org/abs/2310.13014
Large Language Model Prediction Capabilities: Evidence from a Real-World Forecasting Tournament

Accurately predicting the future would be an important milestone in the capabilities of artificial intelligence. However, research on the ability of large language models to provide probabilistic predictions about future events remains nascent. To empirically test this ability, we enrolled OpenAI's state-of-the-art large language model, GPT-4, in a three-month forecasting tournament hosted on the Metaculus platform. The tournament, running from July to October 2023, attracted 843 participants and covered diverse topics including Big Tech, U.S. politics, viral outbreaks, and the Ukraine conflict. Focusing on binary forecasts, we show that GPT-4's probabilistic forecasts are significantly less accurate than the median human-crowd forecasts. We find that GPT-4's forecasts did not significantly differ from the no-information forecasting strategy of assigning a 50% probability to every question. We explore a potential explanation, that GPT-4 might be predisposed to predict probabilities close to the midpoint of the scale, but our data do not support this hypothesis. Overall, we find that GPT-4 significantly underperforms in real-world predictive tasks compared to median human-crowd forecasts. A potential explanation for this underperformance is that in real-world forecasting tournaments, the true answers are genuinely unknown at the time of prediction; unlike in other benchmark tasks like professional exams or time series forecasting, where strong performance may at least partly be due to the answers being memorized from the training data. This makes real-world forecasting tournaments an ideal environment for testing the generalized reasoning and prediction capabilities of artificial intelligence going forward.

arXiv.org
#AIIsGoingGreat "Companies are acting like generative AI is going to change the world and are acting as such, while the reality is that this is a technology that is currently deeply unreliable and may not change much of anything at all" https://www.404media.co/goldman-sachs-ai-is-overhyped-wildly-expensive-and-unreliable/
Goldman Sachs: AI Is Overhyped, Wildly Expensive, and Unreliable

One of the world's largest investment banks wonders if generative AI will be worth the huge investment and hype: "will this large spend ever pay off?"

404 Media

#OpenAI "made staff sign employee agreements that required them to waive their federal rights to whistleblower compensation … threatened employees with criminal prosecutions if they reported violations of law to federal authorities under trade secret laws" -
"No reporting crimes" clause in contract has people asking a lot of question already answered by the contract

https://wapo.st/3WgZGAh

#GiftArticle #GiftLink

OpenAI illegally barred staff from airing safety risks, whistleblowers say

OpenAI whistleblowers filed a complaint with the SEC where they allege the AI company is silencing employees from sharing concerns about its AI technology.

The Washington Post
Also, did they use ChatGPT to write that shit? Because I'm pretty sure a competent attorney would advise you that a "no reporting crimes" clause is unlikely to work in your favor if crimes are, in fact, reported
"The summary by ChatGPT is pretty often empty waffle. It almost feels like a psychic con (I think I read this analogy somewhere) with its always-true generalisations that do not make an actual point" - Yep, Barnum effect satisfies a goal of "words that sound like they go together in this context" just as well as meaningful statements https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/
When ChatGPT summarises, it actually does nothing of the kind.

One of the use cases I thought was reasonable to expect from ChatGPT and Friends (LLMs) was summarising. It turns out I was wrong. What ChatGPT isn’t summarising at all, it only looks like it…

R&A IT Strategy & Architecture
I'm not sure I entirely buy the author's identification of "shortening" as a distinct behavior, but the point is well made, once again, that LLMs lack actual understanding "To truly summarise, you need to be able to detect that from 40 sentences, 35 are leading up to the 36th, 4 follow it with some additional remarks, but it is that 36th that is essential for the summary and that without that 36th, the content is lost" https://ea.rna.nl/2024/05/27/when-chatgpt-summarises-it-actually-does-nothing-of-the-kind/
When ChatGPT summarises, it actually does nothing of the kind.

One of the use cases I thought was reasonable to expect from ChatGPT and Friends (LLMs) was summarising. It turns out I was wrong. What ChatGPT isn’t summarising at all, it only looks like it…

R&A IT Strategy & Architecture
And again, even if it *sometimes* gives you a decent summary, the *only way to be sure it did* is to actually read the original material yourself, in full ¯\_(ツ)_/¯

"big Wall Street investment banks including Goldman Sachs and Barclays, as well as VCs such as Sequoia Capital, have issued reports raising concerns about the sustainability of the AI gold rush, arguing that the technology might not be able to make the kind of money to justify the billions being invested into it" 🥳
https://wapo.st/3zVS5hR

#GiftArticle #GiftLink

Big Tech says AI is booming. Wall Street is starting to see a bubble.

The industry has rushed head-long into AI, and stock market investors are following them. But a growing group of Wall Street analysts are skeptical profitability.

The Washington Post

#AIIsGoingGreat, featuring the old Silicon Valley "sell at a loss until everyone is hooked" strategy "[OpenAI] Total revenue has been $283 million per month, or $3.5 to $4.5 billion a year. This would leave a $5 billion shortfall"

Also "OpenAI gets a heavily discounted rate of $1.30 per A100 server per hour. OpenAI has 350,000 such servers, with 290,000 of those used just for ChatGPT" 🤯 https://pivot-to-ai.com/2024/07/24/openai-could-lose-5-billion-in-2024/

OpenAI could lose $5 billion in 2024

OpenAI is hemorrhaging cash. It could lose about $5 billion this year and may have to raise more funding. [The Information, paywalled; Data Center Dynamics] The Information report is based on OpenA…

Pivot to AI
I can't help but think we could do something useful or at least fun with that kind of compute power

"The Dynamics 365 Field Service management system has also integrated Microsoft’s Copilot AI to help generate work orders based on customer requests. Copilot can also summarize ongoing work orders and update existing requests" - I for one cannot think of anything which could possibly go wrong using spicy autocomplete to write work orders for "services like machine maintenance, repair, cleaning, or home healthcare"

https://www.404media.co/how-a-microsoft-app-is-powering-employee-surveillance/
#AIIsGoingGreat

How a Microsoft App is Powering Employee Surveillance

Microsoft Dynamics 365 uses AI-generated performance metrics to single out individual workers, a new report has found.

404 Media
#AIIsGoingGreat Operators of bullshit generating machine *shocked* to find bullshit going on in their establishment https://www.theverge.com/2024/7/30/24210108/meta-trump-shooting-ai-hallucinations
Meta apologies after its AI chatbot said Trump shooting didn’t happen

After Meta’s AI assistant was caught denying the attempted assassination of former President Donald Trump, the company is blaming the technology behind its chatbot and others.

The Verge
#AIIsGoingGreat … so great the term is becoming toxic to consumers. Who could have predicted that transparent hype chasing and adding dumb chatbots where no one asked for them would end this way? https://futurism.com/the-byte/study-consumers-turned-off-products-ai
Study Finds Consumers Are Actively Turned Off by Products That Use AI

Researchers have found that including the words "artificial intelligence" in product marketing is a major turn-off for consumers.

Futurism

"[Leopold Aschenbrenner] emphasizes this as a critical moment, claiming “the free world’s very survival” is “at stake.” That reaching “superintelligence” first will give the U.S. or China “a decisive economic and military advantage” that determines global hegemony. He is also raising millions of dollars for an investment fund behind this thesis"

https://www.lawfaremedia.org/article/ai-timelines-and-national-security--the-obstacles-to-agi-by-2027

AI Timelines and National Security: The Obstacles to AGI by 2027

Leopold Aschenbrenner’s “Situational Awareness” builds claims of artificial general intelligence’s imminence on assumptions that demand further scrutiny.

Default
#AIIsGoingGreat "Do not hallucinate. Do not make up factual information" - Welp, problem solved! https://www.theverge.com/2024/8/5/24213861/apple-intelligence-instructions-macos-15-1-sequoia-beta
‘You are a helpful mail assistant,’ and other Apple Intelligence instructions

Here are some of the prompts that Apple Intelligence is using to guide AI models in the macOS 15.1 Sequoia developer beta.

The Verge
Today's #AIIsGoingGreat is brought to you by @jasonkoebler who followed the money all the way to the bottom of Facebook's AI slop pit https://www.404media.co/where-facebooks-ai-slop-comes-from/
Where Facebook's AI Slop Comes From

Facebook itself is paying creators in India, Vietnam, and the Philippines for bizarre AI spam that they are learning to make from YouTube influencers and guides sold on Telegram.

404 Media

Courtesy of @ct_bergstrom*, Today's #AIIsGoingGreat features people who have somehow convinced themselves it's a good use of time to investigate whether an ouroboros of BS generators can do scientific research https://arxiv.org/abs/2408.06292

* https://mastodon.social/@ct_bergstrom@fediscience.org/112957271701969972

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

One of the grand challenges of artificial general intelligence is developing agents capable of conducting scientific research and discovering new knowledge. While frontier models have already been used as aides to human scientists, e.g. for brainstorming ideas, writing code, or prediction tasks, they still conduct only a small part of the scientific process. This paper presents the first comprehensive framework for fully automatic scientific discovery, enabling frontier large language models to perform research independently and communicate their findings. We introduce The AI Scientist, which generates novel research ideas, writes code, executes experiments, visualizes results, describes its findings by writing a full scientific paper, and then runs a simulated review process for evaluation. In principle, this process can be repeated to iteratively develop ideas in an open-ended fashion, acting like the human scientific community. We demonstrate its versatility by applying it to three distinct subfields of machine learning: diffusion modeling, transformer-based language modeling, and learning dynamics. Each idea is implemented and developed into a full paper at a cost of less than $15 per paper. To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores. The AI Scientist can produce papers that exceed the acceptance threshold at a top machine learning conference as judged by our automated reviewer. This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems. Our code is open-sourced at https://github.com/SakanaAI/AI-Scientist

arXiv.org

"To evaluate the generated papers, we design and validate an automated reviewer, which we show achieves near-human performance in evaluating paper scores"

To do this, they "compared the artificially generated decisions with ground truth data for 500 ICLR 2022 papers extracted from the publicly available OpenReview dataset"

Which seems like serious logical error…

They wrote a system which (according to them) scores real, honestly written human papers similarly to how humans would score them. They then use that to evaluate probabilistically generated imitations of human written papers, implicitly assuming an imitation scored by this method is "as good" as a similarly scored human written paper. But IMO this does not follow…
Their system confirms the imitation *looks like* a similarly scored human written paper, but all that says is it's a good imitation, not that it actually has characteristics we value, like accuracy and logical consistency
Notably, they do not appear to have confirmed the "near-human performance" for imitation papers. This could have been done by having real humans critically evaluate the imitations (knowing it was an LLM output and focusing on the actual logic and accuracy of the content), and then comparing the scores with the "automated reviewer", but that would be a lot of work ¯\_(ツ)_/¯
Think your spicy autocomplete can actually do research? OK, tell your chatbot to go to https://www.stsci.edu/ftp/presto/ops/program-lists/HST-TAC.html, find a random completed observation, download the proposal and the FITS files and write the paper
HST Time Allocation Committee Programs

How to run Doom badly using only a billion or so times the compute resources required by the original

(snark aside, this is pretty cool)

https://arstechnica.com/information-technology/2024/08/new-ai-model-can-hallucinate-a-game-of-1993s-doom-in-real-time/

New AI model can hallucinate a game of 1993’s Doom in real time

Dobos: “Why write rules for software by hand when AI can just think every pixel for you?”…

Ars Technica
You know #AIIsGoingGreat when convicted fraudsters Jacob Wohl and Jack Burkman jump on the bandwagon using fake names to found an AI lobbying startup https://www.politico.com/news/2024/09/02/jacob-wohl-jack-burkman-ai-lobbying-pseudonyms-00176917

#AIIsGoingGreat "Reviewers told the report’s authors that AI summaries often missed emphasis, nuance and context; included incorrect information or missed relevant information; and sometimes focused on auxiliary points or introduced irrelevant information. Three of the five reviewers said they guessed that they were reviewing AI content" - Pretty much what you'd expect from using autocomplete to generate a summary-shaped thing without actual understanding

https://www.crikey.com.au/2024/09/03/ai-worse-summarising-information-humans-government-trial/

AI worse than humans in every way at summarising information, government trial finds

A test of AI for Australia's corporate regulator found that the technology might actually make more work for people, not less.

Crikey
#AIIsGoingGreat "The reviewers’ overall feedback was that they felt AI summaries may be counterproductive and create further work because of the need to fact-check and refer to original submissions which communicated the message better and more concisely" - 💯 my perennial #LLM gripe: Even if it's *mostly* good, you always need a subject matter expert to be sure it hasn't drifted off into total BS. In which case, why not just have the SME to do the job? 🤔 https://www.crikey.com.au/2024/09/03/ai-worse-summarising-information-humans-government-trial/
AI worse than humans in every way at summarising information, government trial finds

A test of AI for Australia's corporate regulator found that the technology might actually make more work for people, not less.

Crikey

Incidentally that ASIC report does pretty much what I flamed* @[email protected] for not doing when they hyped using AI to summarize: Compare the results of humans doing the same task, as evaluated by humans https://www.aph.gov.au/DocumentStore.ashx?id=b4fd6043-6626-4cbe-b8ee-a5c7319e94a0

* https://mastodon.social/@reedmideke/112680749168177728

Recurring phenomena I've noticed with gee-whiz AI results is that they frequently use some automated metric to score the result, rather than having humans critically evaluating the final product. Presumably, because
1) Humans are expensive
2) The product is often obviously crap to humans, but not by whatever metric they chose
FBI busts musician’s elaborate AI-powered $10M streaming-royalty heist

Feds say it’s the first US criminal case involving artificially inflated music streaming.

Ars Technica
Double plus supplemental #AIIsGoingGreat: Hobo standing in the middle of the street yelling about how spicy autocomplete sent them to a non-existent shelter it hallucinated, again https://www.gov.ca.gov/2024/09/05/governor-newsom-seeks-to-harness-the-power-of-genai-to-address-homelessness-other-challenges/
Governor Newsom seeks to harness the power of GenAI to address homelessness, other challenges | Governor of California

Governor of California
Today's #AIIsGoingGreat (courtesy of @davidgerard) features former Kubient CEO Paul Roberts. Specifically, it features him pleading guilty to accounting fraud in relation to a scheme to falsely portray the startup's "AI click fraud detection" technology as something that actually worked and generated revenue https://pivot-to-ai.com/2024/09/18/kubients-adtech-use-case-for-ai-an-excuse-for-a-fraud/
Kubient’s adtech use case for AI: an excuse for a fraud

Tiny adtech company Kubient shut down in late 2023 when CEO Paul Roberts was caught faking revenue numbers on his AI fraud detection software to lure investment in. He pleaded guilty on Monday. [Do…

Pivot to AI
My big takeaway from Altman's "OMG SUPERINTELLIGENCE IS RIGHT AROUND THE CORNER" blog is that his company is losing money hand over fist while trying to close a $6.5 billion funding round in an environment where the likes of Goldman Sachs are saying "hey guys, this AI thing isn't really paying off and looks like an overhyped bubble"
https://arstechnica.com/information-technology/2024/09/ai-superintelligence-looms-in-sam-altmans-new-essay-on-the-intelligence-age/
OpenAI CEO: We may have AI superintelligence in “a few thousand days”

Altman says “deep learning worked” and will lead to “massive prosperity.”…

Ars Technica

Infosec people: Untrusted, unsanitized inputs have been the bane of our existence for the last 40 years
Tech CEOs: We're betting billions of dollars the next big thing is a black box filled with pure essence of untrusted, unsanitizable inputs

https://arstechnica.com/security/2024/09/false-memories-planted-in-chatgpt-give-hacker-persistent-exfiltration-channel/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

#AIIsGoingGreat

Hacker plants false memories in ChatGPT to steal user data in perpetuity

Emails, documents, and other untrusted content can plant malicious memories.

Ars Technica

Microsoft: If we add just one more <s>overbalanced wheel</s> layer of BS generators to our <s>over-unity machine</s> AI, it will really work this time for sure!

https://www.theverge.com/2024/9/24/24253452/microsoft-correction-ai-safety-tool-fix-errors

#AIIsGoingGreat

Microsoft claims its AI safety tool not only finds errors but also fixes them

Microsoft is launching a new correction feature in its Azure AI Studio that can identify, flag, and correct inaccurate outputs from AI models.

The Verge

OG #ChatGPTLawyer-as-a-service bro Joshua Browder of DoNotPay gets a slap on the wrist from the FTC. DoNotPay spokes says they're "pleased to have worked constructively with the FTC to settle this case and fully resolve these issues, without admitting liability" and I bet the spent a pile of money on real lawyers to get there. Oh, and they also paid the FTC $193,000

https://arstechnica.com/tech-policy/2024/09/startup-behind-worlds-first-robot-lawyer-to-pay-193k-for-false-ads-ftc-says/?utm_brand=arstechnica&utm_social-type=owned&utm_source=mastodon&utm_medium=social

DoNotPay has to pay $193K for falsely touting untested AI lawyer, FTC says

You can't "sue anyone with a click of a button" without testing it first, FTC says.

Ars Technica
Meanwhile Zuck says that since Meta is ripping off billions of people, the fact they ripped of any specific individual is trifling and insignificant https://www.theverge.com/2024/9/25/24254042/mark-zuckerberg-creators-value-ai-meta
Mark Zuckerberg: creators and publishers ‘overestimate the value’ of their work for training AI

Meta CEO Mark Zuckerberg says the company could strike partnerships for “valuable” content to train AI tools, but that it could walk away from paying others.

The Verge
But today's #AIIsGoingGreat star is undoubtedly HP, who are doing their part to pop the AI bubble by associating with it with their ink extortion racket https://www.theverge.com/2024/9/25/24254129/hp-print-ai-beta-launch-printers
Finally, HP is adding AI to its printers

HP is launching new Print AI features that can optimize webpages and spreadsheets for printing as well as customize photos for greeting cards.

The Verge
Today's #AIIsGoingGreat features erstwhile expert witness Charles Ranson who "was adamant in his testimony that the use of Copilot or other artificial intelligence tools, for drafting expert reports is generally accepted in the field of fiduciary services and represents the future of analysis of fiduciary decisions;" but "could not name any publications regarding its use or any other sources to confirm that it is a generally accepted methodology"
https://arstechnica.com/tech-policy/2024/10/judge-confronts-expert-witness-who-used-copilot-to-fake-expertise/
Expert witness used Copilot to make up fake damages, irking judge

Judge calls for a swift end to experts secretly using AI to sway cases.

Ars Technica
"Despite his reliance on artificial intelligence, Mr. Ranson could not recall what input or prompt he used to assist him with the Supplemental Damages Report. He also could not state what sources Copilot relied upon and could not explain any details about how Copilot works or how it arrives at a given output. There was no testimony on whether these Copilot calculations considered any fund fees or tax implications" https://law.justia.com/cases/new-york/other-courts/2024/2024-ny-slip-op-24258.html
Matter of Weber

Matter of Weber - 2024 NY Slip Op 24258

Justia Law
While the immediate fault is obviously Ranson's, this is also an entirely foreseeable result of tech companies marketing these things magic answer boxes, no matter how many CYA disclaimers they put in the fine print
A product so good it sells itself (if you give it away free and throw in a $2.5 million cash sweetener) https://www.theverge.com/2024/10/22/24276747/microsoft-openai-news-outlets-10-million-ai-tools
Microsoft and OpenAI are giving news outlets $10 million to use AI tools

Microsoft and OpenAI are offering news outlets like The Seattle Times and The Minnesota Star Tribune up to $10 million to experiment with and use AI tools.

The Verge

"The White House is directing the Pentagon and intelligence agencies to increase their adoption of artificial intelligence" 🤨
"The memo also specifically requires agencies to monitor the risk AI systems can pose when it comes to privacy, discrimination and human rights" - I'd hope they're also required to monitor the risk it makes shit up
(yeah, a lot of militarily relevant AI isn't genAI but still)

https://www.washingtonpost.com/technology/2024/10/24/white-house-ai-nation-security-memo/

White House orders Pentagon and intel agencies to increase use of AI

The Biden administration will use a national security memo to direct agencies to embrace artificial intelligence, as the United States competes with China.

The Washington Post
Cybercheck has secured murder convictions. It appears to just run websites through a chatbot

Cybercheck, from Global Intelligence, claims it can find the key evidence to nail down a case. Cybercheck reports have been involved in at least two murder convictions. Cybercheck hands the police …

Pivot to AI

What could be better than having your medical visits transcribed by an #AI prone to making shit up? Deleting the original so no one can prove it "It’s impossible to compare Nabla’s AI-generated transcript to the original recording because Nabla’s tool erases the original audio for “data safety reasons,” Raison said"

https://apnews.com/article/ai-artificial-intelligence-health-business-90020cdf5fa16c79ca2e5b6c4c9bbb14

#AIIsGoingGreat

Researchers say AI transcription tool used in hospitals invents things no one ever said

Whisper is a popular transcription tool powered by artificial intelligence, but it has a major flaw. It makes things up that were never said. Whisper was created by OpenAI. It's being used in many industries worldwide to translate and transcribe interviews, generate text in popular consumer technologies and create subtitles for videos. OpenAI has promoted Whisper as having near “human level robustness and accuracy." But more than a dozen computer scientists and software developers tell The Associated Press that isn’t always the case and that it's prone to making up chunks of text and even entire sentences. An OpenAI spokesperson says the company studies how to reduce that and updates its models incorporating feedback received.

AP News

So at first glance, this is just a typical #AIIsGoingGreat - Alaska Education Commissioner Deena Bishop used spicy autocomplete and it made shit up like it so often does, but also… the excuse about the bogus citations being "placeholders" seems like a clear admission she started with the desired policy (restrict smartphones in schools) and then tried to generate a post-hoc justification, without even doing a basic literature review

https://alaskabeacon.com/2024/10/28/alaska-education-department-published-false-ai-generated-academic-citations-in-cell-policy-document/

False citations show Alaska education official relied on generative AI, raising broader questions • Alaska Beacon

Department of Education and Early Development Commissioner Bishop said the false citations were in a draft she used generative AI to create.

Alaska Beacon

Today's #AIIsGoingGreat: German journalist Martin Bernklau discovers Microsoft #Copilot says he committed crimes he reported on, and also helpfully provides directions to his home. Microsoft subsequently seems to have taken the typical band-aid approach and blocked his name… because, of course, none of these companies setting billions on fire to chase #AI hype have any idea how to solve the general case of LLMs making shit up

https://www.abc.net.au/news/2024-11-04/ai-artificial-intelligence-hallucinations-defamation-chatgpt/104518612

AI hallucinations caused artificial intelligence to falsely described these people as criminals

Unprecedented legal battles are testing if parent companies of tools like ChatGPT can be liable for defamation when innocent people are incorrectly described as criminals.

ABC News
Admit I've been a skeptic, but it looks like the payoff for the billions of dollars the tech industry dumped into AI is here: "Microsoft is adding AI-powered themes to Outlook … this AI-powered feature will require a Copilot Pro or business license to add a more personalized look to Microsoft’s email client… You’ll be able to create a theme based on the weather or locations, and they can dynamically update every few hours, each day, weekly, or monthly" https://www.theverge.com/2024/11/7/24290273/microsoft-outlook-ai-themes-copilot
#AIIsGoingGreat
Microsoft Outlook now has dynamic AI-powered themes

Microsoft is adding AI-powered themes to its Outlook email client. You’ll need a Copilot license to use them, and they can dynamically update.

The Verge
In today's #AIIsGoingGreat (ht @daedalus), a franchisee of Australian real estate firm LJ Hooker demonstrates what a crock of shit "have an #LLM write it and a human check it" usually is: If it saves you time, it's a pretty good indication your humans are not actually checking it in a meaningful way
https://www.theguardian.com/australia-news/2024/nov/11/lj-hooker-branch-used-ai-to-generate-real-estate-listing-with-non-existent-schools
LJ Hooker branch used AI to generate real estate listing with non-existent schools

Agency apologises after an ad said a house in Farley, NSW, was close to two ‘excellent’ schools even though there are none in the town

The Guardian

Also real estate dude's process is a pretty perfect anti-usecase: "Huynh said he would usually input the address of a rental property and the basic description such as how many bedrooms and bathrooms it had into ChatGPT"
At the very best, all an #LLM can add is irrelevant fluff or widely known facts about the general region. It cannot reliably add factual information about individual houses or neighborhoods, and more often it'll just make shit up

#AIIsGoingGreat

Oh, team involved in that "AI scientist" preprint I dunked on earlier* included "researchers from the buzzy Tokyo-based startup Sakana AI"

Anyway they allow that their "scientist" making up 10% of the numbers in its "papers" is "probably unacceptable" and then go on to talk about how it could be improved without addressing the possibility that making shit up is an inherent characteristic of LLMs https://spectrum.ieee.org/ai-for-science-2

* https://mastodon.social/@reedmideke/112957617464258809

Will the "AI Scientist" Bring Anything to Science?

<p> A tool to take over the scientific process continues a controversial trend </p>

IEEE Spectrum

Today's #AIIsGoingGreat "…results from a hard-coded filter that puts the brakes on the AI model's output before returning it to the user" - Demonstrating once again that despite setting hundreds of billions of dollars on fire, #LLM #AI companies have no idea how to solve the "hallucination" (aka making shit up) problem in the general case. Their best solution is hard coded checks for individual phrases that might expose them to excessive legal costs

https://arstechnica.com/information-technology/2024/12/certain-names-make-chatgpt-grind-to-a-halt-and-we-know-why/

Certain names make ChatGPT grind to a halt, and we know why

Filter resulting from subject of settled defamation lawsuit could cause trouble down the road.

Ars Technica
It shouldn't need to be said that there's no conceivable way band-aiding results that trigger legal threats will scale to make #LLM chatbots a generally reliable source of information, but some trillion dollar stock valuations suggest it does in fact need to be said, loudly and repeatedly

Today's #AIIsGoingGreat: Hard to see how drowning volunteer developers in #AI slop vulnerability reports could possibly go wrong. Great work everyone, throw another billion on the #LLM BS machine bonfire to celebrate!

https://sethmlarson.dev/slop-security-reports

New era of slop security reports for open source

I'm on the security report triage team for CPython, pip, urllib3, Requests, and a handful of other open source projects. I'm also in a trusted position such that I get "tagged in" to other open sou...

sethmlarson.dev
Today's #AIIsGoingGreat: Hard to see how anything could go wrong with a health insurer filtered their SOPs through a bullshit generating machine (Optum claim it was just a POC that wasn't used operationally, but even getting that far ain't a great sign) https://techcrunch.com/2024/12/13/unitedhealthcares-optum-left-an-ai-chatbot-used-by-employees-to-ask-questions-about-claims-exposed-to-the-internet/
UnitedHealth's Optum left an AI chatbot, used by employees to ask questions about claims, exposed to the internet | TechCrunch

Optum's AI chatbot was found exposed online at a time when the healthcare giant faces scrutiny for its use of AI to allegedly deny patient claims.

TechCrunch

#AIIsGoingGreat: 'correspondence seen by TechCrunch shows that previously, the guidelines read: “If you do not have critical expertise (e.g. coding, math) to rate this prompt, please skip this task.”
But now the guidelines read: “You should not skip prompts that require specialized domain knowledge.” Instead, contractors are being told to “rate the parts of the prompt you understand” and include a note that they don’t have domain knowledge'

https://techcrunch.com/2024/12/18/exclusive-googles-gemini-is-forcing-contractors-to-rate-ai-responses-outside-their-expertise/

Exclusive: Google's Gemini is forcing contractors to rate AI responses outside their expertise

Internal guidelines passed down from Google led to concerns that the AI model could be prone to inaccurate outputs on topics like healthcare.

TechCrunch
Hard to imagine google has a human go through every response and deal with the notes, so presumably they're using AI for that part too…