It's not just that Siri is "bad": the issue is that as people get used to basic LLM features, it increasingly feels like a product from 10 years ago.

Comparison of asking Siri to find the contents of a note Vs. Notion AI.

The 'Use Model' action in Shortcuts is a stopgap for power users, not something that most people can approach. Their Foundation models also feel like models from 2/3 years ago.

Apple needs to throw away Siri and replace it with an LLM with App Intents tool-calling ASAP.

@viticci It’s worse than that though. Doing something quickly, in reflexive reaction to the state of the competition, almost certainly won’t result in anything longterm. They need a competitive personal AI that’s the foundation for decades to come.
@gruber Oh I don't disagree! Just very skeptical of what they can do realistically here for another year. Their new Foundation models are still a far cry from any other modern LLM. One has to wonder whether it's a matter of talent, lack of inference infrastructure, politics, or all of the above. Maybe new management can turn it around?

@viticci @gruber This may be implementation details, like providing the right "tools”, not a failing of the models themselves. I'm working on a tool that can query drafts with the Foundation Model and it handle this type of stuff pretty well.

You have to limit scope to avoid the 4k token limit – but still…

@agiletortoise @gruber This is what I find most baffling. If it works reasonably well for developers, why isn't Apple doing this anywhere in its own apps?
@viticci @gruber It might be a scope problem. I imagine providing too many tools to the model might have diminishing returns, and they can't always make everything available (events, reminders, notes, etc.) – so I can imagine it is challenging to suss out what it applicable to a particular request.
@agiletortoise @viticci @gruber You imagine correctly. At least the on device model turns to complete crap at tool calling after adding more than a few. Not surprising - GPT-4 Turbo (old but still much more powerful than AFM on device) couldn’t really handle more than four or five.
@hunter @agiletortoise @viticci @gruber Our experience too. We had hopes of doing some sort of automation with it, but have had to settle for a RAG setup that answers questions. It is basically summarizing search, and can handle that. It is way less powerful than even the cheapest OpenAI models. They probably should just outsource this stuff like they did search. Sounds like that is what they are planning.
@drewmccormack @hunter @agiletortoise @viticci @gruber I again asked Siri on my iPhone 16 Pro Max running the latest iOS 26 beta, “what year was the iPhone 7 released?” and this is the garbage response. Siri just isn’t capable of doing much at all this far on. They’d be much better off letting someone else do this stuff for them…
@drewmccormack @hunter @agiletortoise @viticci @gruber you would think that it would at least be able to give me an answer based on one of those two dates that it lists. Which one is correct? Personally, I am at least used to Siri giving me some kind of answer even if it’s wrong. In this case, Siri didn’t even try, just posted three links and said search Google yourself.
@drewmccormack @hunter @agiletortoise @viticci @gruber I know you hear this a lot but please do file feedbacks with examples,

@hunter @agiletortoise @viticci @gruber
It sounds like they are at least a generation behind if they are performing at a similar level as pre 2024 models

https://www.understandingai.org/p/reinforcement-learning-explained

Reinforcement learning, explained with a minimum of math and jargon

To create reliable agents, AI companies had to go beyond predicting the next token.

Understanding AI