@cesar_s Open model inference CAN run on mobile devices sure. In fact I made prototypes last year!
What I was referring to, though, was the efficiency needed to make them actually usable. It’s not just generating a little text or an image, it’s about having it process all the things you are doing on that device (reading emails, messages, docs etc, watching your screen and so on). To get the required speed (on today’s devices) without draining battery in 20 minutes we’re still dependent on highly optimised OS models integrated by manufacturers (eg Apple, Samsung, Google).
So what I was saying in my keynote was that I was pausing work on an app that included it’s own LLM … for the next few years it’s going to make a lot more sense for an app to leverage the built-in operating system LLMs.