@davidgerard
I haven’t been able to find them again, but about 20 years ago I read some papers (which were ten to twenty years old then) about voice command. They didn’t have magical AI, so they simulated it by having a person do the task, and then they subtracted the time that the person spent doing the task on behalf of the speaker from the total. They then compared this to the speaker using a direct-manipulation UI to do the same task. The conclusion was, for almost all tasks, a GUI outperformed a voice UI by a massive amount. There are two cases where a voice interface works better:
First, if you don’t have your hands available. If you’re cooking or performing surgery, you don’t want to stop what you’re doing, wash your hands, and then do the computer task and resume. Same thing with changing the music while driving: voice lets you keep your focus on the important task and do a second lower-priority task at the same time.
Second, when the person performing the task had a lot of agency. This is why big ships use voice command from the captain: the captain isn’t telling the pilot to press a couple of buttons, they’re telling the pilot a desired outcome that the pilot will apply a load of domain expertise to achieve.
The second thing is what people want from an ‘AI’ voice interface, but even with AGI it’s largely infeasible to meet users’ requirements.
The problem is that a lot of films and TV series have used voice command as a narrative device. Someone talking to a computer and the computer skipping over twenty steps of ambiguity by magic is great for story telling. If you showed them using a GUI or CLI, it would be tedious.
There’s an episode of TNG where one of the crew is kidnapped by aliens and isn’t sure whether it’s a dream. He tries to reproduce the dream in the holodeck. The first prompt tells the computer to produce a table, and the table is nothing like the table he wanted, but two or three tweaks later it’s identical to the one in the dream. And this is possible because the script says so, and the props department used the same one in both scenes. But even with a computer that was at least as intelligent as a human, this is unrealistic. Imagine a human completely in control of drawing a 3D image, who took zero time. How many prompts would it take you to draw exactly something that you’d seen? A hundred? More?