Let's talk about 'Move Everything':

An unofficial framework for running custom instruments, effects, and controllers on Ableton Move.
Move Everything adds a Shadow UI that runs alongside stock Move, enabling additional Synths, FX, and other tools to run in parallel to the usual UI.

One person, Charles Vestal has managed to slipstream a screen-reader directly onto Move using either Flight or ESpeak-NG, taking advantage of Ableton's own Screen-Reader data and also made it so that the WiFi pin that their web-based offering doesn't read, reads on-device.
The guy is an actual genius.
It does way more than I can begin to mention here, but for the first time ever, we have a groovebox with screen-reader that is 100% actually stand-alone, no need to be tied to a phone or computer of any kind.
Links:
Move Everything: https://github.com/charlesvestal/move-anything
Installers: https://github.com/charlesvestal/move-everything-installer/releases/tag/v0.2.8
Documentation: https://github.com/charlesvestal/move-anything/blob/main/MANUAL.md
CC @pkirn

@FreakyFwoof @pkirn Woooooah, wicked. How's the responsiveness of the screen reader? What voice does it use? Does it have speech interrupt? There's a note on the repo landing page about it not being suitable as a daily driver yet, how's stability been for you?
@Scott @pkirn As I say, ESpeak-NG or Flight. It's not as responsive as NVDA with Tyler's helper tweak but it's more than good enough.
@FreakyFwoof Somehow I missed both of those words, too excited lol. @pkirn
@Scott @FreakyFwoof @pkirn Don't know if it will go anywhere, but I pointed Charles at DECtalk, too, because why not? We already know it performs well on a Raspberry Pi.
@BorrisInABox @Scott @pkirn Let's not get him to add Piper though haha
@FreakyFwoof @Scott @pkirn Well... Piper on a Pi is actually not the worst thing I've ever used. Yes, it would suck, but it would run... as long as you're not doing anything else at all.
@BorrisInABox @FreakyFwoof @Scott @pkirn Easy solution. Put a cashing layer between the synth and Move. If the file exists, play it. If not, generate it with a small delay. Was already thinking about doing something like this. I didn't consider Piper. I bet it could work.
@simon @FreakyFwoof @Scott @pkirn I think that's basically what he did with flight.
@BorrisInABox @FreakyFwoof @Scott @pkirn Unless i'm reading this wrong, it looks like the move-everything TTS has a built-in 300MS buffer to avoid speaking events rapidly. IMO this is *way* too long, and it's probably part of the reason the TTS lags. I don't know what it takes to build this thing (I do have a lot of Raspbery Pi's, so maybe I can). but I'm either going to experiment with a much lower buffer or suggest making it configurable.
Should be pretty simple to hack whatever local or remote TTS we want in there as well. Even without modifying it, we could just create a fake flight executable that calls Voxin on a remote server or whatever.
But yeah. if you notice 300MS latency, that's seemingly by design.
Source: https://github.com/charlesvestal/move-anything/blob/main/docs/tts-architecture.md
@simon @BorrisInABox @FreakyFwoof @Scott @pkirn You’re right! This was by design to prevent repeated messages (you can see in my original screen reader demo video), but it sounds like the feedback is that this should be adjustable. Happy to do so
@charlesv Hey Charles, configurable responsiveness would be interesting to explore but isn't essential IMO. If it's easy to do, I can imagine it being useful to folks with some amount of vision who might want speech as a sort of secondary confirmation. From the perspective of a full time screen reader user though, I always prefer to have the most responsive feeling I can get, even when that means there'll be a tradeoff of some stuttering as controls are being adjusted. If you're not in the habbit of hearing robotic jibber-jabber underpinning your choices all day long it might sound distracting, but for me, it's more like the initial utterance of speech tells me the button press or knob twist has registered, I can usually get a little extra assurance that I've hit the correct button from hearing the first couple of sylabols, then my brain kinda tunes out the rest of the speech noises until I need a full report. Thanks heaps for working on this! @simon @BorrisInABox @FreakyFwoof @pkirn
@Scott @charlesv @simon @BorrisInABox @pkirn Yeah, but I know that Boris and Kara don't want to hear all the turns as you and I do, so I suggested it as a configurable option.
@FreakyFwoof @Scott @charlesv @simon @pkirn I do like that kind of responsiveness, but not always. So yeah, configurable is good.
@BorrisInABox @FreakyFwoof @charlesv @simon @pkirn Hey different question, what's the project format like? Are there ways to get stuff off the Move and into any DAWs other than Live? I like Live here, but REAPER is still where I am most of the time.
@Scott @BorrisInABox @FreakyFwoof @simon @pkirn you really have to think of these like external sound modules connected to your move. Just without the cables. That means that sound wouldn’t get to your door unless you actually record it. I don’t do anything with the live or move project formats, but it will have the midi data in the clips. As far as exporting to other formats… That’s outside the scope of what I’m working on right now.
@charlesv @Scott @BorrisInABox @FreakyFwoof @simon This does, however, raise the possibility of a hardware device that would have an entirely-open ecosystem that could work on desktop just as on the gadget. So yes -- all these solutions exist, absolutely! Just be aware the Ableton ecosystem is a proprietary one.
@charlesv @Scott @BorrisInABox @FreakyFwoof @simon That's not even a criticism of Ableton's proprietary nature -- I mean, the v1 devices all came out of Robert's head, basically, just as the v1 warp engine was derived from Gerhard's granular work! But it does raise more questions about what an open mobile/desktop hardware/DAW solution might look like.