NVDA: AI Image descriptions progress, discussion #19807The on github
https://github.com/nvaccess/nvda/discussions/19807
latest good NVDA installer of AI image descriptions can be found here.
https://download.nvaccess.org/snapshots/try/try-image-desc/
After early alpha testing feedback, on-device AI image descriptions were removed from 2026.1.
This is due to these main reasons:
low quality of descriptions
lag when enabling the feature
lag while the feature is running
no option for higher quality, NPU/GPU description
no VQA: Visual Question Answering: e.g. asking follow up questions about the image
no translations: descriptions are only in English
To reintroduce the feature in alpha, we want to fix the following things first:
have a simple, basic but accurate describer that can run on mid-range CPUs.
So far we have done this, by recently improving the model slightly.
minimize lag when enabling and running the feature: ensure NVDA remains responsive
ideally a way to tell the confidence of the description
The next biggest priorities are:
Add a higher intensive model, that runs on NPUs and GPUs, and offers VQA
Add models to translate output
After:
We would like to offer a wider range of models via a model manager
A big technical challenge here is the lag importing numpy introduces, which the python onnxruntime requires.
We investigated creating a C++ layer, but the implementation is still experimental and not working for ARM64EC: microsoft/onnxruntime#15403
We could consider offloading onnxruntime, numpy and the describer to a separate process, similar to the 32bit shim.
#nvda #screenReader #imageDescription #Blind #llm #AI #openSource
