Does anybody have first hand #DigitalHumanities experience with SAM3?

The web demo is crazy good, also with historical data but we found that performance unfortunately doesn’t really translate to actually using the model (had similar experiences with earlier versions of it).
At the same time, I’m a fan of SAM so it would be great to be able to get it live up to its promise 😅

#DHd2026 #DH afterthoughts

@sarahalang I know @aboutgeo has worked with SAM, although I don't know if with version 3 specifically?

@felwert @sarahalang we've been using SAM2 in https://immarkus.xmarkus.org. Worked ok for us. SAM3 is something I'd like to try. But project didn't have the time/resources yet.

Overall, I guess it depends on:

i) your expectations :-)
ii) the material you're working with (obviously)
iii) whether you're using the prompted or unprompted mode (we're only using prompted in IMMARKUS)
iv) most importantly, the model size/variant

In our case, we used the "tiny" variant only, because that fits entirely into the browser – no server-side embedding needed. The larger variants would give better results I'm sure. (But then you'd need a separate server component which you need to maintain, fund, keep running... not DH-friendly as we know ;-)

IMMARKUS

An image annotation environment for the MARKUS platform. Developed by Prof. Dr. Hilde De Weerdt, Dr. Rainer Simon, Dr. Lee Sunkyu, Dr. Iva Stojević, Meret Meister, and Xi Wangzhi with funding from the European Research Council under the Horizon 2020 programme, grant agreement No. 101019509.

IMMARKUS

@felwert @sarahalang FWIW the IMMARKUS docs have a small screencast demo here: https://github.com/rsimon/immarkus/wiki/05-Annotating-Images#auto-select

(But you can also just try it yourself by loading your own images into https://immarkus.xmarkus.org)

05 Annotating Images

A semantic image annotation tool for researchers, digital humanists and cultural heritage professionals. - rsimon/immarkus

GitHub
@aboutgeo @felwert Thanks, that's really helpful!
@sarahalang @felwert feel free to get in touch if you want. I'm interested in SAM3 myself, and keen to hear what others' experiences with SAM (2 or 3) in DH are!
@sarahalang I've recently come across an article on RF-DETR (https://arxiv.org/abs/2511.09554) which might also be worth looking at.
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers

Open-vocabulary detectors achieve impressive performance on COCO, but often fail to generalize to real-world datasets with out-of-distribution classes not typically found in their pre-training. Rather than simply fine-tuning a heavy-weight vision-language model (VLM) for new domains, we introduce RF-DETR, a light-weight specialist detection transformer that discovers accuracy-latency Pareto curves for any target dataset with weight-sharing neural architecture search (NAS). Our approach fine-tunes a pre-trained base network on a target dataset and evaluates thousands of network configurations with different accuracy-latency tradeoffs without re-training. Further, we revisit the "tunable knobs" for NAS to improve the transferability of DETRs to diverse target domains. Notably, RF-DETR significantly improves over prior state-of-the-art real-time methods on COCO and Roboflow100-VL. RF-DETR (nano) achieves 48.0 AP on COCO, beating D-FINE (nano) by 5.3 AP at similar latency, and RF-DETR (2x-large) outperforms GroundingDINO (tiny) by 1.2 AP on Roboflow100-VL while running 20x as fast. To the best of our knowledge, RF-DETR (2x-large) is the first real-time detector to surpass 60 AP on COCO. Our code is available at https://github.com/roboflow/rf-detr

arXiv.org