Mastodawn

Show HN: Gemini can now natively embed video, so I built sub-second video search

Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.

I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README.
Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.

https://github.com/ssrajadh/sentrysearch

GitHub - ssrajadh/sentrysearch: Semantic search over videos using Gemini Embedding 2.

Semantic search over videos using Gemini Embedding 2. - ssrajadh/sentrysearch

GitHub

Show thread

simonreiff

Very impressive! A webhook could be configured to trigger an alarm if a semantic match to any category of activities is detected, and then you basically have a virtual security guard and private investigator. Well played.

Show thread

sohamrj 5d ago

Thanks! Yeah that would be pretty cool, but continuous indexing would be pretty expensive now, because the model's in public preview and there are no local alternatives afaik.

This very well might be a reality in a couple years though!