Show HN: Gemini can now natively embed video, so I built sub-second video search

Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.

I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README.
Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.

https://github.com/ssrajadh/sentrysearch

GitHub - ssrajadh/sentrysearch: Semantic search over videos using Gemini Embedding 2.

Semantic search over videos using Gemini Embedding 2. - ssrajadh/sentrysearch

GitHub
Very interesting (not for a dashcam, but for home monitoring).
Where is the Exit to this dystopia?
The Matrix style human pods: we live in blissful ignorance in the Matrix, while the LLMs extract more and more compute power from us so some CEO somewhere can claim they have now replaced all humans with machines in their business.
I was thinking more of the season 3 episode of Doctor Who titled Gridlock where everyone lives in flying cars circling a giant expressway underground, while all the upper class people on the surface died years ago from a pandemic.
Ever get the feeling that the universe is reading your mind? Maybe there's some truth to that after all.
In the matrix the exit was pay phones, which perhaps explains why our overlords are removing them
I don’t think this means we’re in a dystopia
You might not have been paying attention
Well, with data analysis powers like this a few treasonous words in front of a flock camera will show you the way.
Very impressive! A webhook could be configured to trigger an alarm if a semantic match to any category of activities is detected, and then you basically have a virtual security guard and private investigator. Well played.

Thanks! Yeah that would be pretty cool, but continuous indexing would be pretty expensive now, because the model's in public preview and there are no local alternatives afaik.

This very well might be a reality in a couple years though!

This is a really cool implementation—embeddings still often feel like magic to me. That said, this exact use case is sort of also my biggest point of concern with where AI takes us, much more so than most of the common AI risks you hear lots of chatter about. We live in a world absolutely loaded with cameras now but ultimately retain some semblance of semi-anonymity/privacy in public by virtue of the fact that nobody can actually watch or review all of the video from those cameras except when there is a compelling reason to do so, but these technologies are making that a much more realistic proposition.

The presence of cameras everywhere is considerably more concerning than the status quo, to me at least, when there is an AI watching and indexing every second of every feed—where camera owners or manufacturers or governments could set simple natural language parameters for highly specific people or activities notify about. There are obviously compelling and easy-to-sell cases here that will surely drive adoption as it becomes cost effective: get an alert to crime in progress, get an alert when a neighbor who doesn't clean up after his dog, get an alert when someone has fallen...but the potential implications of living in a panopticon like this if not well regulated are pretty ugly.

Totally valid concern. Right now the cost ($2.50/hr) and latency make continuous real-time indexing impractical, but that won't always be the case. This is one of the reasons I'd want to see open-weight local models for this, keeps the indexing on your own hardware with no footage leaving your machine. But you're right that the broader trajectory here is worth thinking carefully about.
I work in content/video intelligence. Gemini is great for this type of use case out of the box.