ah shit my attempt to tape text embedding vector search to the side of GTS is actually sort of working. currently prototyping with PGVector and local Ollama running EmbeddingGemma. creating embeddings and indexing them at a few hundred posts per second is using essentially none of my M1 laptop's CPU.

the prototype is probably flexible enough to switch to something even more basic like Word2Vec or GloVe for the low end of GTS deployments. figuring out how to get the sqlite-vec extension into GTS WASM SQLite is left as an exercise to the reader.

really i'm just messing around here as i get back into coding for fun, but this could be the start of semantic search, or a custom feed where you give it a list of exemplar posts and it shows you new ones that come in close to one of them.

GitHub - pgvector/pgvector: Open-source vector similarity search for Postgres

Open-source vector similarity search for Postgres. Contribute to pgvector/pgvector development by creating an account on GitHub.

GitHub

i'm about to describe some pie in the sky but: what if a relay could do expensive processing like calculating standardized post text and image embeddings (or even just fetching link preview cards), and then consumers that decide to trust that relay could skip recomputing/refetching all that stuff, so they'd only need to calc query embeddings locally (and local posts obvi). some guy could put an old gaming PC in his garage and then hundreds of Fedi servers could do less work.

how's that Mastodon thing for "Fediverse providers" going anyway

also why aren't we using torrents for post media. did people forget torrents exist again

Fediverse Discovery Providers

A project exploring better search and discovery on the Fediverse as an optional, decentralized and pluggable service.

Fediverse Discovery Providers

@vyr Launching your pie higher into the sky: surely there’s some way that sites could sign their open graph metadata so that only the originating server (or even client) would need to fetch the preview card and every other server could verify its integrity.

AFAIK at least Bluesky and iMessage fetch link previews client side on the sender’s device and blindly trust whatever the client sends, so a standard would have wide applicability (not just fedi) and big sites might actually want to adopt it

@dale_price in principle yes, in practice signing stuff is a colossal pain in the ass because then you need to deal with managing keys, figuring out when keys are valid and when their signatures are valid, handling revocations, etc., plus even defining what you're signing is tricky (look at HTTP Signatures as an example)

something like Subresource Integrity might be more realistic, but even then you also need to figure out a way to request only the OGP gunk from a page (maybe content type negotiation?) in such a way that you can reliably get a checksum for the exact bytes, and then figure out how that extends to preview images which might themselves be content type negotiated or otherwise vary depending on what's retrieving them

and then you need to get enough sites to do it to matter, which is a harder problem than all of the engineering put together

@vyr yeah I think what I’m imagining is similar to HTTP signatures, but as a new html meta tag containing the signature for other meta tags’ attributes, combined with SRI on the og:image/video/audio tags.

I’m imagining that “stop people from posting links to your articles with fake headlines in the preview cards” (on bluesky at least) would be enough of an incentive for some news sites, but still definitely easier said than done