Marginalia

@marginalia
651 Followers
7 Following
393 Posts
I've built the indie internet #searchengine #marginaliasearch. Working full time on this project. Kind of a librarian of the weird corners of the web.
Website, blog, etc.https://www.marginalia.nu/
Search Enginehttps://search.marginalia.nu/
Encyclopediahttps://encyclopedia.marginalia.nu/
Githttps://git.marginalia.nu/

Did some local LLM-based labeling of the HN comment corpus, tasking the model to classify how likely it is that each comment is:

* Using AI-like grammar (e.g. it's not X, it's Y)
* Using AI-like markup (e.g. em-dashes)
* Trying to shill something
* Trying to influence public opinion

Still a bit work in progress, but here are some preview data.

Did some statistics, and it seems posts from newly registered accounts on HN are nearly 10x more likely to use EM-dashes, arrows, and similar typography than established accounts.

https://www.marginalia.nu/weird-ai-crap/hn/

It's pretty clear when I flipped the switch. You can see the query rate drop drastically, and query time increase quite a lot as many clanker queries are nonsense with no or few results.
Also since building this, I've been scratching my head as to how someone thought this was good UX for presenting certificate information.
Exposed even more of the search engine's historical website data in the site viewer. It's gather and used to detect major changes to websites, but it's neat, so now everyone can peruse it.

Started putting some more information in the site viewer tool. Needs more polish, but it's accessible now.

I've found the best way of ensuring data quality is to make the data visible. More eyeballs = more bug reports = fewer bugs.

New blog post, about index compression, and taming tail latencies.

https://www.marginalia.nu/log/a_131_index_compression/

Turns out the search engine wasn't correctly counting how many additional results are available per domain, showing a much smaller number than what was accurate.

Easy fix.

Published a short writeup about the new ranking changes from yesterday.

https://www.marginalia.nu/log/a_130_trust_in_ranking/

Basically, like this. One of the results has some ads, but it's like... written by people all of it!