Mastodawn

Marginalia

@marginalia

651 Followers

7 Following

393 Posts

I've built the indie internet #searchengine #marginaliasearch. Working full time on this project. Kind of a librarian of the weird corners of the web.

Website, blog, etc.	https://www.marginalia.nu/
Search Engine	https://search.marginalia.nu/
Encyclopedia	https://encyclopedia.marginalia.nu/
Git	https://git.marginalia.nu/

Marginalia Feb 27

Did some local LLM-based labeling of the HN comment corpus, tasking the model to classify how likely it is that each comment is:

* Using AI-like grammar (e.g. it's not X, it's Y)
* Using AI-like markup (e.g. em-dashes)
* Trying to shill something
* Trying to influence public opinion

Still a bit work in progress, but here are some preview data.

Marginalia Feb 25

Did some statistics, and it seems posts from newly registered accounts on HN are nearly 10x more likely to use EM-dashes, arrows, and similar typography than established accounts.

https://www.marginalia.nu/weird-ai-crap/hn/

Show thread

Marginalia Feb 22

It's pretty clear when I flipped the switch. You can see the query rate drop drastically, and query time increase quite a lot as many clanker queries are nonsense with no or few results.

Show thread

Marginalia Feb 16

Also since building this, I've been scratching my head as to how someone thought this was good UX for presenting certificate information.

Marginalia Feb 16

Exposed even more of the search engine's historical website data in the site viewer. It's gather and used to detect major changes to websites, but it's neat, so now everyone can peruse it.

Marginalia Feb 15

Started putting some more information in the site viewer tool. Needs more polish, but it's accessible now.

I've found the best way of ensuring data quality is to make the data visible. More eyeballs = more bug reports = fewer bugs.

Marginalia Feb 13

New blog post, about index compression, and taming tail latencies.

https://www.marginalia.nu/log/a_131_index_compression/

Marginalia Feb 2

Turns out the search engine wasn't correctly counting how many additional results are available per domain, showing a much smaller number than what was accurate.

Easy fix.

Marginalia Jan 31

Published a short writeup about the new ranking changes from yesterday.

https://www.marginalia.nu/log/a_130_trust_in_ranking/

Show thread

Marginalia Jan 30

Basically, like this. One of the results has some ads, but it's like... written by people all of it!