Crazy idea for #Emacs enthusiasts

On the #LispyGopherClimate podcast today, me, @screwlisp and @kentpitman had a fascinating conversation with @someodd .

During our conversation I remember her dropping this idea that the #GopherProtocol was all about menus. I remember this because she had said something like it in her Bartleby RFC document which I had read earlier, “But gopher is hierarchical. That’s the whole point. It’s a tree of menus, not a stream of content.” (I copy-pasted the section from which that quote comes below).

Just two weeks prior on the #LispyGopherClimate podcast we had a discussion with @chiply about “incremental completing read,” which was directly related to @karthink ‘s blog post on the Emacs Avy package.

So here is my crazy synthesis of the two: Emacs Avy as a Gopher client!!!

The incremental completing read pattern goes “Filter a list of results -> Select an item -> perform an action on the item.” The action could be to read the page, or to open a link that may trigger an “applet” action. I can see a whole new way to browse the Internet: no search engines, no LLM chat, just type what you think may exist and narrow down the list of all the content until you find something that you might want to read!

Is this post an attempt at humor, or am I just rambling? A little of each. I do want to try to build this thing, if anything to see how funny it would be to try to browse the Gopher network that way.

(Quoting @someodd )

I’ve been thinking a lot about how people in gopherspace – myself included – try too hard to make gopher be like the web. We abuse directories so we can have files with links. We call our writing “phlogs” which is just “blog” with a different letter, and then our phlogs end up looking like imitations of blogs anyway. Reverse chronological. Post after post. A timeline.

But gopher is hierarchical. That’s the whole point. It’s a tree of menus, not a stream of content. And the biggest abuse of gopher I see is people trying to flatten that hierarchy, trying to make it not-hierarchical, because that’s what the web trained us to expect.

So I started asking: what would sharing information look like if gopher had won? If the web never happened and something other than blogs took off? You wouldn’t have “posts.” You’d have a library. Subjects on shelves. You’d browse by walking through the stacks, not by scrolling a timeline.

That’s what bartleby is trying to be. Not a blog engine that speaks gopher, but a tool that takes the hierarchy seriously. Collections are the primary axis, not dates. Recent acquisitions exist, but they’re the display by the door, not the organizing principle.

gopher-proxy – /0//regarding_someodd/opensource/bartleby/bartleby-rfc.txt

@ramin_hal9001 @kentpitman @chiply @karthink @someodd @screwlisp as someone who maintains local knowledge base, I can tell you that "just narrow" won't work. Personally, you can maintain controlled vocabulary for keyword search, but even for a single person that breaks over time as your knowledge evolves. Llm tech is actually useful here. Not per se, but for semantic search (so-called embeddings). It is pretty good at capturing relevant results. But even that, if we scale up, is tricky.

@yantar92

«…you can maintain controlled vocabulary for keyword search…that breaks…as…knowledge evolves. Llm tech is…useful here…for semantic search…pretty good at capturing relevant results. But even that, if we scale up, is tricky.…»

The history of search follows a parallel track. Keywording of web pages was useful but could not keep up. Full text search won out because it could address scale.

I sum this up differently than many people seem to, though. I think we sacrificed the notion of a right answer for a sort order, so that we could accomplish the search ourselves on well-ordered data. It still makes us a bit vulnerable to sort predicate, but at least we can inspect what's going on.

Note that there was for a time a flirtation with "I'm feeling Lucky" to just say "give me the first item, ignore the others". It seems obvious to me that this did not win, since even the option went away.

But now LLMs offer us ONLY "I'm feeling lucky" (dressed up as "I'm trusting you to work in my best interest" -- what could possibly go wrong?). One cannot inspect the near misses.

Even on technological grounds, this clearly has a cost. RAG makes search faster and for "ordinary things" it may be better. But the web used to be a place where you could search for the obscure thing and it would search everything. Now it narrows you to "just the likely places" before even starting the search. In effect, the new "SEO" will not be about making sure you're in the RAG set, but, importantly, the democratizing effect of a search that would at least search everything is gone.

If you have a hobby site that is the only source of something but your metaphor is ill-chosen, you'll get searched in the wrong set because the coarse categorization is wrong for the outset, and you are intended to pay money to even be in the game. That's a big step backward.

cc @ramin_hal9001 @chiply @karthink @someodd @screwlisp

@kentpitman @ramin_hal9001 @chiply @karthink @someodd @screwlisp but rag does full search. It is just a ranking

@yantar92

I asked ChatGPT 5.5 what it thought of this question we were discussing and it says what I was trying to say in a way that satisfies me and maybe gets my point across better than I was doing, especially the second paragraph:

«At a coarse level, RAG is not simply “the LLM searches everything.” A RAG system first uses a retrieval layer to narrow a large corpus down to a small set of candidate chunks likely to be relevant to the query. In vector or hybrid RAG, that narrowing is often based not on literal keyword matching alone, but on learned representations: embeddings, semantic similarity, metadata filters, and sometimes rerankers. The generator then answers using only that selected context, plus whatever is in the prompt/model.

So yes: retrieval involves ranking, but the ranking is doing architectural work. It is not merely producing an ordered list for the model to inspect exhaustively; it is selecting what enters the model’s context window at all. In that sense, RAG is better understood as relevance-based subsetting followed by generation, often with ranking and reranking inside the subsetting step.»

cc @ramin_hal9001 @chiply @karthink @someodd @screwlisp

@kentpitman @ramin_hal9001 @chiply @karthink @someodd @screwlisp sure. For the purposes of LLM, rag ranked list should be trimmed. But rag is nothing but similarity score. It is a number assigned to each searched entry in db. It does not have to be trimmed.
*Edit*: to be precise RAG abbreviation applies to llm retrieval in particular. But what I am referring to is one of the steps, which is similarity ranking. A better term would be vector search

@yantar92

Right, and what I'm saying is that certain kinds of content don't compete because they are screened "for efficiency" in an initial "likely relevance" pass without being given the same sense of focus.

This favors "obvious searches" and disfavors "searches for obscure things".

I asked GPT 5.5 again to comment on this second round of what you said and what I was going to reply (which I have not edited subsequent to asking it) and it offered this summary:

«In practice, ranking becomes filtering once the system only surfaces the top candidates. Vector/semantic search is great for “things like this,” but it can be worse for obscure exact needles: a rare phrase, quote, error string, or idiosyncratic reference may not score highly under the embedding model, even though literal search would have found it. So the issue is not whether every entry can theoretically get a similarity score; it’s whether the target survives retrieval into the surfaced candidate set.»

(I said I would quote it directly «as long as you're not tailoring it on some theory that i've requested you to agree with me. i'm just seeking neutral points of view here» and it confirmed «That’s a fair use of it, and yes — the point is neutral rather than tailored to make your side “win.”»)

cc @ramin_hal9001 @chiply @karthink @someodd @screwlisp

@kentpitman @ramin_hal9001 @chiply @karthink @someodd @screwlisp I agree. Afaik, real vector searches often employ a mixed ranking on keyword search + similarity. That said, keyword matches alone are not good enough because terminology is not always the same. Terminology also changes over years. So, you need to maintain keyword similarity or aliases on top to make search work. Maybe @publicvoit can comment

@yantar92 @ramin_hal9001 @chiply @karthink @someodd @screwlisp @publicvoit

The thing that bugs me is that it used to be that you could easily find a page that, for example, spoke about the banking industry as a metaphor for some other thing, let's say cooking. But now when you say "find me this exact quote" it first undoes the quote and says "oh, this is a quote about the banking industry, those are usually found over here in the corpus of literature about banks" and then it doesn't find exactly the thing that would have been so distinctive because the keywords on the item will say they are about "cooking", not "about banking" so even if they're well-keyworded, unless someone tagged the post to be about banking (when metaphor really is not "aboutness") then calling up a quote like that will not find it.

@kentpitman @yantar92 @ramin_hal9001 @chiply @karthink @someodd @screwlisp (I haven't read through the whole thread)
In general, you need to differ between different methods of #informationretrieval.

#Search vs. #navigation vs. #tags/labels vs. others + combinations such as teleportation ...

Furthermore, you need to differ between personal retrieval, where you yourself had done some filing/#categorization/tagging process and a process where you need to retrieve something from a corpus that was not somehow curated by you yourself but by one or many peers instead (social #tagging, company file server, ...).

And yes, in any case, your personal mental model changes over time. Therefore, it's difficult to do successful retrieval even for your personal files especially when you did not follow certain principles during the filing process.

For example, that's why tagging is not as simple as most people think it is: https://karl-voit.at/2022/01/29/How-to-Use-Tags/
or https://karl-voit.at/2020/12/27/tagging-natural-objects/

#publicvoit #PIM

How to Use Tags

How to Use Tags

public voit - Web-page of Karl Voit