I'm on here looking for text indexers and everything is 'lightning fast exoscale terafloops that scales to enterprise quantawarbles with polytopplic performanations' and it would be great if this industry could breathe into a bag until it remembers that one person with one computer is a constituency that matters.
If your “open source software” requires a datacenter-scale strata and is optimized for, or maybe only meaningful to, datacenter-scale problems, is not open source in any way that matters. “Free as in corporate risk management” and “free as in labor arbitrage” are not aspirations.
@mhoye This is one of the problems with Kubernetes.

@dmaonR

Free as in your first hit is always free.

@dmaonR respectfully, my dinky homelab k3s cluster, running on a couple of Raspberries Pi, begs to differ. Perfectly feasible and practical to run k8s on personal-scale hardware.

Unless I've misunderstood what problem you are referring to? In which case, apologies.

(Ok so *technically* I have added a beefy PowerEdge node to that cluster too - but that's because I *wanted* to, not because of scale requirements/limitations)

@mhoye what’s your alternative for people who make data center scale software? Are you saying they should not open source what they make?
@mhoye Can I quote you on that?
@drwho I said it, so why not.

@mhoye Do you want recommendations for your text-indexers question?

In the full text search department:
Take a look at melisearch (https://github.com/meilisearch/meilisearch)
or Apache solr (https://solr.apache.org/guide/solr/latest/index.html), which is like elastic search but without all the licensing kerfuffle

GitHub - meilisearch/meilisearch: A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications. - meilisearch/meilisearch

GitHub
@4censord I do, thank you. My goals here are minimalism of implementation and wholly-local computation.

@mhoye ah, both of these are not what i would consider minimalist. Melisearch is smaller than solr, but especially solr has all the enterprise features of an enterprise java project from the 2000-ends.

Also, both run as a separate service that your application connects to, instead of being build into your application.

If you rather have something that is build into your application, maybe the full text search sqlite module is better suited. (https://www.sqlite.org/fts5.html)

SQLite FTS5 Extension

@mhoye @4censord and do you need an app that does it for you with some configuration? Or more of a library you can call any which way you want?
@dolanor @4censord Ideally the outcome of this is a self-hosted webpage that gives me a reasonably good search experience for the extensive documentation already on my computer. I'd prefer not to run a web server locally - that's seems unnecessary - and even a periodic manual refresh rather than anything automatic is probably fine.
@mhoye @dolanor hmm, that does sound like the usecase for something like the SQLite full text search module. (https://www.sqlite.org/fts5.html)
Or, if you are ok with having to use java, check if Apache Lucene is an option, thats the search engine behind apache solr (https://lucene.apache.org/core/)
SQLite FTS5 Extension

@4censord @dolanor yeah, I think my initial approach will be something like pandoc -> SQLite plus a front end.

@mhoye @4censord by self hosted webpage without a web server, do you mean without some traditional webserver like Apache/nginx? Or no web server at all?

And if so, does the generated webpage should hold all the data in it that would be searched via JavaScript in the page content itself?

@mhoye Aside from being (I am reasonably certain) a line from Red Dwarf, you make a very good point.

When my parents moved from Windows to Linux some years back, it was not about the open source or the creative commons that convinced it was a good move, it was cost, ease of use and ease of maintenance.

@plwt Ubuntu no longer has the word "human" on their homepage and I put a few seconds into feeling sad about that every other week.
@mhoye True, but all is not completely lost - https://ubuntu.com/community I understand that they are looking to rejuvenate their community and there is certainly one person there (I was fortunate to meet before they joined Canonical) who has the experience to see that happen.
The Ubuntu Community | Ubuntu

Ubuntu is an open source software operating system that runs from the desktop, to the cloud, to all your internet connected things.

Ubuntu
@mhoye It's difficult to know without having more about the application and data you have in mind.
@mhoye If you just have a load of plain text files that aren't huge you may get on just fine with *nix terminal tools or PowerShell. It may not be worth the hassle of a separate index. But I doubt if you are asking the question it's that simple.
@mhoye My one person / one computer search toolbox:
- Just use egrep, or an equivalent linear search tool.
- If egrep is too slow, it might be due to searching lots of separate files, because filesystems are slow. Try pre-processing by concatening all the files into one.
So far I have not needed a third tool.
@jef @mhoye back in the I’d say glimpse, https://manpages.ubuntu.com/manpages/focal/man1/glimpse.1.html but it doesn’t look like there’s any active maintenance
Ubuntu Manpage: glimpse - search quickly through entire file systems

@dan131riley @jef Glimpse, Woosh and a few others have been mostly abandoned, it looks like.
@jef I could gin something up in with egrep without a ton of effort, sure, but that fundamentally presupposes that I already know what I'm looking for, as represented on disk in string form. I'm also interested in ease of access for people who haven't been neck deep in the shell for decades, but maybe more importantly, in _casual_ discoverability. I mean, who just opens a dictionary, looks up the word they were after and closes it again?
@jef @mhoye I use grep or ripgrep a lot of the time but it really sucks for the case of "find these three words in any order in the same paragraph" and I would like to do things like that sometimes.
@mhoye if it's data usable by a single person, sqlite is almost always the easiest choice and is usually good enough. for text indexing you'd need a fts5 table, i've never used this extension though so take this advice with a grain of salt.
@Pashhur @mhoye I've used FTS5 and its predecessors, they take just a little bit of schema creation work but then they just work perfectly, speedily, every time. You're offered a lot of very rich query functionality but you don't actually need to learn or use any of it to get good results out of the box.
@mhoye I have been happy with Xapian, as visible in email clients/backends like notmuch. Inspired by sup, although I'm not so confident that I know what their reverse-index library is.
@mhoye A friend of mine built minisearch, which might fit what you're looking for: https://github.com/lucaong/minisearch
GitHub - lucaong/minisearch: Tiny and powerful JavaScript full-text search engine for browser and Node

Tiny and powerful JavaScript full-text search engine for browser and Node - lucaong/minisearch

GitHub
@mhoye Have a look at Recoll - www.recoll.org. Linux or Windows.
@alastair @mhoye Xapian (the back end for recoll) looks potentially interesting, will have to take a look at it...tx!
@mhoye When I see software that touts itself as scalable, that makes me think that it can easily grow with whatever the user needs. Works great for big organizations. But when the software is shrunk down to the size of the user's needs and it doesn't work as intended? Then it may not be scalable after all.
@mhoye the Quickwit marketing text says all the enterprise things you dislike but the docs say that it's a single binary you can download and run on your machine: https://quickwit.io/
Search more with less | Quickwit

Sub-second search & analytics engine on cloud storage

@tedmielczarek @mhoye this works super well on a single workstation.

The component that make it work is also open source: https://github.com/quickwit-oss/tantivy

GitHub - quickwit-oss/tantivy: Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust - quickwit-oss/tantivy

GitHub
@tedmielczarek @mhoye Based on the same underlying indexer (Tantivy), this one has a http api: https://github.com/lnx-search/lnx
GitHub - lnx-search/lnx: ⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine.

⚡ Insanely fast, 🌟 Feature-rich searching. lnx is the adaptable, typo tollerant deployment of the tantivy search engine. - GitHub - lnx-search/lnx: ⚡ Insanely fast, 🌟 Feature-rich searching. lnx ...

GitHub
@tedmielczarek @mhoye Tantivy does look useful (and might save me from trying to revive glimpse in my retirement years). I should confess that at work I do run a full ELK stack...

@mhoye

I read part of your sentence as 'polytopic trepanation' and I thought of AI scraping data straight from the source, and I will be under my bed, crying softly if you don't mind.

@mhoye I'm all in on “lightweight alternatives to Elasticsearch / Solr”. Let's see if I starred something useful…
https://github.com/valeriansaliou/sonic (Rust)
https://github.com/zincsearch/zincsearch (Go)
https://github.com/askorama/orama (JS)
https://github.com/CloudCannon/pagefind (specifically for static sites)
https://github.com/kbrsh/wade (Rust, library like Lucene)
GitHub - valeriansaliou/sonic: 🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM.

🦔 Fast, lightweight & schema-less search backend. An alternative to Elasticsearch that runs on a few MBs of RAM. - valeriansaliou/sonic

GitHub
@jnv @mhoye Does Meillisearch (https://github.com/meilisearch/meilisearch) fit into this?
GitHub - meilisearch/meilisearch: A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.

A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications. - meilisearch/meilisearch

GitHub
4censord (@[email protected])

@[email protected] Do you want recommendations for your text-indexers question? In the full text search department: Take a look at melisearch (https://github.com/meilisearch/meilisearch) or Apache solr (https://solr.apache.org/guide/solr/latest/index.html), which is like elastic search but without all the licensing kerfuffle

UnFUG Mastodon
GitHub - projectEndings/staticSearch: A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection

A codebase to support a pure JSON search engine requiring no backend for any XHTML5 document collection - projectEndings/staticSearch

GitHub
@mhoye Would you consider SQLite? I use its full text search for some projects.
https://www.sqlite.org/fts5.html
SQLite FTS5 Extension

@mfinkle I'm strongly inclined that way, definitely.