@nobody Thanks for the suggestion, but I think that's going to be too slow. I'm using phrase prefix hashes, so lookups are O(1)ish unless you're making a typo. The goal then is to detect that by "having not enough good results" and falling back on a Levenshtein distance search with a distance of 1 below 4 characters and 2 if 4 or more.
This gives me a lookup time of 1-4ms with 25k entries (an entry being a sentence plus a word), with around 900k phrase prefix hashes, on an 15 year old notebook
@nobody ah! It's Omni Box search :)
That's currently the performance with a 100k dataset which results in 3.69 M phrase prefixes which needs to be indexed by the hashtable :)
@nobody finished my performance rewrite today.
It's now 39.8x faster in my hardcore test.
Hardcore test is feeding in entries which basicly rewrite the whole db up to 2 times, shutting down the db, opening it again and validate every single entry by a search.
Adding 20 million entries, searching 18.7 million entries and doing 751 cold starts take now 33.9 minutes instead of 22.5 hours :)