The #sqlite3 docs say that indexed by should not be used for query design -- only to prevent regressions. But what is the alternative? I have a #database where most queries are similar to the following:

select count(distinct tags.file) from tags indexed by tags_tag_nocase join meta on tags.file = meta.file where meta.trash = 0 and (tags.tag = 'foo' or tags.tag like 'foo.%')

Without any indexed by clause, the best I can get with this query is about 170ms. With the clause shown, it takes about 80ms. I've tried peppering in "likely" and "unlikely", but this is not an improvement

Some key facts about the data:

  • about 64K rows in tags, and 10K in meta
  • At most 10% of rows in tags can be expected to fulfill either of the tags.tag clauses
  • 90% of rows in meta have meta.trash=0

#sqlite

#sqlite3 #database #sqlite I have a strong preference for solutions I can make work with pypika, since there are parts of the codebase that are not conducive to prepared statements and that's what I use for query building. I haven't looked into how to make the indexed by and likely/unlikely work with pypika yet

Side note: I swear I did something earlier today that made the query complete in 40ms (still with the indexed by)and I'm losing my marbles trying to recreate that

a correction:

There are some values of `foo` where much more than 10% of rows in `tags` match, but for these cases, the inclusion or exclusion of `indexed by` doesn't affect performance. But for the more common (and more restrictive) values of `foo`, it's still a major benefit

also I monkey-patched in an `indexed_by` builder method for PyPika tables

well

I ran ANALYZE. indexed by no longer affects performance, but it's slower than indexed by was before. 120ms

I might scream

#sqlite3 #database