I don't have too many biology followers here, but I'm restarting some of my DNA research, specifically into archaea. I find that recent archaeal discoveries are scattered all over the place, is anyone aware of a recent overview piece? Specifically on Lokiarchaeota and if we are entering a post-Woese period? Thanks!
So sometimes you get these quick wins where suddenly everything works out. By combining my DNA and SQLite tooling, I was able to benefit from the sqlite *spelling* module to fuzzily identify common proteins. This one appears to be present in most #archaea and is only known as "DUF5786 family protein".
I'm a huge @duckdb fanboy, but even so it keeps amazing me. This is a 282 GB database of all bacterial genes & protein products. Here I ask DuckDB to find proteins *similar* to a specific one. It takes 9 seconds. Over 282GB w/o indexes.
@bert_hubert @duckdb What function is that? Can you also query using that?
@michaelbarton @duckdb https://duckdb.org/docs/sql/functions/char.html#jaccards1-s2 has a list of text functions. You can query and sort your results on how close they are to some string.
Text Functions

Text Functions and Operators This section describes functions and operators for examining and manipulating STRING values. Name Description string ^@ search_string Return true if string begins with search_string. string || string String concatenation. string[index] Extract a single character using a (1-based) index. string[begin:end] Extract a string using slice conventions. Missing begin or end arguments are interpreted as the beginning or end of the list respectively. Negative values are accepted. string LIKE target Returns true if the string matches the like specifier (see Pattern Matching). string SIMILAR TO regex Returns true if the string matches the regex; identical to regexp_full_match (see…

DuckDB
@bert_hubert @duckdb ooh nice thanks. Didn’t know about that. I heard you can write your own in python too?