#clickhouse rocks

select
query, detectLanguage(query)
from mytable;

1. how to add typescript to javascript en
2. javascript and cookies en
3. r/javascript subreddit un
4. create grid javascript en
5. how to create a grid in javascript en
6. javascript clean architecture en

https://clickhouse.com/docs/sql-reference/functions/nlp-functions

Natural Language Processing (NLP) Functions | ClickHouse Docs

Documentation for Natural Language Processing (NLP) functions

You also want the probability with which the language is detected?
Use `detectLanguageMixed` instead of `detectLanguage`:

select
detectLanguageMixed(query), query
from mytable;

1. {'en':0.97} how to add typescript to javascript
2. {'en':0.95} javascript and cookies
3. {} r/javascript subreddit
4. {'en':0.95} create grid javascript
5. {'en':0.97} how to create a grid in javascript

Actually it is for detecting multiple languages, but I found only this return the probability too.

All this built right into the #database (#clickhouse) in this case is amazing I think.
I would still try to be aware of the performance impact, since I believe calling `detectLanguage` on millions columns in every query is probably not that efficient. I guess I have to read about the caching it does (or does not?).