BILD-Bürgerstreiche & das Ende der Spaßgesellschaft

Diebstahl lohnt sich manchmal doch. Ist aber sonst ziemlich verboten. bild zeitung Ich weiß nicht mehr, wo ich das mal unterwegs gesehen und aufgenommen habe. Und ich frage mich, was für ein elend, dass heutzutage Menschen gezwungen sind, eine BILD am Sonntag stehlen zu müssen, es sei denn, … Das Ende der Spaßgesellschaft haben wir bereits von annähernd 20 Jahren erlebt. Ich habe mal den Ngram-Viewer von Google nach Fundstellen in Büchern und Zeitschriften befragt. Zugleich mit […]

https://www.kritische-masse.de/logbuch/2026/03/bild-buergerstreiche-das-ende-der-spassgesellschaft/

Due to its rising popularity in formal use in many Indian documents, "erstwhile" is less archaic than you might have supposed? #archaic #language #ngram

les exercices sur #ngram sont chouettes aussi pour apprendre l'#ergol mais seulement je fois qu'on a bien potassé la mémoire mécanique ailleurs.

un an après avoir commencé à être en mesure de faire des phrases en dactylo en ergol, j'ai toujours du mal sur les enchaînements d'annuaire et auriculaire, mais que de la main droite.

probablement une histoire d'appui de longue date concernant une habitude de dessin où ces doigts me servaient surtout de support pour préciser mon trait.

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

Spider v0.9.0 released:

Updates:
-url-match flag to filter URLs by keyword
Several small bug fixes
Go bumped to v1.24.3

https://forum.hashpwn.net/post/606

#infosec #spider #urlcrawl #hashpwn #wordlist #ngram

Fellow finicky writers: Do you prefer "advance notice" or "advanced notice"?

Both are attested. But FYI, #ngram says that "advance notice" is much more common, even if it's in decline.
https://books.google.com/ngrams/graph?content=advance+notice%2C+advanced+notice&year_start=1800&year_end=2022&corpus=en&smoothing=3

Google Books Ngram Viewer

Google Ngrams: advance notice, advanced notice, 1800-2022

🚀 Spider v0.8.0

New features include:

"-file" to generate n-grams from local plaintext files

"-timeout" for URL crawling

"-sort" to output n-grams by frequency

https://forum.hashpwn.net/post/52

#spider #webcrawler #wordlist #ngram #infosec #hashcracking #golang #hashpwn

Слушать некогда читать: где поставим запятую?

Узнаете, когда заглянете под кат.😉 Для затравочки: речь пойдёт про инструмент ЮMoney для транскрибации аудио с внутренних созвонов в тексты и про кое-что ещё для наших клиентов. 😎👇

https://habr.com/ru/companies/yoomoney/articles/896096/

#whisper #llmмодели #искусственный_интеллект #ai #саммаризация #диаризация #идентификация #транскрибация_звонков #ngram

Как мы транскрибируем аудио с внутренних созвонов в текст

Меня зовут Макс, я аналитик в ЮMoney . Недавно перед моей командой стояло две цели: ● Повысить качество взаимодействия пользователя и бизнеса за счёт аналитики данных аудио. ● Снизить время...

Хабр

#Google Books Is Indexing #AI-Generated Books

👉 #GoogleBooks is indexing low quality, AI-generated books that will turn up in search results, and could possibly impact Google #Ngram viewer, an important tool used by researchers to track #language use throughout history. 

https://timesofindia.indiatimes.com/technology/tech-news/google-books-important-source-for-academics-may-have-a-bot-problem/articleshow/109089043.cms

#GoogleNgram #NgramViewer #linguistics #diachrony #diachroniclinguistics #research #languages #aigeneratedcontent #AIgeneratedBooks

Google Books, important source for academics, may have a ‘bot’ problem - Times of India

TECH NEWS News: Google Books faces issues with low-quality AI books affecting Ngram viewer. Recent additions not impacting Ngram results but may in future updates. Go

The Times of India
Google Books reportedly indexing bad AI-written works

Google Books indexes published content dating back to the 1500s. A report found it began indexing AI-written work that could impact the language research tool Ngram.

The Verge