🚀 TopicWatchdog – Week 3: Stable Topics with BERTopic

KMeans worked, but cluster IDs kept jumping across retrains. This week I added a Python BERTopic stage with a BigQuery registry → stable topic IDs!

🟢 UMAP + HDBSCAN
🟢 Stable IDs via registry
🟢 Auto-labels with Gemini
🟢 Looker Studio dashboards

📊 3,802 topics → 2,472 mapped, top clusters: migration, economy, climate, politics.

👉 Blog: https://dracoblue.net/dev/topicwatchdog-stable-topics-with-bertopic/

#TopicWatchdog #BERTopic #BigQuery
#Clustering
#MachineLearning
#FediScience

Week 3: Stable Topics with BERTopic / Articles / dracoblue.net

In Week 1 (extraction) and Week 2 (embeddings + KMeans in BigQuery ML) we laid the groundwork. This week I built a Python BERTopic stage whose IDs stay stable across runs by mapping BERTopic’s internal clusters to stable topic IDs in BigQuery. I use Go...

"An automatic customer service agent: the tricky parts of building an NLP pipeline" by Davide Arella & Alessandro Ercolani #PagoPA

#Codemotion #CodemotionMilan23

#AutomaticCustomerCare #NLP #AI #BERTopic #RAG

"An automatic customer service agent: the tricky parts of building an NLP pipeline" by Alessandro Ercolani & Davide Arella #PagoPA

#Codemotion #CodemotionMilan23

#AutomaticCustomerCare #NLP #AI #BERTopic

🎉 Thrilled to unveil an integration between BERTopic and the
@huggingface hub! #BERTopic is a cutting-edge topic modelling library by @MaartenGr.

You can now train a topic model and share it on the Hugging Face
hub in a few lines of code!

Read more: https://huggingface.co/blog/bertopic

Introducing BERTopic Integration with the Hugging Face Hub

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Would anyone have some example code on how one would approach to do #semisupervised #topicmodelling with #BERTopic on a dataset of articles and book-length documents in order to do document-level #classification and analyses? I am interested in analyzing how document topics change over time.

https://maartengr.github.io/BERTopic/getting_started/semisupervised/semisupervised.html

Semi-supervised Topic Modeling - BERTopic

Leveraging BERT and a class-based TF-IDF to create easily interpretable topics.

One week in Mastodon, a good feel, seems great for geeking about #geospatial #geoviz #dataviz #coding.

I would like to connect further with people in #NLP #NaturalLanguageProcessing #NLPTransformers #TopicModelling #BERTopic.

Also I would like to connect with #Geospatial people in #Australia.

Pass me a follow 🤓👋