๐ TopicWatchdog โ Week 3: Stable Topics with BERTopic
KMeans worked, but cluster IDs kept jumping across retrains. This week I added a Python BERTopic stage with a BigQuery registry โ stable topic IDs!
๐ข UMAP + HDBSCAN
๐ข Stable IDs via registry
๐ข Auto-labels with Gemini
๐ข Looker Studio dashboards
๐ 3,802 topics โ 2,472 mapped, top clusters: migration, economy, climate, politics.
๐ Blog: https://dracoblue.net/dev/topicwatchdog-stable-topics-with-bertopic/
#TopicWatchdog #BERTopic #BigQuery
#Clustering
#MachineLearning
#FediScience
Week 3: Stable Topics with BERTopic / Articles / dracoblue.net
In Week 1 (extraction) and Week 2 (embeddings + KMeans in BigQuery ML) we laid the groundwork. This week I built a Python BERTopic stage whose IDs stay stable across runs by mapping BERTopicโs internal clusters to stable topic IDs in BigQuery. I use Go...
