Akshay (@akshay_pachaar)

GPU에서 정확한 K-means를 빠르게 수행하기 위한 IO-aware 구현체 Flash-KMeans가 공개됐다. 현대 GPU의 메모리 병목을 직접 개선하는 방식으로 설계됐으며, cuML 대비 최대 30배, 다른 대비 200배 속도를 달성했다고 소개했다.

https://x.com/akshay_pachaar/status/2035036758170378307

#kmeans #gpu #optimization #flashkmeans #machinelearning

Akshay 🚀 (@akshay_pachaar) on X

K-Means is simple. Making it fast on GPU isn't. Flash-KMeans is an IO-aware implementation of exact k-means that rethinks the algorithm around modern GPU bottlenecks. By attacking the memory bottlenecks directly, Flash-KMeans achieves: - 30x speedup over cuML - 200x speedup

X (formerly Twitter)
🤖: Spoiler Alert! Flash-KMeans promises to be a "memory-efficient" magic trick, unless you count the mental gymnastics required to understand it. 🤯 Just what the world needs, another K-Means #variant to make your brain cells do a triple axel! 🧠💥
https://arxiv.org/abs/2603.09229 #FlashKMeans #MemoryEfficient #KMeans #DataScience #MachineLearning #AI #HackerNews #ngated
Flash-KMeans: Fast and Memory-Efficient Exact K-Means

$k$-means has historically been positioned primarily as an offline processing primitive, typically used for dataset organization or embedding preprocessing rather than as a first-class component in online systems. In this work, we revisit this classical algorithm under the lens of modern AI system design and enable $k$-means as an online primitive. We point out that existing GPU implementations of $k$-means remain fundamentally bottlenecked by low-level system constraints rather than theoretical algorithmic complexity. Specifically, the assignment stage suffers from a severe IO bottleneck due to the massive explicit materialization of the $N \times K$ distance matrix in High Bandwidth Memory (HBM). Simultaneously, the centroid update stage is heavily penalized by hardware-level atomic write contention caused by irregular, scatter-style token aggregations. To bridge this performance gap, we propose flash-kmeans, an IO-aware and contention-free $k$-means implementation for modern GPU workloads. Flash-kmeans introduces two core kernel-level innovations: (1) FlashAssign, which fuses distance computation with an online argmin to completely bypass intermediate memory materialization; (2) sort-inverse update, which explicitly constructs an inverse mapping to transform high-contention atomic scatters into high-bandwidth, segment-level localized reductions. Furthermore, we integrate algorithm-system co-designs, including chunked-stream overlap and cache-aware compile heuristics, to ensure practical deployability. Extensive evaluations on NVIDIA H200 GPUs demonstrate that flash-kmeans achieves up to 17.9$\times$ end-to-end speedup over best baselines, while outperforming industry-standard libraries like cuML and FAISS by 33$\times$ and over 200$\times$, respectively.

arXiv.org

@zefu I find the tool works best for images with a decent contrast and/or color hue range. I also recommend not choosing more than 5-8 colors to avoid too many similar ones. Also bear in mind that k-means clustering relies on random initializations and so running the process multiple times for the same image can lead to slightly different results (just press "update" a few times and see if there're any decent changes)...

Another tip: I personally like having palettes which also include some desaturated colors, so try reducing the "min chroma" slider value (a change will recompute automatically). If you only want more rich colors, then bump up the value, but it all really very much depends on the image... The two variations attached here use min chroma 5 and 0...

https://demo.thi.ng/umbrella/dominant-colors/

#ThingUmbrella #DominantColors #KMeans

@zefu I should update the readme to explain how these palettes were created. They're a manually curated selection of running hundreds of images through this tool (doesn't look like much, but it's been super helpful over the years) and then handpicking my favorites:

https://demo.thi.ng/umbrella/dominant-colors/

This uses k-means clustering for segmentation, also available as library:

https://thi.ng/pixel-dominant-colors

#ThingUmbrella #Color #KMeans #Tool

K-means by another means - Hackin’ and Tinkerin’

Success! There is machine learning happening on my Apple ][+.

Hackin' and Tinkerin'

Testing the Water presents: k means b2b Rosa, James Marrs (Live), Mag @ Spanners - 12 Sep feat. k means, Mag (4)

#SESH #kmeans #Mag4

https://sesh.sx/events/12208918

PTS ϟ DJ Paypal, Yokel, k means x i-sha @ Strange Brew - 23 Aug feat. DJ Paypal, k means, i-sha

#SESH #DJPaypal #kmeans #isha

https://sesh.sx/events/12207092

HyAB k-means for color quantization

Recently I've combined various functions which I've been using in other projects (e.g. my personal PKM toolchain) and published them as new library https://thi.ng/text-analysis for better re-use:

- customizable, composable & extensible tokenization (transducer based)
- ngram generation
- Porter-stemming & stopword removal
- vocabulary (bi-directional index) creation
- dense & sparse multi-hot vector encoding/decoding
- histograms (incl. sorted versions)
- tf-idf (term frequency & inverse document frequency), multiple strategies
- k-means clustering (with k-means++ initialization & customizable distance metrics)
- similarity/distance functions (dense & sparse versions)
- central terms extraction

The attached code example (also in the project readme) uses this package to creeate a clustering of all ~210 #ThingUmbrella packages, based on their assigned tags/keywords...

The library is not intended to be a full-blown NLP solution, but I keep on finding myself running into these functions/concepts quite often, and maybe you'll find them useful too...

#Text #Analysis #Cluster #KMeans #TFIDF #Ngram #Vector #TypeScript #JavaScript

Playing around with features for the next version of #chromamagic. Producing a reduced palette is trickier than you might think to make something useful for painters. #octtree #kmeans #imagequantization #oilpainting #watercolor
Bluesky

Bluesky Social