Given how vocal I am against the "AI" industry, some of my followers might be surprised to learn that I'm now a co-author on a machine learning paper.
That paper has been submitted to the proceedings of an upcoming conference under their "Responsible AI" track, but it has nothing to do with LLMs or really anything that has recently been pushed by the industry's hype-machine. A pre-print is available on arxiv.org ("Tiny, Hardware-Independent, Compression-based Classification") while its formal review is pending.
Our paper expands on a technique I've been using to classify my emails for more than two years called "NCD-KNN" (Normalized Compression Distance with K-Nearest Neighbours). This method uses commonly available compression utilities like GZIP to estimate the relative "distance" between an input and a set of labeled examples, ultimately categorizing that input according to the labels of the K-nearest examples.
We solved some fundamental problems which could result in negative distances under specific circumstances which we identified, addressed some other theoretical limitations which prevented its broader use, and extended NCD to applications using Support-Vector-Machines (SVMs) for non-linear classification.
My co-authors are not on Fedi, but if any of this interests you then feel free to Ask Me Anything
#AMA #ML #AI #SVM #NCD #KNN