Un chouette reportage SVM : une novice (Jeanne‑Marie) découvre un EeePC sous GNU/Linux — douceur, surprises et petites astuces pour débuter. À voir si vous aimez la découverte, le libre et la simplicité matérielle ! #EeePC #GNULinux #Linux #OpenSource #Framasoft #SVM #French
https://peertube.giftedmc.com/videos/watch/3a63563b-1f08-4145-9c8c-d7b21aabfe08
EeePC sur GNU/Linux

PeerTube

Given how vocal I am against the "AI" industry, some of my followers might be surprised to learn that I'm now a co-author on a machine learning paper.

That paper has been submitted to the proceedings of an upcoming conference under their "Responsible AI" track, but it has nothing to do with LLMs or really anything that has recently been pushed by the industry's hype-machine. A pre-print is available on arxiv.org ("Tiny, Hardware-Independent, Compression-based Classification") while its formal review is pending.

Our paper expands on a technique I've been using to classify my emails for more than two years called "NCD-KNN" (Normalized Compression Distance with K-Nearest Neighbours). This method uses commonly available compression utilities like GZIP to estimate the relative "distance" between an input and a set of labeled examples, ultimately categorizing that input according to the labels of the K-nearest examples.

We solved some fundamental problems which could result in negative distances under specific circumstances which we identified, addressed some other theoretical limitations which prevented its broader use, and extended NCD to applications using Support-Vector-Machines (SVMs) for non-linear classification.

My co-authors are not on Fedi, but if any of this interests you then feel free to Ask Me Anything

#AMA #ML #AI #SVM #NCD #KNN

Tiny, Hardware-Independent, Compression-based Classification

The recent developments in machine learning have highlighted a conflict between online platforms and their users in terms of privacy. The importance of user privacy and the struggle for power over user data has been intensified as regulators and operators attempt to police online platforms. As users have become increasingly aware of privacy issues, client-side data storage, management, and analysis have become a favoured approach to large-scale centralised machine learning. However, state-of-the-art machine learning methods require vast amounts of labelled user data, making them unsuitable for models that reside client-side and only have access to a single user's data. State-of-the-art methods are also computationally expensive, which degrades the user experience on compute-limited hardware and also reduces battery life. A recent alternative approach has proven remarkably successful in classification tasks across a wide variety of data -- using a compression-based distance measure (called normalised compression distance) to measure the distance between generic objects in classical distance-based machine learning methods. In this work, we demonstrate that the normalised compression distance is actually not a metric; develop it for the wider context of kernel methods to allow modelling of complex data; and present techniques to improve the training time of models that use this distance measure. We demonstrate that the normalised compression distance works as well as and sometimes better than other metrics and kernels -- while requiring only marginally more computational costs and in spite of the lack of formal metric properties. The end results is a simple model with remarkable accuracy even when trained on a very small number of samples allowing for models that are small and effective enough to run entirely on a client device using only user-supplied data.

arXiv.org

Use SVC for classes, SVR for numbers; Fortnite is harder.

#svm #machinelearning #gaming

Почему Andrej Karpathy использует SVM в 2026 году (и вам тоже стоит)

На arXiv каждый день публикуются сотни статей по машинному обучению. Читать всё — нереально, а пропустить что-то важное — обидно. Andrej Karpathy, бывший Director of AI в Tesla и соавтор курса Stanford CS231n, решил эту проблему неожиданным способом. Он выбрал не BERT, не GPT и не какой-нибудь модный трансформер. Он остановился на добром старом SVM — алгоритме, которому уже несколько десятков лет. И знаете что? Это работает настолько хорошо, что используется даже в академических системах. В этой статье мы разберём, как устроено его решение, почему «примитивный» подход работает лучше сложных нейросетей, и когда вам тоже стоит выбрать SVM вместо трансформера. Давайте разбираться!

https://habr.com/ru/articles/990386/

#SVM #Andrej_Karpathy #TFIDF #машинное_обучение #Support_Vector_Machine #нейросети #алгоритмы_классификации

Почему Andrej Karpathy использует SVM в 2026 году (и вам тоже стоит)

На arXiv каждый день появляются сотни новых статей по машинному обучению. Читать всё — нереально, а пропустить что-то важное — обидно. Andrej Karpathy решил эту проблему с помощью SVM + TF-IDF. И...

Хабр