Mastodawn

An interesting new paper on the Distances Between Formal Concept Analysis Structures just published in @tgdkjournal See: https://drops.dagstuhl.de/entities/document/10.4230/TGDK.3.2.2

Distances Between Formal Concept Analysis Structures

Ansgar Scherp Sep 7, 2025

I wonder how many conferences taking place this year in the US will be online only / in part virtual because of what is going on there - Or to put it in MeidasTouch's Tennessee Brando's words, "America Is Now a Dictatorship", see https://substack.com/inbox/post/173018239

America Is Now a Dictatorship

We Are There Now

Ansgar Scherp Jul 1, 2025

IRRJ Jul 1, 2025

Never seen our editor in chief, Djoerd Hiemstra, more happy than today, holding a copy of the first issue of #IRRJ

Ansgar Scherp Jun 6, 2025

Show thread

IRRJ May 27, 2025

@ansgarscherp Fully agree. We, the Information Retrieval Research Journal, continued Springer's Information Retrieval as a diamond open access journal, run by some of its former editorial board members and its original founders. @lpag

Ansgar Scherp May 27, 2025

When Springer renames a journal from Information Retrieval to something like "Discover Computing"; it sucks! Now it looks like our article with @lpag and Iacopo Vagliano appeared in there. And I don't even know what "Discover Computing" is, nor does my AI-companion believe it is a thing.

Springer stop doing this. Your journal just lost its impact factor. It is 0.

Ansgar Scherp May 20, 2025

🧼💸 A hands-on experiment of an online merchant selling 🇨🇳 Made-in-China vs. 🇺🇸 Made-in-US shower heads. The latter adjusted to US-based production costs.

The result is: "I was surprised, and not surprised," van Meer [the merchant] said. "I was expecting the cheaper, Made-in-Asia [version] to quote-unquote 'win.' But I was not expecting that the results were this off balance. We sold zero 'Made in the USA' versions."

🔗 https://www.npr.org/2025/05/20/nx-s1-5403514/tariff-made-in-usa-label-texas-experiment

Ansgar Scherp May 16, 2025

🤖 Why Classic ML Still Matters (Even in the Age of ChatGPT + AI)

Teaching Data Mining & ML, I often hear:
“Do we still need this stuff?”

3 recent papers say: YES ✅

🔍 Discrete Key-Value Bottleneck
https://arxiv.org/abs/2207.11240
➡️ Uses k-means for codebook—hidden in refs 👀

🌲 CascadeXML
https://arxiv.org/abs/2211.00640
➡️ Transformer + hierarchical k-means label tree 🧩

🧠 Mirage
https://arxiv.org/abs/2310.09486
➡️ Graph distillation via frequent pattern mining 🕵️‍♂️

Not old. But💡 Fast. Smart. Scalable.

#MachineLearning #AI

Discrete Key-Value Bottleneck

Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the output of the encoder is used to select the nearest keys, and the corresponding values are fed to the decoder to solve the current task. The model can only fetch and re-use a sparse number of these key-value pairs during inference, enabling localized and context-dependent model updates. We theoretically investigate the ability of the discrete key-value bottleneck to minimize the effect of learning under distribution shifts and show that it reduces the complexity of the hypothesis class. We empirically verify the proposed method under challenging class-incremental learning scenarios and show that the proposed model - without any task boundaries - reduces catastrophic forgetting across a wide variety of pre-trained models, outperforming relevant baselines on this task.

arXiv.org

Ansgar Scherp May 8, 2025

Get this right: In the US, classes are back online — not due to a pandemic, but because students fear being snatched off the street and deported 😟📚

Meanwhile, in Germany, some still act like it’s a game

To students & scholars: you’re welcome here.

To Europe’s academics: wake up. This is not a drill. ⏰🌍

#50501movement

Ansgar Scherp Mar 12, 2025

The AI that DOGE allegedly uses is obviously a simple list of regular expressions (aka a Gazetteer approach) and aims to delete all web content, fire all people, etc., that match on trans*, gay*, dei * ...

see also the "Enola Gay" story,
https://www.youtube.com/watch?v=CQ90aLXmbVY

Clumsy execution turns a bad idea into another humiliation for Trump

YouTube

Ansgar Scherp Mar 9, 2025

see also the "Enola Gay" story, www.youtube.com/watch?v=CQ90...