Mastodawn

Research question for machine learning folks. Are there ever situations in which it is desirable to divide learning up in parallel to several models, where each model can only locally learn some component of a computation but never the full computation?

So the differentiation algorithm would compose the components into some global knowledge, but would be the only one with access to that global knowledge. And the training data would be decomposed in such a way that only well-defined fragments of data are accessible to each model?

We have a class of functions for which we can do this, I think, but we aren't sure who cares, and knowing who cares would give us relevant work to look at and help us know what we might want to do next.

It's tingling the neurons in my brain associated with all of threshold schema, cryptosystems, differential privacy, parallel learning, federated learning, and complexity theory, but I don't have anything more specific there than vibes.

Show thread

morgan 6d ago

@TaliaRinger Not really my corner of ML but yes, I remember the differential privacy folks talking about stuff like that. Also depending on the solution it seems like something that could be useful for domain adaptation or as a tool to investigate the decomposability of a problem?

Show thread

Guilherme Espada 6d ago

@TaliaRinger with risk of sounding stupid, isn't this the logic behind boosting/bagging? Many weak learners make a strong learner? Or am I misinterpreting what you mean?

Show thread

Talia Ringer 6d ago

@GUIpsp That's one benefit, but I think here I'm also interested in what kinds of guarantees one might want about such a system

Show thread

Guilherme Espada 6d ago

@TaliaRinger oh no, you mean it in the privacy sense not the performance sense. Apple famously does some fancy neural network FHE thing

Show thread

Talia Ringer 6d ago

@GUIpsp performance questions might still be interesting though

Show thread

mattpd 6d ago

@TaliaRinger @GUIpsp Mixture of Experts (MoE) would be relevant in this context (performance, but also larger/more capable/accurate model fitting into smaller amount of DRAM/SRAM; e.g., "Qwen3.5-397B-A17B" means out of 397 billion total parameters only 17 billion are activated). Good background:
- The MoE 101 Guide by Daria Soboleva (https://soboleva-daria.github.io/): https://www.cerebras.ai/moe-guide
- Lecture 4 from Stanford CS336 Language Modeling from Scratch (https://cs336.stanford.edu/spring2025/): https://www.youtube.com/watch?v=LPv1KfUXLCo&list=PLoROMvodv4rOY23Y0BoGoBGgQ1zmU_MT_&index=5

Daria Soboleva

Show thread

Ype Kingma 6d ago

@TaliaRinger

The problem is to learn to bid for electricity in a group, with only the group big enough for day ahead market access, and the group members adding their bids into the group.

In case the members are close enough, the group can share their weather and market models, and the members only need to learn how they differ from the average.

This sharing might count as access to global knowledge, but all additions into the group are done with privacy.

Show thread

Paul Khuong 6d ago

@TaliaRinger Sounds like federated learning. I've heard of use case for ML at the edge, and maybe institutions with proprietary data they don't want to share but are willing to mix in a shared model.