El Mahdi El Mhamdi

773 Followers
41 Following
14 Posts

Physicist, asst. professor of mathematics and data science at the École Polytechnique. AI/ML security researcher.

Author of "Le Fabuleux Chantier" (EDP Sciences 2019), works: algorithms such as Krum and Bulyan and the formalism of robust distributed learning.

Secular republican who grew up under a medieval monarchy.

websitehttps://elmahdielmhamdi.com
scholarhttps://scholar.google.com/citations?hl=en&user=kNA-WLQAAAAJ

Months before it started, it was clear that the #AIActionSummit was going to be a united Arab emirates' regime influence operation on French assets (started before 2021).

Happy to have boycotted every bit of it. Sad for France's sovereignty… but will not weep for it either.

https://x.com/L_badikho/status/1888951806723588439

El Mahdi El Mhamdi (@L_badikho) on X

Months before it started, it was clear that the #AIActionSummit was going to be a united Arab emirates' regime influence operation on French assets (started before 2021). Happy to have boycotted every bit of it. Sad for France's sovereignty… but will not weep for it either.

X (formerly Twitter)

@elmahdi

A minor difference in the way a sentence is written (with strictly no effect on the meaning) sends an AI to completely opposite meanings. Yet people are rushing to put these AIs behind CV-screening and other critical tasks… Some are already talking about autonomous weapons or "solving the middle east" !?

You can fix this "manually" but it will keep reappearing. See section 7.2 in the paper above.

On the large AI models, this preprint synthesises what we know so far https://arxiv.org/abs/2209.15259

In short: it is mathematically impossible to have AIs combining the following properties:

1) High number of parameters
2) Robustness to poisoning (e.g. fake data)
3) Privacy-preserving

On the Impossible Safety of Large AI Models

Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.

arXiv.org

I have been avoiding Zoom for teaching, professional and personal meetings as much as I could since the start of the pandemic, using and recommending Jitsi instead.

The new updated terms of Zoom just give you new reasons if you were accepting Zoom so far.

"if a machine is expected to be infallible, it cannot also be intelligent."

This quote from Alan Turing was in the cover of my PhD (https://infoscience.epfl.ch/record/275538?ln=en) right after a parent's quote for nurturing a child's curiosity

Turing's nuance contrasts with most of today's techno-absolutism…

Robust Distributed Learning

Whether it occurs in artificial or biological substrates, {\it learning} is a {distributed} phenomenon in at least two aspects. First, meaningful data and experiences are rarely found in one location, hence {\it learners} have a strong incentive to work together. Second, a learner is itself a distributed system, made of more basic processes; the change in the connections between these basic processes is what allows learning. This generic view encompasses a large set of learning situations, from brains, to metabolic networks in the organism to the data centers where several machines are collaborating to recommend personalized content for a billion-users social media. In both aforementioned aspects, a learning system's ability to cope with the failure of some of its components is crucial. This thesis explores the robustness of learning systems from these two aspects. The first aspect is {\it coarse-grained}, as the unit of failure is a whole learner. The second aspect is {\it fine-grained}, as the unit of failure is the basic component of the learner (e.g. a neuron or a synapse). The first and larger part of this thesis focuses on the coarse-grained aspect. Specifically, we study the robustness of distributed Stochastic Gradient Descent (SGD is the work-horse algorithm behind today's most celebrated success in machine learning). We begin by proving that the standard deployment of SGD today is brittle, as this deployment typically consists of {\it averaging} the inputs from each learner. This leads to harmful consequences as the data that is used in machine learning comes from different and potentially unreliable sources. To account for the various types of failures (data poisoning from hackers, software bugs, communication delays etc.), we adopt the general abstraction of arbitrary failures in distributed systems, namely, the {\it Byzantine failures} abstraction. We provide a sufficient condition for SGD to be Byzantine resilient and present three algorithms that satisfy our condition under different configurations. The key algorithms that are introduced by this thesis are (1)~Krum, a gradient aggregation rule (GAR) that we prove to be a robust alternative to averaging in synchronous settings; (2)~Bulyan, a meta-algorithm that we prove to strengthen any given GAR in very high dimensional situations and (3)~Kardam, a gradient filtering scheme that we prove to be Byzantine resilient in the more challenging asynchronous setting. For each of our algorithms, we also provide a few variants as well as a discussion of their practical limitations. The second part of this thesis goes down to the fine-grained aspect. We focus on the special case of (artificial) neural networks. We view these networks as a weighted directed graph and prove upper bounds on the {\it forward propagated error} when the basic components (neurons and synapses) are failing. We also discuss the limitation of these bounds, how they could apply to future neuromorphic hardware and how they could inform on other systems such as biological (metabolic) networks.

Infoscience

As much as I'm excited by the mathematical research in this direction (and looking for a PhD student to work on computability aspects of the Median), as much as I'm not optimistic given how regulation is lagging behind.

If you work on AI regulation, learn about these results, spread the word: most AI systems are voting systems (where vote is done through data and behaviour injection), they should be regulated as such and required to use robust methods when aggregating votes.

4/4

Along the path, I was lucky to onboard @lenhoang back to research, he was an expert in voting systems turned science communicator, we moved between robust ML & voting systems, of which social media are the most consequential incarnation.

If you are a policy maker skip to next paragraph, if you are a mathematician check out our latest results just presented at the AI & Statistics Conference: https://arxiv.org/abs/2106.02394, there are plenty of beautiful mathematical results to chase in this direction.

On the Strategyproofness of the Geometric Median

The geometric median, an instrumental component of the secure machine learning toolbox, is known to be effective when robustly aggregating models (or gradients), gathered from potentially malicious (or strategic) users. What is less known is the extent to which the geometric median incentivizes dishonest behaviors. This paper addresses this fundamental question by quantifying its strategyproofness. While we observe that the geometric median is not even approximately strategyproof, we prove that it is asymptotically $α$-strategyproof: when the number of users is large enough, a user that misbehaves can gain at most a multiplicative factor $α$, which we compute as a function of the distribution followed by the users. We then generalize our results to the case where users actually care more about specific dimensions, determining how this impacts $α$. We also show how the skewed geometric medians can be used to improve strategyproofness.

arXiv.org

Algorithms can't face +100k *human* trolls, democracies can't easily hire a million trolls.

This message was hard to convey in conferences/academic talks 8 to 5 years ago, despite the established Russian operations in Brexit and US elections. US companies were (against US interests) successful in lobbying to minimise the role of these operations, including among academic researchers, who bought on the minimisation argument.
2/.

A message I've been repeating since 2015 is that the most consequential voting systems we have today on earth are social media algorithms.

Every like, follow, comment or watchtime is a vote, shaping the global informational diet. Dictators understood that better than citizens.

China's inforwar counts on 20 million "soldiers", incl. 2 million full-timers. I've seen proportional data for Morocco. Regimes we don't expect are influencing more developed countries.
1/.

حتى حاجة ما كاتعوّض الوالدين الحقيقين و لكن كان هذا أقرب ما قرّبت لواليديا فهاد الربع سنين.

Nothing replaces seeing your parents, but this was the closest I've ever been to that: meeting @Free_Omar_Radi's parents as they represented him for the @RSF_inter award for independence.