Mastodawn

Clément Canonne Jan 26, 2023

📊 It's Thurs-Fri-day [citation needed], time for a (bi)weekly #quiz! It's been a while, so let's take it slowly... lest we make mistakes. Perfect segue: today, Online #Learning Mistake Bounds!

What?

Well, you want to learn a classifier, here a function f: {0,1}ⁿ→{0,1}...

1/ #weaeklyquiz

Show thread

Clément Canonne Jan 26, 2023

... and you observe a sequence of inputs, in arbitrary, possibly adversarial order (no distributional assumption, nothing!). At time t, you're given xₜ in {0,1}ⁿ, need to predict f(xₜ).

If you're wrong? You lose one point (that counts as "one mistake", but are immediately given the true value f(xₜ).

Show thread

Clément Canonne Jan 26, 2023

Your goal is to minimize the total number of mistakes M you ever will make, on any sequence of inputs, possibly going on forever. Again, no assumption on the distribution of the inputs, nothing!

On the bright side? You don't have computational limitations, time or memory-wise.

Show thread

Clément Canonne Jan 26, 2023

So, the first and obvious question: can we do anything AT ALL? After all, the "challenge" xₜ∈{0,1}ⁿ at each step can be chosen adversarially based on the past, and the game goes on forever..

Q1: If f: {0,1}ⁿ→{0,1} is arbitrary, can we achieve any finite mistake bound M?

No, M=∞ 🥔

9.5%

Yes, M=2ⁿ 🥥

81%

Yes! M=O(n) 🍰

Show me the stats 🥦

9.5%

Poll ended at Jan 29, 2023 at 10:25pm.

Show thread

Clément Canonne Jan 26, 2023

Fine. But what if f was not arbitrary? If I promised you the unknown classifier had some reasonable structure (not some completely horrendous, pathological Boolean function), could we do something good?

For instance, if I told you f was just an OR of some of the n variables?

Show thread

Clément Canonne Jan 26, 2023

Let's just suppose f is a monotone disjunction (a giant OR ⋁ of up to n of the variables, without negating any of them). Should be easier? Maybe?

Q2: If f: {0,1}ⁿ→{0,1} is a monotone disjunction, can we achieve any finite mistake bound M?

Yes, M=2ⁿ 🥥

Yes! M=O(n) 🍰

100%

Yes! M=O(log n) 🍦

Show me the stats 🥦

Poll ended at Jan 29, 2023 at 10:28pm.

Show thread

Clément Canonne Jan 26, 2023

Alright, alright. But even monotone disjunctions are a bit much: what are the odds our classifier could actually use ALL n variables? n is big?

What if I promised you it was sparse, i.e., only depended on only k << n of the variables (we just don't know which)?

Show thread

Clément Canonne Jan 26, 2023

If f is promised to be sparse, can we guarantee we'll ever make only a very small number of mistakes, no matter what, in this online learning framework?

Q3: If f: {0,1}ⁿ→{0,1} is a k-sparse monotone disjunction, can we achieve any good mistake bound M?

Yes! M=O(kn) 🍰

7.7%

Yes! M=O(k log n) 🍦

69.2%

Yes! M=O(k) 🍪

7.7%

Show me the stats 🥦

15.4%

Poll ended at Jan 27, 2023 at 10:30pm.

Show thread

Clément Canonne Jan 26, 2023

To conclude: let's go beyond monotone disjunctions, sparse of not. Let's do... lists! Lists are good!

A decision list (DL, but not Deep Learning 👀) is a classifier of the form "if x₄=1 then true; otherwise if x₃₉=0 then false; otherwise if [...]; otherwise true."

Show thread

Clément Canonne Jan 26, 2023

(Technically, that's a 1-DL since each "if" involves only 1 literal, but let's not nitpick). Every disjunction is a DL, but DLs are more expressive.

The length of a DL is the number of if/then clauses in it.

So, let's see...

If f: {0,1}ⁿ→{0,1} is a length-ℓ DL, can we achieve any finite mistake bound M?

10/

Yes! M=O(ℓn) 🍰

23.1%

Yes! M=O(ℓ log n) 🍦

38.5%

Yes! M=O(ℓ) 🍪

7.7%

Show me the stats 🥦

30.8%

Poll ended at Jan 29, 2023 at 10:32pm.

Show thread

Clément Canonne Jan 26, 2023

Alright, that's all for today! Hope you enjoyed this #weaeklyquiz. As usual, answers and discussions in ≈3 days.

In the meantime, feel free to comment or ask questions below!

11/end

Show thread

Clément Canonne Feb 1, 2023

📊 Answers and discussion for last week's quiz on online learning: mistake bounds, and where to get them!

Link to the quiz: https://mathstodon.xyz/@ccanonne/109757821363734340

You want to learn some classifier f: {0,1}ⁿ→{0,1} given an (arbitrary) sequence of inputs. Making few mistakes *in total* over this sequence, no matter how long it is (even infinite): you get an input x, you predict f(x): if you're right, good, if you're wrong, you get told you made a mistake. Rinse and repeat.

Clément Canonne (@[email protected])

Attached: 1 image 📊 It's Thurs-Fri-day [citation needed], time for a (bi)weekly #quiz! It's been a while, so let's take it slowly... lest we make mistakes. Perfect segue: today, Online #Learning Mistake Bounds! What? Well, you want to learn a classifier, here a function f: {0,1}ⁿ→{0,1}... 1/ #weaeklyquiz

Mathstodon

Show thread

Clément Canonne Feb 1, 2023

Also, you're not guaranteed to learn f exactly! All you are asked to is to answer according to some g which agrees with f on the sequence of inputs you're given. E.g., if the sequence is *always* the same x, repeatedly... you won't learn all of the 2ⁿ values f takes on {0,1}ⁿ, but you sure can guarantee to make ≤ 1 mistake on this sequence!

Show thread

Clément Canonne Feb 1, 2023

So, Q1: given that the sequence of queries (inputs) can be infinite and arbitrary, and so is the classifier (Boolean function) f to learn... can we guarantee anything at all on the # of mistakes we'll make?

https://mathstodon.xyz/@ccanonne/109757852367750506

As 81% of you answered: yes! 🥥 At most 2ⁿ which, admittedly, isn't GREAT, but is finite.

There 2 two ways to see it:

Clément Canonne (@[email protected])

So, the first and obvious question: can we do anything AT ALL? After all, the "challenge" xₜ∈{0,1}ⁿ at each step can be chosen adversarially based on the past, and the game goes on forever.. Q1: If f: {0,1}ⁿ→{0,1} is arbitrary, can we achieve any finite mistake bound M? 4/ [ ] No, M=∞ 🥔 [ ] Yes, M=2ⁿ 🥥 [ ] Yes! M=O(n) 🍰 [ ] Show me the stats 🥦

Mathstodon

Show thread

Clément Canonne Feb 1, 2023

First, well, given that you have no computational (time or memory) limits, you can just store all *distinct* inputs xₜ seen so far and the value f(xₜ) (which you know afterwards, mistake or not).

This means that you won't make a mistake on the same input twice. But those inputs are from {0,1}ⁿ, so there are 2ⁿ distinct choices of them at most. So M(f)≤2ⁿ mistakes (but a LOT of memory used to remember all that).

Show thread

Clément Canonne Feb 1, 2023

Second approach? Forget about # of possible inputs: consider the number of possible *functions.*

There are 2^(2ⁿ) possible functions (classifiers). Now, here's the claim:

If you are promised that the classifier f belongs to a function class C, then one can always achieve mistake bound M(f) ≤ log₂ |C|

Yup.🤯 And log₂ (2^(2ⁿ)) = 2ⁿ.

Now, how? With the "halving" algorithm.

Show thread

Clément Canonne Feb 1, 2023

What's the halving algo, you ask? Simple:
- maintain a set H of "still OK" classifiers, initially H=C (recall: you're promised f∊C)
- on input x, take a vote among H: compute g(x)∊{0,1} for all g∊H, output the majority value.
- if that was a mistake? Remove all g that erred from H.

Show thread

Clément Canonne Feb 1, 2023

Why does that work? Well, we can get our mistake bound by combining 3 facts:

* at every mistake, at least 50% of the g's from H must have been wrong (we took the majority!), so |H| is at least halved
* the "true" f will never be removed from H (it doesn't err!), so |H|≥1 always
* at first, H=C

so... we can make at most log₂ |C| mistakes!

Show thread

Clément Canonne Feb 1, 2023

This is, of course (1) computationally inefficient (go over all g in H at every step), and (2) brittle to errors (if f is not really in C as promised, then doom).

This can be fixed: (1) randomization (don't take the majority, but instead answer according to a randomly chosen g from H), and... (2) the Multiplicative Weights Update (MWU) algorithm! See, e.g., these slides: https://unisyd-my.sharepoint.com/:p:/g/personal/clement_canonne_sydney_edu_au/EaQorLRv5g9FqJzPF2k0Z5UBr0hGGvWfoaJNmkkZGHBbSw?e=9kMTQ7

COMP3927-mwu-guest-lecture.pptx

Show thread

Clément Canonne Feb 1, 2023

Great, we're done with Q1: to summarize, we can always guarantee a mistake bound

M(f) ≤ min(|X|, log₂ |C|)

if f: X→{0,1} is promised to be in some class \(C \subseteq \{0,1\}^X\).

This may not be tight, of course. But what can you do? 🤷

Show thread

Clément Canonne Feb 1, 2023

Onto Q2! Q2 asked about a specific class of functions C, the monotone disjunctions: f is of the form f(x) = \(\bigvee_{k∊S} x_k\), with S⊆{1,..,n}.

https://mathstodon.xyz/@ccanonne/109757866288172941

As 100% (holy cow!) of you answered, under this particular promise, we can achieve M(f) = O(n). Namely, M(f) ≤ n.

10/

Clément Canonne (@[email protected])

Let's just suppose f is a monotone disjunction (a giant OR ⋁ of up to n of the variables, without negating any of them). Should be easier? Maybe? Q2: If f: {0,1}ⁿ→{0,1} is a monotone disjunction, can we achieve any finite mistake bound M? 6/ [ ] Yes, M=2ⁿ 🥥 [ ] Yes! M=O(n) 🍰 [ ] Yes! M=O(log n) 🍦 [ ] Show me the stats 🥦

Mathstodon

Show thread

Clément Canonne Feb 1, 2023

Now, based on what we saw, this shouldn't be too surprising: |C|=2ⁿ (as many possible functions as subsets S), so the halving algo achieves mistake bound log₂ |C|=n.

But that's computationally inefficient.

Turns out, we can achieve the same bound w/ an efficient algo!

11/

Show thread

Clément Canonne Feb 1, 2023

The algorithm is very simple: keep a list of all variable (initially T = {1,...,n}). Answer according to \(g(x)= \bigvee_{k∊T} x_k\).

Any time you make a mistake (can only be saying "1" but it should be "0": can you see why?), remove from T every variable that is 1 in the input.

Clearly, that's efficient (evaluating g takes O(n) time, not exponential time as with the halving algo). But why does it work?

12/

Show thread

Clément Canonne Feb 1, 2023

Again, we can analyze this by combining a few simple facts.

If S corresponds to the *true* f:
- no variable from S is ever removed
- initially |T|=n
- each mistake makes removes at least one variable from T

This gives the bound!

13/