Welcome to Monday!* The sun is shining, the random Gaussian matrices are chirping in the low-dimensional trees 🌳... answers and discussion for last week's quiz 📊 on dimensionality reduction!

Recall: you have n vectors in a d-dim Euclidean space, but d is very large so we would like to get n vectors in dimension k<<d, preserving pairwise distances. Can we?

* I swear!
https://mathstodon.xyz/@ccanonne/109323090322352432

1/

Clément Canonne (@[email protected])

Attached: 1 image Thursfriday is here, time for a (bi)weækly quiz! 📊 Let's talk about what happens when we take big things and turn them into small things without making big things that were far become small things that are close and vice versa. Alright, let's talk dimensionality reduction. 1/

Mathstodon

The first question, Q1 asked if we could do it while preserving the distances *exactly*. 32% answered correctly: that's unfortunately too much to ask.🙍

https://mathstodon.xyz/@ccanonne/109323110454628004

No, Virginia, there is no Santa Claus.

Here is a simple argument (thanks to @thesasho: take n=d-1, and n=d+1 points consisting of the origin and an orthogonal basis in ℝᵈ.

As you preserve the distances, your n projected points will have to be (up to translation) the origin of ℝᵏ and d unit vectors. Also, since...
2/

Clément Canonne (@[email protected])

Of course, this brings the question: can it be done? 🤔 Q1: is there a way to "map" n arbitrary d-dimensional vectors to k-dim vectors, while preserving all their pairwise Euclidean distances, for $k \ll d$? (And if so, does k depend on d, n, both?) 4/ [ ] Yes, for some k = k(n) [ ] Yes, for some k=k(d,n) [ ] No [ ] Show me

Mathstodon

... you preserve the distances, you also preserve the inner products, as:
\[\langle u,v\rangle = \frac{1}{2}( \|u\|^2 + \|v\|^2- \|u-v\|^2)
\]
But then the projection of these d points form an orthonormal system in ℝ^k, so... k ≥ d.

Not a great dimensionality reduction. 🧐

Let's take a step back though: maybe we don't need *exact* distances. Can we *nearly* preserve all pairwise distances, up to a 1±ε multiplicative factor?

3/

That's what Q2 asked. The answer? a resounding YES, as 39% of you wrote. k doesn't even need to depend on d!

So what can we do exactly? There are many refinements of this, but the shiny, magnificent hammer we have at our disposal is the (rightfully) celebrated Johnson–Lindenstrauss lemma 🔨, which gives a computationally efficient, simple way to do so: pick a random k×d matrix with i.i.d. Gaussian entries, that's your projection matrix to k-dim space. #coollemma

https://mathstodon.xyz/web/@ccanonne/109323116275539500

4/

Clément Canonne (@[email protected])

But... maybe we don't need to preserve the distance *exactly*. Maybe preserving them up to, say, a factor arbitrarily close to 1 is enough? Q2: is there a way to "map" n arbitrary d-dim vectors to k-dim vectors, with $k \ll d$, while preserving all pairwise Euclidean distances up to 1±ε? 5/ [ ] Yes, for k=k(n,ε) [ ] Yes, for k=k(n,d,ε) [ ] Nope... nope. Nope. [ ] Show me!

Mathstodon

With probability 1-δ, as long as k≍(log(n/δ))/ε², all pairwise Euclidean distances between your n points (and so also inner products) will be preserved up to 1±ε.

Magic! 🧙 And that's not all... not only does that answer Q3 (vindicating 31% of you), it gives even more: it's a *linear* mapping, which is wonderful for many applications.

https://mathstodon.xyz/web/@ccanonne/109323121220341568

It's also optimal, as Kasper Green Larsen and Jelani Nelson showed: https://arxiv.org/abs/1609.02094

5/

Clément Canonne (@[email protected])

Alright, that's good, but let's ask for more, right. Why not? Q3: What's the best bound on k we know, i.e., how low-dimension exactly can we get this done, and is that mapping (computationally) *efficient*? 6/ [ ] k≍(log n)/ε², efficient [ ] k≍(log n)/ε², inefficient [ ] k≍(log(dn)/ε², efficient [ ] Will you just show me, please?

Mathstodon

And even more: you don't even need Gaussian entries! Any subgaussian r.v. would do, so you can use independent random ±1 entries 🎲 if you prefer. Goodbye, annoying sampling considerations! 🎉

So, what's the catch?

Is there a catch?

Well, maybe we need a lot of randomness to do that, say Ω(dk) random bits (size of the matrix), which could be a lot to ask?

That could be a catch. 🤔

But no, as 12% of you answered to Q4, O(log(d) log(k)) suffice!
https://mathstodon.xyz/@ccanonne/109323128131547867

🍰There is no catch! 🦄

6/

Clément Canonne (@[email protected])

And finally... one last question. All the things above were achieved by randomized mappings 🎲, which brings the issue of how much #randomness we need to do that mapping. Randomness doesn't grow on 🌳 trees, you know! Q4: How many random bits 🎲 do we need to achieve this dimensionality reduction? 7/ [ ] O(dk) [ ] O(d²) [ ] O(log(d) log(k)) [ ] I hereby request that you show me

Mathstodon

This last part was proved by Daniel Kane and Jelani Nelson:
📝 https://arxiv.org/abs/1006.3585

Even better(er): if the points are given and the embedding can depend on the n points (as opposed to a general-purpose, oblivious mapping) it can be done deterministically: (‎Engebretsen, Indyk, O’Donnell’02) and (Sivakumar ’02):
📝 https://dl.acm.org/doi/10.5555/545381.545476
📝 https://dl.acm.org/doi/10.1145/509907.509996

Even for oblivious, Kane, Meka, Nelson’11 give an improved bound on the randomness required, \(\tilde{O}(\log d+\log n )\). 🎲
7/

A Derandomized Sparse Johnson-Lindenstrauss Transform

Recent work of [Dasgupta-Kumar-Sarlos, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. The main ingredient in our proof is a spectral moment bound for quadratic forms that was recently used in [Diakonikolas-Kane-Nelson, FOCS 2010].

arXiv.org

tl;dr: for Euclidean distances, dimensionality reduction is possible, efficient, elegant, has a bunch of extra nice properties (linear mapping!), and is randomness-efficient. 🍪

The Johnson–Lindenstrauss lemma (and its refinements) is a truly wonderful result, which has found many applications all over the place.
https://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma

8/

Johnson–Lindenstrauss lemma - Wikipedia

Now, of course, that's for Euclidean distances: for other distances, dimensionality reduction and #metricembeddings can be... trickier. But let's just celebrate the JL lemma for now! 🔨

Or you could also check out these #lecturenotes (2018) by Jiří Matoušek and Moses Charikar:
📝 https://web.stanford.edu/class/cs369m/lectures.html

That's all for today: if you have questions, comments, please do so below. And see you in 1.5 weeks for another quiz!

9/end

Lecture Topics and Notes · CS 369M

@ccanonne thank you, very nice overview. I guess for an embedding with the \(\ell_p\)-norm, AFAIR, the dimension scales like \(O((\log n)^{p/2})\), for \(p\geq 2\).
@ccanonne and you can even do an embedding by only recording the sign of the random projections, by an application of the arcsin law, with Gaussian projection though. In this case the Hamming distance of projected points remains close to the angular distance between them. This is at the core of one-bit compressive sensing.
@lowrankjack any "quick and easy" reference? I always found the literature on this a bit annoying to navigate, with a lot of background knowledge assumed or varying notation.
@ccanonne I gave a talk on how to "quantize" JL lemma in 2014, slides are available here https://laurentjacques.gitlab.io/event/when-buffon-s-needle-problem-helps-in-quantizing-the-johnson-lindenstrauss-lemma/ I recall visually the main principles of the one-bit (sign) case from slide 22. There is a very good review on the topic by Sjoerd Dirksen https://www.semanticscholar.org/paper/Quantized-Compressed-Sensing%3A-A-Survey-Dirksen/dda6061642b78c8f21bca1185e035f410a48db79 #OneBit #CompressiveSensing #RandomProjection #JohnsonLindenstrauss
When Buffon's needle problem helps in quantizing the Johnson-Lindenstrauss Lemma | Laurent Jacques

Abstract: In 1733, Georges-Louis Leclerc, Comte de Buffon in France, set the ground of geometric probability theory by defining an enlightening problem: What is the probability that a needle thrown randomly on a ground made of equispaced parallel strips lies on two of them?

Laurent Jacques

@lowrankjack @ccanonne If you mean embedding an $n$-point subset of \(\ell_p\) into \(\ell_p^k\), then I don't think we know any sub-polynomial in \(n\) bound on \(k\) for
any \(p > 2\). See, e.g., the intro to this nice paper https://arxiv.org/abs/2109.06602.

Or do you mean something else?

$\varepsilon$-isometric dimension reduction for incompressible subsets of $\ell_p$

Fix $p\in[1,\infty)$, $K\in(0,\infty)$ and a probability measure $μ$. We prove that for every $n\in\mathbb{N}$, $\varepsilon\in(0,1)$ and $x_1,\ldots,x_n\in L_p(μ)$ with $\big\| \max_{i\in\{1,\ldots,n\}} |x_i| \big\|_{L_p(μ)} \leq K$, there exists $d\leq \frac{32e^2 (2K)^{2p}\log n}{\varepsilon^2}$ and vectors $y_1,\ldots, y_n \in \ell_p^d$ such that $$\forall \ i,j\in\{1,\ldots,n\}, \qquad \|x_i-x_j\|^p_{L_p(μ)}- \varepsilon \leq \|y_i-y_j\|_{\ell_p^d}^p \leq \|x_i-x_j\|^p_{L_p(μ)}+\varepsilon.$$ Moreover, the argument implies the existence of a greedy algorithm which outputs $\{y_i\}_{i=1}^n$ after receiving $\{x_i\}_{i=1}^n$ as input. The proof relies on a derandomized version of Maurey's empirical method (1981) combined with a combinatorial idea of Ball (1990) and classical factorization theory of $L_p(μ)$ spaces. Motivated by the above embedding, we introduce the notion of $\varepsilon$-isometric dimension reduction of the unit ball ${\bf B}_E$ of a normed space $(E,\|\cdot\|_E)$ and we prove that ${\bf B}_{\ell_p}$ does not admit $\varepsilon$-isometric dimension reduction by linear operators for any value of $p\neq2$.

arXiv.org
@thesasho @ccanonne thanks for the link. I was referring to the fact that, given a m×n Gaussian random matrix A, with high probability, \(\|Ax\|_p\) is close, with controlled distortion, to \(\|x\|_2\), for all n dimensional points x in a finite set with N points, provided m is large compared to \(m_0 = O((\log N)^{p/2})\). This comes from a concentration result from Ledoux regarding Lipschits functions. It extends to restricted isometric btw.
@lowrankjack @ccanonne I see, thanks! So you are embedding \(n\) points in \(\ell_2\) into \(\ell_p^k\), that makes sense.