One of the most surprising privacy results of the last 5 years is the LMW “doubly efficient PIR” paper. The basic idea is that I can load an item from a public database without the operator seeing which item I’m loading & without it having to touch every item in the DB each time.
Short background: Private Information Retrieval isn’t a new idea. It lets me load items from a (remote) public database without the operator learning what item I’m asking for. But traditionally there’s a *huge* performance hit for doing this.
Think about it this way: if the database is public and you’re the operator, then if I ask for entry #22: no matter what crypto we use, you’re going to know which sectors on disk you read from. If it’s a small subset, you can just query yourself on every file til you find one that matches.
The only way to get around this (apparently) is for the crypto protocol to make the operator read and compute over *every single* file on disk just to serve one file. This sucks. And at a high level, it seems to indicate that a “private Internet” will always suck.
After all: the Internet is mostly loading stuff from servers. Computing is mostly loading stuff from RAM or disk. If we want to make those loads happen privately, then every single one (intuitively) now has to have the same cost as reading everything. Yuck.
(Imagine trying to stream “A Christmas Prince” on Netflix, but not wanting anyone to know you have bad taste. To serve this privately, the server has to load and calculate over every single movie in the Netflix catalog! That’s going to be terrible.)
There’s a loophole, though. If the database is static, and people make many different queries to it, you can theoretically sneak around the badness as follows. First do one big “pre-processing” on the whole database. Then individual queries can just look at a small piece of that.
This assumes that the operator can somehow take in a public database, then (using a fully public unkeyed process) “scramble it up” so well that when someone asks for a few disk sectors, even the operator has no idea what entries are stored there.
If you asked me whether such a fully public unkeyed process was possible, my intuition would have said “no way”. Or maybe it would have said “sure, but it will imply implausibly strong cryptographic obfuscation and trusted setup.” (And some early attempts used this.)
Which brings me to the LMW DEPIR paper. What they show is that there’s an extremely “simple” form of oblivious locally-decodable code that has exactly the property you need. It’s based on Ring LWE and while it isn’t *super* efficient right now, it’s not crazy either.
The encoding scheme uses very simple ingredients: it encodes the DB into multivariate polynomial over a ring. The client then queries on a Ring LWE-encrypted index, which allows it to obtain the polynomial at that index. A clever lookup-table approach lets the server evaluate the polynomial without touching O(|DB|) bits of memory or disk.
The neat thing about this result is that it also implies that we can build fully-homomorphic encryption over RAM machines that avoids worst-case costs. Here the “database” is a computer’s RAM and nobody can see what values it accesses.
Anyway this series of tweets hardly does justice to how surprising and cool this result is. Here is the paper itself. If we look back in 30 years I think this will be one of the most significant cryptography results of our time. https://eprint.iacr.org/2022/1703
A (single server) private information retrieval (PIR) allows a client to read data from a public database held on a remote server, without revealing to the server which locations she is reading. In a doubly efficient PIR (DEPIR), the database is first preprocessed, but the server can subsequently answer any client's query in time that is sub-linear in the database size. Prior work gave a plausible candidate for a public-key variant of DEPIR, where a trusted party is needed to securely preprocess the database and generate a corresponding public key for the clients; security relied on a new non-standard code-based assumption and a heuristic use of ideal obfuscation. In this work we construct the stronger unkeyed notion of DEPIR, where the preprocessing is a deterministic procedure that the server can execute on its own. Moreover, we prove security under just the standard ring learning-with-errors (RingLWE) assumption. For a database of size $N$ and any constant $\varepsilon>0$, the preprocessing run-time and size is $O(N^{1+\varepsilon})$, while the run-time and communication-complexity of each PIR query is $polylog(N)$. We also show how to update the preprocessed database in time $O(N^\varepsilon)$. Our approach is to first construct a standard PIR where the server's computation consists of evaluating a multivariate polynomial; we then convert it to a DEPIR by preprocessing the polynomial to allow for fast evaluation, using the techniques of Kedlaya and Umans (STOC '08). Building on top of our DEPIR, we construct general fully homomorphic encryption for random-access machines (RAM-FHE), which allows a server to homomorphically evaluate an arbitrary RAM program $P$ over a client's encrypted input $x$ and the server's preprocessed plaintext input $y$ to derive an encryption of the output $P(x,y)$ in time that scales with the RAM run-time of the computation rather than its circuit size. Prior work only gave a heuristic candidate construction of a restricted notion of RAM-FHE. In this work, we construct RAM-FHE under the RingLWE assumption with circular security. For a RAM program $P$ with worst-case run-time $T$, the homomorphic evaluation runs in time $T^{1+\varepsilon} \cdot polylog(|x| + |y|)$.
@matthew_d_green I can't wait to see what comes out of this research, as it gets applied to real services!
Honestly though its going over my head.
[2022 LMW DEPIR paper] implies that we can build fully-homomorphic encryption over RAM machines that avoids worst-case costs.
https://eprint.iacr.org/2022/1703
Now that's something!
With DEPIR the cost for a query goes down to O(polylog(N)) for RAM size N. Updating is O(Nε).
I'm not sure how this relates to the complexity of homomorphic encryption (making computations without letting the operator know what we did), which is exponential in program length and data size in the schemes I know of.
Thanks, @matthew_d_green for sharing!
A (single server) private information retrieval (PIR) allows a client to read data from a public database held on a remote server, without revealing to the server which locations she is reading. In a doubly efficient PIR (DEPIR), the database is first preprocessed, but the server can subsequently answer any client's query in time that is sub-linear in the database size. Prior work gave a plausible candidate for a public-key variant of DEPIR, where a trusted party is needed to securely preprocess the database and generate a corresponding public key for the clients; security relied on a new non-standard code-based assumption and a heuristic use of ideal obfuscation. In this work we construct the stronger unkeyed notion of DEPIR, where the preprocessing is a deterministic procedure that the server can execute on its own. Moreover, we prove security under just the standard ring learning-with-errors (RingLWE) assumption. For a database of size $N$ and any constant $\varepsilon>0$, the preprocessing run-time and size is $O(N^{1+\varepsilon})$, while the run-time and communication-complexity of each PIR query is $polylog(N)$. We also show how to update the preprocessed database in time $O(N^\varepsilon)$. Our approach is to first construct a standard PIR where the server's computation consists of evaluating a multivariate polynomial; we then convert it to a DEPIR by preprocessing the polynomial to allow for fast evaluation, using the techniques of Kedlaya and Umans (STOC '08). Building on top of our DEPIR, we construct general fully homomorphic encryption for random-access machines (RAM-FHE), which allows a server to homomorphically evaluate an arbitrary RAM program $P$ over a client's encrypted input $x$ and the server's preprocessed plaintext input $y$ to derive an encryption of the output $P(x,y)$ in time that scales with the RAM run-time of the computation rather than its circuit size. Prior work only gave a heuristic candidate construction of a restricted notion of RAM-FHE. In this work, we construct RAM-FHE under the RingLWE assumption with circular security. For a RAM program $P$ with worst-case run-time $T$, the homomorphic evaluation runs in time $T^{1+\varepsilon} \cdot polylog(|x| + |y|)$.