Johannes Gasteiger

147 Followers
37 Following
16 Posts

né Klicpera.

Safety for ML. ML for graphs.
Research Scientist at Google Research. Previously TUM, DeepMind, and FAIR.
Opinions my own.

Having difficulty keeping up with the latest AI safety research?

Great news: My new blog will help with just that!

"AI Safety at the Frontier" covers each month's (subjectively) best papers.

In July '24, I discuss Safetywashing, SAD, AgentDojo and much more: https://open.substack.com/pub/aisafetyfrontier/p/paper-highlights-july-24?r=1v4l78&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Paper Highlights, July '24

Safetywashing in benchmarks, SAD, AgentDojo, vision-language model attacks, latent adversarial training, brittle steering vectors, the 2-dimensional truth, legible LLM solutions, and more LLM debate.

Since IBMB only looks at the output nodes and their surroundings, its runtime is actually independent of the overall graph size! So the speedup becomes even greater for sparser training sets, such as hand-labeled nodes.

Again, note the log. x-axis.
7/8

Many more experiments and details in the paper!

Paper: https://openreview.net/forum?id=b9g0vxzYa_
Code: https://github.com/TUM-DAML/ibmb
(And hopefully soon part of PyG.)

Work with @ChendiQian and @guennemann.
8/8

Influence-Based Mini-Batching for Graph Neural Networks

Influence-based mini batching enables large-scale inference and training for graph neural networks by maximizing the influence of selected nodes on the output.

OpenReview

And the fixed batches allow us to precompute them and cache them in a nice block of consecutive memory, substantially speeding up training as well.

Note again the log. x-axis.
6/8

This results in a fixed set of batches. You might think "WTF, SGD with fixed batches?!" And you're almost right. Almost.

Adaptive optimization and momentum (Adam) can handle these sparse gradients quite well. For the remaining problems we propose a batch scheduling scheme.
5/8

This results in very efficient batches and an up to 130x speedup over the baseline.

These plots show the accuracy vs. speed trade-off for 3 datasets, 3 GNNs, and multiple mini-batching methods when varying their hyperparameters. Note the log. x-axis.
4/8

IBMB uses influence scores to select the most important neighbors, instead of a random set.

It works in 2 steps:
1. Partition the output nodes (for which we want predictions) into batches.
2. For each mini-batch, select the auxiliary nodes that help most with predictions.
2/8

Luckily, the influence scores simplify to personalized PageRank (PPR) if we make some assumptions.

Step 2 then becomes an application of PPR or topic-sensitive PageRank.

Step 1 is more tricky and requires falling back to heuristics like PPR or graph partitioning.
3/8

You can sample nodes for scalable #GNN #training. But how do you do #scalable #inference?

In our latest paper (Oral #LogConference
) we introduce influence-based mini-batching (#IBMB) for both fast inference and training, achieving up to 130x and 17x speedups, respectively!

1/8 in 🧵

I’m thrilled to share that I’m starting a new position as Research Scientist in Bryan Perozzi's team at #Google Research this week!

Looking forward to all the exciting research and opportunities for positive impact!