Mastodawn

Johannes Gasteiger Aug 5, 2024

Having difficulty keeping up with the latest AI safety research?

Great news: My new blog will help with just that!

"AI Safety at the Frontier" covers each month's (subjectively) best papers.

In July '24, I discuss Safetywashing, SAD, AgentDojo and much more: https://open.substack.com/pub/aisafetyfrontier/p/paper-highlights-july-24?r=1v4l78&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true

Paper Highlights, July '24

Safetywashing in benchmarks, SAD, AgentDojo, vision-language model attacks, latent adversarial training, brittle steering vectors, the 2-dimensional truth, legible LLM solutions, and more LLM debate.

Show thread

Johannes Gasteiger Jan 5, 2023

Since IBMB only looks at the output nodes and their surroundings, its runtime is actually independent of the overall graph size! So the speedup becomes even greater for sparser training sets, such as hand-labeled nodes.

Again, note the log. x-axis.
7/8

Show thread

Johannes Gasteiger Jan 5, 2023

Many more experiments and details in the paper!

Paper: https://openreview.net/forum?id=b9g0vxzYa_
Code: https://github.com/TUM-DAML/ibmb
(And hopefully soon part of PyG.)

Work with @ChendiQian and @guennemann.
8/8

Influence-Based Mini-Batching for Graph Neural Networks

Influence-based mini batching enables large-scale inference and training for graph neural networks by maximizing the influence of selected nodes on the output.

OpenReview

Show thread

Johannes Gasteiger Jan 5, 2023

And the fixed batches allow us to precompute them and cache them in a nice block of consecutive memory, substantially speeding up training as well.

Note again the log. x-axis.
6/8

Show thread

Johannes Gasteiger Jan 5, 2023

This results in a fixed set of batches. You might think "WTF, SGD with fixed batches?!" And you're almost right. Almost.

Adaptive optimization and momentum (Adam) can handle these sparse gradients quite well. For the remaining problems we propose a batch scheduling scheme.
5/8

Show thread

Johannes Gasteiger Jan 5, 2023

This results in very efficient batches and an up to 130x speedup over the baseline.

These plots show the accuracy vs. speed trade-off for 3 datasets, 3 GNNs, and multiple mini-batching methods when varying their hyperparameters. Note the log. x-axis.
4/8

Show thread

Johannes Gasteiger Jan 5, 2023

IBMB uses influence scores to select the most important neighbors, instead of a random set.

It works in 2 steps:
1. Partition the output nodes (for which we want predictions) into batches.
2. For each mini-batch, select the auxiliary nodes that help most with predictions.
2/8

Show thread

Johannes Gasteiger Jan 5, 2023

Luckily, the influence scores simplify to personalized PageRank (PPR) if we make some assumptions.

Step 2 then becomes an application of PPR or topic-sensitive PageRank.

Step 1 is more tricky and requires falling back to heuristics like PPR or graph partitioning.
3/8

Johannes Gasteiger Jan 5, 2023

You can sample nodes for scalable #GNN #training. But how do you do #scalable #inference?

In our latest paper (Oral #LogConference
) we introduce influence-based mini-batching (#IBMB) for both fast inference and training, achieving up to 130x and 17x speedups, respectively!

1/8 in 🧵

Johannes Gasteiger Nov 30, 2022

I’m thrilled to share that I’m starting a new position as Research Scientist in Bryan Perozzi's team at #Google Research this week!

Looking forward to all the exciting research and opportunities for positive impact!