Mastodawn

Jonas Köhler May 5, 2023

RT @DaniloJRezende
Tired of reading about AI doom?
Read about ML for quantum field theory in arbitrary space-time dimensions :)
https://arxiv.org/abs/2305.02402

Normalizing flows for lattice gauge theory in arbitrary space-time dimension

Applications of normalizing flows to the sampling of field configurations in lattice gauge theory have so far been explored almost exclusively in two space-time dimensions. We report new algorithmic developments of gauge-equivariant flow architectures facilitating the generalization to higher-dimensional lattice geometries. Specifically, we discuss masked autoregressive transformations with tractable and unbiased Jacobian determinants, a key ingredient for scalable and asymptotically exact flow-based sampling algorithms. For concreteness, results from a proof-of-principle application to SU(3) lattice gauge theory in four space-time dimensions are reported.

arXiv.org

Jonas Köhler Mar 23, 2023

RT @SebastienBubeck
At @MSFTResearch we had early access to the marvelous #GPT4 from @OpenAI for our work on @bing. We took this opportunity to document our experience. We're so excited to share our findings. In short: time to face it, the sparks of #AGI have been ignited.
https://arxiv.org/abs/2303.12712

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

arXiv.org

Jonas Köhler Mar 15, 2023

RT @duolingo
AI and education make a good duo.

Introducing Duolingo Max. A subscription tier above Super that gives you access to your own personal, AI-powered language tutor through Explain My Answer and Roleplay, two features developed with the latest @OpenAI technology.

details in 🧵

Jonas Köhler Mar 3, 2023

RT @_akhaliq
Consistency Models

achieve the new state-of-the-art FID of 3.55 on CIFAR10 and 6.20 on ImageNet 64 ˆ 64 for one-step
generation

abs: https://arxiv.org/abs/2303.01469

Consistency Models

Diffusion models have significantly advanced the fields of image, audio, and video generation, but they depend on an iterative sampling process that causes slow generation. To overcome this limitation, we propose consistency models, a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training on these tasks. Consistency models can be trained either by distilling pre-trained diffusion models, or as standalone generative models altogether. Through extensive experiments, we demonstrate that they outperform existing distillation techniques for diffusion models in one- and few-step sampling, achieving the new state-of-the-art FID of 3.55 on CIFAR-10 and 6.20 on ImageNet 64x64 for one-step generation. When trained in isolation, consistency models become a new family of generative models that can outperform existing one-step, non-adversarial generative models on standard benchmarks such as CIFAR-10, ImageNet 64x64 and LSUN 256x256.

arXiv.org

Jonas Köhler Mar 1, 2023

RT @david_van_dijk
We're proud to introduce Neural Integral Equations (NIE), the first deep learning model that can learn integral equations from data! Check out our paper at https://arxiv.org/abs/2209.15190. Work with @E_Zappala @aho_fonseca @josueortc

Neural Integral Equations

Nonlinear operators with long distance spatiotemporal dependencies are fundamental in modeling complex systems across sciences, yet learning these nonlocal operators remains challenging in machine learning. Integral equations (IEs), which model such nonlocal systems, have wide ranging applications in physics, chemistry, biology, and engineering. We introduce Neural Integral Equations (NIE), a method for learning unknown integral operators from data using an IE solver. To improve scalability and model capacity, we also present Attentional Neural Integral Equations (ANIE), which replaces the integral with self-attention. Both models are grounded in the theory of second kind integral equations, where the indeterminate appears both inside and outside the integral operator. We provide theoretical analysis showing how self-attention can approximate integral operators under mild regularity assumptions, further deepening previously reported connections between transformers and integration, and deriving corresponding approximation results for integral operators. Through numerical benchmarks on synthetic and real world data, including Lotka-Volterra, Navier-Stokes, and Burgers' equations, as well as brain dynamics and integral equations, we showcase the models' capabilities and their ability to derive interpretable dynamics embeddings. Our experiments demonstrate that ANIE outperforms existing methods, especially for longer time intervals and higher dimensional problems. Our work addresses a critical gap in machine learning for nonlocal operators and offers a powerful tool for studying unknown complex systems with long range dependencies.

arXiv.org

Jonas Köhler Feb 28, 2023

RT @tesssmidt
Equiformer Paper: https://openreview.net/forum?id=KwmPfARgOTD
Equiformer Code: https://github.com/atomicarchitects/equiformer
(3/3)

Equiformer: Equivariant Graph Attention Transformer for 3D...

We propose an equivariant graph neural network based on Transformer networks and propose a novel attention mechanism, which improves upon self-attention in typical Transformers.

OpenReview

Jonas Köhler Feb 15, 2023

RT @RickyTQChen
Excited to share our new work on Riemannian Flow Matching.

Unlike diffusion-based approaches, it’s
- completely simulation-free on simple manifolds,
- trivially applies to higher dimensions,
- tractably generalizes to general geometries!

https://arxiv.org/abs/2302.03660

w/ @lipmanya

Flow Matching on General Geometries

We propose Riemannian Flow Matching (RFM), a simple yet powerful framework for training continuous normalizing flows on manifolds. Existing methods for generative modeling on manifolds either require expensive simulation, are inherently unable to scale to high dimensions, or use approximations for limiting quantities that result in biased training objectives. Riemannian Flow Matching bypasses these limitations and offers several advantages over previous approaches: it is simulation-free on simple geometries, does not require divergence computation, and computes its target vector field in closed-form. The key ingredient behind RFM is the construction of a relatively simple premetric for defining target vector fields, which encompasses the existing Euclidean case. To extend to general geometries, we rely on the use of spectral decompositions to efficiently compute premetrics on the fly. Our method achieves state-of-the-art performance on many real-world non-Euclidean datasets, and we demonstrate tractable training on general geometries, including triangular meshes with highly non-trivial curvature and boundaries.

arXiv.org

Jonas Köhler Feb 14, 2023

RT @djjruhe
New work on Geometric Clifford Algebra Networks (GCANs). We propose geometric templates for modeling dynamical systems. A 🧵on geometric / Clifford algebras, and symmetry group transformations in neural networks.
📜https://arxiv.org/abs/2302.06594

Geometric Clifford Algebra Networks

We propose Geometric Clifford Algebra Networks (GCANs) for modeling dynamical systems. GCANs are based on symmetry group transformations using geometric (Clifford) algebras. We first review the quintessence of modern (plane-based) geometric algebra, which builds on isometries encoded as elements of the $\mathrm{Pin}(p,q,r)$ group. We then propose the concept of group action layers, which linearly combine object transformations using pre-specified group actions. Together with a new activation and normalization scheme, these layers serve as adjustable $\textit{geometric templates}$ that can be refined via gradient descent. Theoretical advantages are strongly reflected in the modeling of three-dimensional rigid body transformations as well as large-scale fluid dynamics simulations, showing significantly improved performance over traditional methods.

arXiv.org

Jonas Köhler Feb 8, 2023

RT @KevinKaichuang
Train a diffusion model on coarse-grained samples, and in addition to generating CG samples, you also get the CG force field!

@ArtsMarloes @vgsatorras @chinwei_h @danielzuegner @mfederici_ @CecClementi @FrankNoeBerlin @rpinsler @vdbergrianne

https://arxiv.org/abs/2302.00600

Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics

Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields and molecular dynamics to learn a CG force field without requiring any force inputs during training. Specifically, we train a diffusion generative model on protein structures from molecular dynamics simulations, and we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several small- to medium-sized protein simulations, reproducing the CG equilibrium distribution, and preserving dynamics of all-atom simulations such as protein folding events.

arXiv.org

Jonas Köhler Feb 6, 2023

RT @HaggaiMaron
(1/10) New paper! A deep architecture for processing (weights of) other neural networks while preserving equivariance to their permutation symmetries. Learning in deep weight spaces has a wide potential: from NeRFs to INRs; from adaptation to pruning https://avivnavon.github.io/DWSNets/ 👇

linktree	https://linktr.ee/jonkhler
publications (Google scholar)	https://scholar.google.com/citations?user=WNlTdm0AAAAJ
verification (ORCID)	https://orcid.org/0000-0002-7256-2892

Normalizing flows for lattice gauge theory in arbitrary space-time dimension

Sparks of Artificial General Intelligence: Early experiments with GPT-4

Consistency Models

Neural Integral Equations

Equiformer: Equivariant Graph Attention Transformer for 3D...

Flow Matching on General Geometries

Geometric Clifford Algebra Networks

Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics

Equivariant Architectures for Learning in Deep Weight Spaces