Mastodawn

Popular ML resources May 22, 2023

The most popular Arxiv link during the weekend (via ymatias@twitter):

How did our medical #LLM, Med-PaLM 2, become the first to perform at “expert” level on U.S. Medical Licensing Exam-style questions? Check out our new paper: https://t.co/dyMoJVSJyE https://t.co/nM6pc3URvu

https://twitter.com/ymatias/status/1659730065515577346

Popular ML resources May 20, 2023

The most popular Arxiv link yesterday (via ShunyuYao12@twitter):

Still use ⛓️Chain-of-Thought (CoT) for all your prompting? May be underutilizing LLM capabilities🤠

Introducing 🌲Tree-of-Thought (ToT), a framework to unleash complex & general problem solving with LLMs, through a deliberate ‘System 2’ tree search.

https://t.co/V6hjbUNjbt https://t.co/IaZZdVeaTW

https://twitter.com/ShunyuYao12/status/1659357547474681857

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/princeton-nlp/tree-of-thought-llm.

arXiv.org

Popular ML resources May 19, 2023

The most popular Arxiv link yesterday (via tlngy@twitter):

I’m stoked to release my PhD work 🥳
https://t.co/43aCABWL8u

We found that neutrophils swell (they drink H2O💧!) when they start to move. We used a genome-wide screen to figure out how they swell and found that stopping it makes cells much slower!

🐇-💧=🐢

🐢immune cell = 🤒 https://t.co/jumx18kozI

https://twitter.com/tlngy/status/1658964520608382976

Popular ML resources May 18, 2023

The most popular Arxiv link yesterday (via AziziShekoofeh@twitter):

Excited to share our new pre-print on "Med-PaLM 2", highlighting our progress towards achieving expert-level medical question answering performance!

Link: https://t.co/12rBwQSnqP https://t.co/WhOk45mZ8Q

https://twitter.com/AziziShekoofeh/status/1658659343087304705

Popular ML resources May 17, 2023

The most popular Arxiv link yesterday (via stevenhoi@twitter):

Introducing 🔥CodeT5+🔥, a new family of open-source code LLMs for both code understanding and generation, achieved new SoTA code generation performance on HumanEval, surpassing all the open-source code LLMs.

Paper: https://t.co/apxl03WvNc
Code: https://t.co/nHTaIGIEmm

(1/n) https://t.co/YR7cp2P2Yn

https://twitter.com/stevenhoi/status/1658270266424975361

Popular ML resources May 16, 2023

The most popular Arxiv link yesterday (via _akhaliq@twitter):

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

abs: https://t.co/yqzduI4EpP
paper page: https://t.co/h6pM6L0UIv https://t.co/S2kPHVkD6m

https://twitter.com/_akhaliq/status/1657919671092539394

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books. We proposed Megabyte, a multi-scale decoder architecture that enables end-to-end differentiable modeling of sequences of over one million bytes. Megabyte segments sequences into patches and uses a local submodel within patches and a global model between patches. This enables sub-quadratic self-attention, much larger feedforward layers for the same compute, and improved parallelism during decoding -- unlocking better performance at reduced cost for both training and generation. Extensive experiments show that Megabyte allows byte-level models to perform competitively with subword models on long context language modeling, achieve state-of-the-art density estimation on ImageNet, and model audio from raw files. Together, these results establish the viability of tokenization-free autoregressive sequence modeling at scale.

arXiv.org

Popular ML resources May 15, 2023

The most popular Arxiv link during the weekend (via _akhaliq@twitter):

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

abs: https://t.co/TeafvDiMmW
paper page: https://t.co/1TJeGslkhZ https://t.co/qfQH8GmQ79

https://twitter.com/_akhaliq/status/1657806182306676737

HACK: Learning a Parametric Head and Neck Model for High-fidelity Animation

Significant advancements have been made in developing parametric models for digital humans, with various approaches concentrating on parts such as the human body, hand, or face. Nevertheless, connectors such as the neck have been overlooked in these models, with rich anatomical priors often unutilized. In this paper, we introduce HACK (Head-And-neCK), a novel parametric model for constructing the head and cervical region of digital humans. Our model seeks to disentangle the full spectrum of neck and larynx motions, facial expressions, and appearance variations, providing personalized and anatomically consistent controls, particularly for the neck regions. To build our HACK model, we acquire a comprehensive multi-modal dataset of the head and neck under various facial expressions. We employ a 3D ultrasound imaging scheme to extract the inner biomechanical structures, namely the precise 3D rotation information of the seven vertebrae of the cervical spine. We then adopt a multi-view photometric approach to capture the geometry and physically-based textures of diverse subjects, who exhibit a diverse range of static expressions as well as sequential head-and-neck movements. Using the multi-modal dataset, we train the parametric HACK model by separating the 3D head and neck depiction into various shape, pose, expression, and larynx blendshapes from the neutral expression and the rest skeletal pose. We adopt an anatomically-consistent skeletal design for the cervical region, and the expression is linked to facial action units for artist-friendly controls. HACK addresses the head and neck as a unified entity, offering more accurate and expressive controls, with a new level of realism, particularly for the neck regions. This approach has significant benefits for numerous applications and enables inter-correlation analysis between head and neck for fine-grained motion synthesis and transfer.

arXiv.org

Popular ML resources May 13, 2023

The most popular Arxiv link yesterday (via DulwichQuantum@twitter):

A breakthrough result that finally resolves the longstanding author ordering problem.
https://t.co/jIMT9f1FBZ https://t.co/waMysbMWXa

https://twitter.com/DulwichQuantum/status/1656980286993997824

Every Author as First Author

We propose a new standard for writing author names on papers and in bibliographies, which places every author as a first author -- superimposed. This approach enables authors to write papers as true equals, without any advantage given to whoever's name happens to come first alphabetically (for example). We develop the technology for implementing this standard in LaTeX, BibTeX, and HTML; show several examples; and discuss further advantages.

arXiv.org

Popular ML resources May 12, 2023

The most popular Arxiv link yesterday (via _akhaliq@twitter):

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

abs: https://t.co/4XfU1XvlkL
paper page: https://t.co/JYSJsbQmxN https://t.co/f3vS9evtzI

https://twitter.com/_akhaliq/status/1656467161693519873

Relightify: Relightable 3D Faces from a Single Image via Diffusion Models

Following the remarkable success of diffusion models on image generation, recent works have also demonstrated their impressive ability to address a number of inverse problems in an unsupervised way, by properly constraining the sampling process based on a conditioning input. Motivated by this, in this paper, we present the first approach to use diffusion models as a prior for highly accurate 3D facial BRDF reconstruction from a single image. We start by leveraging a high-quality UV dataset of facial reflectance (diffuse and specular albedo and normals), which we render under varying illumination settings to simulate natural RGB textures and, then, train an unconditional diffusion model on concatenated pairs of rendered textures and reflectance components. At test time, we fit a 3D morphable model to the given image and unwrap the face in a partial UV texture. By sampling from the diffusion model, while retaining the observed texture part intact, the model inpaints not only the self-occluded areas but also the unknown reflectance components, in a single sequence of denoising steps. In contrast to existing methods, we directly acquire the observed texture from the input image, thus, resulting in more faithful and consistent reflectance estimation. Through a series of qualitative and quantitative comparisons, we demonstrate superior performance in both texture completion as well as reflectance reconstruction tasks.

arXiv.org

Popular ML resources May 11, 2023

The most popular Arxiv link yesterday (via srush_nlp@twitter):

Pretraining without Attention (https://t.co/Y2P32MheXc) - BiGS is alternative to BERT trained on up to 4096 tokens.

Attention can be overkill. Below shows *every* word-word interaction for every sentence over 23 layers of BiGS (no heads, no n^2). https://t.co/bZVkaXH2r7

https://twitter.com/srush_nlp/status/1656303656004624386

Pretraining Without Attention

Transformers have been essential to pretraining success in NLP. While other architectures have been used, downstream accuracy is either significantly worse, or requires attention layers to match standard benchmarks such as GLUE. This work explores pretraining without attention by using recent advances in sequence routing based on state-space models (SSMs). Our proposed model, Bidirectional Gated SSM (BiGS), combines SSM layers with a multiplicative gating architecture that has been effective in simplified sequence modeling architectures. The model learns static layers that do not consider pair-wise interactions. Even so, BiGS is able to match BERT pretraining accuracy on GLUE and can be extended to long-form pretraining of 4096 tokens without approximation. Analysis shows that while the models have similar average accuracy, the approach has different inductive biases than BERT in terms of interactions and syntactic representations. All models from this work are available at https://github.com/jxiw/BiGS.

arXiv.org