2.2K Followers
0 Following
1,077 Posts

A timeline of the latest AI models for audio generation, starting in 2023

github: https://github.com/archinetai/audio-ai-timeline

GitHub - archinetai/audio-ai-timeline: A timeline of the latest AI models for audio generation, starting in 2023!

A timeline of the latest AI models for audio generation, starting in 2023! - GitHub - archinetai/audio-ai-timeline: A timeline of the latest AI models for audio generation, starting in 2023!

GitHub

RT @[email protected]

Text-to-motion is a thing now 🤯 https://huggingface.co/spaces/vumichien/generate_human_motion
This demo is built as part of our community sprint where we build demos to cutting edge models, you can join us in discord here 👉 http://hf.co/join/discord

🐦🔗: https://twitter.com/mervenoyann/status/1620387099672473600

Generate Human Motion - a Hugging Face Space by vumichien

Discover amazing ML apps made by the community

AK on Twitter

“Looped Transformers as Programmable Computers abs: https://t.co/wZTUGiY7vk”

Twitter

RT @[email protected]

Thank you, AK!
The contrastive language-audio pretraining latents enable AudioLDM to learn regenerating audio in training stage, while realizing text to audio in sampling stage. AudioLDM is trained on a single GPU, and advantageous in sample quality and audio manipulation. https://twitter.com/_akhaliq/status/1620239832856363009

🐦🔗: https://twitter.com/ZehuaChenICL/status/1620258287987077121

AK on Twitter

“AudioLDM: Text-to-Audio Generation with Latent Diffusion Models abs: https://t.co/G6568wgwky project page: https://t.co/L1jLVcPTdz”

Twitter

RT @[email protected]

@[email protected] 16GB of memory ought to be enough for anybody.

🐦🔗: https://twitter.com/minimaxir/status/1620286240007528450

Max Woolf on Twitter

“@_akhaliq 16GB of memory ought to be enough for anybody.”

Twitter
when you Memory limit exceeded (16G) 😢

Sample Efficient Deep Reinforcement Learning via Local Planning

abs: https://arxiv.org/abs/2301.12579

Sample Efficient Deep Reinforcement Learning via Local Planning

The focus of this work is sample-efficient deep reinforcement learning (RL) with a simulator. One useful property of simulators is that it is typically easy to reset the environment to a previously observed state. We propose an algorithmic framework, named uncertainty-first local planning (UFLP), that takes advantage of this property. Concretely, in each data collection iteration, with some probability, our meta-algorithm resets the environment to an observed state which has high uncertainty, instead of sampling according to the initial-state distribution. The agent-environment interaction then proceeds as in the standard online RL setting. We demonstrate that this simple procedure can dramatically improve the sample cost of several baseline RL algorithms on difficult exploration tasks. Notably, with our framework, we can achieve super-human performance on the notoriously hard Atari game, Montezuma's Revenge, with a simple (distributional) double DQN. Our work can be seen as an efficient approximate implementation of an existing algorithm with theoretical guarantees, which offers an interpretation of the positive empirical results.

arXiv.org

A theory of continuous generative flow networks

abs: https://arxiv.org/abs/2301.12594
github: https://github.com/saleml/continuous-gfn

A theory of continuous generative flow networks

Generative flow networks (GFlowNets) are amortized variational inference algorithms that are trained to sample from unnormalized target distributions over compositional objects. A key limitation of GFlowNets until this time has been that they are restricted to discrete spaces. We present a theory for generalized GFlowNets, which encompasses both existing discrete GFlowNets and ones with continuous or hybrid state spaces, and perform experiments with two goals in mind. First, we illustrate critical points of the theory and the importance of various assumptions. Second, we empirically demonstrate how observations about discrete GFlowNets transfer to the continuous case and show strong results compared to non-GFlowNet baselines on several previously studied tasks. This work greatly widens the perspectives for the application of GFlowNets in probabilistic inference and various modeling settings.

arXiv.org
Adaptive Computation with Elastic Input Sequence

Humans have the ability to adapt the type of information they use, the procedure they employ, and the amount of time they spend when solving problems. However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downstream usage of these models but can also serve as a powerful inductive bias for solving certain challenging classes of problems. In this work, we introduce a new approach called AdaTape, which allows for dynamic computation in neural networks through adaptive tape tokens. AdaTape utilizes an elastic input sequence by equipping an architecture with a dynamic read-and-write tape. Specifically, we adaptively generate input sequences using tape tokens obtained from a tape bank which can be either trainable or derived from input data. We examine the challenges and requirements to obtain dynamic sequence content and length, and propose the Adaptive Tape Reading (ATR) algorithm to achieve both goals. Through extensive experiments on image recognition tasks, we show that AdaTape can achieve better performance while maintaining the computational cost. To facilitate further research, we have released code at https://github.com/google-research/scenic.

arXiv.org

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

abs: https://arxiv.org/abs/2301.13156
github: https://github.com/fudan-zvg/SeaFormer

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation

Since the introduction of Vision Transformers, the landscape of many computer vision tasks (e.g., semantic segmentation), which has been overwhelmingly dominated by CNNs, recently has significantly revolutionized. However, the computational cost and memory requirement render these methods unsuitable on the mobile device, especially for the high-resolution per-pixel semantic segmentation task. In this paper, we introduce a new method squeeze-enhanced Axial TransFormer (SeaFormer) for mobile semantic segmentation. Specifically, we design a generic attention block characterized by the formulation of squeeze Axial and detail enhancement. It can be further used to create a family of backbone architectures with superior cost-effectiveness. Coupled with a light segmentation head, we achieve the best trade-off between segmentation accuracy and latency on the ARM-based mobile devices on the ADE20K and Cityscapes datasets. Critically, we beat both the mobile-friendly rivals and Transformer-based counterparts with better performance and lower latency without bells and whistles. Beyond semantic segmentation, we further apply the proposed SeaFormer architecture to image classification problem, demonstrating the potentials of serving as a versatile mobile-friendly backbone.

arXiv.org