9 Followers
78 Following
17 Posts
Interests in efficient NLP and TTS

On the large AI models, this preprint synthesises what we know so far https://arxiv.org/abs/2209.15259

In short: it is mathematically impossible to have AIs combining the following properties:

1) High number of parameters
2) Robustness to poisoning (e.g. fake data)
3) Privacy-preserving

On the Impossible Safety of Large AI Models

Large AI Models (LAIMs), of which large language models are the most prominent recent example, showcase some impressive performance. However they have been empirically found to pose serious security issues. This paper systematizes our knowledge about the fundamental impossibility of building arbitrarily accurate and secure machine learning models. More precisely, we identify key challenging features of many of today's machine learning settings. Namely, high accuracy seems to require memorizing large training datasets, which are often user-generated and highly heterogeneous, with both sensitive information and fake users. We then survey statistical lower bounds that, we argue, constitute a compelling case against the possibility of designing high-accuracy LAIMs with strong security guarantees.

arXiv.org

Two new-to-me terms: sycophancy and sandbagging:

> More capable models can better recognize the specific circumstances under which they are trained. Because of this, they are more likely to learn to act as expected in precisely those circumstances while behaving competently but unexpectedly in others. This can surface in the form of problems that Perez et al. (2022) call sycophancy, where a model answers subjective questions in a way that flatters their user’s stated beliefs ...

The most popular Arxiv link yesterday (via _akhaliq@twitter):

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

abs: https://t.co/q2mCZsCe4g
github: https://t.co/wCcDm5a8Fi https://t.co/AKMu7IlByp

https://twitter.com/_akhaliq/status/1623135186442485760

Hard Prompts Made Easy: Gradient-Based Discrete Optimization for Prompt Tuning and Discovery

The strength of modern generative models lies in their ability to be controlled through text-based prompts. Typical "hard" prompts are made from interpretable words and tokens, and must be hand-crafted by humans. There are also "soft" prompts, which consist of continuous feature vectors. These can be discovered using powerful optimization methods, but they cannot be easily interpreted, re-used across models, or plugged into a text-based interface. We describe an approach to robustly optimize hard text prompts through efficient gradient-based optimization. Our approach automatically generates hard text-based prompts for both text-to-image and text-to-text applications. In the text-to-image setting, the method creates hard prompts for diffusion models, allowing API users to easily generate, discover, and mix and match image concepts without prior knowledge on how to prompt the model. In the text-to-text setting, we show that hard prompts can be automatically discovered that are effective in tuning LMs for classification.

arXiv.org

If anyone is wondering if Bing+GPT is somehow going to be better than Google Bard, remember that both LLM systems will be dependent on the underlying search algorithms.

Here is Bing (prior to GPT integration) doing the wrong thing. Bing search is retrieving articles about how Google summary boxes are wrong and then giving the wrong answer.

Both will have the same failure modes. The only difference is that Google stepped in it first.

Source: https://twitter.com/stilgherrian/status/1623576572015050753

Stilgherrian on Twitter

“Well this is going pretty much as expected.”

Twitter

amazing on so many levels:

1. naivete of the researcher who published this paper

2. that the first commenter on hacker news correctly identified what's wrong with the reasoning behind the paper

3. that the rest of the thread on HN is just nerds who don't have the foggiest notion of what a 'mind' is arguing whether a stochastic parrot that has memorized and regurgitates texts written by beings with one, itself has a mind

https://news.ycombinator.com/item?id=34730365

Theory of Mind May Have Spontaneously Emerged in Large Language Models | Hacker News

At first I felt bad about the game of "jailbreaking" #LLM and revealing their secret instructions. But on second thought, we've been playing this game since Case freed Wintermute and it's probably too seductive for us to stop. This dialogue, revealing that Bing's secret name for itself is "Sydney," comes from https://twitter.com/kliu128/status/1623472922374574080
Kevin Liu on Twitter

“The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.)”

Twitter
"Trading Information between Latents in Hierarchical Variational Autoencoders. (arXiv:2302.04855v1 [stat.ML])" — A generalization of VAEs to application domains beyond generative modeling (e.g., representation learning, clustering, or lossy data compression) by introducing an objective function that allows practitioners to trade off between the information content ("bit rate") of the latent representation and the distortion of reconstructed data.

Paper: http://arxiv.org/abs/2302.04855

#AI #CV #NewPaper #DeepLearning #MachineLearning

<<Find this useful? Please boost so that others can benefit too 🙂>>
Left: trade-off between perform…
Trading Information between Latents in Hierarchical Variational Autoencoders

Variational Autoencoders (VAEs) were originally motivated (Kingma & Welling, 2014) as probabilistic generative models in which one performs approximate Bayesian inference. The proposal of $β$-VAEs (Higgins et al., 2017) breaks this interpretation and generalizes VAEs to application domains beyond generative modeling (e.g., representation learning, clustering, or lossy data compression) by introducing an objective function that allows practitioners to trade off between the information content ("bit rate") of the latent representation and the distortion of reconstructed data (Alemi et al., 2018). In this paper, we reconsider this rate/distortion trade-off in the context of hierarchical VAEs, i.e., VAEs with more than one layer of latent variables. We identify a general class of inference models for which one can split the rate into contributions from each layer, which can then be tuned independently. We derive theoretical bounds on the performance of downstream tasks as functions of the individual layers' rates and verify our theoretical findings in large-scale experiments. Our results provide guidance for practitioners on which region in rate-space to target for a given application.

arXiv.org

Actually cannot believe this. After 13 years, Sony/BMG have decided to take down Rick Astley's "Never gunna give you up" due to a dispute with Youtube over ad royalties.

It's completely blocked globally. Actual end of an era.

https://youtu.be/dQw4w9WgXcQ

Rick Astley - Never Gonna Give You Up (Official Video) (4K Remaster)

Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

YouTube
"Will it run on a standard laptop a student could afford?" is an underrated metric in #NLProc