RT @haider1: Der Grund, warum Anthropic "Mythos" noch immer im Labor eingesperrt hΓ€lt:
mehr auf Arint.info
#AI #AIHumor #Anthropic #MachineLearning #Mythos #arint_info
<p>RT @haider1: Der Grund, warum Anthropic "Mythos" noch immer im Labor eingesperrt hΓ€lt:</p> <p><a href="https://arint.info/@Arint/116655871166811283">mehr</a> auf <a href="https://arint.info/">Arint.info</a></p> <p>#AI #AIHumor #Anthropic #MachineLearning #Mythos #arint_info</p> <p><a href="https://x.com/haider1/status/2059721299250098262#m">https://x.com/haider1/status/2059721299250098262#m</a></p>

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated context and updates the fast weights in its state-space model (SSM) blocks through a learned local rule. During inference, this shifts extra computation to sleep while preserving the latency of wake-time prediction. We test our method on controlled synthetic tasks, including cellular automata and multi-hop graph retrieval, as well as a realistic math reasoning task, on which a regular transformer as well as SSM-attention hybrid models fail. We then show that increasing sleep duration $N$ for our models improves performance, with the largest gains on examples that require deeper reasoning.
My #Ai getting sarcastic and snarky.
I have a standing directive on my model to counter natural built-in tendency of models to please the user ( #aisycophancy )
This is just one of the aspects of what I mean when I say "AI is a learned skill"
And implicitly, most users are level 0 (untrained) AI operators.
And sometimes, the pushback creates amusing exchanges like this;
#llm #promptengineering #fronteermodels #aihumor