Hans Pinckaers

92 Followers
212 Following
89 Posts
I like training neural networks with gigapixel images
At workMachine Learning Scientist
At lifeDad of two small monkeys
At nightPhD candidate in computational pathology
Websitehttp://hanspinckaers.com

Interesting developments in subquadratic alternatives to self-attention based transformers for large sequence modeling (32k and more).

Hyena Hierarchy: Towards Larger Convolutional Language Models

https://arxiv.org/abs/2302.10866

They propose to replace the quadratic self-attention layers by an operator built with implicitly parametrized long kernel 1D convolutions.

#DeepLearning #LLMs #PaperThread

1/4

Hyena Hierarchy: Towards Larger Convolutional Language Models

Recent advances in deep learning have relied heavily on the use of large Transformers due to their ability to learn at scale. However, the core building block of Transformers, the attention operator, exhibits quadratic cost in sequence length, limiting the amount of context accessible. Existing subquadratic methods based on low-rank and sparse approximations need to be combined with dense attention layers to match Transformers, indicating a gap in capability. In this work, we propose Hyena, a subquadratic drop-in replacement for attention constructed by interleaving implicitly parametrized long convolutions and data-controlled gating. In recall and reasoning tasks on sequences of thousands to hundreds of thousands of tokens, Hyena improves accuracy by more than 50 points over operators relying on state-spaces and other implicit and explicit methods, matching attention-based models. We set a new state-of-the-art for dense-attention-free architectures on language modeling in standard datasets (WikiText103 and The Pile), reaching Transformer quality with a 20% reduction in training compute required at sequence length 2K. Hyena operators are twice as fast as highly optimized attention at sequence length 8K, and 100x faster at sequence length 64K.

arXiv.org

This "texture healing" technique is really impressive. It uses OpenType’s contextual alternate glyphs, usually seen in stylistic embellishments such as swash caps or ligatures. With Monaspace, they dynamically adjust the widths of characters in a MONOSPACED face, while still respecting the monospace grid. It's pretty wild. The website has a neat interactive step-by-step visualization on how it works. Also, comes in 5 faces!
https://monaspace.githubnext.com

via @idan & @rileycran
#Typography #FontDesign

Monaspace

An innovative superfamily of fonts for code

Aqua Slack

This is fun. A friend of mine pioneered the "add hundreds of keywords to the PDF footer in 1-point white text to thwart keyword filters" approach that I'm sure is common knowledge now, but this is a delightful levelling-up of that venerable technique.

https://kai-greshake.de/posts/inject-my-pdf/

Inject My PDF: Prompt Injection for your Resume

To escape a deluge of generated content, companies are screening your resumes and documents using AI. But there is a way you can still stand out and get your dream job: Prompt Injection. This website allows you to inject invisible text into your PDF that will make any AI language model think you are the perfect candidate for the job. You can also use this tool to get a language model to give you an arbitrary summary of your document.

Kai's Blog
I continue to review biomedical #machinelearning papers that do not do HP optimization, do not provide CIs for metrics, misuse CV, don't address data cleaning, don't compare to simpler methods such as regression, don't provide interpretation, etc.... #datascience

The Wobbly Table Theorem might be the most useful I know in everyday life. Works every time! (I guess that's the thing about theorems! 😂)

https://arxiv.org/abs/math/0511490

tldr: if your table is wobbly, rotate it around its central axis. At some point, as if by magic, all four feet will be firmly planted on the ground.

Amazingly, this theorem does not seem to have a Wikipedia page.

Mathematical table turning revisited

We investigate under which conditions a rectangular table can be placed with all four feet touching a continuous ground by turning it on the spot.

arXiv.org
I'm SO sorry but I had to #design
Can we please pronounce it “ten”
One of my favorite internet stories: "We can't send email more than 500 miles" https://kottke.org/12/05/we-cant-send-email-more-than-500-miles
“We can’t send email more than 500 miles”

Still one of my favorite internet things: The case of the 500-mile email. I was working in a job running the campus email system

kottke.org

Every major tool that we use for our work now is built using web technology and they're all just awful software to use and interact with.

We as software developers seriously fucked up at some point — we've eschewed great products and services for mediocrity at a scale that's impossible to roll back.

Slack is “good enough”. Notion is "good enough”. Discord is “good enough”. They're all “good enough" to secure funding and deploy the enshittification parachute on their way down.