Mariano Kamp

37 Followers
269 Following
7 Posts

I'm thrilled to announce that I've just published my first article on Medium. Naturally, it covers some of my favorite topics—Large Language Models (LLMs), specifically Performance Efficient Finetuning (PEFT) and LoRA.

#LLM #GenAI #PEFT #AWS

https://medium.com/@mkamp/dive-into-lora-adapters-38f4da488ede

@lb Also, while listening to a podcast with Tri Dao I picked up on delays due to memory transfer not being reflected by FLOPS, but by wall clock time (e.g. flash attention). (And yes, using this abandoned Mastodon feed as a notebook now).
Lucas Beyer on Twitter

“I beg the community to please stop using parameters as x axis. It is *especially* meaningless for ViT-style models: B/32 has *more* params than B/16, but is faster, less capacity, and performs worse. Use img/s ideally, or flops if need. (Not singling this paper, so so many!)”

Twitter
@lb About the head, lower capacity/lack of non-linearity does not lead to worse results. I have even seen better results when trimming the BERT Pooler. What could be the intuition behind that?
Maybe making the head linear makes it harder to learn something in the head, even so it is conveniently close to the error. Instead then learning in the more distant (and more powerful) transformer blocks becomes more attractive?

@lb thanks. Much appreciated that you took the time.

You referred to reporting wall clock time. As a normalizer, like FLOPS?

@vowe mir fehlt immer noch eine automatische Timeline. Aber mittlerweile kann ich auch dem chronologischen Modell was abgewinnen. Ist wie Tagesschau. Man kriegt alle News. Nicht nur die, die einen interessieren. Das weitet die Sicht.
@vowe Ich habe einen Kaminofen. Vielleicht sollte ich mal so ein Eve Room anschaffen ...