133 Followers
103 Following
391 Posts
Builder of Software with head in AWS ๐ŸŒฉ๏ธ, hardware hacker
Twitterhttps://twitter.com/wolfeidau
Githubhttps://github.com/wolfeidau
Bloghttps://www.wolfe.id.au
Baked up some banana bread ๐ŸŒ๐Ÿž to have with coffee. #baking
Finished product ๐Ÿ˜‚๐Ÿง
They may look tired, but they are going to make a great banana cake. Been out bike riding today so I need some energy ๐Ÿ˜…๐Ÿšดโ€โ™€๏ธ๐Ÿง #baking #publicholiday

Digging into zip archiving libraries and I am astounded by the amazing work people do to make some of these libraries sing. ๐Ÿš€ ๐Ÿคฏ

https://github.com/saracen/fastzip

Benchmarks need to be taken with a grain of salt but still, respect. Also props for the simple API.

GitHub - saracen/fastzip: Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

Fastzip is an opinionated Zip archiver and extractor with a focus on speed. - saracen/fastzip

GitHub
TIL if you modify the configuration of the HTTP Transport in Go you need to enable this flag to use HTTP/2.0. https://github.com/golang/go/blob/master/src/net/http/transport.go#L290-L295 Trap for those customizing timeouts or TLS configuration. #golang ๐Ÿค”
go/src/net/http/transport.go at master ยท golang/go

The Go programming language. Contribute to golang/go development by creating an account on GitHub.

GitHub
Updating c code I wrote 2 years ago to work on new hardware. Made the incorrect choice to also update libraries, now reading the reference manual to understand error codes. ๐Ÿค”๐Ÿ˜ญ In the plus side the latest Bosch bsec library has way more validation... My dev environment is a shoe box. ๐Ÿ˜‚
I wonder if $nvidia will lower the memory of it's consumer cards now that you can get away with fine tuning a 7b parameter LLM on cards with 24gb of memory using GaLore ๐Ÿ˜Ž ๐Ÿค‘ #ai #nvidia https://arxiv.org/abs/2403.03507
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Training Large Language Models (LLMs) presents significant memory challenges, predominantly due to the growing size of weights and optimizer states. Common memory-reduction approaches, such as low-rank adaptation (LoRA), add a trainable low-rank matrix to the frozen pre-trained weight in each layer, reducing trainable parameters and optimizer states. However, such approaches typically underperform training with full-rank weights in both pre-training and fine-tuning stages since they limit the parameter search to a low-rank subspace and alter the training dynamics, and further, may require full-rank warm start. In this work, we propose Gradient Low-Rank Projection (GaLore), a training strategy that allows full-parameter learning but is more memory-efficient than common low-rank adaptation methods such as LoRA. Our approach reduces memory usage by up to 65.5% in optimizer states while maintaining both efficiency and performance for pre-training on LLaMA 1B and 7B architectures with C4 dataset with up to 19.7B tokens, and on fine-tuning RoBERTa on GLUE tasks. Our 8-bit GaLore further reduces optimizer memory by up to 82.5% and total training memory by 63.3%, compared to a BF16 baseline. Notably, we demonstrate, for the first time, the feasibility of pre-training a 7B model on consumer GPUs with 24GB memory (e.g., NVIDIA RTX 4090) without model parallel, checkpointing, or offloading strategies.

arXiv.org
Just got a new beagle board to mess with, looking forward to trying out some of the new features while learning more about AI at the edge. #ai #robotics
Back to zero days since I had to Jerry rig a serial connection to a dev board... Combination of some probe cables and a jst cable got me access to a shell, all to find out it was missing mdns I was so used to. #iot
Updating my beaglebone ai64 so I can do some updated tutorials using the onboard neural processor to recognize obstacles. Only took me an hour or so to figure out how to flash the new os image ๐Ÿ˜‚ #ai #hardware