Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
Show HN: 1-Bit Bonsai, the First Commercially Viable 1-Bit LLMs
1 bit with a FP16 scale factor every 128 bits. Fascinating that this works so well.
I tried a few things with it. Got it driving Cursor, which in itself was impressive - it handled some tool usage. Via cursor I had it generate a few web page tests.
On a monte carlo simulation of pi, it got the logic correct but failed to build an interface to start the test. Requesting changes mostly worked, but left over some symbols which caused things to fail. Required a bit of manual editing.
Tried a Simon Wilson pelican as well - very abstract, not recognizable at all as a bird or a bicycle.
Pictures of the results here: https://x.com/pwnies/status/2039122871604441213
There doesn't seem to be a demo link on their webpage, so here's a llama.cpp running on my local desktop if people want to try it out. I'll keep this running for a couple hours past this post: https://unfarmable-overaffirmatively-euclid.ngrok-free.dev

Played around with PrismML's 1bit model. https://t.co/mLfSL22gRd It uses 1 bit per parameter, and a FP16 scale factor for each group of 128 params. Cool demo - runs crazy fast. It's able to handle basic tool usage via cursor, but it's nowhere near usable. I rate it neat / 10
Thanks for sharing the link to your instance. Was blazing fast in responding. Tried throwing a few things at it with the following results:
1. Generating an R script to take a city and country name and finding it's lat/long and mapping it using ggmaps. Generated a pretty decent script (could be more optimal but impressive for the model size) with warnings about using geojson if possible
2. Generate a latex script to display the gaussian integral equation - generated a (I think) non-standard version using probability distribution functions instead of the general version but still give it points for that. Gave explanations of the formula, parameters as well as instructions on how to compile the script using BASH etc
3. Generate a latex script to display the euler identity equation - this one it nailed.
Strongly agree that the knowledge density is impressive for the being a 1-bit model with such a small size and blazing fast response
> Was blazing fast in responding.
I should note this is running on an RTX 6000 pro, so it's probably at the max speed you'll get for "consumer" hardware.
Does anyone know how to run this on CPU?
Do I need to build their llama.cpp fork from source?
Looks like they only offer CUDA options in the release page, which I think might support CPU mode but refuses to even run without CUDA installed. Seems a bit odd to me, I thought the whole point was supporting low end devices!