Google releases Gemma 4 open models
Google releases Gemma 4 open models
Thinking / reasoning + multimodal + tool calling.
We made some quants at https://huggingface.co/collections/unsloth/gemma-4 for folks to run them - they work really well!
Guide for those interested: https://unsloth.ai/docs/models/gemma-4
Also note to use temperature = 1.0, top_p = 0.95, top_k = 64 and the EOS is "<turn|>". "<|channel>thought\n" is also used for the thinking trace!
Thank you for your work.
You have an answer on your page regarding "Should I pick 26B-A4B or 31B?", but can you please clarify if, assuming 24GB vRAM, I should pick a full precision smaller model or 4 bit larger model?
Thank you!
I presume 24B is somewhat faster since it's only 4B activated - 31B is quite a large dense model so more accurate!