Mastodawn

jeffmcjunkin 6d ago

Google releases Gemma 4 open models

https://deepmind.google/models/gemma/gemma-4/

Gemma 4

Gemma 4 is a family of open models, purpose-built for advanced reasoning and agentic workflows.

Google DeepMind

Show thread

chrislattner 6d ago

If you want the fastest open source implementation on Blackwell and AMD MI355, check out Modular's MAX nightly. You can pip install it super fast, check it out here:
https://www.modular.com/blog/day-zero-launch-fastest-perform...

-Chris Lattner (yes, affiliated with Modular :-)

Modular: Day Zero Launch: Fastest Performance for Gemma 4 on NVIDIA and AMD

Our benchmarks show 15% higher throughput when compared to vLLM on NVIDIA B200.

Show thread

nabakin

Faster than TensorRT-LLM on Blackwell? Or do you not consider TensorRT-LLM open source because some dependencies are closed source?

Show thread

melodyogonna 6d ago

I reviewed the TensorRT-LLM commit history from the past few days and couldn't find any updates regarding Gemma 4 support. By contrast, here is the reference for MAX:https://github.com/modular/modular/commit/57728b23befed8f3b4...

[MODELS] Add Gemma 4 architecture support · modular/modular@57728b2

MODULAR_ORIG_COMMIT_REV_ID: 0afdf2115c27095211140e29ba0ba81250e0b095

GitHub

Show thread

nabakin 6d ago

If OP meant they have the fastest implementation of Gemma 4 on Blackwell at the moment, I guess that is technically true. I doubt that will hold up when TensorRT-LLM finishes their implementation though.