I ran Gemma 4 as a local model in Codex CLI
https://blog.danielvaughan.com/i-ran-gemma-4-as-a-local-model-in-codex-cli-7fda754dc0d4
I ran Gemma 4 as a local model in Codex CLI
https://blog.danielvaughan.com/i-ran-gemma-4-as-a-local-model-in-codex-cli-7fda754dc0d4
1 token ahead or 2?
It's interesting - imo we'll soon have draft models specifically post-trained for denser, more complicated models. Wouldn't be surprised if diffusion models made a comeback for this - they can draft many tokens at once, and learning curves seem to top out at 90+% match for auto-regressive ones so quite interesting..
Hey - I use the same, w/ both gemma4 and gpt-oss-*; some things I have to do for a good experience:
1) Pin to an earlier version of codex (sorry) - 0.55 is the best experience IME, but YMMV (see https://github.com/openai/codex/issues/11940, https://github.com/openai/codex/issues/8272).
2) Use the older completions endpoint (llama.cpp's responses support is incomplete - https://github.com/ggml-org/llama.cpp/issues/19138)