Emre Sokullu

33 Followers
68 Following
273 Posts

Benchmarked 8 VLMs for screenshot grounding inside a browser agent.

Counterintuitive finding: Qwen 3.5-9B classifies a dropdown affordance that 308B MiMo V2.5 misses. Affordance doesn't scale with parameter count.

Only 1 of 8 (Qwen 3.6-35B-A3B) flags honest uncertainty in calibration.

Full write-up + VRAM tier recs + repo:

https://webbrain.one/blog

#LocalLLM #VLM #AIAgents #OpenSource #Qwen #qwen36 #MachineLearning #FOSS #AI

WebBrain Blog

Engineering notes from WebBrain — the open-source AI browser agent.

OpenAI still produces the best foundational models, but it's behind Anthropic in products, Google in infrastructure, and Alibaba in open-source.
Intel/Qwen3.6-27B-int4-AutoRound is out!
Pls star https://github.com/esokullu/webbrain/ -- it's worth it :)
GitHub - esokullu/webbrain: Open-source AI browser agent for Chrome and Firefox

Open-source AI browser agent for Chrome and Firefox - esokullu/webbrain

GitHub

What if the so-called Lazarus attack on aave is not Lazarus, but it's aave itself!? This helps them in multiple ways; (1) if they're expecting ETH to appreciate, this is their way to take back the ETH they've been lending. (2) for people still in the market, the interest rates just got skyrocketed!

Regardless, not a good look for aave's image as a reliable partner, and not good for DeFi either.

I benchmarked 3 local vision models on the same browser screenshot, using the exact prompt WebBrain's vision sub-call sends.

Both Qwens beat Gemma — despite Google's claim Gemma 4-31B > Qwen 3.5-27B.

Qwen3.6-35B-A3B wins:
→ 5.3s (beats dense 27B's 7.9s)
→ Only model OCRing small identifiers correctly
→ Only model using the prompt's "Unknowns" section honestly

Same VRAM as the 27B dense, better quality, lower latency. MoE earns its keep.

Full write-up + CLI probe:
https://www.webbrain.one/blog/vision-model-shootout

Four vision models, one screenshot — WebBrain benchmark

Gemma 4-E2B vs Gemma 4-31B vs Qwen3.5-27B vs Qwen3.6-35B-A3B on the same browser screenshot, same prompt. OCR, latency, and token cost — what actually matters for a browser agent.

Let's admit it. Open source models are no way close to the closed source frontier models. I see it first-hand testing all kinds of models on WebBrain. Kimi K2.6 (just announced) was supposed to be as good as Opus 4.6 on the paper, but testing it, it's nowhere to be close.

That's why tools like WebBrain and OpenCode are very important to close the gap between super smart closed models and open source ones.

Eventually we should have 8B models even smarter than Opus 4.7, but not yet.

WebBrain vs Claude Code

https://webbrain.one

I believe the notion that AI will eliminate pure software companies is absurd. While software may become commoditized, similar to the impact of open source, services will always remain in demand.
Trump is testing whether Iranians fear death. I don’t think they’ll blink.