Mastodawn

Qwen3.6-Plus: Towards Real World Agents

Qwen

Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

Show thread

jgbuddy 5d ago

Worth noting that this model, unlike almost all qwen models, is not open-weight, nor is the parameter count exposed. Also odd that it is compared against opus 4.5 even though 4.6 was released like 2 months ago.

Show thread

pferdone 5d ago

They said in the last paragraph[0]:

"[...] In the coming days, we will also open-source smaller-scale variants, reaffirming our commitment to accessibility and community-driven innovation. [...]"

[0] https://qwen.ai/blog?id=qwen3.6#summary--future-work

Qwen

Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.

Show thread

deaux 5d ago

> we will also open-source smaller-scale variants

In other words, like GP said, this Qwen3.6-Plus model is not open-weight unlike the other Qwen models.

Show thread

pferdone 5d ago

> unlike almost all qwen models

Almost all means there have been ones before that were not open. So, no contradiction there.

Show thread

kennywinker 5d ago

> unlike the other Qwen models

Please send the download link for qwen 3.5-plus.

Also, who cares? If you have the hardware to run a ~400b model i don’t think you count as a home user anymore.

Show thread

dgb23 5d ago

In a practical sense, I'm primarily interested in small to medium sized models being open. I think that might be common sentiment.

However, my hope is that there will be at least somewhat competitive big and open models as well, from an ethical/ideological perspective. These things were trained on data that was provided by people without their consent, so they should at least be be publicly accessible or even public domain.

Show thread

thepasch 5d ago

Qwen3.5-Plus is the largest variant of the open weight Qwen3.5 model, expanded with a 1M context window and fine-tuned on the Qwen-native harness’ specific tools.

Show thread

zozbot234 5d ago

I wouldn't say "almost all" seeing as -MAX and -Omni models were always closed.

Show thread

cpburns2009 5d ago

If Opus 4.6 was only released two months ago, then it seems reasonable that Qwen hasn't finished fully comparing against the latest Opus.

Show thread

Aurornis 5d ago

This is their hosted-only model, not an open weight model like they’ve become known for. They got a lot of good publicity for their open weight model releases, which was the goal. The hard part is pivoting from an open weight provider to being considered as a competitor to Claude and ChatGPT. Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.

Comparing to Opus 4.5 instead of the current 4.6 and other last-gen models is clearly an attempt to deceive, which isn’t winning them any points either.

I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper. I don’t know how successful they’ll be in the race to the bottom in this market niche, though. Most users of cheap API tokens are not loyal to any brand and will change providers overnight each time someone releases a slightly better model.

Show thread

cubefox 5d ago

> I think there is a moderately large market for models like this that aren’t quite SOTA level but can be served up much cheaper.

There isn't, pretty much everyone wants the best of the best.

Show thread

Someone1234 5d ago

> There isn't, pretty much everyone wants the best of the best.

For direct user interaction or coding problems, perhaps. But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets, or as sub-agents called from expensive SOTA models.

For example, in Claude, using Opus as an orchestrator to call Sonnet sub-agents, is a popular usage "hack." That only gets more powerful, as the Sonnet equivalent model gets cheaper. Now you can spawn entire teams of small specialized sub-agents with small context windows but limited scope.

Show thread

thinkcontext 5d ago

> But as API calls get cheaper, it becomes more realistic to use them for completely automated workflows against data-sets

Seems like a huge waste of money and electricity for processes that can be implemented as a traditional deterministic program. One would hope that tools would identify recurrent jobs that can be turned into simple scripts.

Show thread

alexsmirnov 5d ago

Exactly.

I did create my own MCP with custom agents that combine several tools into a single one. For example, all WebSearch, WebFetch, Context7 exposed as a single "web research" tool, backed by the cheapest model that passes evaluation. The same for a codebase research

Use it with both Claude and Opencode saves a lot of time and tokens.

Show thread

joefourier 5d ago

Ever hit your daily limit on Claude Code and saw how expensive it is to pay per token?

Show thread

PhilippGille 5d ago

The OpenRouter usage stats indicate the opposite: https://openrouter.ai/rankings?view=month

LLM Rankings | OpenRouter

LLM rankings and leaderboard based on real usage data from millions of users. See which AI models developers actually use.

OpenRouter

Show thread

jjice 5d ago

OpenRouter usage is likely skewed towards LLMs that are more niche and/or self-hostable by solid hardware that's available, but most consumers don't have on hand. I can imagine Anthropic and OpenAI LLMs often get called directly from their APIs instead.

At least from my experience and friends of mine, we use OpenRouter for cases where we want to use smaller LLMs like Qwen, but when I've used ChatGPT and Claude, I use those APIs directly.

Show thread

senordevnyc 5d ago

Same, and my little SaaS is pushing more than 0.1% of the TOTAL volume of tokens on OpenRouter, so the reality is they’re TINY.

Show thread

vorticalbox 5d ago

what happened around jan this year(26) that caused such a climb in usage?

Show thread

wongarsu 5d ago

For coding I want the best. Both I and $work do lots of things besides coding where smaller models like qwen3.5-27b work great, at much lower cost.

Show thread

thraxil 5d ago

No. Right now I'm upset that Google has removed (or at least is in the process of removing) the Gemini 2.0 flash model. We use it for some pretty basic functionality because it's cheap and fast and honestly good enough for what we use it for in that part of our app. We're being forced to "upgrade" to models that are at least 2.5 times as expensive, are slower and, while I'm sure they're better for complex tasks, don't do measurably better than 2.0 flash for what we need. Yay. We've stuck with the GCP/Gemini ecosystem up until now, but this is kind of forcing us to consider other LLM providers.

Show thread

zozbot234 5d ago

> not an open weight model like they’ve become known for.

Right, they state that they'll release "smaller" variants openly at some point, with few details as to what that means. Will there be a ~300B variant as with Qwen 3.5? The blog post doesn't say.

Show thread

dev_l1x_be 5d ago

How stupid somebody has to be to mix up Opus with Qwen?

Show thread

jstummbillig 5d ago

> Initial reactions are mostly anger from everyone who didn’t realize that the play along was to give away the smaller models as advertising, not because they were feeling generous.

The naivety around this has been staggering quite frankly. All of a sudden, people thinking that meta etc are releasing free models because they believe in open access and distribution of knowledge. No, they just suck comparatively. There is nothing to sell. Using it to recruit and generate attention is the best play for them.

Show thread

furyofantares 5d ago

I'll diverge from some of these comments, I don't find it misleading to compare to Opus 4.5.

I can remember how good Opus 4.5 was. If I'm considering using this, it's most informative to me to compare to the model it's closest to that I have familiarity with.

I'm obviously not switching to this if I want the best model. I'm switching if I'm hopeful that the smaller versions are close to it, or if I want to have more options for providers, or for any other reasons unrelated to getting the highest quality responses possible.

Show thread

bensyverson 5d ago

Exactly this. If you can get something close to Opus 4.5 for free, that's noteworthy. I may not use it for the most critical pieces of my app, but not everything I do is galaxy-brain coding.

Show thread

Alifatisk 5d ago

I understand peoples reactions of Qwen team comparing against Opus 4.5 instead of 4.6. And them comparing against Gemini Pro 3.0 instead of 3.1. But calling it misleading is a bit of stretch in my eyes, people here are acting like we immediately forgot how previous generations performed just because a new version is released.

This field is going in a incredible pace, the providers release a new model every quarter or so. The amount of criticism is a bit overblown in my opinion. The benchmarks still look very good to me. I’ve used GLM-5 (latest is GLM-5.1) and Kimi K2.5, they are decent and gets the job done, so seeing how this model of Qwen performs compared to it is kinda impressive.

Also, why are so many pointing out the fact that this model is not open-weight as if this is their first time doing so. Qwen-3.5-plus, Qwen-3-Max is also closed source. This is not something new.

I think Qwen trying to catch up to the SOTA models is still healthy for us, the consumers. Sure, its sad news that this version is closed-weight, but I won’t downplay their progress.

Show thread

nickvec 5d ago

I think it’s more the principle of deception that upsets people. Imagine if Apple released a new iPhone and publicly compared its specs to some previous gen Android. It’s not in good faith.

Show thread

Alifatisk 5d ago

Why are we so quick to call it deception? Their figure is quite clear. They aren't fiddling with the graph or hiding the labels, they are clearly stating which models it compares against. But I agree on the sentiment that the standard practice should be to bench against the latest SOTA models.

Show thread

simonw 5d ago

Pretty solid Pelican: https://gist.github.com/simonw/ca081b679734bc0e5997a43d29fad...

I used the https://modelstudio.alibabacloud.com/ API to generate that one, which required signing up for an account and attaching PayPal billing - but it looks like OpenRouter are offering it for free right now so I could have used that: https://openrouter.ai/qwen/qwen3.6-plus:free

output.txt

GitHub Gist: instantly share code, notes, and snippets.

Gist