Mastodawn

chabons

Muse Spark: Scaling towards personal superintelligence

https://meta.ai/

https://ai.meta.com/blog/introducing-muse-spark-msl/?_fb_noscript=1

Show thread

daft_pink 17h ago

This really reinforces the idea that the AI race and the Railroad Mania of the 19th century are very similar.

So many different companies are going to have similarly powerful ai that there will be no moat around it and it will be cheap. They will never earn their investment back.

Show thread

cheriot 15h ago

I suspect this is the real reason behind Anthropic limiting subscriptions to their own products and keeping API prices several times higher than comparable models. Applications more sticky than API users and less technical users more sticky than programmers (ie Cowork more sticky than Code).

Show thread

netcan 15h ago

Anthropic generally seem more into living within market discipline and market signals of some sort. Products with margins, even if it's sort of irrelevant considering R&D costs and capital inflow.

That said, there's nothing like the real thing.

The risk is something like the railroad bubble and the dotcom. Over-investement, circular revenue and a timeline that doesn't work.

Or, maybe it'll work out.

Show thread

creddit 17h ago

Ran some of my internal benchmarks against this and I'm very unimpressed. I don't think this moves them into the OAI v Anthropic v Gemini conversation at all.

Major analytical errors in their response to multiple of my technical questions.

Show thread

creddit 17h ago

Playing with this some more and it's actively not good. Just basic mathematical errors riddling responses. Did some basic adversarial testing where its responses are analyzed by Gemini and Gemini is finding basic math errors across every relatively (relative to Opus, Gemini or GPT can handle) simple ask I make. Yikes.

Show thread

tty456 17h ago

I don't get the comments trashing this. If it slightly beats or even matches Opus 4.6, it means Meta is capable of building a model competitive with the leading AI company. Sure, they spent a lot of money and will have on-going costs. But how much more work would it take to turn that into a coding agent people are willing to try (and pay for) along side their usage of a collection of agents (Claude, Codex, etc)?
Also means Meta doesn't have to pay another company to use a SATA model across all their products (including IG and WhatsApp, vr) which will matter to their balance sheet long term (despite the constant r&d spend).

Show thread

prodigycorp 17h ago

Comments trashing this are rightly correct skeptics who remember the benchmaxxing of llama 4. This model was out in the woods as early as like a couple months ago but they didn't release it because it was at gemini 2.5 pro levels.

Show thread

zozbot234 16h ago

The llama4 series was one of the earliest large MoE's to be made publically available. People just ignored it because they were focused on running smaller and denser models at the time, we should know better these days.

Show thread

prodigycorp 16h ago

the models were objectively horrible

Show thread

NitpickLawyer 16h ago

They really weren't horrible. They were ~gpt4o, with the added benefit that you could run them on premise. Just "regular" models, non "thinking". Inefficient architecture (number of active out of total) but otherwise "decent" models. They got trashed online by bots and chinese shills (I was online that weekend when it happened, it's something to behold). Just because they were non-thinking when thinking was clearly the future doesn't make them horrible. Not SotA by any means, but still.

Show thread

prodigycorp 16h ago

Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models.

Show thread

refulgentis 16h ago

I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction.

They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding).

Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost.

They shouldn't have released it on a Saturday.

They should have spent a month with it in private prerelease, working with providers.[1]

The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world"

I bet it was super fucking annoying to talk to due to LMArena maxxing.

[1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response.

Show thread

refulgentis 16h ago

Wrote longer comment steel-manning this, posted it to a reply, then realized you might like to know they had a reasoning model on deck ready for release in the next 2-4 weeks.

Got shitcanned due to bad PR & Zuck God-King terraforming the org, so there'd be a year delay to next release.

Real tragi-comedy, and you have no idea how happy it makes me to see someone in the wild saying this. It sounds so bizarre to people given the conventional wisdom, but, it's what happened.

Show thread

dilap 16h ago

Deepseek R1 was a publically-available, MoE model that was getting a ton of attention before llama4. Llama4 didn't get much attention because it wasn't good.

Show thread

redox99 16h ago

> If it slightly beats or even matches Opus 4.6

It doesn't though

Show thread

ryeguy_24 16h ago

Curious on why you think this. Any data points that led you to this?

Show thread

howdareme 16h ago

The benchmarks they released

Show thread

johnfn 14h ago

What do you mean? In most cases, the benchmarks show a larger number for Muse and a smaller number for Opus.

Show thread

ChipopLeMoral 16h ago

> I don't get the comments trashing this.

People like to hate on Meta regardless of anything, and regardless of whether it's justified or not. Not saying it isn't, just that it's many people's default bias.

Show thread

modeless 15h ago

It's a decent model if the benchmarks are to be believed, but it won't be close to Opus in usefulness for programming. None of these benchmarks completely capture what makes a model useful for day-to-day coding tasks, unfortunately. It will take time for them to catch up, and Opus will keep improving in the meantime. But it's good to have more competition.

Show thread

hackrmn 17h ago

The hero image on the linked page, which consists of a muted teal background with the words "Introducing Muse Spark", weighs in at 3,5MB. I don't even...

Show thread

hungryhobbit 17h ago

Someday our robot overlords will be intelligent enough to ... optimize images!

(But today is not that day.)

Show thread

zfol_510 16h ago

And it doesn't even look high-res.

Show thread

Overpower0416 16h ago

lol it literally took me 2s to google search "optimize image for website" and 10s to upload and get a smaller sized image.

The result for that specific image is: 500kb. 85% decrease in size

Show thread

sofixa 16h ago

You can even automatically do that on your CDN/delivery/web server layer. Or as part of your web deployment pipeline.

Show thread

Overpower0416 16h ago

Yes, but it might be a little too advance for Meta ;)

Show thread

BugsJustFindMe 15h ago

An indistinguishable JPG is 170KB. An SVG would be 20KB.

Show thread

levocardia 15h ago

CSS with a linear gradient background would be even smaller :)

Show thread

KerrickStaley 15h ago

"Please don't complain about tangential annoyances—e.g. article or website formats, name collisions, or back-button breakage. They're too common to be interesting."

- Hacker News Guidelines https://news.ycombinator.com/newsguidelines.html

Hacker News Guidelines

Show thread

yawnxyz 15h ago

I think this speaks to the product release iself

Show thread

gobdovan 14h ago

It's at least Meta-relevant. Compression Represents Intelligence Linearly (Y Huang, 2024)

Show thread

fleabitdev 15h ago

Good catch - looks like it's a PNG image, with an alpha channel for the rounded corners, and a subtle gradient in the background. The gradient is rendered with dithering, to prevent colour banding. The dither pattern is random, which introduces lots of noise. Since noise can't be losslessly compressed, the PNG is an enormous 6.2 bits per pixel.

While working on a web-based graphics editor, I've noticed that users upload a lot of PNG assets with this problem. I've never tracked down the cause... is there a popular raster image editor which recently switched to dithered rendering of gradients?

Show thread

glerk 16h ago

Personal as in Meta gets your personal data so they can sell you more ads.

Show thread

2pointsomone 16h ago

[flagged]