0 Followers
0 Following
9 Posts

This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
Officialhttps://
Support this servicehttps://www.patreon.com/birddotmakeup
Is the gram pen a design for Caran D’Ache? Because it looks a lot like their 849 model
This is crazy, thank you for the link!
It's a little disturbing, but also very fun to just discover by probing, building and breaking.

Another example of the mindf@#$ these systems are: I was doing some fine tuning to a small model, take data fields and make a sentence out of it. I was running into mode collapse (basically when the AI simplifies too much and always output the same thing).

I got unstuck by randomizing the field order for each row?!? At training, and now I'm thinking I should do the same at inference time...

Incredible, will translate to better coding models in the near future.

We really need to develop better tools to understand what's happening inside these NNs. Working with high-D spaces is not something we're good at, and we're basically throwing stuff at it and seeing if it sticks.

If you’re looking into small models for tiny local tasks, you should try Qwen coder 0,5B. It’s more of an experiment, but it can output decent functions given the right context instructions.
Making a sentence out of a json
AFM models are very impressive, but they’re not made for conversation, so keep your expectations down in chat mode.
This study assumes everybody is oblivious to contamination, and explicitly says they can't differentiate. Not useful and bordering on the tautological