my stupid llm research is absofuckinglutely not going the way i was hoping.

ive spent like a fucking week trying to setup a testing harness to get local models to do the same test 100 times, aperture science style, to test the drift of their results

but 100% of the time, the model:
- emits tool calls incorrectly, so i see them
- ignores instructions
- falls into a loop
- says its gonna do stuff, then .. just doesnt
- intentionally deviates from instructions even when explicitly told not to

this is nuts. keeping these things on track is a sisyphean task of pure crystalline futility.

how people think they're gonna lose their jobs to this fucking crap is nuts. the only way to make it work is to put it in trillion dollar datacenters, so that the 1% of output that isnt absolute trash actually makes it out of the system.

@Viss Meanwhile Jack Dorsey rolls the dice and axes half the staff of Block.
✨ YOLO ✨
@coreysnipes oh did he do that?
jack (@jack) on X

we're making @blocks smaller today. here's my note to the company. #### today we're making one of the hardest decisions in the history of our company: we're reducing our organization by nearly half, from over 10,000 people to just under 6,000. that means over 4,000 of you are

X (formerly Twitter)