Mastodawn

Telepyleia 6h ago

RE: https://cyberplace.social/@GossiTheDog/116327564500679388

Stop using human error as a category it almost never is the underlying cause.

Telepyleia 19h ago

jonny (good kind)22h ago

Claude code source "leaks" in a mapfile
people immediately use the code laundering machines to code launder the code laundering frontend
now many dubious open source-ish knockoffs in python and rust being derived directly from the source

What's anthropic going to do, sue them? Insist in court that LLM recreating copyrighted code is a violation of copyright???

Telepyleia 19h ago

Show thread

jonny (good kind)20h ago

So the reason that Claude code is capable of outputting valid json is because if the prompt text suggests it should be JSON then it enters a special loop in the main query engine that just validates it against JSON schema (it looks like the schema just validates that something in fact and object and its keys are strings) and then feeds the data with the error message back into itself until it is valid JSON or a retry limit is reached.

This code is so eye wateringly spaghetti so I am still trying to see if this is true, but this seems to be how it not only returns json to the user, but how it handles all LLM-to-JSON, including internal output from its tools. There appears to be an unconditional hook where if the JSON output tool is present in the session config at all, then all tool calls must be followed by the "force into JSON" loop.

If that's true, that's just mind blowingly expensive

edit: please note that unless I say otherwise all evaluations here are just from my skimming through the code on my phone and have not been validated in any way that should cause you to be upset with me for impugning the good name of anthropic

edit2: this is both much worse and not as bad as i thought on first read - https://neuromatch.social/@jonny/116326861737478342

jonny (good kind) (@[email protected])

Attached: 3 images OK i can't focus on work and keep looking at this repo. So after every "subagent" runs, claude code creates *another* "agent" to check on whether the first "agent" did the thing it was supposed to. I don't know about you but i smell a bit of a problem, if you can't trust whether one "agent" with a very big fancy model did something, how in the fuck are you supposed to trust another "agent" running on the smallest crappiest model? That's not the funny part, that's obvious and fundamental to the entire show here. HOWEVER RECALL [the above JSON Schema Verification thing](https://neuromatch.social/@jonny/116325123136895805) that is unconditionally added onto the end of every round of LLM calls. the mechanism for adding that hook is... JUST FUCKING ASKING THE MODEL TO CALL THAT TOOL. second pic is registering a hook s.t. "after some stop state happens, if there isn't a message indicating that we have successfully called the JSON validation thing, prompt the model saying "you must call the json validation thing" this shit sucks so bad they can't even ***CALL THEIR OWN CODE FROM INSIDE THEIR OWN CODE.*** Look at the comment on pic 3 - "e.g. agent finished without calling structured output tool" - that's common enough that they have a whole goddamn error category for it, and the way it's handled is by just pretending the job was cancelled and nothing happened.

neurospace.live

Telepyleia 6d ago

Sure. Let’s do another ai driven proof of concept the sales people can run wild with and overpromise on without actually doing any market fit testing, causing the board to Hail Mary the end of year results on yet another unsecured, non-viable llm based fever dream that will cost us more than it will make.

Can we just not do it on prod this time?

I want to get of Mr bones wild ride.

Telepyleia 6d ago

Unsure whether the “ai will write most code” thing currently going on at work is a sign of the investment fund drinking the koolaid or whether it’s an indictment of most of our devs.

Kinda split on that one tbh

Telepyleia Mar 19

Zack Labe Mar 19

🚨 Ice update - #Arctic sea ice extent is currently the *lowest* on record for the date (JAXA data)...

• about 640,000 km² below the 2010s mean
• about 1,050,000 km² below the 2000s mean
• about 1,460,000 km² below the 1990s mean
• about 1,950,000 km² below the 1980s mean

More: https://zacklabe.com/arctic-sea-ice-figures/

Telepyleia Mar 4

RE: https://mstdn.ca/@AlisonCreekside/116171477122428233

Watch the Dutch government offer "political but not military" support again. Then we find out we actually did send troops years later and the fucker responsible gets elected nato chief again and we all pretend like that never happened.

Telepyleia Sep 18, 2024

Internal security friends.
If the board or the investors hire a consulting firm to analyze your orgs infosec program, despite you having informed them of what's wrong for years and them not doing shit about it, smile at the junior consultant that leads the inquiry, tell them in all honesty how much of a dumpster fire your program is, preferably with nice color coded graphs and but without blaming anybody, then use the resulting report to get what you need.

Telepyleia Jul 31, 2024

C-level inquired whether we have any controls in place to prevent open source dependencies from being used in our software. Answering that literally our entire stack is open source... was not the answer he wanted to hear. Lol.