Mastodawn

my stupid llm research is absofuckinglutely not going the way i was hoping.

ive spent like a fucking week trying to setup a testing harness to get local models to do the same test 100 times, aperture science style, to test the drift of their results

but 100% of the time, the model:
- emits tool calls incorrectly, so i see them
- ignores instructions
- falls into a loop
- says its gonna do stuff, then .. just doesnt
- intentionally deviates from instructions even when explicitly told not to

Show thread

joel cretan

@Viss well, there’s your problem. This is very much the expected result for Aperture Science. Everyone’s worried about GLaDOS but you’re running Wheatley.

Show thread

Viss Feb 27

@kaced oh its not wheatley, its the spaaaaaaaaace orb

Show thread

joel cretan Feb 27

@Viss You’re absolutely right! This is a common problem when referencing Portal, there are too many good ones. Would you like me to suggest some more topical video game references? What’s your favorite thing about space? Mine is space.