my stupid llm research is absofuckinglutely not going the way i was hoping.

ive spent like a fucking week trying to setup a testing harness to get local models to do the same test 100 times, aperture science style, to test the drift of their results

but 100% of the time, the model:
- emits tool calls incorrectly, so i see them
- ignores instructions
- falls into a loop
- says its gonna do stuff, then .. just doesnt
- intentionally deviates from instructions even when explicitly told not to

@Viss Maybe stuff in this mornings kali blog might help? https://www.kali.org/blog/kali-llm-claude-desktop/

Have not had time to burn on LLM hacking myself 🤷‍♂️

Kali & LLM: macOS with Claude Desktop GUI & Anthropic Sonnet LLM | Kali Linux Blog

This post will focus on an alternative method of using Kali Linux, moving beyond direct terminal command execution. Instead, we will leverage a Large Language Model (LLM) to translate “natural language” descriptions of desired actions into technical commands. Achieving this setup requires the integration of three distinct systems:

Kali Linux
@zombie042 ahhaha so many people are gonna rm themselves