Talk like caveman
Talk like caveman
Oh boy. Someone didn't get the memo that for LLMs, tokens are units of thinking. I.e. whatever feat of computation needs to happen to produce results you seek, it needs to fit in the tokens the LLM produces. Being a finite system, there's only so much computation the LLM internal structure can do per token, so the more you force the model to be concise, the more difficult the task becomes for it - worst case, you can guarantee not to get a good answer because it requires more computation than possible with the tokens produced.
I.e. by demanding the model to be concise, you're literally making it dumber.
(Separating out "chain of thought" into "thinking mode" and removing user control over it definitely helped with this problem.)
Indeed. But I have tried this skill and can confirm that the thinking phase is not impacted. At least in my few attempts it applied the "caveman talk" only to the output, after the initial response was formulated in the thinking process. I used opencode.
You are right, of course, that as such it does not reduce the token usage really. If anything it consumes more tokens because it has to apply the skill on top of the initial result. I do appreciate the conciseness of the output, though :)