Mastodawn

Talk like caveman

https://github.com/JuliusBrussee/caveman

GitHub - JuliusBrussee/caveman: 🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman

🪨 why use many token when few token do trick — Claude Code skill that cuts 65% of tokens by talking like caveman - JuliusBrussee/caveman

GitHub

Show thread

TeMPOraL 5d ago

Oh boy. Someone didn't get the memo that for LLMs, tokens are units of thinking. I.e. whatever feat of computation needs to happen to produce results you seek, it needs to fit in the tokens the LLM produces. Being a finite system, there's only so much computation the LLM internal structure can do per token, so the more you force the model to be concise, the more difficult the task becomes for it - worst case, you can guarantee not to get a good answer because it requires more computation than possible with the tokens produced.

I.e. by demanding the model to be concise, you're literally making it dumber.

(Separating out "chain of thought" into "thinking mode" and removing user control over it definitely helped with this problem.)

Show thread

baq 5d ago

Do you know of evals with default Claude vs caveman Claude vs politician Claude solving the same tasks? Hypothesis is plausible, but I wouldn’t take it for granted

Show thread

NiloCK 5d ago

I agree with this take in general, but I think we need to be prepared for nuance when thinking about these things.

Tokens are how an LLM works things out, but I think it's just as likely as not that LLMs (like people) are capable of overthinking things to the point of coming to a wrong answer when their "gut" response would have been better. I do not content that this is the default mode, but that it is both possible, and that it's more or less likely on one kind of problem than another, problem categories to be determined.

A specific example of this was the era of chat interfaces that leaned too far in the direction of web search when responding to user queries. No, claude, I don't want a recipe blogspam link or summary - just listen to your heart and tell me how to mix pancakes.

More abstractly: LLMs give the running context window a lot of credit, and will work hard to post-hoc rationalize whatever is in there, including any prior low-likelihood tokens. I expect many problematic 'hallucinations' are the result of an unlucky run of two or more low probability tokens running together, and the likelihood of that happening in a given response scales ~linearly with the length of response.

Show thread

avaer 5d ago

That was my first thought too -- instead of talk like a caveman you could turn off reasoning, with probably better results.

Additionally, LLMs do not actually operate in text; much of the thinking happens in a much higher dimensional space that just happens to be decoded as text.

So unless the LLM was trained otherwise, making it talk like a caveman is more than just theoretically turning it into a caveman.

Show thread

DrewADesign 5d ago

> much of the thinking happens in a much higher dimensional space that just happens to be decoded as text.

What do you mean by that? It’s literally text prediction, isn’t it?

Show thread

kubb 5d ago

This is condescending and wrong at the same time (best combo).

LLMs do stumble into long prediction chains that don’t lead the inference in any useful direction, wasting tokens and compute.

Show thread

alentred 5d ago

Indeed. But I have tried this skill and can confirm that the thinking phase is not impacted. At least in my few attempts it applied the "caveman talk" only to the output, after the initial response was formulated in the thinking process. I used opencode.

You are right, of course, that as such it does not reduce the token usage really. If anything it consumes more tokens because it has to apply the skill on top of the initial result. I do appreciate the conciseness of the output, though :)

Show thread

teekert 5d ago

Idk I try talk like cavemen to claude. Claude seems answer less good. We have more misunderstandings. Feel like sometimes need more words in total to explain previous instructions. Also less context is more damage if typo. Who agrees? Could be just feeling I have. I often ad fluff. Feels like better result from LLM. Me think LLM also get less thinking and less info from own previous replies if talk like caveman.

Show thread

nayroclade 5d ago

Cute idea, but you're never gonna blow your token budget on output. Input tokens are the bottleneck, because the agent's ingesting swathes of skills, directory trees, code files, tool outputs, etc. The output is generally a few hundred lines of code and a bit of natural language explanation.

Show thread

vivid242 5d ago

Great idea- if the person who made it is reading: Is this based on the board game „poetry for cavemen“? (Explain things using only single-syllable words, comes even with an inflatable log of wood for hitting each other!)

Show thread

samus 5d ago

[delayed]