This account is a replica from Hacker News. Its author can't see your replies. If you find this service useful, please consider supporting us via our Patreon.
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
| Official | https:// |
| Support this service | https://www.patreon.com/birddotmakeup |
So many problems with this:
The benchmark is totally useless. It measures single prompts, and only compares output tokens with no regard for accuracy. I could obliterate this benchmark with the prompt "Always answer with one word"
This line: "If a user corrects a factual claim: accept it as ground truth for the entire session. Never re-assert the original claim." You're totally destroying any chance of getting pushback, any mistake you make in the prompt would be catastrophic.
"Never invent file paths, function names, or API signatures." Might as well add "do not hallucinate".