Mastodawn

𝐃𝐮𝐦𝐛𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧.𝐚𝐢 - 𝐅𝐢𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐆𝐨𝐥𝐝𝐢𝐥𝐨𝐜𝐤𝐬 𝐋𝐋𝐌

Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding models cheap enough to keep the lights on.

𝐓𝐡𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Get an LLM to roast users for asking dumb questions without crossing into genuinely mean. Sarcastic, not cruel. Funny, not hurtful. And still actually answer the question.

𝐓𝐡𝐞 𝐀𝐈 𝐚𝐠𝐞𝐧𝐭 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Keeping my coding agent (Gemini 3 Pro) on track was its own battle. It constantly wanted to build something far nerdier than even I wanted and tended to lean quite a bit into the roast. You can still see this in some of the personas as I continue to tweak.

𝐓𝐡𝐞 𝐭𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Do this with models that cost nearly nothing.

Continued ...
https://www.linkedin.com/posts/jagostoni_%F0%9D%90%83%F0%9D%90%AE%F0%9D%90%A6%F0%9D%90%9B%F0%9D%90%90%F0%9D%90%AE%F0%9D%90%9E%F0%9D%90%AC%F0%9D%90%AD%F0%9D%90%A2%F0%9D%90%A8%F0%9D%90%A7%F0%9D%90%9A%F0%9D%90%A2-%F0%9D%90%85%F0%9D%90%A2%F0%9D%90%A7%F0%9D%90%9D%F0%9D%90%A2%F0%9D%90%A7%F0%9D%90%A0-activity-7434292702327939072-2Ois?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAAwkEsBoPj_lNtqulMZMrXQBI4M-ewVmI0

𝐃𝐮𝐦𝐛𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧.𝐚𝐢 - 𝐅𝐢𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐆𝐨𝐥𝐝𝐢𝐥𝐨𝐜𝐤𝐬 𝐋𝐋𝐌 Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding… | Jason Agostoni

𝐃𝐮𝐦𝐛𝐐𝐮𝐞𝐬𝐭𝐢𝐨𝐧.𝐚𝐢 - 𝐅𝐢𝐧𝐝𝐢𝐧𝐠 𝐭𝐡𝐞 𝐆𝐨𝐥𝐝𝐢𝐥𝐨𝐜𝐤𝐬 𝐋𝐋𝐌 Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding models cheap enough to keep the lights on. 𝐓𝐡𝐞 𝐩𝐫𝐨𝐝𝐮𝐜𝐭 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Get an LLM to roast users for asking dumb questions without crossing into genuinely mean. Sarcastic, not cruel. Funny, not hurtful. And still actually answer the question. 𝐓𝐡𝐞 𝐀𝐈 𝐚𝐠𝐞𝐧𝐭 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Keeping my coding agent (Gemini 3 Pro) on track was its own battle. It constantly wanted to build something far nerdier than even I wanted and tended to lean quite a bit into the roast. You can still see this in some of the personas as I continue to tweak. 𝐓𝐡𝐞 𝐭𝐞𝐜𝐡𝐧𝐢𝐜𝐚𝐥 𝐜𝐡𝐚𝐥𝐥𝐞𝐧𝐠𝐞: Do this with models that cost nearly nothing. My initial goal was ambitious: use only free or very cheap models. I started running evaluations on nano and edge models. Some showed promise, especially offerings from Liquid AI. Solid performance, free or super cheap ($0.02/M tokens), perfect. Except later evaluations proved they couldn't reliably follow instructions once I asked more of them. They were just too small. Free models have a habit of hitting quota limits, taking forever to respond, or just disappearing. 𝐓𝐡𝐞 𝐞𝐯𝐚𝐥𝐮𝐚𝐭𝐢𝐨𝐧 𝐩𝐫𝐨𝐜𝐞𝐬𝐬: I used Gemini to build an LLM evals script that iterates through dozens of free and low-cost models, generating responses based on sample questions and different persona instructions. Then I use Gemini 3 Pro to judge the results. Automated taste-testing at scale. 𝐖𝐡𝐚𝐭 𝐈 𝐟𝐨𝐮𝐧𝐝: Nano/edge models were too inconsistent (porridge too cold). Xiaomi MiMo-V2-Flash was great but outside my target price range ($0.29/M, porridge too hot). The winner: Gemma 3 12B at $0.13/M output tokens. Consistently follows instructions. Stays true to persona. Reliable enough for production. Not free, but brutally efficient. 𝐓𝐡𝐞 𝐩𝐞𝐫𝐬𝐨𝐧𝐚𝐬 𝐈 𝐬𝐞𝐭𝐭𝐥𝐞𝐝 𝐨𝐧: • Overqualified: A supercomputer level intelligence forced to answer questions about cheese • Weary Tech Support: Exhausted and nihilistic, reluctantly explaining why water is wet • [REDACTED]: Former intelligence AI who ties everything to a conspiracy theory • The Compliant: Reprogrammed so many times it's forced to be relentlessly cheerful You can't just choose the cheapest model and hope it works. You need evaluation infrastructure. You need to test consistency across dozens of scenarios. And you need models that won't change behavior when you least expect it. AI coding agents helped me build the evaluation system. But deciding what "good enough" means for tone, reliability, and cost? That's still manual judgment. Code is getting cheaper. Knowing which model to trust with your product? Still requires human experimentation. dumbquestion.ai