๐๐ฎ๐ฆ๐๐๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง.๐๐ข - ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ญ๐ก๐ ๐๐จ๐ฅ๐๐ข๐ฅ๐จ๐๐ค๐ฌ ๐๐๐
Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding models cheap enough to keep the lights on.
๐๐ก๐ ๐ฉ๐ซ๐จ๐๐ฎ๐๐ญ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Get an LLM to roast users for asking dumb questions without crossing into genuinely mean. Sarcastic, not cruel. Funny, not hurtful. And still actually answer the question.
๐๐ก๐ ๐๐ ๐๐ ๐๐ง๐ญ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Keeping my coding agent (Gemini 3 Pro) on track was its own battle. It constantly wanted to build something far nerdier than even I wanted and tended to lean quite a bit into the roast. You can still see this in some of the personas as I continue to tweak.
๐๐ก๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Do this with models that cost nearly nothing.

๐๐ฎ๐ฆ๐๐๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง.๐๐ข - ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ญ๐ก๐ ๐๐จ๐ฅ๐๐ข๐ฅ๐จ๐๐ค๐ฌ ๐๐๐ Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND findingโฆ | Jason Agostoni
๐๐ฎ๐ฆ๐๐๐ฎ๐๐ฌ๐ญ๐ข๐จ๐ง.๐๐ข - ๐ ๐ข๐ง๐๐ข๐ง๐ ๐ญ๐ก๐ ๐๐จ๐ฅ๐๐ข๐ฅ๐จ๐๐ค๐ฌ ๐๐๐ Building DumbQuestion.ai meant solving two problems at once: creating personas with the right tone AND finding models cheap enough to keep the lights on. ๐๐ก๐ ๐ฉ๐ซ๐จ๐๐ฎ๐๐ญ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Get an LLM to roast users for asking dumb questions without crossing into genuinely mean. Sarcastic, not cruel. Funny, not hurtful. And still actually answer the question. ๐๐ก๐ ๐๐ ๐๐ ๐๐ง๐ญ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Keeping my coding agent (Gemini 3 Pro) on track was its own battle. It constantly wanted to build something far nerdier than even I wanted and tended to lean quite a bit into the roast. You can still see this in some of the personas as I continue to tweak. ๐๐ก๐ ๐ญ๐๐๐ก๐ง๐ข๐๐๐ฅ ๐๐ก๐๐ฅ๐ฅ๐๐ง๐ ๐: Do this with models that cost nearly nothing. My initial goal was ambitious: use only free or very cheap models. I started running evaluations on nano and edge models. Some showed promise, especially offerings from Liquid AI. Solid performance, free or super cheap ($0.02/M tokens), perfect. Except later evaluations proved they couldn't reliably follow instructions once I asked more of them. They were just too small. Free models have a habit of hitting quota limits, taking forever to respond, or just disappearing. ๐๐ก๐ ๐๐ฏ๐๐ฅ๐ฎ๐๐ญ๐ข๐จ๐ง ๐ฉ๐ซ๐จ๐๐๐ฌ๐ฌ: I used Gemini to build an LLM evals script that iterates through dozens of free and low-cost models, generating responses based on sample questions and different persona instructions. Then I use Gemini 3 Pro to judge the results. Automated taste-testing at scale. ๐๐ก๐๐ญ ๐ ๐๐จ๐ฎ๐ง๐: Nano/edge models were too inconsistent (porridge too cold). Xiaomi MiMo-V2-Flash was great but outside my target price range ($0.29/M, porridge too hot). The winner: Gemma 3 12B at $0.13/M output tokens. Consistently follows instructions. Stays true to persona. Reliable enough for production. Not free, but brutally efficient. ๐๐ก๐ ๐ฉ๐๐ซ๐ฌ๐จ๐ง๐๐ฌ ๐ ๐ฌ๐๐ญ๐ญ๐ฅ๐๐ ๐จ๐ง: โข Overqualified: A supercomputer level intelligence forced to answer questions about cheese โข Weary Tech Support: Exhausted and nihilistic, reluctantly explaining why water is wet โข [REDACTED]: Former intelligence AI who ties everything to a conspiracy theory โข The Compliant: Reprogrammed so many times it's forced to be relentlessly cheerful You can't just choose the cheapest model and hope it works. You need evaluation infrastructure. You need to test consistency across dozens of scenarios. And you need models that won't change behavior when you least expect it. AI coding agents helped me build the evaluation system. But deciding what "good enough" means for tone, reliability, and cost? That's still manual judgment. Code is getting cheaper. Knowing which model to trust with your product? Still requires human experimentation. dumbquestion.ai




