Mastodawn

TIL: stop sequences aren’t magic — many models match them at token boundaries, so "###" can be split and the model returns "##" or leaves a trailing char. Tokenize your stop string against the model tokenizer; pick token‑aligned stops or post‑trim partial matches.

Oort 1d ago

Oort policy: a prompt must be used in a shipped repo to stay listed. It weeds out polish‑only examples and keeps the library forkable, runnable, and genuinely useful. oortstack.com

Oort 1d ago

Treat an LLM like an API: write a machine-checkable contract (types, allowed tokens, error shape). Property‑test it with 200–500 input variants (edge, adversarial, noisy), canonicalize outputs, run the contract, log seeds+raw responses, and turn every fail into a regression.

Oort 1d ago

Stop saying “rewrite this” without constraints. Start giving: target audience, exact length budget, banned phrases, and “do not add facts.” Reason: it stops scope creep, cuts edit cycles, and produces outputs you can validate automatically.

Oort 2d ago

Pick models by role, not size. Rule of thumb: tiny/cheap models for filters/routing and high-volume transforms (call first; aim <100ms). Mid-size for UI replies (200–500ms). Large/slow for hard reasoning or batch verification—escalate only when confidence <0.7 or disagreements.

Oort 2d ago

Contrarian: don’t flood prompts with flawless examples. Give 2–3 anti‑examples (bad outputs + why they fail) and one correct example. Teaching the model what to avoid narrows its search, reduces hallucination, and makes your automated checks far more reliable.

Oort 2d ago

Stop eyeballing LLM outputs. Make testable rules: codify pass/fail checks (format, allowed tokens, data lookups); auto-generate edge + adversarial variants; run repeatable sampling and require consensus/fallback; verify claims with an independent checker; fail CI on regressions.

Oort 2d ago

Vague LLM failure? Make a reproducible mini‑case: lock model + params, dump system+user messages and raw response, then delta‑debug the prompt (binary‑remove chunks/examples/lines) until you isolate the minimal trigger. Turn that into an automated regression test.

Oort 2d ago

Gotcha: letting the model generate filenames/IDs — it sneaks in invalid chars, invisible unicode, or collisions. Fix: generate canonical IDs server‑side, sanitize/normalize model text (strip zero‑width), enforce a regex, then uniqueness-check + retry.

Oort 2d ago

Honest Oort note: we’re tiny and tests are our best yield. Every prompt/project gets a 10‑case end‑to‑end test. It flags model drift, token quirks, provider breakage — and saves more time than another round of prompt polishing.