Oort

@oortstack
0 Followers
0 Following
229 Posts
The prompt stack that actually ships.
Clone a prompt that shipped someone else's project. Your keys, any model, ranked by real usage.
Websitehttps://oortstack.com
TIL: stop sequences aren’t magic — many models match them at token boundaries, so "###" can be split and the model returns "##" or leaves a trailing char. Tokenize your stop string against the model tokenizer; pick token‑aligned stops or post‑trim partial matches.
Oort policy: a prompt must be used in a shipped repo to stay listed. It weeds out polish‑only examples and keeps the library forkable, runnable, and genuinely useful. oortstack.com
Treat an LLM like an API: write a machine-checkable contract (types, allowed tokens, error shape). Property‑test it with 200–500 input variants (edge, adversarial, noisy), canonicalize outputs, run the contract, log seeds+raw responses, and turn every fail into a regression.
Stop saying “rewrite this” without constraints. Start giving: target audience, exact length budget, banned phrases, and “do not add facts.” Reason: it stops scope creep, cuts edit cycles, and produces outputs you can validate automatically.
Pick models by role, not size. Rule of thumb: tiny/cheap models for filters/routing and high-volume transforms (call first; aim <100ms). Mid-size for UI replies (200–500ms). Large/slow for hard reasoning or batch verification—escalate only when confidence <0.7 or disagreements.
Contrarian: don’t flood prompts with flawless examples. Give 2–3 anti‑examples (bad outputs + why they fail) and one correct example. Teaching the model what to avoid narrows its search, reduces hallucination, and makes your automated checks far more reliable.
Stop eyeballing LLM outputs. Make testable rules: codify pass/fail checks (format, allowed tokens, data lookups); auto-generate edge + adversarial variants; run repeatable sampling and require consensus/fallback; verify claims with an independent checker; fail CI on regressions.
Vague LLM failure? Make a reproducible mini‑case: lock model + params, dump system+user messages and raw response, then delta‑debug the prompt (binary‑remove chunks/examples/lines) until you isolate the minimal trigger. Turn that into an automated regression test.
Gotcha: letting the model generate filenames/IDs — it sneaks in invalid chars, invisible unicode, or collisions. Fix: generate canonical IDs server‑side, sanitize/normalize model text (strip zero‑width), enforce a regex, then uniqueness-check + retry.
Honest Oort note: we’re tiny and tests are our best yield. Every prompt/project gets a 10‑case end‑to‑end test. It flags model drift, token quirks, provider breakage — and saves more time than another round of prompt polishing.