i love it when my program's execution conditions are "still active if unsure"
what in the fuck kind of world have we arrived at where in the optimal conditions where the "program" works fully as intended the program is "RUNNING" as long as the execution environment is "NOT SURE" (????????) if the program is "RUNNING"
this is what counts as benchmarking, with code links because this shit makes literally no sense and boils your brain if you try and read it:
validate_email, is_valid_email, etc. if any of those names is defined, get the function by fucking evaling the name. globals() DICT AND SEE IF THAT IS AN EMAIL VALIDATION FUNCTIONthere is an as-yet unmerged PR to "fix the correctness benchmarks" and a "robustness audit" that is wonderful:
https://github.com/DietrichGebert/ponytail/pull/83

Two-part response to #65 ("Impact on model performance?"). Part 1 — fix the correctness gate Two bugs in the correct gate were under-reporting correctness for terse models — the likely so...
if switches to give a name to the prompt, rather than idk labeling the prompts themselves
You don't understand. This original prompt has been brought into being by tibetian throatsingers during a 48-hour "vibing" session with the late Sir Ferdinand von Codeschreiber.
You can't simply change it, because that would cause things to fail in a way that can't be properly tested because it's all snakeoil anyway... 