Mastodawn

Curious which words (>3 letters) in your system dictionary have all the letters in alphabetical order? Sate your curiosity with a little #awk:

$ awk 'length>3 && /^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$/' /usr/share/dict/words

Optionally sort they by length:

$ awk 'length>3 && /^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$/{print length, $0}' /usr/share/dict/words | sort -n

give me "billowy" and "beefily" as words of interest. If you don't like duplicates, use "?" instead of "*"

$ awk 'length>3 && /^a?b?c?d?e?f?g?h?i?j?k?l?m?n?o?p?q?r?s?t?u?v?w?x?y?z?$/{print length, $0}' /usr/share/dict/words | sort -n

which gives "almost", "biopsy", and "chintz" as nice long runs.

Show thread

Ben Zanin

@gumnos looking at those regexes first makes me shudder at the amount of backtracking they would kick off, then makes me remember just how unbelievably fast modern CPUs actually are.

Show thread

Tim Chase Mar 10

@gnomon

Given the ratcheting nature of them and the initial anchoring at the front/back, there's minimal backtracking. The first letter of a word zips right through zero-of-everything-before, and the instant an out-of-sequence letter is found, it rejects. It may be *ugly*, but it's fast 😆

Show thread

Ben Zanin Mar 10

@gumnos awwww *pats regex on the head* me too l'il guy

Show thread

Tim Chase Mar 10

@gnomon

If you don't mind uglier and are using PCRE2 instead of awk's EREs, you can force greedy * operators with *+ like

^a*+b*+c*+d*+e*+f*+g*+h*+i*+j*+k*+l*+m*+n*+o*+p*+q*+r*+s*+t*+u*+v*+w*+x*+y*+z*+$

which backtracks at roughly O(N) rather than O(N log N) 😉

https://regex101.com/r/hjuQRY/1

(changing "billowya" to "billowy" goes from not-matching in 29 steps to matching in 30 steps) compared to the non-greedy "*" operator which rejects "billowya" in 120 steps while accepting "billowy" in 30)

(and yes, I used regex to edit my regex… 😆)

regex101: build, test, and debug regex

Regular expression tester with syntax highlighting, explanation, cheat sheet for PHP/PCRE, Python, GO, JavaScript, Java, C#/.NET, Rust.

regex101

Show thread

Omar Polo Mar 10

@gnomon @gumnos
(sorry but I like blabbering about regexp implementation, so...)
there's no need for backtracking *at all* in regexp, you'd need it only for some extensions (which are no longer regular expression in the mathematical sense).

Russ Cox gives a very nice explanation here: https://swtch.com/~rsc/regexp/regexp2.html

regex101: build, test, and debug regex

Regular Expression Matching: the Virtual Machine Approach