File this under #shell #functions I should have written years ago:
function grepc {
#Do a grep -c, but skipping files with no results
grep -c "$@" |grep -v ':0$'
}
File this under #shell #functions I should have written years ago:
function grepc {
#Do a grep -c, but skipping files with no results
grep -c "$@" |grep -v ':0$'
}
Oh, didn't know about -c. I usually just pipe to wc -l I guess.
-c, -l, -h, -H, and -q are my favorite #grep flags. :D
Huh, that almost became a [Marcel Duchamp] reference. 😅
I just use -v and -E
...and bash instead of zsh
...and grep/awk/sed instead of jq
...and firefox instead of chrome
...and the fediverse instead of facebook
Face it... I'm an unpopular-opinion neckbeard level boss. XD
cc: @mirabilos
@rl_dane Those are so not comparable!
@sotolf @thedoctor @rl_dane @mirabilos
Mm, not really though? ripgrep is meant for bulk grepping of files
@sotolf @thedoctor @rl_dane @mirabilos
I mostly just use it to run rg TODO and see all the spots in a codebase I marked as still needing work.
@sotolf @thedoctor @rl_dane @mirabilos
Well more it's for "this function works for now but doesn't handle an edge case; I'll mark it so I remember to come back later". Something of a to-do list for edge case stuff.
@amin @sotolf @thedoctor @mirabilos
Why is ripgrep better than just grep -R?
@rl_dane @sotolf @thedoctor @mirabilos
Probably more useful output, I dunno. 🤷
@amin @sotolf @thedoctor @mirabilos
Meh. ;)
@paul @amin @sotolf @thedoctor @mirabilos
uuuuuugh 🤣
@amin @sotolf @paul @thedoctor @rl_dane yeah…
It can also be faster, but if you know your grep options (e.g. -F) the difference gets much much smaller than someone elee said here. So, no need.

@kabel42 @amin @sotolf @thedoctor @mirabilos
Interesting! I wonder what kind of algorithmic optimizations (as opposed to compiler optimizations) they're using to do that, and if regular (GNU/BSD) grep could do the same.
Because I'll wear clown shoes and a tutu before changing to a "rewrite the world in rust!" utility 😂
@kabel42 @rl_dane @amin @sotolf @thedoctor eww, it’s not even a drop-in then…
(For not-a-drop-in, I found pcregrep interesting. Sadly, Debian recently dropped it, but in the versions which don’t have pcregrep any more, you can use grep -P for many use cases. pcre2grep is not a drop-in for pcregrep either…)
@kabel42 @mirabilos @amin @sotolf @thedoctor
"transpiling" from PCRE to normal regex sounds deliciously cursèd. XD
@rl_dane @sotolf @kabel42 @thedoctor @amin TINSTA "normal" regex. Do you mean POSIX ERE?
Not possible because PCRE is not a regular expression language. (Chomsky hierarchy.) Unless you skip some features anyway.
Also, POSIX semantics for longest-match suck.
@mirabilos @sotolf @kabel42 @thedoctor @amin
Yes, I did mean POSIX RE and ERE.
Ok, now we're getting into liguistics and my hair is standing on end 😄
@mirabilos @sotolf @kabel42 @thedoctor @amin
Yes. I think the last pedant bus has left for the day, though. ;)
@mirabilos @sotolf @kabel42 @thedoctor @amin
yay for me 😂
@mirabilos @kabel42 @amin @sotolf @thedoctor
I was a total PCRE stan in the olden days, but I've steered more towards regular extended regexp for compatibility. I do miss \d, \w and \s, though. [[:space:]] feels so clumsy to type and use several times in a regex, I'll sometimes put a sp="[[:space:]]" line at the start of a script, and you'll see several invocations of "${sp}" in my regex strings.
But again... compatibility. ;)
Is there a big difference between (GNU) grep -P and pcregrep? I hadn't heard of that utility before.
@amin @kabel42 @rl_dane @sotolf @thedoctor I never used \d and the likes, always felt them much too complicated. I almost never use POSIX character classes (besides the BSD [[:<:]] and [[:>:]]), rather I just hit [ tab space ] quickly.
GNU grep -P does a PCRE grep, it doesn’t support all of the extra flags of pcregrep though, and before the version in IIRC trixie was very broken.
@mirabilos @amin @kabel42 @sotolf @thedoctor
is [[:<:]] and [[:>:]] the same as \< and \>?
@rl_dane @amin @kabel42 @sotolf @thedoctor obviously not, because it’s written differently ;)
re_format(7) knows:
There are two special cases** of bracket expressions: the bracket expres-
sions '[[:<:]]' and '[[:>:]]' match the null string at the beginning and
end of a word, respectively. A word is defined as a sequence of charac-
ters starting and ending with a word character which is neither preceded
nor followed by word characters. A word character is an alnum character
(as defined by ctype(3)) or an underscore. This is an extension, compati-
ble with but not specified by POSIX, and should be used with caution in
software intended to be portable to other systems.
(as for the mark:)
POSIX leaves some aspects of RE syntax and semantics open; '**' marks de-
cisions on these aspects that may not be fully portable to other POSIX
implementations.
The definition for \< / \> differs between less, perlre, pcre, … I believe, but they all are somewhat simiar.
@rl_dane @amin @kabel42 @sotolf @thedoctor perlre(1) actually has…
A word boundary ("\b") is a spot between two characters that
has a "\w" on one side of it and a "\W" on the other side of
it (in either order), counting the imaginary characters off
the beginning and end of the string as matching a "\W".
… so the \< probably comes from less(1)?
… hm, no. But, where then?
@mirabilos @amin @kabel42 @sotolf @thedoctor
I used to use \b a lot, but \< and \> are just as easy to use, and POSIX. ;)
\w is nice, though. I think the closest POSIX one is [[:graph:]]? (Not super close, though)
@rl_dane @amin @kabel42 @sotolf @thedoctor \< and \> are not POSIX.
perlre(1) \w is identical to POSIX [a-zA-Z0-9_] in the C locale, so [[:alnum:]_] if you have support for POSIX character classes.
@mirabilos @amin @kabel42 @sotolf @thedoctor
Ah, yes. [[:alnum:]] was the one I was thinking of.
@mirabilos @amin @kabel42 @sotolf @thedoctor
Waiiiiit, what does the underscore before the second bracket do? I've never seen that before.
No mention of it in RE_FORMAT(7) on FreeBSD.
[a-zA-Z0-9_], and I’d be surprised if the FreeBSD manpage would not document it@kabel42 @mirabilos @amin @sotolf @thedoctor
Doctor Strangepattern or: How I Learned to Stop Worrying and Love the Write-Once-Read-Never Nature of Regexp
@mirabilos @kabel42 @amin @sotolf @thedoctor
I think it's like almost any terse "programming" language where it takes some time to find the same neural pattern in your own head that produced it, so you can "remember" what you were doing. ^___^
In the past, I have literally used shell loops to construct regexp variables on the fly, rather than having completely incomprehensible "line noise" regexps. 😄
@sotolf @kabel42 @mirabilos @amin @thedoctor
I feel that. XD
@rl_dane @amin @kabel42 @sotolf @thedoctor let me blow your mind if that was news to you:
[[:alpha:][:digit:]_]
@kabel42 @mirabilos @amin @sotolf @thedoctor
in [[:alpha:]] the outer brackets denote the fact that you're defining a character class (terminology???), and the inner [:alpha:] is a character class/shortcut for [a-zA-Z].
Someone please correct my terminology.
[ switches from RE context to RE-Bracket context in the bracket-begin state, in which you can have an optional ^ (except in shellglobs where it is spelt !), then an optional ] not taken as the end of the RE-Bracket, then an optional -, then any amount of expressions of the type a-z, [:charclass:], [=equivalenceclass=], x, then an optional -, then a closing ] which terminates the RE-Bracket context.] or the - at the beginning, not both)@mirabilos @kabel42 @amin @sotolf @thedoctor
Ok re_format(7) is very terse when defining equivalence classes (#TIL!!!)
Are they just for visually/linguistically-similar characters, like "e" and "é"?
@mirabilos @amin @kabel42 @sotolf @thedoctor
Oh DUH. Ok. XD
@mirabilos @amin @kabel42 @sotolf @thedoctor
Fair point. I had seen people reproduce the header style before, so I wasn't sure if that was canonical.
@rl_dane @amin @kabel42 @sotolf @thedoctor the less(1) manpage is full of lies.
The older one in MirBSD:
/pattern
Search forward in the file for the N-th line containing the pat-
tern. N defaults to 1. The pattern is a regular expression, as
recognized by ed(1). The search starts at the second line displayed
No, less(1) uses different REs than ed(1), which uses POSIX BRE.
The newer one in Debian:
/pattern
Search forward in the file for the N-th line containing the pat‐
tern. N defaults to 1. The pattern is a regular expression, as
recognized by the regular expression library supplied by your
system. The search starts at the first line displayed (but see
Just as big a lie, glibc’s regexp (as documented by Linux man-pages) also does not support \< or \>.
@mirabilos @amin @kabel42 @sotolf @thedoctor
Really! I don't recall \< and \> ever not working for me.
grep(1).)@mirabilos @amin @kabel42 @sotolf @thedoctor
Hmm, I wonder if it would be different on Alpine Linux, as that's a relatively non-GNU distro.
@mirabilos @amin @kabel42 @sotolf @thedoctor
What's "MiNT?"