File this under #shell #functions I should have written years ago:

function grepc { #Do a grep -c, but skipping files with no results grep -c "$@" |grep -v ':0$' }

#unix #UnixShell #ShellScripting #bash #ksh

@rl_dane

Oh, didn't know about -c. I usually just pipe to wc -l I guess.

@amin

-c, -l, -h, -H, and -q are my favorite #grep flags. :D

Huh, that almost became a [Marcel Duchamp] reference. 😅

Marcel Duchamp - Wikipedia

@rl_dane

I just use -v and -E

@amin @rl_dane you guys use flags?... :p
@amin @rl_dane @sotolf You guys still use grep instead of ripgrep. Tst

@thedoctor @amin @sotolf

...and bash instead of zsh
...and grep/awk/sed instead of jq
...and firefox instead of chrome
...and the fediverse instead of facebook

Face it... I'm an unpopular-opinion neckbeard level boss. XD

cc: @mirabilos

@rl_dane Those are so not comparable!

@amin @sotolf @mirabilos

@thedoctor @rl_dane @amin @mirabilos At least bash and zsh is comparable to grep ripgrep, as zsh is just a strictly better bash ;)

@sotolf @thedoctor @rl_dane @mirabilos

Mm, not really though? ripgrep is meant for bulk grepping of files

@amin @thedoctor @rl_dane @mirabilos I think I had it installed, I just never remembered to use it :p

@sotolf @thedoctor @rl_dane @mirabilos

I mostly just use it to run rg TODO and see all the spots in a codebase I marked as still needing work.

@amin @thedoctor @rl_dane @mirabilos I don't think I have anything big enough to warrant that, the biggest thing I have made is probably less than 5k lines of code, and nicely compartmentalised :p also stupid simple.

@sotolf @thedoctor @rl_dane @mirabilos

Well more it's for "this function works for now but doesn't handle an edge case; I'll mark it so I remember to come back later". Something of a to-do list for edge case stuff.

@amin @thedoctor @rl_dane @mirabilos Ah yeah, I'm not the main person using these things, so mostly when something comes in they have just gotten an error, or they remember what they did, so I redo the thing, and read logs, they mostly point me to what is broken :p

@amin @sotolf @thedoctor @mirabilos

Why is ripgrep better than just grep -R?

@amin @sotolf @thedoctor @mirabilos @rl_dane because it's written in rust, silly

@amin @sotolf @paul @thedoctor @rl_dane yeah…

It can also be faster, but if you know your grep options (e.g. -F) the difference gets much much smaller than someone elee said here. So, no need.

@rl_dane @mirabilos @amin @sotolf @paul @thedoctor Auto detecting fixed strings could be a thing, yes. :)
But that would be backwards compatible 
@rl_dane @amin @sotolf @thedoctor @mirabilos it's somehow a lot faster if you want to grep a few GiB of code, like 15 minutes to 30 seconds

@kabel42 @amin @sotolf @thedoctor @mirabilos

Interesting! I wonder what kind of algorithmic optimizations (as opposed to compiler optimizations) they're using to do that, and if regular (GNU/BSD) grep could do the same.

Because I'll wear clown shoes and a tutu before changing to a "rewrite the world in rust!" utility 😂

@rl_dane @amin @sotolf @thedoctor @mirabilos From what little i have read, some assumptions about what you are greping and different defaults. Doing the same in existing grep would probably break compatibility.

@kabel42 @rl_dane @amin @sotolf @thedoctor eww, it’s not even a drop-in then…

(For not-a-drop-in, I found pcregrep interesting. Sadly, Debian recently dropped it, but in the versions which don’t have pcregrep any more, you can use grep -P for many use cases. pcre2grep is not a drop-in for pcregrep either…)

@mirabilos @rl_dane @amin @sotolf @thedoctor no, its explicitly not

@kabel42 @mirabilos @amin @sotolf @thedoctor

"transpiling" from PCRE to normal regex sounds deliciously cursèd. XD

@rl_dane @sotolf @kabel42 @thedoctor @amin TINSTA "normal" regex. Do you mean POSIX ERE?

Not possible because PCRE is not a regular expression language. (Chomsky hierarchy.) Unless you skip some features anyway.

Also, POSIX semantics for longest-match suck.

@mirabilos @sotolf @kabel42 @thedoctor @amin

Yes, I did mean POSIX RE and ERE.

Ok, now we're getting into liguistics and my hair is standing on end 😄

@rl_dane @sotolf @kabel42 @thedoctor @amin you mean POSIX BRE and POSIX ERE?

@mirabilos @sotolf @kabel42 @thedoctor @amin

Yes. I think the last pedant bus has left for the day, though. ;)

@mirabilos @kabel42 @amin @sotolf @thedoctor

I was a total PCRE stan in the olden days, but I've steered more towards regular extended regexp for compatibility. I do miss \d, \w and \s, though. [[:space:]] feels so clumsy to type and use several times in a regex, I'll sometimes put a sp="[[:space:]]" line at the start of a script, and you'll see several invocations of "${sp}" in my regex strings.

But again... compatibility. ;)

Is there a big difference between (GNU) grep -P and pcregrep? I hadn't heard of that utility before.

@amin @kabel42 @rl_dane @sotolf @thedoctor I never used \d and the likes, always felt them much too complicated. I almost never use POSIX character classes (besides the BSD [[:<:]] and [[:>:]]), rather I just hit [ tab space ] quickly.

GNU grep -P does a PCRE grep, it doesn’t support all of the extra flags of pcregrep though, and before the version in IIRC trixie was very broken.

@mirabilos @amin @kabel42 @sotolf @thedoctor

is [[:<:]] and [[:>:]] the same as \< and \>?

@rl_dane @amin @kabel42 @sotolf @thedoctor obviously not, because it’s written differently ;)

re_format(7) knows:

There are two special cases** of bracket expressions: the bracket expres- sions '[[:<:]]' and '[[:>:]]' match the null string at the beginning and end of a word, respectively. A word is defined as a sequence of charac- ters starting and ending with a word character which is neither preceded nor followed by word characters. A word character is an alnum character (as defined by ctype(3)) or an underscore. This is an extension, compati- ble with but not specified by POSIX, and should be used with caution in software intended to be portable to other systems. (as for the mark:) POSIX leaves some aspects of RE syntax and semantics open; '**' marks de- cisions on these aspects that may not be fully portable to other POSIX implementations.

The definition for \< / \> differs between less, perlre, pcre, … I believe, but they all are somewhat simiar.

@rl_dane @amin @kabel42 @sotolf @thedoctor perlre(1) actually has…

A word boundary ("\b") is a spot between two characters that has a "\w" on one side of it and a "\W" on the other side of it (in either order), counting the imaginary characters off the beginning and end of the string as matching a "\W".

… so the \< probably comes from less(1)?

… hm, no. But, where then?

@mirabilos @amin @kabel42 @sotolf @thedoctor

I used to use \b a lot, but \< and \> are just as easy to use, and POSIX. ;)

\w is nice, though. I think the closest POSIX one is [[:graph:]]? (Not super close, though)

@rl_dane @amin @kabel42 @sotolf @thedoctor \< and \> are not POSIX.

perlre(1) \w is identical to POSIX [a-zA-Z0-9_] in the C locale, so [[:alnum:]_] if you have support for POSIX character classes.

@mirabilos @amin @kabel42 @sotolf @thedoctor

Ah, yes. [[:alnum:]] was the one I was thinking of.

@mirabilos @amin @kabel42 @sotolf @thedoctor

Waiiiiit, what does the underscore before the second bracket do? I've never seen that before.

No mention of it in RE_FORMAT(7) on FreeBSD.

@rl_dane @amin @kabel42 @sotolf @thedoctor the exact same thing as the underscore in [a-zA-Z0-9_], and I’d be surprised if the FreeBSD manpage would not document it
@mirabilos @rl_dane @amin @sotolf @thedoctor yay clear and unmistakable syntax 

@kabel42 @mirabilos @amin @sotolf @thedoctor

Doctor Strangepattern or: How I Learned to Stop Worrying and Love the Write-Once-Read-Never Nature of Regexp

@rl_dane @amin @kabel42 @sotolf @thedoctor let me blow your mind if that was news to you:

[[:alpha:][:digit:]_]

@mirabilos @rl_dane @amin @sotolf @thedoctor yay context sensitive [], there is no way that can go wrong \s
@rl_dane @amin @kabel42 @sotolf @thedoctor (also, though capitalised in the header, manpage names are case-sensitive)

@mirabilos @amin @kabel42 @sotolf @thedoctor

Fair point. I had seen people reproduce the header style before, so I wasn't sure if that was canonical.

@rl_dane @amin @kabel42 @sotolf @thedoctor the less(1) manpage is full of lies.

The older one in MirBSD:

/pattern Search forward in the file for the N-th line containing the pat- tern. N defaults to 1. The pattern is a regular expression, as recognized by ed(1). The search starts at the second line displayed

No, less(1) uses different REs than ed(1), which uses POSIX BRE.

The newer one in Debian:

/pattern Search forward in the file for the N-th line containing the pat‐ tern. N defaults to 1. The pattern is a regular expression, as recognized by the regular expression library supplied by your system. The search starts at the first line displayed (but see

Just as big a lie, glibc’s regexp (as documented by Linux man-pages) also does not support \< or \>.

@mirabilos @amin @kabel42 @sotolf @thedoctor

Really! I don't recall \< and \> ever not working for me.

@rl_dane @amin @kabel42 @sotolf @thedoctor then, my dear, you’re suffering from GNU extensions. (Probably. I still haven’t figured out where it came from. No manpage on my Debian I tried documents it, other than grep(1).)

@mirabilos @amin @kabel42 @sotolf @thedoctor

Hmm, I wonder if it would be different on Alpine Linux, as that's a relatively non-GNU distro.

@rl_dane @amin @kabel42 @sotolf @thedoctor nah, busybox is full-on GNU compatible
@rl_dane @amin @kabel42 @sotolf @thedoctor also, please write Alpine Linux, so I don’t confuse it with the MUA. Thanks.

@kabel42 @amin @sotolf @thedoctor @mirabilos

Ah, so what you are describing is:

it-could-have-just-been-a-shell-alias-but-we-are-cultic-rust-stans,-so-we-will-scream-and-harp-about-how-amazing-our-sh\t*-smells

Got it.

@rl_dane @amin @sotolf @thedoctor @mirabilos no, i think it targets a different use case
@kabel42 @rl_dane @amin @thedoctor @mirabilos Ah, so it's basically cheating, I mean, it does work, and I do it often when I create small tools, with the excuse that "It wasn't meant for that"
@sotolf @rl_dane @amin @thedoctor @mirabilos Is it cheating, if it is the second sentence in you README.md?
"ripgrep will respect gitignore rules and automatically skip hidden files/directories and binary files. (To disable all automatic filtering by default, use rg -uuu.)"
or, you didn't want to grep in .git anyway you are just too lazy to look up the flag to skip that
@kabel42 @rl_dane @amin @thedoctor @mirabilos All optimisation are just different ways of cheating ;)