Mastodawn

Amin, minor deity of the legume realm Feb 4

@rl_dane

Oh, didn't know about -c. I usually just pipe to wc -l I guess.

-c, -l, -h, -H, and -q are my favorite #grep flags. :D

Huh, that almost became a [Marcel Duchamp] reference. 😅

Marcel Duchamp - Wikipedia

Amin, minor deity of the legume realm Feb 4

@rl_dane

I just use -v and -E

sotolf Feb 4

@amin @rl_dane you guys use flags?... :p

thedoctor Feb 4

@amin @rl_dane @sotolf You guys still use grep instead of ripgrep. Tst

@thedoctor @amin @sotolf

...and bash instead of zsh
...and grep/awk/sed instead of jq
...and firefox instead of chrome
...and the fediverse instead of facebook

Face it... I'm an unpopular-opinion neckbeard level boss. XD

cc: @mirabilos

thedoctor Feb 5

@rl_dane Those are so not comparable!

@amin @sotolf @mirabilos

sotolf Feb 5

@thedoctor @rl_dane @amin @mirabilos At least bash and zsh is comparable to grep ripgrep, as zsh is just a strictly better bash ;)

Amin, minor deity of the legume realm Feb 5

@sotolf @thedoctor @rl_dane @mirabilos

Mm, not really though? ripgrep is meant for bulk grepping of files

sotolf Feb 5

@amin @thedoctor @rl_dane @mirabilos I think I had it installed, I just never remembered to use it :p

Amin, minor deity of the legume realm Feb 5

@sotolf @thedoctor @rl_dane @mirabilos

I mostly just use it to run rg TODO and see all the spots in a codebase I marked as still needing work.

@amin @sotolf @thedoctor @mirabilos

Why is ripgrep better than just grep -R?

@kabel42 @amin @sotolf @thedoctor @mirabilos

@rl_dane @amin @sotolf @thedoctor @mirabilos it's somehow a lot faster if you want to grep a few GiB of code, like 15 minutes to 30 seconds

Interesting! I wonder what kind of algorithmic optimizations (as opposed to compiler optimizations) they're using to do that, and if regular (GNU/BSD) grep could do the same.

Because I'll wear clown shoes and a tutu before changing to a "rewrite the world in rust!" utility 😂

@rl_dane @amin @sotolf @thedoctor @mirabilos From what little i have read, some assumptions about what you are greping and different defaults. Doing the same in existing grep would probably break compatibility.

@kabel42 @rl_dane @amin @sotolf @thedoctor eww, it’s not even a drop-in then…

(For not-a-drop-in, I found pcregrep interesting. Sadly, Debian recently dropped it, but in the versions which don’t have pcregrep any more, you can use grep -P for many use cases. pcre2grep is not a drop-in for pcregrep either…)

I was a total PCRE stan in the olden days, but I've steered more towards regular extended regexp for compatibility. I do miss \d, \w and \s, though. [[:space:]] feels so clumsy to type and use several times in a regex, I'll sometimes put a sp="[[:space:]]" line at the start of a script, and you'll see several invocations of "${sp}" in my regex strings.

But again... compatibility. ;)

Is there a big difference between (GNU) grep -P and pcregrep? I hadn't heard of that utility before.

@amin @kabel42 @rl_dane @sotolf @thedoctor I never used \d and the likes, always felt them much too complicated. I almost never use POSIX character classes (besides the BSD [[:<:]] and [[:>:]]), rather I just hit [ tab space ] quickly.

GNU grep -P does a PCRE grep, it doesn’t support all of the extra flags of pcregrep though, and before the version in IIRC trixie was very broken.

is [[:<:]] and [[:>:]] the same as \< and \>?

@rl_dane @amin @kabel42 @sotolf @thedoctor obviously not, because it’s written differently ;)

re_format(7) knows:

     There are two special cases** of bracket expressions: the bracket expres-
     sions '[[:<:]]' and '[[:>:]]' match the null string at the beginning and
     end of a word, respectively. A word is defined as a sequence of charac-
     ters starting and ending with a word character which is neither preceded
     nor followed by word characters. A word character is an alnum character
     (as defined by ctype(3)) or an underscore. This is an extension, compati-
     ble with but not specified by POSIX, and should be used with caution in
     software intended to be portable to other systems.


(as for the mark:)
     POSIX leaves some aspects of RE syntax and semantics open; '**' marks de-
     cisions on these aspects that may not be fully portable to other POSIX
     implementations.

The definition for \< / \> differs between less, perlre, pcre, … I believe, but they all are somewhat simiar.

@rl_dane @amin @kabel42 @sotolf @thedoctor perlre(1) actually has…

     A word boundary ("\b") is a spot between two characters that
     has a "\w" on one side of it and a "\W" on the other side of
     it (in either order), counting the imaginary characters off
     the beginning and end of the string as matching a "\W".

… so the \< probably comes from less(1)?

… hm, no. But, where then?

I used to use \b a lot, but \< and \> are just as easy to use, and POSIX. ;)

\w is nice, though. I think the closest POSIX one is [[:graph:]]? (Not super close, though)

@rl_dane @amin @kabel42 @sotolf @thedoctor \< and \> are not POSIX.

perlre(1) \w is identical to POSIX [a-zA-Z0-9_] in the C locale, so [[:alnum:]_] if you have support for POSIX character classes.

Ah, yes. [[:alnum:]] was the one I was thinking of.

@rl_dane @amin @kabel42 @sotolf @thedoctor but [[:alnum:]_]

Waiiiiit, what does the underscore before the second bracket do? I've never seen that before.

No mention of it in RE_FORMAT(7) on FreeBSD.

@rl_dane @amin @kabel42 @sotolf @thedoctor the exact same thing as the underscore in [a-zA-Z0-9_], and I’d be surprised if the FreeBSD manpage would not document it

@rl_dane @amin @kabel42 @sotolf @thedoctor let me blow your mind if that was news to you:

[[:alpha:][:digit:]_]

@mirabilos @rl_dane @amin @sotolf @thedoctor yay context sensitive [], there is no way that can go wrong \s

@kabel42 @rl_dane @amin @sotolf @thedoctor it’s actually not, the first unescaped [ switches from RE context to RE-Bracket context in the bracket-begin state, in which you can have an optional ^ (except in shellglobs where it is spelt !), then an optional ] not taken as the end of the RE-Bracket, then an optional -, then any amount of expressions of the type a-z, [:charclass:], [=equivalenceclass=], x, then an optional -, then a closing ] which terminates the RE-Bracket context.

@kabel42 @rl_dane @amin @sotolf @thedoctor (I erred: you can have either the ] or the - at the beginning, not both)

mirabilos

@kabel42 @rl_dane @amin @sotolf @thedoctor (and I forgot collating elements, which is totally fucked up, [a[.ch.]] in e.g. es_ES.UTF-8 matches either a or ch, so a bracket expression in POSIX has a variable matching length…)

@mirabilos @rl_dane @amin @sotolf @thedoctor yeah, i hate it

@kabel42 @rl_dane @amin @sotolf @thedoctor these are rare-to-never-used features, thankfully

@kabel42 @rl_dane @amin @sotolf @thedoctor tbh the only time I use something other than simple chars and ranges in bracket expressions is the BSD [[:<:]] and [[:>:]] extension (which matches a zero-length string)

@mirabilos @rl_dane @amin @sotolf @thedoctor as in '^$'?

@kabel42 @rl_dane @amin @sotolf @thedoctor no, the zero-length string between a nōn-word‑ and a word character

@kabel42 @rl_dane @amin @sotolf @thedoctor see above in the thread, where I posted the relevant excerpt…

@mirabilos @rl_dane @amin @sotolf @thedoctor nōn-word?

https://toot.mirbsd.org/@mirabilos/statuses/01KGZBC68X0E7AQEK692CGSK65

@kabel42 @rl_dane @amin @sotolf @thedoctor (does your font lack the ‑ ?)

@mirabilos @rl_dane @amin @sotolf @thedoctor 404

@kabel42 @rl_dane @amin @sotolf @thedoctor no, in my post

@kabel42 @rl_dane @amin @sotolf @thedoctor and, duh, it’s a Fediverse link, you copy/paste it into the Search form of your client to read it, not the browser…

… I wish GtS would go on and support web+ap://…

@kabel42 @mirabilos @amin @sotolf @thedoctor

@mirabilos @rl_dane @amin @sotolf @thedoctor i still have no idea what an "ō" is

I think he's using the German keyboard on his phone.

@rl_dane @kabel42 @amin @sotolf @thedoctor no, he’s using the MirKeyboardLayout on his laptop

@rl_dane @mirabilos @amin @sotolf @thedoctor no, that's not a normal character in german, only äöüÄÖÜß

@kabel42 @rl_dane @amin @sotolf @thedoctor and ẞ since a few years ago

sotolf Feb 9

@rl_dane @kabel42 @mirabilos @amin @thedoctor german keyboards don't have that letter :p

@rl_dane @kabel42 @mirabilos @amin

thedoctor Feb 9

@sotolf Definitely not. I can readily do ø and Ø but not that one.

sotolf Feb 9

@thedoctor @rl_dane @kabel42 @mirabilos @amin ō, on mine that's a dead key on alt-gr ´ and then o, on qwertz I wouldn't even know :p

@kabel42 @rl_dane @amin @sotolf @thedoctor a long o

@kabel42 @rl_dane @amin @sotolf @thedoctor like in ad-hōc network

@mirabilos @rl_dane @amin @sotolf @thedoctor never seen that in normal text. So pronounced like like noon?

@kabel42 @rl_dane @amin @sotolf @thedoctor noon is pronounced as nun

@mirabilos @rl_dane @amin @sotolf @thedoctor how?

sotolf Feb 9

@kabel42 @mirabilos @rl_dane @amin @thedoctor I've seen macrons used quite a lot in transliterating japanese, never outside of that though.

@kabel42 @rl_dane @amin @sotolf @thedoctor (here I miss the ability to write XHTML on Fedi, this should have been pronounced as <span xml:lang="de">nun</span>)

@kabel42 @rl_dane @amin @sotolf @thedoctor ō wie das O in Ofen

@mirabilos @rl_dane @amin @sotolf @thedoctor weder kurz noch lang?

@kabel42 @rl_dane @amin @sotolf @thedoctor lang, wie in Ofen, nicht kurz wie in offen

@mirabilos @rl_dane @amin @sotolf @thedoctor lang wäre oofen oder ohfen oder so

@kabel42 @rl_dane @amin @sotolf @thedoctor dann machs halt was länger als dein Ofen-O, wenn du das so kürzt…

@kabel42 @mirabilos @amin @sotolf @thedoctor

@kabel42 @rl_dane @amin @sotolf @thedoctor ist jedenfalls zweimorig, aber nicht verlängert

Basically spaces and punctuation.

@rl_dane @kabel42 @amin @sotolf @thedoctor no, literally [^a-zA-Z0-9_]

@rl_dane @kabel42 @amin @sotolf @thedoctor so everything else, including control characters

@mirabilos @rl_dane @amin @sotolf @thedoctor and ^ here is negation?

@kabel42 @rl_dane @amin @sotolf @thedoctor no, [^char-class] matches “any single character, other than newline, not in char-class”

@mirabilos @rl_dane @amin @sotolf @thedoctor yeah, basically what i meant except for the newline maybe

@kabel42 @rl_dane @amin @sotolf @thedoctor yea, I’m just pedantic.

In the RE ^foo[^bar^]baz$ there technically are exactly two carets.

@kabel42 @rl_dane @amin @sotolf @thedoctor this is important when you want to include a ] or - in a bracket expression, and for the newline ofc.

Don't you have to backslash escape a right bracket, like [a-z\]]?

@mirabilos @sotolf @thedoctor @amin @kabel42

@sotolf @thedoctor @amin @rl_dane @kabel42 not if it’s the first character of a bracket expression, like []a-z]

Ahhhh, good to know. Mentally filed. ;)

@mirabilos @kabel42 @amin @thedoctor @sotolf

@kabel42 @amin @thedoctor @sotolf @rl_dane I often go through logs by first cutting off timestamp
and host using rectangle mode in jupp, then replacing ^([^ ]*)\[[^]]*\]: with \1: and sort -uing.

I’ve also used [][0-9a-fA-F:] to match IP addresses…

I love editors with rectangle selection and editing modes. vim has it, and my first exposure to it was actually in Microsoft Word 4.0 for mac. Obviously not something I use today. XD

kabel42 Feb 9

@rl_dane @mirabilos @amin @thedoctor @sotolf kate had that for a time and now i can't find it anymore... :(

@kabel42 @mirabilos @amin @thedoctor @sotolf

@rl_dane @kabel42 @sotolf @thedoctor @amin jupp :p

Looking online... is it ctrl+shift+B?

@kabel42 @mirabilos @amin @sotolf @thedoctor

@thedoctor @rl_dane @kabel42 @sotolf @amin why not?

@bentsukun made the first editions of the MirBSD flyers in Quark Xpress on MacOS.

Aye. You can use it in bracket expressions, even with character classes, like:

[^0-9]          Everything but digits
[^[:space:]]    Everything but spaces

@rl_dane @kabel42 @amin @sotolf @thedoctor nope, it’s part of the brackets and doesn’t stand on its own

The caret? Outside of brackets, the caret matches the beginning of the line.

@rl_dane @kabel42 @amin @sotolf @thedoctor yes, and inside of brackets, the caret matches a single caret.

@rl_dane @kabel42 @amin @sotolf @thedoctor there are two different kinds of bracket expressions, one goes [^foo] and the other goes [foo]

(as long as it's not the first character after the [, kind of like - is a hyphen character only if it's the last character before the ]) 😅

Everything is fine.

@sotolf @thedoctor @amin @kabel42 @rl_dane if the first character after a [ is a ^ it’s not technically inside the bracket expression (this IS important for ] and -)

@rl_dane @mirabilos @amin @sotolf @thedoctor turns out its actually the same in python, and i have no idea why i was expecting negation to be ! 🤷

@kabel42 @rl_dane @amin @sotolf @thedoctor Python and py3k use PCRE.

Shell globs have [!…] for negated bracket expressions.