A tangent to my other thread: https://fosstodon.org/@ketmorco/109633724401350197

I've been thinking ( 🎵 A dangerous past time, I know 🎵 ) more, and I finally added entries to all the topics in my personal WaynesWiki.

And I came up with this Vim macro:

:4^M:r!comm <(sed -E '/- ([A-Z][a-z]+){2,}/,/^$/\!d;s/^- //;/^$/q' journal.txt | sort | uniq) <(egrep '\b([A-Z][a-z]+){2,}\b' -o journal.txt | sort | uniq) -1 -3 | shuf -n1^MI- ^[w*Nvip:sort^Mnyy+w*P2xyypVr=

Wayne Werner (@[email protected])

I think about #networks a lot. Not (necessarily) TCP/IP based networks, but networks in terms of edges and nodes, and especially the networks that us #HumanBean-s form. I was one of the lucky(?) ones who grew up with and without the Internet. That is, my first computer networking experience was dialing up to #BBS-es around central #Arkansas. As a smol child, I lacked funds to buy minutes so we were limited to whatever free time they provided. Mostly we just downloaded #shareware.

Fosstodon

As it turns out, that's not quite perfect, but that's a slightly different story...

The story that I want to tell is that for many of you, this is an indecipherable mess. To me right now it's pretty straight forward. To me in a week it will be odd. To me next year it's probably a mostly indecipherable mess!

But also it's ridiculously straightforward to explain what it does.

Go through my file and tell me all of the entries that I've reference but don't have a topic entry.

That's it.

I'm gonna meander on a sidetrack here off of my already sidetrack.

While I was writing entries for all of the topics that were missing entries, I realized that my knowledge is fairly fractal.

Most of the entries I ended out creating new references that didn't have topics! So I ended out just doing my best to leave myself indications that I might want to make some future links or something. So Rather Than Full References I just left the words capitalized like that.

The realization just how fractal my knowledgebase is was pretty interesting. And now back to our regularly scheduled program.

:4^M:r!comm <(sed -E '/- ([A-Z][a-z]+){2,}/,/^$/\!d;s/^- //;/^$/q' journal.txt | sort | uniq) <(egrep '\b([A-Z][a-z]+){2,}\b' -o journal.txt | sort | uniq) -1 -3 | shuf -n1^MI- ^[w*Nvip:sort^Mnyy+w*P2xyypVr=

Is incredibly basic and simple and straight forward but to grok it requires... not necessarily a fractal of knowledge, but knowledge of a lot of things!

Let's start with something that isn't exactly in this script, but is pretty essential to know. And that's the Unix philosophy that we should have programs that do one thing, and do it well, and (hopefully) are well documented.

There's a bit of a related idea that you should be liberal in what you accept, conservative in what you emit, but in our case that's not too terribly relevant.

If you're aware of those two things, then the next thing to know is #regex or Regular Expressions.

I mean, you don't even need to be particular familiar with them, just that they exist. And our regular expression is actually very simple!

egrep -o '\b([A-Z][a-z]+){2,}\b'

If you man egrep you'll know that's the grep that gives you extended regular expressions which honestly are only slightly helpful here. And the `-o` only emits the match. I'm guessing that most of you could even puzzle out what this does, knowing my original description (and if you've read up on my approach to Wiki).

[A-Z] matches any character [A-Z]. [a-z] gives you any lowercase ASCII letter. And the + means that you need one or more. The () groups, and the {2,} means match that two or more times.

So:

FooBar matches
Foo2Bar does not
FOOBar does not
FBar does not

Feel free to vote in this poll:

I already knew regex, this was easy
75.7%
I didn't know regex, but this was easy
5.4%
I didn't know regex, but this was confusing.
18.9%
Poll ended at .

That's actually all it takes to get all of the WikiWords/PascalCase/CamelCase words out of my journal file.

The next part actually took me quite a while and honestly was pretty shocking, since this entire thing took me like 26 lines including blank lines to do it in Python.

I wanted to find my TableOfContents at the top of the file

---

This doesn't matter

- ButThisIsTheStart
- AndGetThis
- ThisToo

But not this
OrThis
- OrThis
- OrAnyOfThese

---

Food time, and then another sidetrack!

#ed is the standard editor. I'm https://www.gnu.org/fun/jokes/ed-msg.en.html was the actual man page for ed years old.

From what I understand, grep is from the ed `g/re/p` pattern - g for global, re being your regex, and p being print. Fun times!

Well, ed only operated on actual files, but there was a need to do the same thing but for streams. And wouldn't you know it, the #UnixPhilosophy pops up again: let's call the program `sed` for Stream ed!

In my case, I only want to display the first block of matches.

Ed, man! !man ed- GNU Project - Free Software Foundation (FSF)

This is a bit of a challenge, since #sed only operates on *lines*. So I can't just say "give me line A to line B". But I *can* say `sed -n '/match A/,/match B/p'` and sed will give me everything from match A to match B and then quit. Sort of.

It will actually give me everything from *every* match A to match B. Which I didn't quite understand, so trying to do `sed -En '/- ([A-Z][a-z]+){2,}/,/^$/p'` ended out giving me every first line that started with `- WikiText` to every blank line. Oops!

The Internet finally helped out here but reading `man sed`, well.. it wouldn't have helped much.

"d Delete pattern space. Start next cycle"

So what I actually am doing with #sed is deleting everything *not* in the pattern space. That leaves me with only the table of contents. But cool enough, I can also go ahead and strip off the `- `, since I'm here. Finally, I just want until `match B` and then q(uit).

Breaking down my #sed command:

sed -E # give me extended regex
/- ([A-Z][a-z]+){2,}/,/^$/!d; # find everything starting at a '- WikiCase' ending at a blank line. Delete everything else
s/^- //; # remove the '- ' from the start of line
/^$/q # quit at the first blank line

Oh! I just remembered another sidetrack. sed uses the same "keybindings" as ed. So `s/^- //` would remove the starting hyphen from the current line in #ed

Now remember - so far all of this is just to get two lists - my TableOfContents and all of the WikiWords that exist in my file.

If you use #Linux or #Unix for any period of time (or #GnuLinux - where my pedants at?), you will probably have come across `sort`.

Now we get to talk about streams and files!

Everything in Linux is a file. Which is shockingly powerful. Input from the keyboard? A file you can read from. Output to the screen? A file you can write to. #stdin and #stdout are their names

You can even check `man stdin` and `man stdout` on most systems.

Anyway, since these are files, you can just chain them together. In Linux you're probably familiar with #redirection and #piping. > and | are the characters that you might have seen associated with them. But that's really just because they're ascii chars that were common enough on keyboards and kind of looked like the things we wanted to do.

Relevant side note: Did you know that ctrl+h is backspace? Try it in your terminal!

It's not the only ctrl code. ctrl+[ is escape. ctrl+m is return. ctrl+c sends the ASCII "end of text" symbol. You can type these literal characters in Vim:

ctrl+v, ctrl+c
ctrl+v, ctrl+[
ctrl+v, ctrl+m

You'll see ^C^[^M

Save your file and then send them through the hex dump tool #xxd - :%!xxd

(to get back :%!xxd -r)

Freaking sweet, huh? Who knew that #ASCII held such secrets?!

Okay, so redirecting to redirection, you may have seen something like `cat something.txt >/dev/null` #devnull ...

#devnull is cool. It's just an empty file forever. You can throw as much as you want into there, or read from it as much as you want, you're always going to get nothing back.

/dev/null is as empty as my soul.

Anyway.

So you can redirect your output, and you can pipe your output. Piping is useful when you want to send output from one command to any other command that operates on streams like say, stdin. You can try it like this:

echo "hey dude" | python -c 'print("cool", input())'

`input` in #python read from #stdin. It's pretty straight forward.

But this is how *most* tools work in Linux. Often times you can give them data on stdin instead of in a file. In Python you might do it like

import sys

if len(sys.argv) > 1:
with open(sys.argv[0]) as f:
data = f.read()
else:
data = sys.stdin.read()

You've gotta .read() from stdin, since `input` just looks for the first newline. Anyway. Piping is a very fundamental part of Linux command line.

Now comes the fun part. I knew already of `sort` and `uniq`. If you check the man pages for things there's often a "SEE ALSO" section where you can check for other tools. Eventually I came across `shuf` and `comm`.

Now, comm was a bit weird because I know how to pipe output from a single command into something that expects a single stream, easy peasy. But `comm` wanted *two* files. And there's not stdin_1 and stdin_2, so what do we do *there*? I had kind of an inkling...

Rather than download vim (for the billionth time) on a docker container, I learned to combine here-documents and shell redirection to "edit" files:

cat <<something >somefile.txt
this is the contents
of my file
here!
something

It doesn't really matter what something is, but EOF is somewhat helpful. You could use #fnord or booger or ShellsAreSuperpowerful, whatever!

Anyway, I knew that it was possible to do some sort of thing like that but I wasn't sure *what*.

A quick search for `shell redirection multiple commands` ended out with the syntax:

cmd <(cmd_one) <(cmd_two).

This creates some file descriptors, as I just learned by doing `python -c 'import sys; print(sys.argv)' <(echo 'hi') <(echo "bye")`

(Took me a couple of tries to get that right)

So comm, which wants two files, can be satisfied simply by wrapping my sed command and my egrep command like so:

comm <(sed ...) <(egrep ...)

Fancy!

That gets us three lists, though!

Fortunately, the manpage for comm comes through:

-1 ignores what's just in file 1
-2 ignores just what's in file 2
-3 ignores what's common to *both* files.

Oh! but... comm needs sorted input! Fortunately we already know about that though, so

comm <(sed ... | sort | uniq) <(egrep ... | sort | uniq)

Sorts that all out.

#shell #sort #uniq #linux #comm #PunInTenDid

Of course, I could just use this, but I don't like going in order so let's `comm ... | shuf -n1` to give me a random, single, missing element.

Now if we just call this all SHELL and handwave it away and just look at our vim macro, things get way less complicated!

:4^M:r!SHELL^MI- ^[w*Nvip:sort^Mnyy+w*P2xyypVr=

Assuming you don't know anything about vim, these are just keystrokes. Except ^[ & friends are achieved with the aforementioned ctrl+v, ctrl+[.

#vim #ascii #EscapeSequence