Galatia

Do you recall back in 2023 where I mentioned failing my C programming class in college? So long ago! Going into this holiday break (a two week vacation for me), I got bored and picked up the old source code and input file and I finished the assignment…on the 30th anniversary of my Incomplete.

Assignment:
Based on knowledge learned through the semester with file management, text processing, memory allocation, data structures, B-Trees, linked-lists, and so on, write a program that can take a text file representing a book of the bible and produce a concordance of the important words, listing each word in alphabetical order with a list of references for each word in book/chapter/verse order. Extra credit if you include a parser to stem the words (instead of “write”, “writes”, “wrote”, you get “write”).

My book was actually Galatians, not Ephesians like I recalled earlier, but whatever.

Input format (sample snippet, no newlines):

@$GAL@ 01:01 Paul, an apostle - sent not from men nor by man, but by Jesus Christ and God the Father, who raised him from the dead -

My old code was written for Borland C on Windows 3 to be run on the command line. That’s how old this code is. It had decent bones, and I gave it an honest try in 1994, but I just couldn’t get the bones to stick together. Something-something about my obvious misunderstanding of fundamentals like pointers, recursion, source file flow, something-something.

Once I got the gcc build chain up on my Linux box and got VSCode going, I tried building what I had. There were so many dependency issues and syntax errors, I moved everything aside and rebuilt the code from the ground up, using old pieces to build new files by the lessons I learned a decade ago when I built my JX3P Tape Dump Decoder tool (also written in C).

20 years of professional and hobby code development has taught me so much more than I ever could’ve grokked in 4 months at that thumb-headed age of 22.

The new code did it proper:

  • makefile with real and .phony targets (all, clean)
  • *.c source files under ./src/
  • *.h headers under ./include/
  • *.o object files under ./obj/
  • source management with git

Invocation, with explicit source and destination files (can also read from stdin and print to stdout for piping):

user@host:~/concordance$ ./concordance books/galatians.txt final-gal.txt

Sample output from final-gal.txt:

gained GAL 2:21Galatia GAL 1:2Galatians GAL 3:1gave GAL 1:4, 2:9, 2:20, 3:18Gentile GAL 2:14, 2:15Gentiles GAL 1:16, 2:2, 2:7, 2:8, 2:9, 2:12, 2:12, 2:14, 3:8, 3:14gentleness GAL 5:23gently GAL 6:1get GAL 1:18, 4:30give GAL 2:5, 3:5, 6:9given GAL 2:9, 3:14, 3:21, 3:22, 3:22, 4:15glad GAL 4:27glory GAL 1:5go GAL 1:17, 2:9, 5:12goal GAL 3:3God GAL 1:1, 1:3, 1:4, 1:10, 1:13, 1:15, 1:20, 1:24, 2:6, 2:8, 2:19, 2:20, 2:21, 3:5, 3:6, 3:8, 3:11, 3:17, 3:18, 3:20, 3:21, 3:26, 4:4, 4:6, 4:7, 4:8, 4:9, 4:9, 4:14, 5:21, 6:7, 6:16gods GAL 4:8good GAL 4:17, 4:18, 5:7, 6:6, 6:9, 6:10, 6:12goodness GAL 5:22gospel GAL 1:6, 1:7, 1:7, 1:8, 1:9, 1:11, 2:2, 2:5, 2:7, 2:14, 3:8, 4:13grace GAL 1:3, 1:6, 1:15, 2:9, 2:21, 3:18, 5:4, 6:18gratify GAL 5:16Greek GAL 2:3, 3:28group GAL 2:12guardians GAL 4:2

As an aside, I grabbed the entire bible from Gutenberg.org and modified the formatting to fit the concordance parser, and — hoo-boy — it took 5 minutes for a single thread to chew through that 4MB text file to produce a 3MB concordance. Mighty. Just look at that sample output:

account 1CH 27:24; 2CH 26:11; JOB 33:13; PSA 144:3; ECC 7:27; MAT 12:36, 18:23; LUK 16:2; ACT 19:40; ROM 14:12; 1CO 4:1; PHI 1:18, 4:17; HEB 13:17; 1PE 4:5; 2PE 3:15; 2KA 12:4accounted DEU 2:11, 2:20; 1KI 10:21; 2CH 9:20; PSA 22:30; ISA 2:22; MAR 10:42; LUK 20:35, 21:36, 22:24; ROM 8:36; GAL 3:6Accounting HEB 11:19accounts DAN 6:2accursed DEU 21:23; JOS 6:17, 6:18, 6:18, 6:18, 7:1, 7:1, 7:11, 7:12, 7:12, 7:13, 7:13, 7:15, 22:20; 1CH 2:7; ISA 65:20; ROM 9:3; 1CO 12:3; GAL 1:8, 1:9accusation JUD 1:9; EZR 4:6; MAT 27:37; MAR 15:26; LUK 6:7, 19:8; JOH 18:29; ACT 25:18; 1TI 5:19; 2PE 2:11accuse PRO 30:10; MAT 12:10; MAR 3:2; LUK 3:14, 11:54, 23:2, 23:14; JOH 5:45, 8:6; ACT 24:2, 24:8, 24:13, 25:5, 25:11, 28:19; 1PE 3:16

I included code to preserve capitalization if words appear to be known names, and bring capitalized words to lowercase of they’re seen in lowercase elsewhere (useful for words first seen at the start of sentences). I didn’t include any stemming code, so no extra credit; the parser is naive. And I did cheat a little and use the string search/case/copy methods available in the gcc stdlib, and I don’t feel guilty about it. But I did write the recursive B-Tree and linked list code from scratch, so there’s that.

I won’t be posting the code. I’m proud of it and happy it works, and it’s clean and neat, but I’m not a fan of public git repo sites (especAIlly now). And tarballs seem excessive for how silly this project is.

I still chafe that Dr. H made us do this with a book of the bible but, honestly, it’s an interesting project with other applications. I tested myself and believe I would’ve passed had I enough experience and patience.

So take that, Doctor H! I hope you’re doing well, wherever you are.

#bible #C #college #concordance #gcc #git #Gutenberg #KandR #makefile #programming #success #vscode

Evolution of how I think of #loops while #coding:

1. When I first learned "loops":

while (condition is true) {do these things, adjust things so a slightly new condition is checked}

// That's where I first saw infinite loop and how there are intentional infinite loops.

2. A small step to move condition update out of the loop body:

for (i=0; i< N; i++) {do these things}

// After the couple of days it took to get used to them, I found them neater and closer to how I think of things.

3. Most of the time, the i from before is indexing into something, so let's directly deal with the item being indexed:

for item in collection:
do stuff

# After the few days to rewire syntax muscle memory, going back would decidedly feel like a step back.
# I don't want to give up automatic (and transparent) out-of-bound checks.

4. There are actually only about 3/4 things one does inside a loop:

map/fold/scan/filter function-to-call collection-to-traverse-through

;; Getting rid of explicit indexing was just step one.
-- After a few days/months/years, I now realize that it is more important and less buggy if I think only of the function to call (and whether I want to end up with a new (maybe pruned) collection, a single thing, or "both" (that's how I think of scans))

----------
Alternatively, my evolution as I learned new #programming languages idioms:
#KandR -->
#cpp or #java -->
#python -->
#lisp or #haskell --> ???

You can write insecure code in any programming language.
I'm sick of the C-language bashers.
#KandR