🎯 Führungsaufgabe zwischen Kommunen & Landespolitik!🎯

Wir suchen eine/n Beigeordnete/n (m/w/d) für die Bereiche Kinder, Jugend, Bildung & Kultur (bis Bes.-Gr. B 2).
Sie vertreten kommunale Interessen gegenüber Landtag & Landesregierung – von Jugendhilfe über Kita bis Schule & Kultur.
Alle Infos & Konditionen 👇

🔗 https://link.nlt.de/jobs

#Stellenangebot #Niedersachsen #ÖffentlicherDienst #Hannover #NLT #fedijobs

Finished a huge pet project where I took a copy of the #NLT (New Living Translation) #Bible I scraped from the web some time ago and massaged it into a standard format where it's one verse to a line, each starting with "%s:%03d:%03d " book_name, chapter, verse.

It was really challenging, because the scraped text wasn't in that format, and any time there was a number inside the text (like "500 cubits" or whatever), my attempts at massaging the data created spurious extra verses. Also, some verses in the first two chapters of Numbers were joined, like "verses 1-2," which I had to split. I also found the "missing" verses from the new testament and added them back in from the footnotes, looking like "Aaaa:000:000 [Some mansucripts add verse 000, And then they...]"

I created a new copy of the texts in a new folder every time I completed a step of massaging the text into this format, and I ended up with 20 folders at 101 MiB total, which xz scrunched down to 1.9 MiB.
The resultant file is 4.4 MiB, which is surprisingly 54 KiB smaller than the KJV version (which I took from project gutenburg and massaged slightly into the same format around 20 years ago).

As far as I know, the only remaining issue with the text is that some section headings like "Jesus appears to Thomas" appear as snippets at the end of the preceding verse, but I'm ok with that for now.

If I had it to do all over again, I'd just scrape the html directly and the various tags and subparagraphs as clues to parse it more accurately, but what's done is done.

The NLT is copyrighted (copyrighting bibles is a fun ethical issue), so I can't distribute it, but there's nothing wrong with sharing with friends, if anyone's interested.

cc: @amin @joel