Recurring #TechWriting issue that I still haven't found a good solution for:

Is anyone aware of a decently reliable automation for reformatting #Markdown text that previously used line length limits of 80 characters and forced line wraps, to one sentence per line?

Must preserve all Markdown formatting including tables and fenced code blocks.

(If you think this is trivial and can be solved with a sprinkling of regex — nope.)

Boosts appreciated!

@xahteiwi Best guess: Convert to HTML and back.

Identifying sentences remains a hard problem, but the rest should be mostly mechanical. I'd start with Pandoc, hoping that it can be configured to create Markdown in the required format.

@OmegaPolice No need for the HTML conversion if using pandoc — you can use `--wrap=none` to remove line breaks, even if you're staying within Markdown. However, sadly that doesn't solve the problem at all, because now you have lengthy paragraphs of multiple sentences.

@xahteiwi Ah, nice! 👍

That's probably the best you can get without throwing some serious NLP at it. Curious to see if I'm missing something!

@OmegaPolice I'm slowly coming to the conclusion that this style decision is a one-way function: if you write your original documents as one sentence per line, it is trivial to subsequently impose a line length limit. But once you have that limit, then unless you also mandate *sentences* of, say <80 characters (not sure if that's ever useful; I doubt it), it's quite painful to go to one sentence per line.

@xahteiwi @OmegaPolice

Boosted because I don't have an answer. TeX has some pretty good heuristics for working out when a . is a full stop that ends a sentence, but they're not 100% reliable. I don't think this is something I'd want to do without carefully reading the output. It's probably better to just define that style for changes and tell people to reformat an entire paragraph when they make a change anywhere.

@david_chisnall That's exactly what I'm doing now, but it's causing bad blood with infrequent contributors. They make a big change and because the rest of the specific Markdown doc they're editing uses 80-char lines, that's how they format their patch, with the best of intentions. Then I ask them to reformat to one sentence per line, which is manual and tedious and they're rightfully annoyed. I want to remove that tedium and annoyance.

@OmegaPolice

@xahteiwi @david_chisnall Hm. 🤔 So if you automatically reformat to one paragraph per line and ask to boyscout to one sentence per line, would that help?

@OmegaPolice @xahteiwi

When I've done this manually, I've done:

  • Bulk reformat one paragraph per line.
  • Search for a dot followed by a space.
  • Replace almost all of those with dot followed by newline.

It's the almost that makes this an annoying manual process.

@david_chisnall Right. And now you also want line breaks after exclamation and question marks. All unless they're enclosed in backticks. And of course not within fenced code blocks. Or tables.

@OmegaPolice

@xahteiwi @david_chisnall @OmegaPolice What about quotes?

> When the terminal says "Error! Run again" you do as it says.

Probably shouldn't be wrapped at all, should it?

Then again, that's just the same as the backticks.

Not convinced this cannot be done with regex...

@jpl Did you read the thread from the start?

@xahteiwi I did. The more precise problem statement in the post I directly replied to, however, sounded compatible with regex.

But I won't bother you anymore, sorry.

@jpl You can't find matching quotes, parens, etc with regex in nested structures. You need to do that for this task, though.

@OmegaPolice Nested like "a 'b "c" d' e"? Or just different styles like "a 'b <c> d' e"?

Also, what rules actually do apply for quoted sentences?

> It says: "Something went wrong. Check the logs!"

Would that be wrapped after "wrong."? Probably, that quote could get really long

Then again

> When the output says "Something went wrong. Check the logs!" you should do as it says.

Probably shouldn't be wrapped.

The rules seem underspecified, so doing it automatically seems impossible.