@julian So, there are a few ways to handle this.
First, the b2b8 recommendation on the summary is just a guideline, not a strict requirement (thus the 'about').
A common practice in news-style text is that the first paragraph is a lede that summarizes the article's main points.
For example, in this article, Lawrence Summers Will Resign From Harvard After Epstein Revelations, the first paragraph is:
Lawrence H. Summers, a Harvard University economist and the schoolβs former president, will resign from teaching at the end of the academic year, according to a Harvard spokesman.
Some bloggers follow a similar practice, especially since blogging software often uses the first paragraph as the summary in RSS and Atom feeds.
I think if I were generating a summary for long-form text, I'd use these techniques in roughly descending order:
- Manual override; an optional way for the user to define a summary manually - either with a marker in the text, or with a separate input element (I think WordPress does this).
- Use the whole text if it meets the rough guidelines (~1 paragraph, a few sentences, about 500 chars) in b2b8.
- Use the first paragraph if it meets the rough guidelines (a few sentences, about 500 chars) in b2b8.
- Truncate the first paragraph and include an ellipsis ([...]).
This is also something that LLMs are pretty good at. So, maybe rather than truncating (last option), consider using an LLM to generate a summary that meets the boundary requirements.