@gugurumbe I personally use the #Rust scraper crate to do this: you basically tell it to extract sections by CSS path, and it always does the right thing. I think sometimes lax standards are better than rigid ones, and that current HTML/CSS serves the #indieweb very well. X{HTML,ML} is all too finicky for the casual web producers like me.
[ps. there is also a Rust microformats crate, but that's been rubbish in my experience.]
@Unn0wn Noone would be surprised, should I declare I don’t have much respect for the browser vendors [1 is less known than other complaints]. But that’s beside the point. technology is political, so we have to think about our choices at every step. Regarding the specific task of parsing microformats, using HTML gives more power to a monopoly, while XHTML does not.