PDF -> Markdown 100 pages per second.

I hate that it's for #AI but I also think it will have connotations for #digipres one way or another.

#wtfpdf

https://social.lansky.name/@hn50/115255089991079589

Hacker News 50 (@[email protected])

OpenDataLoader-PDF: An open source tool for structured PDF parsing Link: https://github.com/opendataloader-project/opendataloader-pdf Discussion: https://news.ycombinator.com/item?id=45347147

Mastodon

@beet_keeper call me unimpressed when their illustrative example is an arXiv preprint, for which people had to submit valid TeX and when all the world‘s linguistic diversity reduced to a single metric for “non-English”.

#TechBros #EpistemicViolence