Mastodawn

Alan Arnold Apr 23, 2023

Jeffrey P. Bigham 🔥🔥

we have chosen to put most of our research into documents in PDF format.

PDFs are a huge pain to make accessible.

most scientists write their papers in Latex, overleaf, etc., which cannot produce accessible PDFs.

to make such PDFs accessible, one uses Adobe Acrobat, which is expensive and proprietary.

increasingly, we post our PDFs to arXiv, which ~forbids accessible PDFs b/c they can't be compiled from source.

~none of our science is accessible.

artifacts (and file formats) have politics.

Show thread

Julian Fietkau Mar 16, 2023

@jbigham It seems LaTeX is in the process of implementing this: https://www.pdfa.org/presentation/tagged-and-accessible-pdf-with-latex/

But put me in the "PDFs are not worth the hassle" camp anyway. I know some people in the LaTeX project have been working on a new document container format that still offers precise printing, but that can also reflow text for different screen sizes. The name escapes my memory right now. Maybe it'll be really cool. But for current research publication, I'd settle for HTML.

Tagged and Accessible PDF with LaTeX

In Summer 2020 the LaTeX Project Team announced the start of a multi-year project [1, 2] to produce tagged and accessible PDF from existing LaTeX sources with

Show thread

Jeffrey P. Bigham 🔥🔥Mar 16, 2023

@julian this has been in progress for over a decade, so i'm skeptical. it really is hard, especially buried in that presentation is compatibility with other packages, which turns out to be a super hard problem.

Show thread

Counting Is Hard Mar 17, 2023

@julian @jbigham Using LaTeXML to convert to HTML with MathJax is the suggestion I currently suggest, not least because the LaTeX tagging options tell you not to use them anymore 😭

Show thread

quelu 🦔🦉👺Mar 17, 2023

@julian @jbigham HiNT ?

Show thread

Julian Fietkau Mar 17, 2023

@quelu @jbigham I think that must have been it, yeah! Thank you! Really tough to find if you can't remember the name. 😀 https://www.tug.org/TUGboat/tb40-2/tb125ruckert-hint.pdf

Show thread

geolaw Mar 17, 2023

@julian @jbigham a replacement for pdf would be good

Show thread

Jan D Mar 16, 2023

@jbigham I always wondered why science on the web is all pdf when the web is said to have been invented to ease sharing of information for and by scientists.

Show thread

Jeffrey P. Bigham 🔥🔥Mar 16, 2023

@simulo lots of reasons, but primarily editing and reading tools, also people like their papers to have a consistent look

Show thread

Scott Jackson Mar 16, 2023

@jbigham @simulo That's kind of chicken-and-egg, though, isn't it? If someone _wanted_ to output compliant, accessible HTML there are already standards for this: https://w3c.github.io/scholarly-html/

You can be prescriptive without requiring a proprietary file format...

Scholarly HTML

Show thread

dingodog Mar 17, 2023

@scottmmjackson @jbigham @simulo I remember being super excited about PDF as a file format because it broke the exclusive choice of Apple or Microsoft as an environment for your doc

Show thread

Jan D Mar 17, 2023

@scottmmjackson @jbigham I'm already happy about research in standard-compliant "normal" HTML (though the semantic infos of scholarly html are nice)

Show thread

Scott Jackson Mar 17, 2023

@simulo @jbigham Sure, but to the extent that journals have an interest in things generally looking the same and being printable in a particular fashion, HTML offers a really good basis for that

Show thread

Sabrina/Brie Mar 17, 2023

@scottmmjackson @jbigham @simulo PDF is not a proprietary standard - it is an open standard, maintained by the ISO.

Show thread

Scott Jackson Mar 18, 2023

@conrad @jbigham @simulo Adobe still owns a bunch of PDF technologies, and it was built as a proprietary format by Adobe. Ultimately it's proprietary by design, even if the format is "open" now.

Show thread

Prof Gaelle Vallee-Tourangeau Mar 17, 2023

@jbigham @simulo I think the “look” is more important than we might think at first. I have often wondered whether/to what extent the perceived scientific value of an academic paper would change as a function of its formatting. The technology for saving a document as an ebook is here. Some journals also offer epub. The question is not so much whether technological solutions exist. Rather it is why academic users haven’t adopted them, whether they are sharing a preprint or downloading an article…

Show thread

Världens bästa Kille™Mar 18, 2023

@simulo @profgaelle @jbigham I’m no scientist, but I am a (copy) writer, so I have worked alongside designers for twenty odd years. So believe me when I say this: Design makes a huge difference. The impact on things like perception and readability cannot be over stated. And more often than not it affects content, too. (“Ok, so this info is in the diagram, we can skip the text” or “let’s high light that and skip the diagram” etc.)

Design and typography matters.

Show thread

Världens bästa Kille™Mar 18, 2023

@simulo @profgaelle @jbigham Personally I’d go farther and say that design IS content, and that good writing is design.

Show thread

Världens bästa Kille™Mar 18, 2023

@profgaelle @simulo @jbigham All of this is to say: Accessibility isn’t one thing, it hinges on the definition of “accessible”.

Show thread

Jan D Mar 20, 2023

@thelovebing @profgaelle @jbigham I agree – and HTML/CSS offer a lot of ways to design documents well; what they do not do well is replicating how a printed document looks like (since they are focussed on content for different screen sizes)

Show thread

Virginia Murr Apr 23, 2023

@thelovebing @simulo @profgaelle @jbigham

Tech editor here who has also, separately, helped a few nonprofit orgs through the epub process.

I very much agree with your take on this.

Show thread

Världens bästa Kille™Apr 23, 2023

@VirginiaMurr @profgaelle @simulo @jbigham This is the kind of interaction I am here for!

Show thread

jonny Mar 18, 2023

@jbigham
@simulo
you forgot the massive and entrenched industries that fought web publishing tooth and nail!

Show thread

Knut Morå Mar 17, 2023

@simulo @jbigham
I think it is in great part a desire to make it exactly correspond to what is printed in the paper journal, for better or worse.

My read of the invention of http in particular was that it was meant to help information management needed to run large physics experiements-- many thousand pages of internal documentation, and would probably not have been invented that early to keep track of the very distilled final product
(https://www.w3.org/History/1989/proposal.html speaks of cern user groups etc)

The original proposal of the WWW, HTMLized

Show thread

Adam Shostack

Mar 16, 2023

@jbigham @blakereid this is so important. For my last article I used htlatex and tex4ebook to make web and epubs, but it was a struggle to find the tools to do it (https://shostack.org/blog/fast-cheap-good-redux/)

Shostack + Friends Blog > Fast, Cheap and Good, Redux

A new paper on how fast, cheap and good can combine into something we usually discount.

Show thread

Ben Bolker Mar 16, 2023

@jbigham For what it's worth the LaTeX team is supposedly working on a long-term solution ... this is the most recent https://www.pdfa.org/presentation/tagged-and-accessible-pdf-with-latex/ (still seems very slow-moving, but what do I know??) (see also https://www.latex-project.org//publications/indexbytopic/pdf/ )

Tagged and Accessible PDF with LaTeX

In Summer 2020 the LaTeX Project Team announced the start of a multi-year project [1, 2] to produce tagged and accessible PDF from existing LaTeX sources with

Show thread

Colin Bischoff Mar 16, 2023

@bbolker @jbigham I'm still always surprised that it has taken so long for LaTeX to do this. Seems like it would be a natural thing to do in documents that have already been created with section tags, figure environments with captions, etc. I've seen references to why this is a hard project and the only one I understand is that there are a million packages, homebrew document styles, and so on. But we are long overdue for at least having accessibility in simple document styles.

Show thread

Tom Ritchford Mar 17, 2023

@cbischoff @bbolker @jbigham It is a small team, and I believe they are all volunteers.

The majority of the commits are from one person: https://github.com/latex3/latex3

GitHub - latex3/latex3: The expl3 (LaTeX3) Development Repository

The expl3 (LaTeX3) Development Repository. Contribute to latex3/latex3 development by creating an account on GitHub.

GitHub

Show thread

Ben Bolker Mar 17, 2023

@TomSwirly @cbischoff @jbigham I would contribute $$$ to help this happen! From a late-2020 article https://www.latex-project.org/publications/2020-FMi-TUB-tb129mitt-tagpdf.pdf : "A realistic scenario would be that each phase [out of 6] takes between one and two release cycles of LATEX, of which there are two per year. This implies that the project will stretch across four years as a minimum, but it most probably will be somewhat longer. Additional funding will help to ensure timely delivery of each phase ..."

Show thread

Ben Bolker Mar 24, 2023

@TomSwirly @cbischoff @jbigham If anyone is still on this thread *and* is on TeX Stack Exchange *and* feels like picking up some bounty points ... https://tex.stackexchange.com/q/663825/11435

Status on the Tagged PDF project

Does aynone know what the status of the Tagged PDF project is? How far has the LaTeX3 Team progressed on the time line from section 3 of "LaTeX Tagged PDF -- Feasibility Evaluation"? Also...

TeX - LaTeX Stack Exchange

Show thread

Sukrit Venkatagiri Mar 16, 2023

@jbigham I wish SIGCHI/ACM would move away PDFs as the default file format for this and so many other reasons. Let's just do HTML and give people the option to convert to PDF >.<

Show thread

cryo2go :unverified:Mar 16, 2023

@jbigham agreed on the thesis, but calling bullshit on "most scientists write their papers in Latex"

I would be hard pressed to believe that most CS or physicists do, much less scientists more broadly.

Show thread

ahistorical immaterialist Mar 16, 2023

@BenjaminHimes @jbigham yep, LaTeX is certainly common in some fields, but I have been an author on well over a hundred papers, now, and LaTeX was used in just two of them.

Show thread

Kristin Branson Mar 16, 2023

@BenjaminHimes @jbigham I've never seen someone in CV/ML write a paper in something other than Latex... My first nature submission I had to reformat it from Latex to word, it was appalling what it did to my equations. I don't think it occurred to me that a pdf would not be okay.

Show thread

Martin Hamilton Mar 16, 2023

@jbigham #Markdown to the rescue? https://brainbaking.com/post/2021/02/writing-academic-papers-in-markdown/

How to write academic papers in Markdown

Tired of that silly LaTeX syntax?

Brain Baking

Show thread

Niklas Hauser Mar 16, 2023

@jbigham Oh, it's not just research, almost all engineering documents hide their most useful data in PDFs. It's so horrible, I was forced to write my thesis on it!1!! PDF -> HTML -> Table Processing -> Knowledge Graphs. Math Formulas is still really hard tho. https://salkinium.com/master.pdf

Show thread

Maya Jul 12, 2023

@salkinium Oh interesting, though got a 404 when I tried the link.

A former coworker wrote an inhouse tool to pull out table data from TRMs, and I remember him complaining about all the edge cases he encountered, even though the tool only had to support documents from our main SoC supplier.

Show thread

Niklas Hauser Jul 12, 2023

@Mayabotics Had to pull my thesis temporarily while our paper about it gets peer reviewed anonymously. It's still available in the git history ;-P https://github.com/salkinium/intertubes/tree/4b7af2a4bf2735162cb244f50738609cc04dcd37

GitHub - salkinium/intertubes at 4b7af2a4bf2735162cb244f50738609cc04dcd37

Marvel at my mad web skillz. Contribute to salkinium/intertubes development by creating an account on GitHub.

GitHub

Show thread

mmu_man Mar 16, 2023

@jbigham hmm, in which way is LaTeX PDF output non-accessible? At least it doesn't produce rasters like, well, Photoshop & other things, so that must be some other problem…

Do you mean text blocks ordering or some other stuff?

Show thread

Jeffrey P. Bigham 🔥🔥Mar 16, 2023

@mmu_man https://lmgtfy.app/?q=https%3A%2F%2Fwebaim.org%2Ftechniques%2Facrobat%2F ;)

LMGTFY - Let Me Google That For You

For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.

Show thread

mmu_man Mar 16, 2023

@jbigham I know how to use Google, thanks.

Show thread

mmu_man Mar 16, 2023

@jbigham (and it's not meant to search for URLs by URL, that's what the URL bar is.)

Show thread

mmu_man Mar 16, 2023

@jbigham https://lmgtfy.app/?q=latex+pdf+tags

Don't know how extensive they are but there are ways to do it it seems. Sure they need more exposure, and push for arXiv to consider making it mandatory.

LMGTFY - Let Me Google That For You

For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.

Show thread

mmu_man Mar 16, 2023

@jbigham just like surely not everyone using Acrobat know or will use the accessibility tags properly anyway.

People still want to fake radio buttons in HTML, so there's a long road ahead…

Show thread

mmu_man Mar 17, 2023

@jbigham makes me wonder how well LibreOffice handles that in PDF it generates, as I've been using it for years thinking it was just fine.

Same for pandoc I use to generate user manual through LaTeX from markdown… 🤔

Show thread

何事西风

不待人 Mar 16, 2023

@jbigham are there alternatives to it?

Show thread

Jeffrey P. Bigham 🔥🔥Mar 16, 2023

@SWwind lots of them… e.g., Word, epub, HTML, etc, all with tradeoffs.

Show thread

Tom Renner Mar 17, 2023

@jbigham @grimalkina if only we had invented some kind of Markup Language that allowed Text to be annotated, include Hyperlinks, and evolve over time to capture advances and updates in scientific thinking. Maybe we could use that as a shareable format, with each update being marked with different resource identifiers.

Show thread

Dee Toher Mar 17, 2023

@jbigham I now use markdown (specifically quarto) to render work into multiple formats from the same source file.

As you can keep the .tex file while rendering to pdf, for submissions where submitting tex files are that still de rigueur is not a barrier.
But I can also, with the same source document, produce a fully accessible document (even with equations), with structure and metadata.
Making it easier to host files in a more accessible format on arXiv and similar is key

Show thread

Andrea Bianchi Mar 17, 2023

@DToher @jbigham Indeed, for some reason, Physical Review takes the tex files but does offer only PDF afterwards. Maybe we need to yell at our professional societies more.

Show thread

Nontechie Talk

Mar 17, 2023

@jbigham What makes accessing a pdf document challenging?

Show thread

Swarmpicker Mar 17, 2023

@jbigham The solution is to use the PubMed approach, encode documents in XML and then compile to PDF or an accessible format at the user's request. (Personally I think it's crying shame that EPUB can't handle all of this natively.)

Show thread

Rep. Eric Gallager (no "h"!)Mar 17, 2023

@jbigham This is something I've been trying to legislate about, but it's hard... my state government uses PDFs a lot, too; how should we balance between the interests of supporting accessibility and opposing proprietary software?

Show thread

Kat the Leopardess Mar 17, 2023

@jbigham I understand this frustration. Sometimes I use Linux for some workaround (including using something that will let me create, markup, and edit PDFs)

I hate the hold that Adobe has on this kind of stuff.

Show thread

varx/tech Mar 17, 2023

@jbigham I'm curious about how hard it would be to teach screenreaders to read TeX, or perhaps make a tool that takes in TeX and spits out some semantic tagged stream. (Or maybe this is just the TeX -> HTML that already exists.)

And I feel like it should be possible to include TeX in PDFs as an embed of some sort... I know it's a hack, but I just wonder if it's a faster road to accessibility.

Show thread

Iain Anderson Mar 17, 2023

@jbigham EPUB should be the solution for this; it's based on HTML and offers reflowable and fixed-layout options. Unfortunately, there's no default reader for Windows, while there's the Books app on all Apple products.

Show thread

Andy Scholand Mar 17, 2023

@jbigham Do the suggestions at https://libguides.lib.msu.edu/c.php?g=995742&p=8207771 mostly not work, then? (Haven't tried this myself but was planning on trying soon.)

LibGuides: LaTeX: Creating Accessible LaTeX Documents

A basic introduction to writing and managing citations in LaTex.

Show thread

huesm Mar 17, 2023

@jbigham #PDF #LaTEX #Adobe #acrobat #overleaf

Show thread

pizzapal Mar 17, 2023

@jbigham pdf is the devil. like it emulates a piece of paper, but somehow manages to be worse than an actual hard paper copy. i like to do hacking on extracting data from pdf to make the bad go away

Show thread

Andrew Harvey Mar 17, 2023

@jbigham @freakboy3742 I work in this domain and PDF display and management is a non-trivial exercise.

PDFs held hostage by predatory publishers is a whole other extra barrel of fun.

Show thread

Jonathan Dowland Mar 17, 2023

@jbigham meanwhile, the original design intent of HTML is sitting in the corner, dagger-eyes