For accessibility reasons, arXiv is starting to publish HTML versions of papers. https://info.arxiv.org/about/accessibility_html_papers.html 🧵

#math #papers #openscience #academia #arXiv
@brembs
@lambo

I think this is interesting and welcome, especially on mobile devices. It is not without problems to want to quickly check some fact on your phone, download the PDF, go to landscape mode, find the right location in the paper, zoom in, etc.
1/4

HTML papers for beta testing - arXiv info

The promise would be that in HTML, everything is reformatted to look great on any line length.

But there is a kind of integration attack going on: Once we have the HTML version, why not enhance the paper with things PDF can't do? After all, HTML5 offers limitless possibilities to make the 3D figures interactive, run simulations, have the examples to be toys you can play with, etc.

Once these seemingly harmless new features are there, who would want to go back to the PDF version, which is a
2/4

static, cumbersome and arcane? Once we are there, we will definitely need versioning of publications because if papers become software, they will have bugs and need bug fixes.

I feel very insecure about this future. I am unhappy with how static and old-fashioned it is of us, to stick to this A4-PDF-paper format, for documents which are consumed on laptop and iPhone screens, and it never really fits and scrolling and zooming and whatnot.
3/4

On the other hand, I want to be done with a paper at some point. If publications turn into software, they will just need a lot of maintenance, forever. Maybe it can be solved with open-source culture, maybe future generations will take care of the bug fixing for my papers, but why would they and how would the get credit for this work in an academic system that only rewards new things and not the maintenance?

So what do you think about HTML arXiv papers?

4/4

@tomkalei The ideal solution, to me, is to offer both static/PDF and dynamic/HTML versions. There are some open source tools that facilitate this, like PreTeXt. Having the PDF available mitigates possible maintainability issues involving interactive documents.
@KlingonHipster @tomkalei There's also been https://ar5iv.labs.arxiv.org/ for a while now, which allows replacing the "X" in any arXiv link with a "5" for HTML5. I like the transition, but I think retaining the 'static to interactive and back' bridge is essential. No gifs, no videos, no embedded graphs. That's what project websites and OSF repositories are for. The whole purpose of an archive is that it's *archival*, which means it shouldn't require maintenance
ar5iv – Articles from arXiv.org as responsive HTML5 web documents

ar5iv offers a modern web view for arXiv's preprints. An open community resource, on a quest to a full collection of high-quality documents.

ar5iv
@tomkalei
Yes, I saw the HTML option the other day, but still need to check the features out.
Personally, I think we should've upgraded from PDF at least 20 years ago.
For readers, the last version will always be discernible and for authors, they decide when their paper has reached a state where additional changes are no longer warranted. That'd be my perspective at least.
@tomkalei it's a must. biRxiv has HTML versions since forever (?) And it really annoys me that chemRxiv also only offers PDF. arXiv is not really my place to go, but it's definitely a good decision.
@tomkalei I don't really follow your point about super dynamic HTML pages, HTML versions of journal articles are the norm and they are (unfortunately) all static and not interactive at all.

@tomkalei @buerviper personally I never author PDFs. Too much like dead paper. Presentations on my website are pure HTML and ready for full interactivity using my own open source library:

https://analyticphysics.com
https://paulmasson.github.io/mathcell/docs/

Of course I'm not part of the academic system, so I can implement whatever I feel appropriate. The MathCell library replicates Mathematica's Manipulate command in pure JavaScript, so it runs just by loading the page. Wouldn't have it any other way.

When I make major changes to a page I also note that at the bottom. More detailed tracking of changes is easily implemented with something like GitHub.

Analytic Physics

@paulmasson @tomkalei yeah I wish academic journals would be similar, but publications work a lot differently.

@buerviper @paulmasson If research results are versioned like this, would there not be incentives to publish 0. Versions way too early, fix things as you go, and import things that are bad in software development into science and engineering?

Also how would citing work if publications are not static but are updated?

Also some parts of math are super conservative. State of the art results are 50+ years old. Would that kind of quality still be possible in a much accelerated publishing culture?

@tomkalei @paulmasson Much of that criticism is about scientific publishing in general. Currently, you mostly publish once the full story (all experiments, analysis, etc.) is finished, because that will secure you a spot in a better journal. But for scientific progress in general, it'd probably be better to publish incremental results on which everybody can build upon.

@tomkalei @paulmasson There are ideas out there about it (like Octopus.ac), but it's difficult to accept widely. For early career researchers, you need these high profile publications, otherwise you won't get a permanent position. So you can't experiment too much with how to publish.

I'm glad preprints are now widely accepted in Chemistry, and you can also update your preprints on chemRxiv and each version gets a version number added to the doi, so you seee exactly which work was cited.

@buerviper @paulmasson In math we have been doing preprints as long as I can remember (which is not too long…). But putting the preprint on arXiv means the paper is Done (with capital D). It’s good style to only update arXiv once with the final version after review. Authors of papers that have 7 revisions on arXiv look foolish. It’s a cultural thing…

@buerviper @paulmasson I think there comes a cultural change with the technological change. Earlier and smaller publishing will lead to more hurrying, more errors, more jumping to the conclusion. A “move fast break things” culture.

This has benefits and risks but how do they compare in academia specifically? Construction of space craft has different requirements than building a social networking site.

arXiv html versions is another step into this direction. Good and bad.

@buerviper @tomkalei thought I'd share one particular page on my website. When I was first reading Barrett O’Neill’s book "The Geometry of Kerr Black Holes" I wanted all of his illustrations to be live and fully interactive. Eventually I put this together:

https://analyticphysics.com/General%20Relativity/Visualizing%20Aspects%20of%20the%20Kerr%20Metric.htm

Now whenever I see a PDF with two or three special cases of a general solution, I kind of cringe. Could never go back to that myself when I can have so much more in an HTML document.

#physics #math #visualization #GeneralRelativity

Visualizing Aspects of the Kerr Metric

@tomkalei I don't see much differences. LaTeX is also some kind of programming resp. markup language similar to HTML. Scripting plots or diagrams is not unusual. Putting the LaTeX source files under version control is common, especially if you work collaboratively on a paper. Publishing the final version is similar too.

@zuphilip my point is not the mode of presentation or production, but rather if there exists such thing as a “final version”. How would #academia change if all research was published like software, release 0.11a_p2 etc.

And then it’s a new discussion for super conservative fields like #math. We have some state of the art papers that are 50+ years.

@tomkalei When the publishing workflow stays the same, then I don't see why there should be more versions published. Instead of clicking a button generate PDF you (or the publisher) click on a button to generate HTML as output format.

@zuphilip @tomkalei I'm in a completely different field (tech law), but there are definitely (small and big) points in previously published papers that I would like to update. Maybe a law has changed, a talk with a colleague changed my perspective, there's a nuance that I didn't see at the time.

Not sure how this would work, e.g. peer review for each small thing could be cumbersome.

@Gerard @zuphilip and it breaks citations that are not to specific versions.

And if this becomes the norm, the culture changes. If you can make updates later it’s less of a problem to jump to potentially wrong conclusions now and update later.

@tomkalei @Gerard 1) I think that such living articles/books are an exception. 2) Just changing the format to HTML will not create more living publications. 3) Citations to printed books w/o mentioning the specific edition is similar bad. 4) I doubt that anyone creates wrong hypothesis on purpose to change later. 5) All published version will stay visible and therefore also any mistakes in early versions. 6) I hope that fixing errors is already part our culture also in research.
@zuphilip @tomkalei In response to 4: Peer review should be a extra check in that regard. 6: Agreed. But updates are not, and this is something that bothers me.
@Gerard @tomkalei This sounds very much like "living handbooks" and there are examples in life sciences, see e.g. https://www.publisso.de/en/publishing/books/books-policy .
Books Policy

Books Policy - Publishing your own handbook with the open acces publishing platform Living Handbooks from Publisso.

ZB MED - Informationszentrum Lebenswissenschaften

@tomkalei @zuphilip At the same time, hypertext has been known to be superior for information presentation essentially since it was created.

The move to PDF was a regression in most aspects save portability/consistency (math display in most hypertext types is still not properly standardized & implemented in renderers).

However, I think it should be *exclusively* hypertext. Not Javascript/arbitrary-code nonsense (we should not be facilitating information as attack tools).

@tomkalei @zuphilip Having a supplementary computational notebook attached/accompanying the hypertext paper would be an adequate compromise if interaction & interactive recomputation are desirable.
@tomkalei Probably submissions would then not be any HTML website, but ones following a predefined template or more common some Markdown file from where the HTML can be generated. ArXiv is generating the HTML files from the LaTeX files.
@tomkalei TBH I've wondered if half of the problems with reading pdfs on mobile could be solved just by setting the page size to A5 and sticking to single column. If you want to print it does 2 to a side easily and it's way more legible on a screen.
@tomkalei this is great news! They should always degrade gracefully to html

@tomkalei @brembs @lambo

Interesting! Something like this already exists with ar5iv: take the link for any arXiv paper (older than a few months), change "arxiv.org" to "ar5iv.org" and get an HTML5 version. It's not perfect but I've found useful enough for reading stuff on my phone. e.g.

https://ar5iv.org/abs/2109.12132

My background is astronomy, where the publishers will probably limit PDF content to traditional things for some time yet.

Detecting Biosignatures in the Atmospheres of Gas Dwarf Planets with the James Webb Space Telescope

Exoplanets with radii between those of Earth and Neptune have stronger surface gravity than Earth, and can retain a sizable hydrogen-dominated atmosphere. In contrast to gas giant planets, we call these planets gas dwa…

ar5iv
@tomkalei @brembs @lambo GENIUS and IT'S ABOUT TIME. All things everywhere need to do this.
Exciting news for this blind scientist!

@tomkalei @brembs @lambo

HTML is of course preferable to PDF for many reasons, as long as the papers print well. Interactive reading is best done on paper with pen in hand.

@tomkalei @brembs @lambo And HTML, unlike PDF, would also allow folks to print papers any way they want.