My friends drew my attention to this paper which was written by computer scientists so pure, so theoretical, so far above the sins of the empirical plane, that they ran a benchmark comparing common serialization formats in their *mind palace* and came to a conclusion which is faster https://arxiv.org/abs/2505.13478
An Extensive Study on Text Serialization Formats and Methods

Text serialization is a fundamental concept in modern computing, enabling the conversion of complex data structures into a format that can be easily stored, transmitted, and reconstructed. This paper provides an extensive overview of text serialization, exploring its importance, prevalent formats, underlying methods, and comparative performance characteristics. We dive into the advantages and disadvantages of various text-based serialization formats, including JSON, XML, YAML, and CSV, examining their structure, readability, verbosity, and suitability for different applications. The paper also discusses the common methods involved in the serialization and deserialization processes, such as parsing techniques and the role of schemas. To illustrate the practical implications of choosing a serialization format, we present hypothetical performance results in the form of tables, comparing formats based on metrics like serialization deserialization speed and resulting data size. The discussion analyzes these results, highlighting the trade offs involved in selecting a text serialization format for specific use cases. This work aims to provide a comprehensive resource for understanding and applying text serialization in various computational domains.

arXiv.org

@0xabad1dea looks at the paper

Wow. No one has ever, uhmmm, measured the speed difference between four of the most common text formats before. We definitely need to use a hypothetical computer for that!

@Chip_Unicorn @0xabad1dea But that’s a comparison of implementations which, as they point out, brings confounding factors. Running experiments involves a lot of different sources of error. In contrast, if you simply make up your results, there is only one source of error. The latter is obviously a superior approach arising from Lemma 1.4: 1 < many.

@0xabad1dea "To illustrate the practical implications of choosing a serialization format, we present hypothetical performance results in the form of tables …"

because, of course, *hypothetical* performance measurements are more reliable when you put them in tables 😂

@0xabad1dea Wow. 21 authors too. Some resume padding maybe…

@0xabad1dea

I didn't know we'd still see, "I have only proved it correct, not tried it" in 2025.

@0xabad1dea

.....This is at the level of what I'd expect from someone's project paper for a course in a CS-adjacent associate's degree program. It is several tiers below a research paper

@0xabad1dea Now I’m wondering how many of my tech rants I’ve written over the years I can reformat in Latex and call “An Extensive Study”. Academia here I come!
@0xabad1dea wait you can just do that?

@th @0xabad1dea of course and it is a viable scientific method¹.

_______
Sources:

@0xabad1dea in their defense, computer science is less frustrating when you throw out the computers
@dotjayne you can still make this an opportunity for science! Mostly physics. Mass, acceleration under gravity, parabolic trajectories, even wind resistance if you measure carefully!
@dotjayne @0xabad1dea I'm old enough that I went to college when they were just separating the CS department from the math department, so... yeah, they tried that once.

@danlyke
Were the remaining mathematicians happier afterwards?

@dotjayne @0xabad1dea

@Landa yes. I think everybody benefitted.

@dotjayne @0xabad1dea

@Landa @danlyke @dotjayne @0xabad1dea Steady on. Imagining mathematicians happy sounds like a bit of a stretch to me.
@dotjayne @0xabad1dea Something something astronomy and telescopes.
@0xabad1dea
This seems like the first step in creating a hypothesis, tho... Like saying 'this is x number of steps and that is y number of steps so we can compare them before dealing with the size of each step in each implementation...'
@Crissa sure, but… then where’s the rest of the paper…
@0xabad1dea
Well, they drank the bottle instead of launching a ship with it.

@0xabad1dea absolutely wild LOL

The hypothetical results presented in the previous section, while not based on a real benchmark, align with the generally understood characteristics of the evaluated text serialization formats. Analyzing these hypothetical results allows us to discuss the typical trade-offs involved in choosing a format for a given application.

@0xabad1dea It almost feels that the paper and its benchmarks were LLM-generated, and the authors just did a s/benchmark/hypothetical benchmark/g.

@0xabad1dea "The simulated benchmark was run on a hypothetical standard computing environment."

I think they misunderstand the meaning of "run" here. Nothing was ever run. THEY NEVER RAN ANYTHING.

They really need the word "imagined" in there somewhere.

@0xabad1dea honestly, this looks like some people's largely LLM-generated undergrad course project
@0xabad1dea “sins of the empirical plane” feels sloppily redundant.