Paolo Maldini: "If I have to make a tackle then I have already made a mistake."
Three tackles in one pipeline. All three exist because pandas can't carry what #BigQuery produces: nested types, nullable integers, timezone-aware timestamps.
The TypeError surfaced on line 40. The mistake was made on line 3. The error couldn't point further back.
#Arrow removed all three. Twelve lines replaced a hundred and fifty, and are more correct.
https://paolobietolini.com/development/a-nan-where-a-long-should-be/

A NaN where a Long should be | Paolo Bietolini
A PySpark TypeError that looked like a schema bug was actually three steps upstream. Pandas can't represent what BigQuery hands it, (nested structs, nullable integers, timezone-aware timestamps, arbitrary-precision numerics) so every downstream line is a patch against a loss that happened at the moment pandas entered the pipeline. A walk to a twelve-line Arrow replacement, and the rule it points at.





