Mastodawn

Julia Evans Apr 13, 2023

today I'm thinking about the "don't use floating point for money" advice I hear all the time. It obviously has a lot of truth to it.

But -- Excel/Google Sheets uses floating point for all of its calculations, people use spreadsheets for money calculations all the time, and it generally seems to work just fine -- the results get rounded for display.

So I'm trying to figure out if there's a more nuanced guideline than "never use floating point for money".

Show thread

Julia Evans

thinking about trying to define a "safe zone" and a "danger zone" for floating point. maybe something like:

safe zone:
* all integer values (like 1.0, 234.0) behave 100% exactly the way you'd expect, UNLESS (!!!) they're more than 2^52. You can check for equality, it's fine.
* adding up ~100 numbers and rounding the result to 4 decimal places or so is going to work fine, as long as the numbers are roughly the same size

Show thread

Julia Evans Apr 13, 2023

I think it's interesting to talk about floating point's "safe zone" (things you can do with floating point that are Completely 100% Fine Actually) because I think sometimes folks see that floating point is weird and kind of... overreact and treat it as a Magical Thing that could unexpectedly break at any time.

Show thread

Julia Evans Apr 13, 2023

someone pointed out that this "never use floats for money" advice probably comes from the time of 32-bit floats, which have WAY less precision than 64-bit floats (8 digits instead of 16!) and are VERY VERY bad to use for money: you start losing 1 cent of accuracy around $100,000!!

Show thread

Julian Andres Klode 🏳️‍🌈Apr 13, 2023

@b0rk sometimes you need to track 1.18585373853 dollars or something like that.

Show thread

masukomi Apr 13, 2023

@b0rk yes, but...when you're dealing with money, why would you ever choose to use something that has precision errors when you can choose a solution that doesn't?

It seems like good advice regardless of how many bits of precision we have. Equality tests going wonky 20 digits out vs 10 digits out is still a failing test.

Show thread

Jindra Lacko Apr 13, 2023

@masukomi @b0rk I work with money, and I see floats in scripts often as a fallback of sorts: like when it is necessary to multiply two decimals with different precision a sloppy developer can cast both to float and avoid overflows...

Show thread

brandizzi Apr 13, 2023

@masukomi @b0rk that's what I ask myself, too. Floating point is slower, less precise, and longs are large enough (I mean, the global GDP is ~10^14 USD, how much more one needs?) I can think about calculating interests, maybe, and surely large, heavy matrix processing, but then these are cases were we *need* to use FP.

Show thread

brandizzi Apr 13, 2023

@masukomi @b0rk I'm curious to see the results of challenging the common sense here, but if there is a rule of thumb that seems too easy not to follow, using integers for FP is the one IMHO.

Show thread

Dave Hirschfeld Apr 14, 2023

@masukomi @b0rk It doesn't matter what data type you use, you can't represent 100/3 exactly (modulo symbolic algebra).

Everything has precision errors on a computer so I think it's a bit of a moot point. Sure, you can do better than floats with other data types at the cost of performance, but you can also use more sophisticated algorithms such as Kahan summation, also at the cost of performance.

Show thread

Thomas Depierre Apr 13, 2023

@b0rk another thing to keep in mind is that some currency have really low buying power. That means that even handling bread price can be counted in hundreds of thousands... that has... impact... on both that currency but also the precision when you translate 1 unit of them into a far far more powerful buying power currency...

Show thread

Honza Machala Apr 13, 2023

@b0rk and different currency it might make much bigger problem. Not just cents

Show thread

supersingular Apr 13, 2023

@b0rk I think the advice is generally geared towards accounting systems where very small errors can accumulate to large sums across a gigantic number of transactions - also an interesting read: https://skeptics.stackexchange.com/a/14933

Has a programmer ever embezzled money by shaving fractions of a cent from many bank transactions?

There's a popular story that describes a programmer having altered a program at a bank so it diverted fractions of a cent from every transaction to another account (or some variation like multiple

Skeptics Stack Exchange

Show thread

Christopher Agocs Apr 13, 2023

@b0rk 🎵Guess who got an angry email from Bank of America about that in 2011 or soooooooo🎵

Show thread

Paddy Apr 13, 2023

@agocs there's a story here

Show thread

Christopher Agocs Apr 13, 2023

@paddy babby developer me, not quite out of college, was working for a company that built debit card rewards programs for banks. I using SQL Server Integration Services to put together an ETL pipeline that read over a tranche of BofA's debit card transactions and summed up the amount of rebates their account holders were owed.

I remember deciding between double and float and going with float because I didn't think we'd need the range of a double. I completely forgot about precision loss....

Show thread

Christopher Agocs Apr 13, 2023

@paddy .... until someone on the BoA side checked and was like "hey, this is a hundred dollars off"

Show thread

Paddy Apr 13, 2023

@agocs lolol okay that makes more sense, I was imagining you were a consumer of BoA and they were sending you angry emails about floating point precision and I imagined for that to be true there must be something wild going on

Show thread

Christopher Agocs Apr 13, 2023

@paddy Tale as old as time / song as old as rhyme / Jr. dev makes mistake

Show thread

Paddy Apr 13, 2023

@agocs that's basically the job, in fairness

Show thread

Mike Samuel Apr 13, 2023

@b0rk This is a lovely idea. Re "all integer values,"
there are some corner cases where Float64's interact with generic data structures like maps, and sets because often Float64's equality and comparison do not obey the contract needed by data structures that hash or sort internally.

This can lead to oddities around questions like "does this collection contain this zero"?

For example, The caveat in the docs for the boxing type for Java double explains how Java makes double arithmetic meet user's expectations most of the time while inter-operating with generic collections. Java collections use .equals to compare values, not primitive ==, which means that a set can separately contain positive and negative zero and may contain one but not the other.

import java.util.*;

public class TwoZeroes {
    public static void main(String... argv) {
        double x = Double.longBitsToDouble(
            0x8000_0000_0000_0000L
        );

        Set<Double> someDoubles = new LinkedHashSet<>();
        someDoubles.add(x);

        System.err.println( // true
            "x == 0: " + (x == 0.0D)
        );
        System.err.println( // true
            "someDoubles contains x: " + someDoubles.contains(x)
        );
        System.err.println( // false
            "someDoubles contains zero: " + someDoubles.contains(0.0D)
        );
    }
}

Negative zeroes might be rare in practice, but this might affect users who parse floating point numbers from a field pre-populated with a minus sign, like discount: -_____ or, iirc, when using rounding mode towards -Inf.

Double (Java Platform SE 8 )

Show thread

Leifur Halldór Ásgeirsson Apr 13, 2023

@b0rk IME it can still be an issue when you need to aggregate smaller transactions. E.g., "what was the total of the payments I accepted last month?" Maybe each transaction was well within 1¢ of accuracy, but your reports don't add up with each other.

Show thread

Leifur Halldór Ásgeirsson Apr 13, 2023

@b0rk i think also excel has more than one way of representing fractional numbers, i wonder if the "accounting" format uses floats? In any case, if your numbers are already rounded to whole cents before you put them in Excel, basic aggregations are probably close enough. In my experience it's more often a problem if your DB is storing the raw results of floating point arithmetic.

Show thread

Leifur Halldór Ásgeirsson Apr 13, 2023

@b0rk i think the hard part for me with floating point is that it's often hard to keep track of where I'm in a "safe zone" or not. Using ints or something like python's Decimal class just reduces the cognitive overhead, at least in situations where I need a predictable precision.

Show thread

Ben Houston Apr 13, 2023

@b0rk I would never use any float for money - you never know when things will get weird. Just use BigInt if you are in JavaScript / Node.Js. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt

Also if you do need to add more than 100 floats, just do a tree-like summation (add neighbours, then add those results to their neighbours, etc. Minimizes loss.). For extra precision sort first, so you add like to like.

BigInt - JavaScript | MDN

BigInt values represent integer values which are too high or too low to be represented by the number primitive.

MDN Web Docs

Show thread

lynn Apr 13, 2023

@b0rk Something I'd be curious about is how this works in practice. What are banks using? Governments?

Would you use signed integer amounts of pennies? Mils? What about interest rate calculations, or converting between currencies? Is anyone actually using the special libraries/APIs for dealing with currency? What is the largest number, realistically, anyone has to deal with? One quadrillion? Etc

Show thread

eviljonny Apr 13, 2023

@lynndotpy @b0rk GOV.UK Pay (https://www.payments.service.gov.uk/) represents monetary amounts in whole pence https://github.com/alphagov/pay-connector/blob/master/src/main/java/uk/gov/pay/connector/charge/model/ChargeCreateRequest.java#L38 Specifically a java Long in this case, it does also travel through the front-end which is in Node.js

Performance data

GOV.UK Pay

Show thread

lynn Apr 14, 2023

@eviljonny thank you for sharing this!! i appreciate this

Show thread

Owen O’Malley Apr 14, 2023

@lynndotpy @b0rk one large company that I have worked with used millionths of a dollar in a long. That allows for very accurate currency conversions. No accountant would knowingly accept floating point approximations of money.

Show thread

lynn Apr 14, 2023

@omalley i love this, thank you for sharing!!

quick back-of-the-envelope math: assuming long == signed 64bit int, that's $9,223,372,036,854 which can be accurately represented! :0

Show thread

🌈 Andrew ☄️Apr 13, 2023

@b0rk I’m not sure if it applies to money, but some operations are not well defined in the spec, so you get tiny cumulative differences rather than determinism from one platform to the next. This is a problem in gamedev when you want to send minimal data (like input) between players and have each machine compute the full changes to the game world the same way. Differences in float implementations between player’s machines are a big gotcha for desynchronisation bugs. https://randomascii.wordpress.com/2013/07/16/floating-point-determinism/

Floating-Point Determinism

Is IEEE floating-point math deterministic? Will you always get the same results from the same inputs? The answer is an unequivocal “yes”. Unfortunately the answer is also an unequivocal “no”. I’m a…

Random ASCII - tech blog of Bruce Dawson

Show thread

🌈 Andrew ☄️Apr 13, 2023

@b0rk So the safe zone really depends on what you’re trying to do, compiler settings, runtime settings, machine, etc.

I have seen one suggestion for float determinism which keeps all values in a range and rounds to a specific precision after each operation. It’s essentially like 64-bit float but only utilising the 58 of the bits you know will be accurate on all platforms (the frequent rounding prevents cumulative errors cascading into more significant bits).

I expect cumulative errors cascading into more significant bits would be a problem for money with 64-bit float too. Although this may be fine if you rounded after every operation and the numbers weren’t too big. At some point there’s just more representable numbers and fewer edge cases to worry about with fixed-point math.

Show thread

Daniel Apr 13, 2023

@b0rk For some high-quality time wasting: Crockford's DEC64, aka "Why do why have to think about ints versus floats?" https://www.crockford.com/dec64.html and Gustafson's unums, aka "Why are IEEE floats so complicated?" https://en.wikipedia.org/wiki/Unum_(number_format)

DEC64: Decimal Floating Point

Show thread

Kevin Teljeur ❄️Apr 13, 2023

@b0rk When your floats are slicing the salami

Show thread

Alex White Apr 13, 2023

@b0rk It was beaten into me when I first started coding in 1986, and the fear of intermediate precision errors has never gone away!

Show thread

mathew Apr 14, 2023

@b0rk It also comes from before decimal64 and decimal128.

(I don't know why they even bothered with decimal32.)

Show thread

llewelly Apr 14, 2023

@b0rk that is about half of what I was going to say; 32bit float used to be really common, because people were afraid (sometimes justified, sometimes not) 64 bit was too resource hungry.

But the other half is guard bits: 20 or more years ago, there were still some cpus that provided floating point without guard bits, and the lack of guard bits causes errors to grow much faster. (This created some problems for some console video games I worked in that era, but fortunately no money involved!)

Show thread

Mike Samuel Apr 14, 2023

@b0rk Anecdotally, I've heard that people who do Monte-Carlo style financial projections intentionally ignore that advice. For example, to compute a projected range of returns to describe the risk associated with a financial instrument or proposed asset allocation.

I think the reasoning is that, since hardware acceleration for 754 is substantially better, being able to do a greater number of iterations in the same amount of time contributes more to the accuracy of the results than something like using 854-based decimals would.

Show thread

Edward L Platt Apr 13, 2023

@b0rk Have definitely seen this. Specifically: safe to use as keys if not mutable

Show thread

Mark Asbach Apr 14, 2023

@b0rk I would disagree with the idea of a „safe zone“. The main issue is that floating point numbers on a base of 2 cannot represent decimal places exactly. Never. But we need and expect that the entered decimal data is the exact truth for financial calculations. So it’s really a thing that can break unexpectedly. I would say: whenever you store and calculate with more than a few data points with exact decimal places, you should use the Decimal data type over float.

Show thread

Gasper Zejn Apr 13, 2023

@b0rk No, don't do that.

64-bit IEEE754 FP give you 15 decimal digits of precision.

That may be 15 for an integer, or 13 before and 2 after decimal separator, or 11 before and 4 after decimal separator.

The point is: you don't get to choose. The data type guarantees you get 15 digits and the decimal point automatically floats back and forth.

Show thread

Gasper Zejn Apr 13, 2023

@b0rk The money is a different beast.

Someone decided how money values should behave and how many digits after decimal separator are needed and also how they are rounded.

Some standards require rounding up, some down, some also up, down and 5, ...

Show thread

Gasper Zejn Apr 13, 2023

@b0rk Some goods require more digits after decimal places, eg. goods that are sold in bulk by weight, such as small screws.

Show thread

Gasper Zejn Apr 13, 2023

@b0rk Sure, you can get very far with floating point, a bunch of tests and fuzzing.

But the FP datatype is a poor fit for the given domain: it will *gladly* calculate numbers that are out of bounds of the domain, which is having more than 15 significant digits.

It will do this *without giving you any warning* it is losing information.

That is poor engineering.

Show thread

Itamar Turner-Trauring Apr 13, 2023

@b0rk Another way to think of it is "number of accurately expressible values for a given level of required precision", which is more expansive version of your first bullet point.

(I wrote this up with some examples: https://pythonspeed.com/articles/float64-float32-precision/)

The problem with float32: you only get 16 million values

Switching from float64 (double-precision) to float32 (single-precision) can cut memory usage in half. But how do you deal with data that doesn’t fit?

Python⇒Speed

Show thread

slacy Apr 13, 2023

@b0rk One thing about floating point that I find fascinating is issues that arise when using them for coordinates in things like video game worlds.

The "really weird" thing about floats is how they loose decimal precision as the whole number part gets larger.

So, when using floats for game coordinates, you have to be careful about "how far from the origin am I" because the farther you go, the lower precision, and thus, the lower fidelity you'll have on positions: Even for polygon vertices!

Show thread

slacy Apr 13, 2023

@b0rk Also, it's important to be really precise when you say "floating point" do you mean 32-bit floats or 64-bit double precision (doubles). There's a huge difference in the decimal precision!

For example, "float for latitudes & longitudes" is well known to cause lots of precision-related issues. 32-bit floats are generally "not precise enough" for these kinds of coordinates. You have to *always* use doubles.

Show thread

slacy Apr 13, 2023

@b0rk Another thing that really blows my mind about floats is issues when thinking about them as line vertices.

For example, if you have two lines, one from (0,0) to (1,1) and another from (0,1) to (0.5,0) and you compute "the intersection point between these two lines" you have to remember that the intersection point doesn't actually lie on either line! It will be some position just slightly off due to precision issues.

Show thread

slacy Apr 13, 2023

@b0rk This gets even tricker and non-intuitive dealing with polygons.

For example, when computing the intersection between two polygons, its actually possible for the area of the intersection to be larger than the area of one or both of the input polygons!

This generally will only happen when the input polygons are "quite close to" the precision limit of the float (i.e. 1E-7 or thereabouts for 32-bit).

But, "really long and skinny" polygons can result in other similarly bizarre cases.

Show thread

slacy Apr 13, 2023

@b0rk One last bizarreness:

std::numeric_limits<float> returns some ... surprisingly un-intuitive values!

For example,

std::numeric_limits<int32_t>::min() is "-2^31" (i.e. a large negative number)

BUT

std::numeric_limits<float>::min() gives the unexpected value 1.17549e-38 which is actually "the smallest positive value"

If you want "the smallest value that can be represented" you should use ::lowest() instead.

Show thread

Graeme Williams Apr 13, 2023

@b0rk Subtracting large numbers close in value can put you into the danger zone:

>>> v = 10e9
>>> (v+1)/3 - v/3
0.33333301544189453

losing more than half your significant digits.

Show thread

bigblen Apr 14, 2023

@b0rk this paper has some great pictures (and words) about the "badlands" of some FP functions https://jeapostrophe.github.io/home/static/tm-cise2014.pdf