New blog post! I really hope I'll get a decent amount of people mad with this one 😈😈😈

It's OK to compare floating-points for equality:

https://lisyarus.github.io/blog/posts/its-ok-to-compare-floating-points-for-equality.html

#programming #computing #floatingpoint #geometry

It's OK to compare floating-points for equality

lisyarus blog

@lisyarus Thanks a lot! This is one of my larger pet peeves. In the last 10 years I probably needed exact comparisons 95%+ of the time (if you exclude test code). And I mostly regretted any epsilon comparison in time. This is all with the exception of tests, where "approximate equal to the expected result" is fine.

But basically "load-bearing" comparisons with eps didn't bear any load in the end...

@artificialmind Yep, that's pretty much what my own experience was as well!

@lisyarus I had one client who had operator== on their Vector3 class overloaded with a non-zero epsilon.

Can you imagine how a std::unordered_map behaves if the equality operator might pick up the wrong element? That's fun to debug.

@lisyarus i'm NaN boxing as this comes into my feed
@lisyarus this post did not make me mad!
@aras Haha, you probably already know all I have to say, then :)

@lisyarus but what’s it about to justify leaving the Fedi client for a webbrowser? For all I know it could be about, idk, reproduction of snails (Fedi is weird so…).

No description, no teaser, not even a title, and no speaking URL either (if it is speaking, don’t truncate it when you post).

This doesn’t entice me, I have to say:

@mirabilos Well, you're entirely free not to read it, then!
@lisyarus and here I was trying to give a tip on improving things…
@mirabilos Thanks for the tip (really), I'm not using a dedicated client and just using a web browser, which shows the link preview, which has a post title. And, to be honest, it's the first time I see someone complain about this.

@mirabilos

It's just that the tone of your reply looked rather...ill-intentioned to me? You could say "Tip: in some clients you don't see a link preview, so the users don't know what the post is about", but you said (with intentional emotional paraphrasing to make it clear what I'm trying to convey) "I don't like it, you made a bad thing, I'm not gonna read it, it's bad".

@lisyarus sorry, I’m not english
@mirabilos No problem, neither am I šŸ˜…
@lisyarus there’s no such thing as a "link preview" in the majority of both Fedi client software and Fedi instance software (and both have to support it for the user to get any)
@mirabilos @lisyarus Do the tags not tease enough that it's probably not about snails? Or how about the fact that the person posting is usually talking about graphics, gamedev, and programming. You could click the article and spend about 10 seconds deciding if it's interesting or not. Talking about needing to be enticed sounds like you have too much of a firehose of a feed going on.
@lisyarus Seems pretty reasonable to me!

@lisyarus
> In reality it is a pretty deterministic (modulo compiler options, CPU flags, etc)

and then det(v, v) suddenly doesn't return 0.0 anymore because the compiler now targets a newer CPU with FMA and uses regular multiplication for `a.y * b.x` and FMA for `a.x * b.y - prev_result`, and FMA uses infinite precision so `a.x * b.y` can be slightly different from the result of the regular multiplication -_-

It annoys me that IEEE754 allows this and compilers (ab)use it

@Doomed_Daniel @lisyarus This is not an IEEE754 problem; IEEE754 defines FMA and unfused mul/add. It's up to each language to figure out how to compile compound expressions and what is permissible; C/C++ allow contraction by default, so you do get different results based on the platform or the underlying model - but you can also turn contraction off to fix this, or use languages that don't allow this implicit conversion.

@zeux @lisyarus IEEE754 explicitly allows contractions

IMHO compilers should put them behind -ffast-math or similar, not enabled by default

@Doomed_Daniel @lisyarus I agree that contraction should not be enabled unconditionally; Rust is an example of a language that doesn't do that - it follows IEEE 754 strictly. IEEE 754 does not explicitly allow contractions. In fact it has language that prohibits these by default, which C doesn't follow (clang will synthesize fma unconditionally without optimizations when supported)

@zeux @lisyarus
IEEE 754 at least seems to allow this as long as it can be switched off, doesn't require it to be disabled by default - or at least that's what the C/C++ standards and/or compilerwriters think

(GCC is extra fun because it only allows using -ffp-contract but ignores the FP_CONTRACT pragma)

@Doomed_Daniel @zeux @lisyarus again, that's not IEEE 754.

But it's not like the FP standard gets a vote on this. The language standards do whatever they want.

See also Kahan's rants on Java's choices wrt FP implementation in the 90s and early 2000s. Mind, in a lot of that he's railing against the idea of trying to aim for exact reproducibility since it was at the time impractical. By now it wouldn't be, though.

@rygorous @zeux @lisyarus
IIRC in a discussion about this years ago (and/or GCC bugtracker?) compiler-adjacent people argued that it does not violate IEEE 754, because the standard allows it as long as it can be controlled with a flag, or sth like that.

The language standard can do whatever it wants, but not claim that it conforms to IEEE 754 then?

@Doomed_Daniel @zeux @lisyarus Again, I don't understand what power you think the standard has here, though.

They can write all kinds of things into IEEE-754 that are then summarily ignored.

This is pretty much exactly how IEEE 754s traps have been handled by most langs for the past 40 years.

@Doomed_Daniel @zeux @lisyarus See also langs and environments that always treat subnormals as 0 and so forth.

It's just words on a page. What do you think the enforcement mechanism is?

The reality of the matter is that anything written in IEEE-754 is exactly as important as its users think it is. Historically speaking, very good track record on the basic stuff (correctly rounded fundamental operations, RN default, certain math lib fns being available), beyond that, very touch and go.

@Doomed_Daniel @zeux @lisyarus For most of the 80s and 90s we had a bunch of very different evaluation strategies, at least three:
1. basic RISCs with just +,-,*,/,sqrt and either only float or float+double but no extended format (and often limited support for traps etc.)
2. RISCs with FMAs (usually of the "almost everything reduces to FMA variety", e.g. the RS/6000, PPC and descendants)
3. M68k and x87 with internal extended precision and extra exponent range.
@Doomed_Daniel @zeux @lisyarus all three of these provided reasonable error bounds in the same ballpark but disagreed on the exact results, and simulating each of those with any of the others was impractically expensive. Hence compilers doing a best-effort mapping and language standards not saying much on the topic. What IEEE thinks about it, nobody really asked in the first place.

@Doomed_Daniel @zeux @lisyarus at this point, #3 is functionally dead, and most things that used to be in category #1 by now have moved to #2, i.e. FMAs used to be a "depends on the platform" thing and are now almost everywhere.

The side effect being that a thing that used to show up on ports to a different platform (i.e. going non-FMA <-> FMA targets) is now something that is showing up within targets, and the numerical env in general is much more homogeneous.

@Doomed_Daniel @zeux @lisyarus My point being, legislating any particular evaluation strategy was wholly impractical until at least the 2010s or so.

At this point it would be a lot more reasonable. It's just that we have a path-dependency where both the compiler options and the standards come from a world where insisting that things be nailed down in a particular way was simply impractical, and what compiler support there is for turning this off is also what used to be a total niche feature.

@Doomed_Daniel @zeux @lisyarus Because in the 90s, if you're porting numerical SW to say a PPC, you wouldn't want to not use FMAs, for the most part.

That's making a lot of things 2x more expensive because even just basic FP mul and FP add on that HW would execute as FMAs, and their was only one FP arithmetic unit which was the FMA unit!

Standards have more recently started to try to tighten down on things but it's just a hard retro-fit due to many historical accidents.

@Doomed_Daniel @zeux @lisyarus FWIW I also agree that FP contractions being enabled by default is very unfortunate these days, but it makes perfect sense for that to be the default given how we got the option in the first place, namely, it was introduced on HW where FMA was the only way to do FP adds or muls and turning contractions off could literally halve your FP throughput.
@Doomed_Daniel @zeux @lisyarus This is not "fast math" level shenanigans where the semantics are extremely fuzzy and the perf win often minor; this one is _usually_ fine (except when it isn't, of course, but this is a lot more rare than say fast-math screw-ups), the perf difference was quite substantial, and it was a very niche thing to want to turn off.

@rygorous @zeux @lisyarus
my assumption was that if a language/compiler says it implements IEEE 754 that it then by default conforms to that standard - but apparently that was optimistic/naive.

either way, this is a massive footgun that has become more relevant recently with x86-64-v3 becoming a popular target

@Doomed_Daniel @zeux @lisyarus as far as I'm aware, C++ standards do not have anything analogous to C's Annex F that would say how the language binds to IEEE 754 / IEC 60559.
@lisyarus fun fact, I agree with the sentiment but some of the examples are horrible, but before commenting further I'd prefer your permission 8-D
@oblomov Sure!

@lisyarus OK, let's get started 8-D

1. when an epsilon check *is* made, the epsilon should never be arbitrary, but at the very least be based on machine epsilon and the the magnitude of the numbers involved (e.g. something like fabs(x - y) < (x+y)*FLT_EPSILON/2 rather than fabs(x-y) < 1.e-4, unless additional information is known about the process by which the numbers are obtained.

@lisyarus

2. I like the recommendation to use hypot for the vector length, but it might be worth mentioning that for maximum accuracy it's preferrable to sort the components, or at the very least identify which one is largest. Computationally, this spares one (expensive) division and one multiplication at the expense of two swaps, but adding the lower terms squared to 1 is more consistently accurate than the three squared terms added in arbitrary order.

@lisyarus

2bis. also I don't think anyone ever brought that up as an example in which to use an arbitrary epsilon, but that's a different matter 8-D

@lisyarus

3. the Gauss–Jordan elimination argument is *really* flaky. GJ is just an extremely poor algorithm in general, and should not be used at all, but replacing the epsilon check with a 0 check doesn't really help in its application. ā€œVery impreciseā€ can be quite catastrophic depending on applications.

@lisyarus

4. the user input case really brings out the fact that main pain point with epsilons isn't even the epsilons themselves, but the fact that they are chosen *arbitrarily*; however, this is a very different matter compared to ā€œdon't use them when they aren't neededā€. (yes this connects to point 1. above)

THE END.

@lisyarus Not a controversial statement among numerics people at all, FWIW. :)
@lisyarus guilty here. i would like to think i learned something, but i don't know if i can come up with the solutions.
@lisyarus It's a perfectly normal thing to do. But then so is approximately equal testing...it might need to test both abs and rel measures though.
@lisyarus symbolic algebra knows no floats :-)
@lisyarus for some examples, I wondered why floats, at all? A grid is not in need of floats. Neither is the passing of time. 1/4 of an hour is just a representation. 60 / 4 or 3600 / 4 also. I do use floats but as little as possible. In audio land, using floats just increases deps. Like, need fpu all of a sudden. My conventions, where time is concerned, all use the heuristic of Hz/cps and time becomes a counting issue. Probably missunderstood something.
@lisyarus looks interesting, I’ll read it later! I feel like it’s obligatory for me to link https://herbie.uwplse.org/demo/ here. It’s an excellent tool for finding alternative math expressions which are faster and/or have better float precision.
Herbie web demo

@lisyarus I learned "use epsilon" before IEEE-754 existed, when you didn't necessarily know how your floats behaved at boundary conditions. and I never really thought about it much after floats became universally standardized.

this feels like a case of a best-practice principle losing its context and getting misapplied in a changed environment

@lisyarus thanks for sharing this. What do you think about using double precision for intermediates when inputs are in single precision? For example, in your "vector length" example it's much cheaper to cast x, y, z to double instead of dividing by max(x, y, z) (in context of a CPU target, not GPU) — after the cast extreme exponent values do not pose an issue. This is exactly how hypotf in a modern C math library works — it's the double-precision hypot that needs to be fancier than that.
@lisyarus Thank you for this article. I found it enjoyable and useful!
@lisyarus Related talk by Rose Peck a.k.a. sixfold-origami at last year's RustWeek (not specific to Rust): https://www.youtube.com/watch?v=JY4nYnLrnt8
Probably no new insights there for you, but maybe for others in this thread, and very entertaining in any case.
Floating Point Hashing - Rose Peck

YouTube