Mastodawn

we talked last week about what can go wrong with floating point numbers, so -- what can go wrong when using integers?

so far I have:

* 32 bit integers are smaller than you think (they only go up to 4 billion!)
* overflow
* sometimes you need to switch the byte order
* ?? (maybe something about shift / bitwise operations? not sure what can go wrong with that exactly)

I'd especially love real-world examples of things that have gone wrong, if you have them!

Show thread

lo-fi socialist junk Jan 16, 2023

@b0rk sign preserving vs non-preserving shifts with signed ints is always fun.

Show thread

Reed Mideke Jan 16, 2023

@b0rk For the last one, sign extension can bite you

Show thread

Julia Evans Jan 16, 2023

@reedmideke ooh do you have an example of how this has happened to you by any chance?

Show thread

Ash Jan 16, 2023

@b0rk @reedmideke using an unsigned int in a for loop can lead to an infinite loop (while x > 0). Also, the differences between how programming languages support these. Java doesn't have unsigned int.

Show thread

deadalnix Jan 16, 2023

@b0rk @reedmideke I ran into a compiler bug about them last week (was doing a signed shift when it should have done a logical one). Some hair pulling ensued.

Show thread

Reed Mideke Jan 16, 2023

@b0rk Hmm, don't remember a specific case, though I'm pretty sure I've messed it up. Sticks in my mind because assembler usually has two different right shift instructions, for sign extending or not, and if using an unfamiliar language, I have to figure out which it does.

As https://learn.microsoft.com/en-us/cpp/cpp/left-shift-and-right-shift-operators-input-and-output?view=msvc-170 notes it's actually implementation defined in C/C++ whether right shift sign extends 😱 though AFAIK normal platforms do

Left shift and right shift operators ('<<' and '>>')

Learn more about: Left shift and right shift operators ('<<' and '>>')

Show thread

George Dinwiddie Jan 16, 2023

@b0rk
* underflow
* signed and unsigned may act differently
* loss of precision if the order of operations decreases the number of bits in the the intermediate results

Show thread

Julia Evans Jan 16, 2023

@gdinwiddie can you say more about "signed and unsigned may act differently"? (how?)

Show thread

George Dinwiddie Jan 16, 2023

@b0rk
It's been a long time since I've worked at that level, but signed integers typically presume a twos-complement representation; unsigned not. That can affect shift operations, arithmetic between operands of different types, and probably other things, depending on how they're implemented.

If you divide by two by shifting right, you want to shift in a zero for unsigned, and repeat the sign bit for signed, for example.

Show thread

Julia Evans Jan 16, 2023

@gdinwiddie why would you shift an signed integer? i always assumed shift was only used on unsigned integers

Show thread

deadalnix Jan 16, 2023

@b0rk @gdinwiddie say you want a signed 31 bit integer and a bool packed into 32 bits.
int x; // the packed 32bits.
x & 0x01; // the bool.
x >> 1; // the int.

More generally, you want that to optimize, for instance, division by powers of two.

Show thread

Dan Lyke Jan 16, 2023

@b0rk @gdinwiddie shift is a great cheap divide by 2^n ('cause divides can be *expensive*), but... this in test.c:

#include <stdio.h>
int main(int argc, char **argv) {
printf("%d\n", -4 >> 1);
return 0;
}

$ gcc -o test test.c ; ./test
-2

And Node gives similar results:
$ node
Welcome to Node.js v16.15.0.
Type ".help" for more information.
> -4 >> 1
-2

But Perl....:
$ perl -le 'print (-4 >> 1);'
9223372036854775806

Show thread

Till O'Rly

Jan 17, 2023

@danlyke @b0rk @gdinwiddie

To be fair. That's just pearl doing pearl things.

Show thread

Dan Lyke Jan 17, 2023

@b0rk @gdinwiddie fair. I kind of assumed it was 64 bit losing the top bit, but above 32 bits the numbers aren't familiar to me, so it could be 53 bits or whatever the "overlay an int and a double on the same space" encoding ends up being. And I'm sure I've worked on architectures where the carry flag or zero of something got shifted into the high bit, so ya had to be careful.

Show thread

John McFarlane Jan 17, 2023

@danlyke @b0rk @gdinwiddie then compare -3/2 to -3>>1. Effectively, the rounding may vary. This is a pitfall of underflow.

Show thread

thinkbeforecoding Jan 16, 2023

@b0rk @gdinwiddie arithmetic right shift can divide sign numbers by 2 and keep sign. Left shift multiply by 2 with no prb.

Show thread

thinkbeforecoding Jan 16, 2023

@b0rk @gdinwiddie arithmetic right shift ire-nsert the sign bit. While logical right shift always insert a 0.

Show thread

Henk Langeveld Jan 16, 2023

@b0rk @gdinwiddie

I recall that the 6502 instruction set has two types of bit shift operators. A rotate, or a pure shift. One just recycles all bits, the other shifts in a zero every time.

Show thread

Henk Langeveld Jan 16, 2023

@b0rk @gdinwiddie

Nice. A Google site has an overview.

https://sites.google.com/site/6502asembly/6502-instruction-set/6502-instruction-set-bit-manipulation

6502 Bit Manipulation Operations - 6502 Assembly

Show thread

George Dinwiddie Jan 16, 2023

@hlangeveld @b0rk
I never programmed the 6502, and find it odd they don't have an ASR (arithmetic shift right) which keeps the sign bit. That's what you need for 2's complement division by powers of 2.

Compare with the 6800 instruction set: http://www.8bit-era.cz/6800.html#ASR

6800 instruction set

MC6800 decoding table, full instruction set with description. Addition to my web based MC6800 assembler.

8bit-era.cz

Show thread

Henk Langeveld Jan 16, 2023

@gdinwiddie @b0rk

Fair.

The 6502 was a more cost-effective chip compared to the 6800. I don't think integer division was a design target. Low cycle count operations were a strong objective.

Show thread

bazzargh Jan 16, 2023

@b0rk @gdinwiddie one that has been biting me since the very first time I tried to use Java is that bytes in java are signed. So 0xFF is a negative number, and when you do a right shift with >> it gets promoted to a _negative_ int, then masking back with & 0xFF to get the byte result is incorrect

byte b = (byte) 0xFF;
System.out.println(b); // -1
S.o.println(b >> 1); // -1
S.o.println(b >>> 1); // 2147483647
S.o.println((0xFF & b) >> 1); // 127

Show thread

Julia Evans Jan 16, 2023

@bazzargh @gdinwiddie one big takeaway for me from this thread is that java doesn't have unsigned integers!

Show thread

bazzargh Jan 16, 2023

@b0rk @gdinwiddie it's not something that normally bothered me but the very first thing I tried to write was a Quake bot, and I needed to pull bytes out of the network packets...boom

Show thread

Bill Ricker Jan 18, 2023

@b0rk @bazzargh @gdinwiddie

i was decidedly nervous when we were allowing a major contractor to reimplement a key accounting application in server-side Java.

In theory we had enough bits ...

(Alas IBM JVM didn't support the AIX Floating Decimal in firmware option, that was only for COBOL and RPG.)

Show thread

deadalnix Jan 16, 2023

@b0rk @gdinwiddie say you write the following bound check:
if (x < length - 1) { ... }

If you are using usigned, and if legth is zero, then the bound check is busted. This is a common error, and is very likely to pass code review.

Show thread

antipode77 Jan 16, 2023

@b0rk

How about adding signed versus unsigned.

Show thread

Rachel Greenham Jan 16, 2023

@b0rk they only go up to four billion when they're unsigned. 😉

Show thread

Drezil

Jan 16, 2023

@b0rk bigendian/littleendian when serializing/deserializing.

Show thread

Sami Samhuri Jan 16, 2023

@b0rk If you start looking at the bits of integers then you need to know about twos-complement or you might be surprised.

Show thread

bryan newbold Jan 16, 2023

@b0rk problems/mistakes when casting (eg, truncation or signed/unsigned). not handling signed ints correctly when doing bit manipulation (eg, masking bits for truncation)

Show thread

Julia Evans Jan 16, 2023

@bnewbold are there good reasons to do bit manipulation on signed integers? (what are they?)

Show thread

bryan newbold Jan 16, 2023

@b0rk I don't have a ref to a real example off the top of my head, but as a possible example, constructing a UDP datagram header length field.
start accidentally with a default 32-bit int and sum bytes. payload is huge and you roll over 2 billion; undefined but might appear negative. verify that payload is less than 2^16 (it is, because negative), then bit mask low 16 bits and shift in to header alignment

Show thread

bryan newbold Jan 16, 2023

@b0rk I think the more common issue is that one thinks they are manipulating an unsigned int, or assuming that a signed value is safely in positive range, but it isn't

Show thread

Julia Evans Jan 16, 2023

@bnewbold yeah that makes sense. someone else mentioned that some language don’t have unsigned ints

Show thread

bryan newbold Jan 16, 2023

@b0rk the most common times I end up doing bitwise "mask and shift" on integers are protocol headers, binary file formats, and embedded hardware register interaction, in C. but then sometimes I do a hack in python and forget when integers are signed/unsigned, etc, and and up using a library like 'struct' instead of figuring out how builtin types work

Show thread

doragasu Jan 16, 2023

@b0rk @bnewbold One of them is fast multiplication/division by powers of two.

Show thread

Nick Barnes Jan 16, 2023

@b0rk talk to John Regehr!

Show thread

∴ esoterik ∴Jan 16, 2023

@b0rk
- unexpected promotion of byte/short to int values (changes bitwise ops when negative)
- getting floor division when you expected true division or vice versa (esp in e.g. python)
- floor division behaves "differently" for negative versus positive integers, so -(x/y) is not (-x)/y.
- division by zero throws an error (unlike floating point)
- negating an integer doesn't always produce a non-negative integer (two's complement means there is one more negative than positive value)

Show thread

Micah Walter Jan 16, 2023

@b0rk they don’t all divide evenly / some of them are more special than others

Show thread

nelhage Jan 16, 2023

@b0rk This is mostly specific to C/C++, but in those languages signed overflow is undefined behavior, so simple addition can misbehave in all kinds of ways sometimes if the numbers get too big or too small.

Show thread

Julia Evans Jan 16, 2023

@nelhage i didn’t realize it was undefined behaviour!

Show thread

Graham Sutherland / Polynomial Jan 16, 2023

@b0rk @nelhage it really shouldn't be, given that the practical behaviour is identical on 99% of target ISAs, but here we are.

this undefined status also causes some pretty horrible nightmares with overflow checks being optimised out by the compiler, since to a lot of compiler devs "undefined" means "an excuse for an optimiser pass"

Show thread

Julia Evans Jan 16, 2023

@gsuberland @nelhage oof

Show thread

PulkoMandy Jan 16, 2023

@b0rk @gsuberland @nelhage this is better in C23, all implementations are now required to use two complement for signed integers, so there is a lot less things left undefined now

Show thread

Nathaniel J. Smith Jan 17, 2023

@gsuberland @b0rk @nelhage It's more complex than just how it behaves on target ISAs. If signed-overflow is defined, then the compiler can't tell whether

for (int i = 0; i < max_count; i++) { ... }

will terminate (because i++ might make i smaller!), which blocks certain optimizations. So defining signed-overflow would slow down lots of programs that never overflow (!)

It's a nasty mess with no good way out, like many aspects of C/C++.

Show thread

Thomas Leeper Jan 16, 2023

@b0rk operations on combinations of floats and integers can be surprising if you're expecting integers to be implicitly coerced to float, or vice versa.

Show thread

Clark Breyman (he/him)Jan 16, 2023

@b0rk - Sometimes folks will BCD inside. Sometimes that BCD is a date. https://www.neowin.net/news/y2k22-bug-microsoft-rings-in-the-new-year-by-breaking-exchange-servers-all-around-the-world/

Y2K22 bug: Microsoft rings in the new year by breaking Exchange servers all around the world

In order to resume processing of mail, sysadmins are disabling malware scanning on their exchange servers, leaving their users, and possibly the servers themselves, vulnerable to attack.

Neowin

Show thread

Julia Evans Jan 16, 2023

@clark TIL BCD

Show thread

Clark Breyman (he/him)Jan 16, 2023

@b0rk - I thought I knew it until I looked the Wikipedia article. So many encoding variants. Makes my head hurt.
https://en.wikipedia.org/wiki/Binary-coded_decimal

Binary-coded decimal - Wikipedia

Show thread

Farce Majeure Jan 16, 2023

@clark @b0rk the most amazing about BCD is the 8087 floating point unit had it as a native format!

Show thread

tef Jan 16, 2023

@b0rk this is the classic real world example of integer overlow https://ai.googleblog.com/2006/06/extra-extra-read-all-about-it-nearly.html

Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken

Show thread

tef Jan 16, 2023

@b0rk on a similar note, there's an hour long talk on "why implementing std::midpoint()" is very tricky https://www.youtube.com/watch?v=sBtAGxBh-XI

CppCon 2019: Marshall Clow “std::midpoint? How Hard Could it Be?”

YouTube

Show thread

tef Jan 16, 2023

@b0rk the only thing that comes up for me regularly is that modulus doesn't work the same way across languages, when you ask for the mod of a negative number

which leads to "what the hell is even happening" type bugs

Show thread

tef Jan 16, 2023

@b0rk or how floor division can trip people up with negative numbers, as it rounds to the smaller number

>>> -3.5 //2
-2.0
>>> 3.5 //2
1.0