The non-problem of unsigned integers in Java

2 hours ago 2

Java famously has no unsigned integer types^[1]. This is sometimes treated as a problem that needs a solution. Assuming that it is a problem, let's see some solutions.

Solution 1: use a wider type

[...] Although it is possible to get around this problem using conversion code and larger data types, it makes using Java cumbersome for handling unsigned data. While a 32-bit signed integer may be used to hold a 16-bit unsigned value losslessly, and a 64-bit signed integer a 32-bit unsigned integer, there is no larger type to hold a 64-bit unsigned integer. In all cases, the memory consumed may double, and typically any logic relying on two's complement overflow must be rewritten. [...]

Wikipedia: Criticism of Java

This is a legitimate approach, and not as cumbersome as the quote makes it sound. Adding two variables of type u32-in-long a and b would look like a + b. Wow, very cumbersome. But this is only representative of the "middle" of the solution, you may (and only "may") need the occasional x & 0xffffffffL to hammer the value back into the expected range - which I assume "logic relying on two's complement overflow must be rewritten" refers to^[2]. You can get away with using that very sparingly however, since having some garbage in the upper 32 bits of the long is mostly harmless. If you're paying attention, this is a hint that we mostly didn't need a long to begin with (carrying around garbage bits that don't contribute to the result doesn't sound terribly helpful), leading to the second solution.

Solution 2: don't use a wider type

[...] Alternatively, it is possible to use Java's signed integers to emulate unsigned integers of the same size, but this requires detailed knowledge of bitwise operations.[9] Some support for unsigned integer types was provided in JDK 8, but not for unsigned bytes and with no support in the Java language. [...]

Wikipedia: Criticism of Java

The level of "detailed knowledge" required for this approach is .. actually that part of the quote may just be bogus. There aren't going to be many bitwise operations. The link in that quote [9] links to an article which doesn't discuss or use this approach, it uses an u32-in-long at one point, several u8-in-ints, and a char.

In this case, adding two variables of type u32-in-int a and b would look like a + b. Wow, detailed knowledge of bitwise operations required.

But showing only addition is unfair

For some operations there is a bit more going on. Conveniently, Java 8 added functions such as Integer.compareUnsigned and Integer.divideUnsigned (and corresponding functions in Long) to handle that for you, so you again don't need detailed knowledge of bitwise operations. Perhaps that quote predates Java 8, and then if you wanted to manually implement something like "is this long unsigned-less-than this other long" then finally you may need detailed knowledge of bitwise operations to be able to write:

long m = 1L << 63; if ((a ^ m) < (b ^ m)) // a unsigned-less-than b

But instead of XOR you could have added or subtracted the offset (still the same m) that "rotates" the number line such that the low half of the unsigned range is mapped to the low half of the signed range, and the high half of the unsigned range is mapped to the high half of the signed range. That's a rotation by a half turn, so it doesn't matter which direction you do it in (you can add or subtract m) and it's also the same as swapping the halves (which is how you can interpret the XOR by m). So I'm still not sure where the quote thinks the bitwise operations come into play.

Anyway let's see some example with different kinds of operations, such as, picking an example is tricky but let's go with this:

uint32_t murmur_32_scramble(uint32_t k) { k *= 0xcc9e2d51; k = (k << 15) | (k >> 17); k *= 0x1b873593; return k; }

Using u32-in-long, we could do this:

static long murmur_32_scramble(long k) { k *= 0xcc9e2d51; k = (k << 15) | ((k & 0xffffffffL) >> 17); k *= 0x1b873593; return k & 0xffffffffL; }

The final & 0xffffffffL on the return value is not necessary if the caller of this function can be relied upon to ignore the upper 32 bits. Anyway, two extra AND operations, it's not that cumbersome. You need to know where to put the extra AND operations though, that's not hard but it is something that relies on a human not making mistakes which is always a risk, especially if the human in question is a programmer. For that reason I'd say that it's the u32-in-long approach that requires knowledge of bitwise operations, moreso than the u32-in-int approach, but specifically it is knowing when you need them which is more about bit-level knowledge of arithmetic operations. Alternatively you could spam & 0xffffffffL at every opportunity "just in case" as a safety net, at that point u32-in-long starts to look pretty silly compared to u32-in-int, which for comparison could look like:

static int murmur_32_scramble(int k) { k *= 0xcc9e2d51; k = (k << 15) | (k >>> 17); k *= 0x1b873593; return k; }

Not much to see here, I only needed to change >> to >>>. Which is obvious. If we're doing unsigned arithmetic, the right shift had better be unsigned. This was just a one-to-one transliteration of the C code to Java, the two shifts and the OR form a rotate which could be spelled out directly:

static int murmur_32_scramble(int k) { k *= 0xcc9e2d51; k = Integer.rotateLeft(k, 15); k *= 0x1b873593; return k; }

Opinions

In general I've found that u32-in-int is quite nice actually, perhaps preferable to having to no-op-cast between signed and unsigned integers of the same size as C# and C++ make me do, to the point that I don't consider it a solution to a problem: it's a nice and natural way to work with integers (signed or unsigned, with is often a non-distinction) to begin with. The occasional call of a library function such as Integer.divideUnsigned to replace an operator is the worst part, which is not that bad and also not that common. Similarly, u64-in-long is mostly a natural way to work with 64-bit integers, and it doesn't have a credible alternative - there is BigInteger, now that is cumbersome, thanks to the lack of operator overloading and also we cannot be quite as lazy about clearing the high bits because a BigInteger can otherwise hoard a problematic amount of garbage in the bits that we're going to ignore in the end.

To be fair there are at least two real downsides to not having unsigned types:

An integer type by itself does not describe the interpretation of that integer, only its size, and the "default interpretation" that all integers are signed can be misleading. This is especially an issue at interface boundaries.
Conversions to a wider integer type sign-extend, changing the bit-pattern you're working with, even implicitly. Java code that works with bytes tends to spam & 0xff all over the place to counteract this.