Testing the GCC-based Rust compiler(back end)

3 months ago 1

In my previous article, I talked about some of my work on fixing the bugs within the GCC based Rust compiler backed.

rustc_codegen_gcc is, in layman's term, nothing more than a Rust compiler plug-in, allowing us to use GCC instead of LLVM.

There is nothing more important for a compiler than correctness. An buggy, non-standard compiler is nothing short of a nightmare. Imagine a world in which a correct program proudly proclaims that infinity is less than 0. What a nightmare!

Finding compiler bugs is not exactly easy - in this article, I explain some of my work on finding bugs in cg_gcc.

The main problem we face is quite simple : compilers are very big. There is a lot of moving parts, and they interact in non-obvious ways.

The code you write goes trough multiple processing stages, all of which can break. Each tiny part of syntax, type and operation need to be tested - and tested extensively.

Moreover, we can't test those things in isolation - bugs often arise from complex, hard to understand combos of features.

One of the bugs I(spoilers!) found required some pretty absurd control flow.

pub fn fn2(mut arg: u8) -> isize { let mut mut_int = 0; let mut adt1: Adt1 = Default::default(); loop { let mut tmp = !mut_int; match arg { 0 => return 0, 1 => { mut_int = 1; adt1.a.fld1 = tmp as u8; fn3(adt1.a, adt1); } _ => return 0, } // Needed to prevent an earlier opt pass from seeing arg does not change(?). core::hint::black_box(()); arg = arg & arg; match arg { 1 => return 0, _ => continue, } } } #[inline(never)] pub fn fn3(arg1: Adt38, arg2: Adt1) { dump_var(!(arg1.fld1) as isize); }

If you remove any part of this sample - it will no longer get miscompiled by cg_gcc.

This is as simple as I could get, after a lot of tries. If you think about this code for a second, it is kind of pointless.

Think about the loop I have shown here.

If arg is not 1, then we return in the first match(the Rust equivalent of switch):

match arg { 0 => return 0, 1 => { mut_int = 1; adt1.a.fld1 = tmp as u8; fn3(adt1.a, adt1); } _ => return 0, }

So, after this point, we know that arg is 1.

At the very end of the loop body, we do this:

arg = arg & arg; match arg { 1 => return 0, _ => continue, }

Since we know arg to be 1 at this point... this loop can never actually loop - we just always return 0 in the first iteration!

Still, that admittedly useless loop is needed to trigger the bug. Bizarre.

// LLVM !(arg1.fld1) as isize = 0 // GCC without opts !(arg1.fld1) as isize = 0 // GCC with opts !(arg1.fld1) as isize = -256

No real person would write code like this. But... this is a sign of a larger problem, that could affect real code. We can't have that.

Now, think about how you could test for something like this. How can we detect a compiler bug - especially one that requires so many moving parts to fall into place?

The best solution seems to be, for better or worse, throwing more compute at the problem. We need to fuzz.

The best way to test a compiler is... to compile a whole bunch of things. Who would have thunk.

Now, I could try generating random Rust code myself, but doing so in an efficient, and correct way is surprisingly difficult. So, I will take an off-the-shelf tool, called rustlantis.

If you want to hear it from the horse's mouth, here is a paper about it. It is an interesting read, and I strongly recommend you take a peek.

Still, what matters for our purposes is that rustlantis generates something called "MIR".

MIR

You can think of MIR as a simplified representation of Rust. The Rust compiler fronted maps all the complex features of Rust(iterator, async, borrows) down to simple MIR ops: copying data, arithmetics, function calls and so on.

So, all the complex features of Rust get boiled away into simple, primitive operations.

use core::intrinsics::mir::*; // The Rust compiler can parse MIR from Rust files // Neat :) #[custom_mir(dialect = "built")] pub fn simple(x: i32) -> i32 { mir! { let temp2: i32; { let temp1 = x + x; Goto(my_second_block) } my_second_block = { temp2 = temp1 * temp1; // only one op per line is allowed! RET = temp2; Return() } } }

Each compiler backend(cg_llvm, cg_gcc) is feed this simplified representation of Rust.

This is perfect for our purposes: we only care about this very last part of the compilation process. This is the part we(as in the GCC based backend) are responsible for - so the more we test it, the better.

The code generated by rustlantis is also fully deterministic, and well defined. So, if we compile code with LLVM(in debug mode), and with GCC(with optimizations), their results MUST match up. Otherwise, we have a bug.

rustlantis's output is also quite odd. It offsets pointers, performs odd casts, transmutes types, calls a whole lot of intrinsics. In simpler words, it uses a lot of parts of the compiler. Not all of them, but still more than enough for our purposes.

(*_1) = -RET; _1 = core::ptr::addr_of_mut!(_6); _4 = !_9; _17.1 = core::ptr::addr_of_mut!(_7); _15 = _12.2 << _8; _16 = (-3256106608463376368_i64) as f64; _8 = RET >> _3;

The tool is also decently fast - on my laptop, I could achieve a fuzz rate of 1.7 files(each file of 10K+ lines) per second.

So, in an hour, we could fuzz 61 million lines of code. If we throw this much things at the wall, something is bound to stick.

And, we got a bug. In fact, we got quite a few bugs!

So, we got a bug - now, all that remains is to fix it, right?!

Well, hold your horses.

We did not really find a bug - we found a 10K+ LOC file, with some code that triggers the bug in it.

It is like finding a needle in a haystack - knowing that the needle is in this haystack does not help all that much.

Ideally, we'd like a simple one-function file, containing a few lines of code, needed to trigger the bug.

Getting such a smaller bug reproducer requires a process called minimization.

Naive approach

The simplest, most brain-dead solution to this problem is to... try removing lines of code, and check if the bug is still present.

(*_1) = -RET; _1 = core::ptr::addr_of_mut!(_6); _4 = !_9; // Is this needed to trigger the bug? // _17.1 = core::ptr::addr_of_mut!(_7); _15 = _12.2 << _8; _16 = (-3256106608463376368_i64) as f64; _8 = RET >> _3;

This would not really work, for a couple of reasons.

UB

Remember when I said this:

The code generated by rustlantis is also fully deterministic, and well define

That is true. But... removing random lines of code can lead to non-determinism, or worse, UB.

Suppose we have code like this:

int index = rand(); index = index % sizeof(arr); // Ensure index within bounds! return arr[index];

This is perfectly fine, albeit odd, code.

What happens when we comment out the second line?

int index = rand(); return arr[index];

Yeah... we are going to be reading out of bounds - which is UB in C. This program will compile, and LLVM and GCC will disagree about the result... but that does not mean the compilers are buggy - our code is.

Now, some of you may say: hang on a minute, you can't trigger UB in safe Rust! The compiler won't let you do that!

MIR is not safe Rust - in fact, it is not Rust at all. Remember, MIR is what the compiler produces after borrow checking, and a bunch of other checks.

All MIR is, in a sense, implicitly unsafe. The compiler does not check it, because it is not supposed to be written by humans.

So yeah, you can trigger UB in MIR, and you can do so with ease :(.

This means that we need to somehow ensure our code contains no UB.

MIRI

`miri` is a handy little tool that can catch the vast majority of UB in any unsafe Rust code. As the name implies, it works with MIR - so we can use it to check if our changes introduce any UB.

This should work out fine in most if not all cases... with some caveats.

miri is slow.

Speed

Running our sample, 10K+ program under MIR takes 3-4 seconds.

So, checking every line in a our program will take at least 4s 10 000 = 40 000 s =eleven hours. Peronepass of a file. And wedo* need more than one.

Consider this:

let _3:f32; _3 = 0.0;

We can't remove the variable declaration _3 until we remove the assignment to it. So, to remove this, we'd need at least 2 passes.

With code like this:

_1 = 0.0; _2 = _1 * 4.533; _3 = _2 + 1.05; _5 = _3 / _1;

We'd need to remove _5, then _4, then _3, then _2, and then _1. Ouch.

Now, this process can be speed up by parallelizing some of those checks. But, even with 20 threads, all just running this naive reducer, this would still take a considerable amount of time.

Not to mention the heat - I don't want a noisy space heater in the middle of summer.

There are better ways to do this.

Your first thought might be to use a tool like creduce - something designed to more smartly reduce compiler bug samples.

creduce does a pretty good job at this task - it is smart enough to remove multiple things at the same time. However, it struggles with some MIR syntax.

I have seen it leave basic blocks like this:

// Unreachable block bb11 = { Goto(bb12) }

While we can see that this block of code can be safely removed, creduce does not understand MIR syntax, so it can't really see that this is a basic block.

In my experience, creduce reduces Rust code really well, but struggles with more convoluted MIR.

Could we exploit our knowledge of MIR syntax to speed this process up?

The first question we ought to ask ourselves is: what is the most expensive part of our MIR samples?

Usually, it is the call to the function dump_var. This function is responsible for checking for differences between LLVM and GCC executions.

This function either hashes it's arguments, or prints them as strings.

#[inline(never)] fn dump_var( f: usize, var0: usize, val0: impl Debug, var1: usize, val1: impl Debug, var2: usize, val2: impl Debug, var3: usize, val3: impl Debug, ) { println!("fn{f}:_{var0} = {val0:?}\n_{var1} = {val1:?}\n_{var2} = {val2:?}\n_{var3} = {val3:?}"); }

So, it tends to be expensive.

Most calls to dump_var are not needed: we are interested only in the call that prints the problematic value.

dump_var(fn_id, var_id, buggy_var, ...); // Keep this dump_var(fn_id, var_id, ok_var, ...); // Discard this

So, we can first try removing all the calls to dump_var, and only keep the one call we need. This alone shaved a second or two off my execution time.

But... we can do better.

Removing function calls

Suppose we have some code like this:

let bug_var = buggy_fn(); // Triggers a compiler bug, and behaves incorrectly under GCC. let ok_var = ok_fn(); dump_var(fn_id, var_id, buggy_var, ...);

The call to ok_fn is not needed to trigger this bug. Ideally, we'd like to remove this call: it only clutters the code, and wastes CPU cycles.

Since function can execute a lot of code, and call other functions, removing function calls ought to be our priority.

A function call in MIR looks something like this:

Call(_88 = fn19(_31), ReturnTo(bb32), UnwindUnreachable())

Here, we call f19 with _31 as it's only argument, and then we jump to the basic block(part of the function) bb32.

What is interesting about this is that we can remove function calls in a way that is very unlikely to cause UB.

You see, when rustlantis generates function calls, it ensures that the function arguments are all initialized.

So, we know rustlantis won't generate a function like this:

fn init_arg(arg:*mut u32){ // This is needed - otherwise, we have UB. *arg = 0; }

This means that replacing a function call with an assignment and a jump will not introduce UB into the program we are minimizing.

_88 = 0; Goto(bb32)

After this transformation, we have a slightly different, but still well defined program.

If this program still has the bug we are looking for, we can start minimizing that simpler piece of code instead.

This allows us to remove a lot of code quickly, speeding testing with miri noticeably.

Removing functions

After this step, we know that we are very likely to have some dead code. So, as a next pass, I automatically try to remove each function present in the program.

This tends to very quickly simplify the program by a lot.

// We have removed the only call to `fn19` - it is no longer needed :(. fn fn19(arg:u32)->u32{ /**/ }

Abusing matches

If you dig into the rustlantis paper, and the source code of that tool, you might discover some interesting things.

Despite inserting things like matches and Gotos... the code generated by rustlantis does not actually branch all that much. In reality, most of those are "decoys" - things rustlantis know will never be executed, but compiler may not know about that. They are there mostly to just confuse the hell out of the compiler, hoping to trip it up, and trigger a bug/

We can abuse some of the facts about how rustlantis generates code. In a match, the generated code will almost always jump to the second-last arm... in the absence of compiler bugs, that is.

// This is **NOT** a Rust match - MIR matches jump to the block selected by the arm! match _56.3 { 0 => bb20, // Will never jump to bb20 1 => bb33, // Will never jump to bb33 2 => bb34, // Will never jump to bb34 3035215634051155254 => bb36, // Always jumps to bb36 _ => bb35, // Never jumps to bb35. }

With this, we know that we could replace this whole match with just Goto(bb36) - and the behavior of our program should not change.

Of course, sometimes those ""decoy" arms are needed to confuse the compiler, so we still need to check if our simplified reproducer still triggers the bug.

Even then, this transformation makes the code much simpler, and easier to read for humans. It also opens the way to some more simplification

Killing blocks

Now, we know that a lot of blocks in our program are unreachable. They may still be needed(eg. they form a decoy branch needed to confuse the compiler), but they are effectively dead code.

We can replace those blocks with calls to abort. Since the program can only jump to a start of a basic block, we can replace an entire dead block with just one call to abort.

bb19 = { _24 = -(-7024724093558139643_i64); (*_12) = core::ptr::addr_of!((*_18)); (*_15) = 78913118762850115972924188634687843923_i128 as u128; _34 = (*_15) == (*_18); _32 = _23 as isize; /* Hundreds of lines of code*/ (*_18) = _10 as u128; (*_18) = 21367986460140776094219671356767834567_u128; _38 = _5 as isize; Goto(bb13) }

We can turn the example above into just this:

bb19 = { Call(core::intrinsics::abort(), NeverReturn, UnwindUnreachable()) }

Now, besides simplifying the code in question, this also makes it very clear to a human that this is dead code.

So, when I look over the bug reproducer, I can very easily tell what is real code, and what is just there to confuse the compiler.

Removing blocks

After I replaced most of the basic blocks with aborts, I then try to remove as many of them as possible.

Why does this step take place after replacing the block with abort?

Consider loops:

bb1 = { Goto(bb2) } bb2 = { Goto(bb1) }

I can't remove any of those blocks now, because that would make the program invalid(we would remove the target of a jump).

However, if we first replace the blocks with abort, the problem disappears, and we can easily try removing them from the program.

bb1 = { Call(core::intrinsics::abort(), NeverReturn, UnwindUnreachable()) } bb2 = { Call(core::intrinsics::abort(), NeverReturn, UnwindUnreachable()) }

Removing most if not all basic blocks from a program allows us to perform even more simplifications.

Linearizing the control flow

Suppose we have a snippet of MIR like this:

bb1 = { (*_18) = 21367986460140776094219671356767834567_u128; _38 = _5 as isize; /* Hundreds of lines of code*/ _34 = (*_15) == (*_18); Goto(bb2) } bb2 = { /* Hundreds of lines of code*/ _32 = _23 as isize; Goto(bb3) }

You can quite clearly see that, in this case, we could replace those 2 blocks with 1, big block:

bb1 = { (*_18) = 21367986460140776094219671356767834567_u128; _38 = _5 as isize; /* Hundreds of lines of code*/ _34 = (*_15) == (*_18); /* Hundreds of lines of code*/ _32 = _23 as isize; Goto(bb3) }

So, this is also something my minimizer attempts to do. We try to "linearize" the control flow(remove any jumps), and see if the new program is UB-free, and still triggers the compiler bug.

If all goes well, and the bug is relatively simple, at the very end, we are left off with a bunch of simple, one-block functions.

bb1 = { (*_18) = 21367986460140776094219671356767834567_u128; _38 = _5 as isize; /* Hundreds of lines of code*/ _34 = (*_15) == (*_18); /* Hundreds of lines of code*/ _32 = _23 as isize; /* Hundreds of lines of code*/ Return() }

By this point, the sample is usually 10-30% of its original size. That is enough for humans to start their work.

While fuzzing directly on MIR is great, it is not exactly the ideal form of bug reproducers. The macros I used to build this COMPILER INTERNAL IR are really meant for developers, and are not pleasant to use.

Ideally, we'd like to trigger this compiler bug from safe Rust: then, we would not need to run miri to check for UB at all.

Still, you may wonder: why not run creduce at this point? It is a tool designed to reduce bugs, and, at this point, the miri execution is fast enough for this to not be a huge problem.

Well, the problem with dealing with compiler-internal macros is that nothing is designed to handle them. Including rustfmt(the Rust formatter).

creduce, for some reason, really likes to remove a lot of new lines from the sample, "reducing" it to a single line of code... that I can't format.

That is not a very pleasant experience.

You know what rustfmt is designed to format? Rust code :). So, the next logical step is to take one of those simpler files, and start turning MIR back into Rust.

Enums

The very first manual step I take is removing all the enums from the file I am operating on.

Why? Since MIR is not Rust, it can do some cursed things that are impossible to reproduce in even unsafe Rust.

What does this bit of code do?

Field::<u16>(Variant(_37, 1), 2) = 8;

Well, of course, it assigns the value 8 of type u16 to the 2nd field of the variant 1 of an enum.

To people familiar with Rust, this immedeielty rings a bell as something impossible.

You can't do something like this:

result.Err.0 = "Hello :)". // WTF?

Enums have variants, and you have to match on those variants to access their contents.

In MIR, this is, however, kind of allowed, and relatively common.

The cursed ability to access the fields any enum variant(even the non-active one!) is needed for coroutines, which share some of their handling with enums.

Since rustlantis tries very hard to find all manner of compiler bugs, it also tests this code.

However, since this is something not present in surface Rust, we have to find workarounds. In most cases, I managed to either:

Remove all but one enum variant, leaving me with what is effectively a struct.
Replace the enum with an union.

Thankfully, after this, things start to get a bit better.

Replicating cursed control flow.

Let us go back to one of the first examples I showed you:

loop { match arg { 0 => return 0, 1 => fn3(adt1.a, adt1), _ => return 0, } match arg { 1 => return 0, _ => continue, } }

Just for a second, focus on the control flow in this sample. Think about how it would look, expressed in just gotos and matches(or C switches).

loop_head = { match arg{ 0 => ret0 1 => call3, _=> other_ret0 } } ret0 = { RET = 0; Return() } other_ret0 = { RET = 0; Return() } call3 = { Call(fn3(adt1.a,adt1), ReturnTo(other_match), UnwindUnreachable()) } other_match = { match arg { 1 => another_ret0, _=>loop_head, } } another_ret = { RET = 0; Return() }

Can you see what is going on? It is probably hard, is it not? Even with helpful block names, this is still a slog. Now, imagine there is no names(just numbers), the order is scrambled, and each block has a bunch of BS in it.

Thankfully, there are ways around this.

Poor man's goto

Goto's going back can be replaced with... labeled loops.

'bb1: loop { // Goto in Rust :) if num == arg { continue 'bb1; } do_sth(); 'bb2: loop{ // Goto bb1 if num2 == arg { continue 'bb1; } // Goto bb2 if num3 == arg{ continue 'bb2; } do_sth_else(); } }

This is kind of cursed, but it is more or less how I converted the original MIR into unsafe Rust. Now, you can probably see why I spent so much time removing this kind of branching automatically :).

Going from unsafe Rust to safe Rust is still annoying, but it is much more bearable.

The main problem here is that, on MIR level, lifetimes or borrows don't exist(kind of). So, there are perfectly sound things that you can't do in safe Rust, that I needed to replicate.

Now, most of those things were useless, but they nonthelles tested parts of the actual language.

// This: _16 = core::ptr::addr_of!((*_16)); // Is just the long way of doing this: _16 = _16; // Which is... a NOP.

At this point, I was usually left with a couple of hundreds of lines of safe Rust. And here, I could let creduce shine.

Remember, since we are in safe Rust, we don't need to worry about UB. If a transformation is incorrect, it will not compile. So, I could ditch mriri, and get some simpler bug samples.

After a lot of finagling, I ended up with this curious thing:

fn main() { let mut _4 = 2623078656.0; _4 = _4 * _4 * _4; let val = -(_4 * _4) * _4 * (_4); let inf: f32 = val * val; let int = inf as u64; println!("{inf:?} {:x} {int:?}", inf.to_bits()); }

According to GCC in debug(and LLVM), the result of this operation are as follows:

inf 7f800000 18446744073709551615

So... the number is infinity, with the float pattern being indeed infinity, and the result of casting this as an int is 2^64-1.

That is correct - in Rust, float to int casts are saturating. That means that they try to return the closest integer value, if the float is out of range.

So, 55.0 as u8 is 55, but -1.0 as u8 is 0, and 300.0 as u8 is 255 - the closest number we could get.

The biggest integer is closest we will ever get to infinity, so this is correct.

However, as soon as we turn the optimizations on... the result changes:

inf 7f800000 0

Huh? How is infinity the closest number to 0?

Now, at this point, I went on an entirely wrong path, trying to change the way we do float-to-int casts on the Rust side.

Since those casts are UB in C, I assumed we just did something wrong, and that is the source of our problem. The semantics of GIMPLE(GCC's IR) are a bit nebulous, but, since GCC compiles C well, if we follow the rules of C, we should be golden, right?

I will not bore you with the detail, but I spent some time changing the order of some ops to be 100% C compliant. And that did nothing.

Somehow, someway, our range checks would get optimized away.

Infinity is less than 0.

I converted our code to C, only to see.. it still misbehave? Even tough it was clearly standard-compliant?

So, I decided to just return the value of our check and... What?

#include <stdbool.h> #include <stdio.h> bool womky(void) { float val = 3.4028234663852885981170418348451692544e+38; float a = val * -(val * val); float b = val * a; return b * b > 0.0; } int main(void) { printf("%d", womky()); } // -O0: 1 // -O2 0

Now, at this point, I was very, very confused. I know GCC has some pitfall-y float optimizations, but those are enabled with -Ofast. I am on -O2. So... what gives?

If I obtain infinity any other way, this program will print 1. If I print the bit-pattern of the float, I can see it is positive infinity. And yet, somehow... it is less than 0?

This lead to a lot of consternation and trying to figure out what is going on. Until somebody pointed out that... GCC thinks the number can only be NaN.

Global Exported: a_4 = [frange] float [+Inf, +Inf] +-NAN Global Exported: b_5 = [frange] float [] +-NAN Global Exported: _3 = [frange] float [] +-NAN

Something seems to be wrong there... although I am not 100% sure. Right now, I am kind of going over the C standard, and trying to figure out if this is some kind of UB or rule I have never heard about.

Still... seems like a potential GCC bug. I guess that is a neat find :).

I have a few more bugs like this, but I have not been able to reproduce them to C so far.

Reproducing the bug in C is an important step - it shows that the problem is, indeed, on the GCC side.

Integrating the fuzzer

Anyway - as you can see, fuzzing the compiler is very valuable. Besides discovering bugs, it also should raise the confidence of users of our backend.

This is why I worked on integrating this entire process with the build / test system of cg_gcc.

Fuzzing the project is now as easy as typing:

./y.sh fuzz --start=start_seed --count=iter_count

This will spawn multiple fuzzing thread. As soon as we find a bug, that thread will start working on reducing the bug sample as much as possible.

With this, I am able to just leave the fuzzer running overnight, and find some fresh bugs to fix.

But... fuzzing is not the be all and end all of compiler testing.

Imagine this: you have 2 compilers: Alice and Bob.

Code compiled with Alice works fine in isolation - and so does code compiled with Bob.

However, as soon as you mix those two together, bad things begin to happen. Functions return nonsense values, and you program crashes in the weirdest of places.

// Compiled by `Alice` fn print_four(val:u64){ println!("2 *2 is: {val}") } // Compiled by `Bob` let four = 2 + 2; assert_eq!(four, 4); print_four(four); 2 *2 is: 494845249282442424

This sort of nightmarish issue can be caused by ABI bugs.

What is an ABI?

An ABI is, in the simplest of terms, just the way data is passed between functions.

We agree that, on x86_64 Linux, the first function argument is passed in rdi.

So, the caller of our function may do something like this:

mov rdi, 4 call print_four

And the function can then read that variable from the register rdi

do_sth rdi ; Use `val` as the parameter to some instruction.

But... what would happen if the compiler Alice decided to put the first argument in rax, instead?

mov rax, 4 call print_four

Now, instead of reading 4, we read garbage.

do_sth rdi ; Oh no! We read garbage :(

When Alice and Bob disagree on the ABI, chaos ensues.

Now, you may think this is not a very common scenario - but it is surprisingly important.

THE RUST ABI IS REAL!

Your very first thought might be: well, rust does not have a stable ABI, does it?

So, who the hell cares if 2 compilers disagree on how something unspecified is implemented?

For better or for worse, the Rust ABI is real and can hurt you.

While you can't mix Rust code compiled with different compiler versions, you can safely mix code compiled with the same compiler version.

If that did not work, you could not compile any code at all.

You can also mix different compiler backends. People do this all the time: compile heavy dependencies with LLVM, and drop down to cranelift on things they often change.

This allows you to have most of your code run fast, while keeping iteration speed(cranelift is very fast at producing slightly slower code).

This is also something we should allow with GCC.

Mixing LLVM and GCC

Now, you may ask: why would I ever mix LLVM and GCC-compiled code?

In some cases, GCC can produce code faster than LLVM. Even without much optimizations on our side, we can already see GCC win in a few Rust benchmarks.

So, if we actually bothered to try to be fast, cg_gcc could produce better code in more scenarios.

If you could freely pick-and-choose between GCC and LLVM for each dependency you have, you could squeeze a tiny bit more performance out of your code.

There are some other uses for mixing LLVM and GCC compiled Rust code. By default, the Rust compiler ships with an LLVM-compiled standard library.

If GCC and LLVM could not mix, you'd have to download a separate standard library for GCC and LLVM work. That is... not pleasant.

The Rust team would also have to build & distribute those 2 different versions. This is not a cheap process.

In order to save on bandwidth costs, the Rust binaries go trough a very lengthy compression process, which usually takes hours to finish.

We don't want to have to duplicate that.

The C ABI

Ignoring all of the points above, there is one more reason to test if our ABI is compatible: C.

pub fn open(path: *const c_char, oflag: c_int, ...) -> c_int;

We can't live forever in our little Rust bubble. Sometimes... you just have to talk to C.

Now, on Linux, we could just do raw syscalls(talk directly to the OS) - that interface is rock-solid.

On Windows, however, you HAVE to use C libraries to talk to the OS. Period.

Syscall numbers are not stable at all. They can(and do!) change at any point - even with a version bump.

Take a look at this table.

In the first release of Win 11, NtListenPort had the syscall index 0x0109. However, in 11 21H2, it had the id 0x010b.

So, if Rust tried never use the system C library(and used syscalls instead), our attempt to listen to a port could call NtIsUILanguageComitted instead. Not nice.

This means that we need to talk to C, and implement the C abi correctly. Otherwise... we might see some fun bug reports :).

Checking for ABI correctness

So... how can we check that our implementation of any ABI is correct?

I mean... we could read the ABI spec, read the Rust ABI handling code, and the assembly we generate - checking that all of that is correct.

That is, however, boring. And not very idiot-proof - which is important, since I am an idiot.

It is very easy to miss some odd corner case this way. Humans are squishy and good at missing things.

Ideally, we'd automate this process. Throw a whole bunch of function signatures at the problem, and see if something went wrong,

Good news - there is a tool for that!

The ABI cafe.

ABI-cafe can automatically generate ABI tests in Rust, and in C. Given a set of compilers, it will create a set of tests where each those compilers will attempt to call each other.

// Compiled by Alice fn return_struct(field_a: u32, field_b:f32, /* and so on */)->MyStruct{ /**/ } // Compiled by Bob fn test_return_struct(){ assert_eq!(return_struct(1, 3.0, /* and so on */), MyStruct{field_a:1, field_b:3.0, /* */}); }

The same test can also be applied to check for compatibility between Rust and C:

// Compiled by C compiler Alice struct MyStruct return_struct(uint32_t field_a, float field_b, /* */){ /* */ } // Compiled by Rust compiler Bob fn test_return_struct(){ assert_eq!(return_struct(1, 3.0, /* and so on */), MyStruct{field_a:1, field_b:3.0, /* */}); }

It even found some C compiler bugs. The tool is a bit more advanced than what I have shown here, so I highly recommend you read up on how it works under the hood.

What is important is that:

This tool can test a lot of compiler combos
The test suite is pretty extensive, checking all sort of odd corner cases.

In our case, this is all the important.

Since integrating abi-cafe with our test system(you can run it via y.sh abi-test), I have discovered a lot of bugs. Small ones, big ones. Runtime errors, compiler errors - even compiler crashes.

Most of them are corner-cases you'd never think about, so catching them is great.

Some of them are a bit more worrying, and harder to fix.

Proc-macro crash

This particular issue is actually the reason I bothered with integrating abi-cafe into our tests.

When trying to build the Rust compiler on ARM(using GCC), we'd get this odd error:

thread '<unnamed>' panicked at sysroot_src/library/core/src/panicking.rs:226:5: unsafe precondition(s) violated: slice::from_raw_parts requires the pointer to be aligned and non-null, and the total size of the slice not to exceed `isize::MAX` This indicates a bug in the program. This Undefined Behavior check is optional, and cannot be relied on for safety. stack backtrace: 0: __rustc::rust_begin_unwind ... 13: proc_macro::bridge::client::run_client ... 23: rustc_interface::interface::run_compiler::<(), rustc_driver_impl::run_compiler::{closure#0}>::{closure#1} note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

We seem to trip UB checks... in the Rust compiler. That is not good. Not good at all.

But... hang on a minute - at that point, I haven't yet build a Rust compiler using GCC. That is what I was trying to do. So... this crash is in LLVM-compiled code?

GDB seems to think so(saying that those functions come from librustc-driver.so) - but that does not make much sense. LLVM is known to work on this platform. It is extensively tested here.

I am running this code on the Rust remote desktops - where a whole lot of Rust compiler devs work on rustc each day. How would nobody notice this?

Seems suspicious.

We are crashing when interacting with a proc-macro - maybe that is related?

Proc-macros, IPC, and reality

Now, at first I have thought that crashing the compiler with a proc-macro would be impossible. I vaguely remembered the proc-macro RFC saying something about using IPC, and spawning an isolated, sandboxed thread.

In such a case, causing a compiler crash ought to be impossible - even if the shared library(.so, or .dll) was broken. At worst, we should fail some parse check, and just... not continue on.

We certainly should not crash the compiler.

Well, I was mistaken. The RFC does indeed mention using IPC for proc-macros, and talks about preventing compiler crashes.

Currently, procedural macros are dynamically linked with the compiler. An alternative architecture would have procedural macros compiled as independent programs and have them communicate with the compiler via IPC. This would have the advantage of allowing static linking for the compiler and would prevent procedural macros from crashing the main compiler process.

So yeah... the RFC did say some of the things I mentioned... but it also said doing things this way is not the current plan.

proc-macros are far more cursed than you'd think. They can do pretty much whatever they want to the compiler - including forking it.

whichever-compiles is a Rust proc-macro that forks(spawns copies of) the compiler, until it finds a bit of syntax that successfully compiles.

whichever_compiles! { try { thisfunctiondoesntexist(); } try { invalid syntax 1 2 3 } try { println!("missing arg: {}"); } try { println!("hello {}", world()); } try { 1 + 2 } } $ cargo run hello world

So yeah... thanks to the Rust folks for showing me this. Rust proc-macros are something else.

ABI bug

After a lot of digging, I finally found the real cause of this problem.

You see, while the Rust compiler I used was indeed compiled with LLVM, the proc-macro in question was compiled with GCC.

The proc-macro boundary looks something like this:

extern "C" fn proc_macro(bridge:BufferConfig)->Buffer;

Now - both of those types have a stable, known layout. Still, somehow, GCC-compiled code would receive a garbage BufferConfig, detect that and crash. Odd.

And the issue only occurs on Arm64 CPUs. Not on x86_64.

This may surprise you, but the reason we receive invalid arguments on ARM is related to... the way we return structs from functions.

x86_64

Calling conventions are a hell of a mess. The rules about returning a structs are a bit complex, but, in this case, the struct is big enough to be returned indirectly.

What does that mean? Well, we don't have enough registers to cram that structure into them. So, instead, we pass it via a hidden pointer.

On x86_64(and most other architectures), this signature:

extern "C" fn proc_macro(bridge:BufferConfig)->Buffer;

Gets turned into something like this:

extern "C" fn proc_macro(out: *mut Buffer, bridge:BufferConfig);

So, we receive a pointer to a stack-allocated buffer, into which we ought to write our return value.

This pointer is passed to us the exact same way our first argument is: in edi. So far, so good.

ARM, however, decided to be a bit different.

Indirect Result Location Register

On ARM, structs are returned in a similar way, with one key difference. As per the ARM website:

XR - This is an indirect result register. If foo() returned a struct, then the memory for struct would be allocated by the caller, main() in the earlier example. XR is a pointer to the memory allocated by the caller for returning the struct.

So... our pointer is not passed in hidden, implicit first argument, but somewhere else: in register XR, a.k.a x8.

Due to a bug on our side(cg_gcc), we are always expecting the indirect return pointer to be in the first argument(x0).

LLVM places the data for our function as follows: x0 - bridge x8 - return value pointer Our code expects the data laid out as follows: x0 - return value pointer x1 - bridge

So, instead of reading the BufferConfig, we read an uninitialized buffer we are supposed to write data into.

Not a GCC bug

Here, I want to clearly state this: this is not a GCC bug, this is "our GCC-based Rust compiler backend" bug. We screw this up.

The fundamental problem here is as follows:

LLVM gives you low-level ABI building blocks: sret parameters, things like that. So, Rust exploits this low-level access heavily.
GCC does not give you low-level ABI access: ABI is not really something you ought to worry about. You provide it with a signature, and a few attributes - and voilà! You got yourself an ABI. This has huge advantages for portability: frontends need not to worry about any low-level details, which allows them to support a wide range of platforms with ease.

Those 2 approaches clash, and clash badly. You see... on the Rust side, the ABI is also out of our control.

cg_ssa is the part of the compiler responsible for this, and it is shared between cg_gcc, cg_llvm, and a few others.

cg_ssa inserts this hidden first parameter, and tells us:

Please place this in the right register for this platform. Thanks <3.

Except... we really, really can't. GCC needs some more high-level detail, and we can't really tell it: please place this arg here. That is just not how this works.

GCC does not want any of our "inserting first arg" BS. It just wants to be told: this function returns a struct of this type. That is all the info it needs.

But... cg_ssa expects this function to return the way it wants. So, it will do things like this:

// Write to arg0(our return pointer) (*arg0) = return_val; // Void return - this function returns indirectly :) return;

Really, the proper solution to this problem is to change cg_ssa - but that is not easy, but not for the reason you'd think.

Really, most if not all patches I mentioned can be done in a couple of hours, tops. Sure, the Rust compiler is poorly documented, but the changes we require are simple enough.

With a couple of hints from the Rust devs, I got the required patch ready in less than a day. I tested it with LLVM, and - everything works, hooray!

Now, it is time to test it with GCC, and...

thread 'rustc' panicked at /home/gh-FractalFir/rust/compiler/rustc_codegen_ssa/src/mir/rvalue.rs:1039:5: assertion `left == right` failed left: __uint8_t right: __int8_t stack backtrace:

Great. I broke something completely unrelated, and I don't know how.

After a couple of hours of wrangling with this, I decided to run the test on the previous commit, and...

thread 'rustc' panicked at /home/gh-FractalFir/rust/compiler/rustc_codegen_ssa/src/mir/rvalue.rs:1039:5: assertion `left == right` failed left: __uint8_t right: __int8_t stack backtrace:

Yeah. I did not break things.

Who needs tests?

For reasons a little outside this article, the cg_gcc tests are simply not run on any Rust compiler patches.

You can imagine why this is a less than ideal setup.

We fix all the issues that crop up every couple of weeks, but... things get broken shortly thereafter.

*Our* repo is well-tested, tough.

You can imagine why this is a nightmare for any patches. You are either:

Flying blind, not running tests too(they are broken), and praying things work out
Basing your patch on an up to 2-weeks old commit
Rushing after each sync, to make your changes before something else gets broken.

Thankfully, things are changing in this matter. Some of the blockers for running tests got resolved, and a MCP was accepted, meaning those tests will start being run soon.

Still, my patch is now in limbo. What can you do - such is life.

As a bit of a consolation, I at least finished the GCC-side of our ABI changes.

Why did we need to change anything on the C side? Glad you asked :).

Sometimes, the C and Rust ABI disagree on how a struct should be passed(via registers vs indirectly).

// Sample show to me by Bjorn3 #[repr(C)] struct Foo { a: u128, } #[unsafe(no_mangle)] extern "C" fn c_func() -> Foo { Foo { a: 1 } } #[unsafe(no_mangle)] extern "Rust" fn rust_func() -> Foo { Foo { a: 1 } } ; ARM assembly c_func: mov w0, #1 mov x1, xzr ret rust_func: mov w9, #1 stp x9, xzr, [x8] ret

In such cases, we need to ask GCC to pass the struct indirectly, even when it things that is not neccesarry.

Thankfully, that is kind of easy. We just need to set a special flag on the type we pass(TREE_ADDRESSABLE).

Besides the ABI bugs on the platforms we currently support, there is also the problem of ABI bugs on platforms we will support.

Right now, I am working on improving support for GCC-only(no LLVM support) platforms, like DEC-Alpha, or SuperH(SH).

Part of that process involves adding support to the ABIs of those platforms in the Rust compiler.

Now, even in cases where the ABI is well documented, we have had cases of ABI bugs.

With ABI tests, we could double-check our work, reducing the risk of such bugs.

You might be a bit surprised to learn that I have also been working on improving our support for Motorola 6800 processors.

They are now supported by LLVM, so you might think using GCC to compile for this platform is kind of pointless.

Well, I am working on building the rust compiler for this platform. The LLVM support is still a bit spotty, from what I have heard.

GCC is much more complete in that regard.

And I can now proudly announce that I am the first person to run the rust compiler on m68k processors!

I had to work around a few issues along the way: quite a few necessary crates don't support m68k yet. This is something you can help, by... just adding the needed support for those crates.

Here is the issue with more detail.

Now, I want to clarify that the Rust compiler built for m68k is still not all that functional.

I can get it to run some basic things(eg. printing compiler version, some other checks), but proper compilation requires us to build tarballs with the compiler + all the tools in them.

➜ m68k sudo chroot ~/m68k/vm2 qemu-m68k-static ~/release/rustc-main ~/hello-world.rs error: failed to find a `codegen-backends` folder in the sysroot candidates: * /usr fatal runtime error: failed to initiate panic, error 3386902200, aborting

Just copying files about is not really the right way to do things.

Sadly, it seems like I am hitting some edge case in the Rust compiler build system.

When tying to build some additional tools(things like cargo), the bootstrap script(rust compiler build tool) seems to "forget" where some libraries are.

error[E0463]: can't find crate for `std` | = note: the `m68k-unknown-linux-gnu` target may not be installed = help: consider adding the standard library to the sysroot with `x build library --target m68k-unknown-linux-gnu` = help: consider building the standard library from source with `cargo build -Zbuild-std`

The std crate(Rust standard library) is also obviously a dependency of the compiler, so I know it built just fine. So, it is probably just the case of needing some patches to that build system.

Still, this aborts the build process, and stops us from obtaining installable tarballs. Ugh.

I hope you enjoyed this article - even tough it is on the longer side.

Tests don't tend to be the most glamorous of things to talk about, but, as I hopefully shown, they are even more important when it comes to compilers.

In the next article, I will talk about memory leaks in rustc_codegen_gcc, and how I fixed them.

I will also talk about the terrible way &str matches work, and why they crashed GCC(and slowed LLVM to a crawl).

If you want to check out rustc_codegen_gcc, here is it's repo

Oh, and since this project is part of GSoC, you can see my daily updates here.

I would also like to thank all the Rust developers who helped me with my work. Really, a few good answers here and there saved me weeks of going down the wrong path.

So, thanks for that :D!

Read Entire Article