Blowing the stack is one of the few ways to kill a Rust program that you can't really recover from. On Unix systems, the area just past the end of the stack is set up as a guard page (either by the host OS or by Rust's own setup code) so that trying to access it will trigger a SIGSEGV or SIGBUS signal, and as part of the general setup code Rust adds to every program, it installs a signal handler that checks if the address that caused the signal is a guard page, and if so, aborts your entire program. The code for setting it up is straightforward (and has some fun SAFETY comments too).
The reason that this is an abort (which you can't catch and always causes program exit) as opposed to a panic is because it's effectively impossible to guarantee anything about the state of the world. In particular, you might be in the middle of executing some unsafe code, an FFI handler, or some other thing where allowing execution to continue would violate safety guarantees. Plus, when the signal handler returns, you're still out of stack space, so you'd have to unwind without allocating any more stack! Even backtrace-on-stack-overflow, which does what the name implies, aborts immediately after printing the backtrace and comes with a note saying that it's "unsuited for being enabled in production".
Languages with a runtime (Java, Python, Go, Javascript) can just have the runtime manually check the stack, but we don't have that. The simplest answer is to have a depth counter and decrement it on every recursive call:
#[test] fn recursive() { fn sum(depth: usize, xs: &[u64]) -> Option<u64> { if xs.is_empty() { return Some(0) } if depth == 0 { return None } Some(xs[0] + sum(depth - 1, &xs[1..])?) } assert_eq!(sum(2, &[3, 4]), Some(7)); assert_eq!(sum(0, &[3, 4]), None); }Easy, simple, and basically what I wound up doing when I ran into this problem. But you don't need a blog post to tell you how to pass a parameter, and I wanted to show off. So let's have some fun with it.
If we're going to make this into something reusable, let's define some helpers. For reasons that will become apparent shortly, we define the natural numbers using the standard Peano encoding:
enum Nat { Z, S(Box<Nat>) }and implement a helper function that automates the recursion:
impl Nat { fn recurse<In, Out>(self, mut f: impl FnMut(Nat, In) -> Option<Out>, val: In) -> Option<Out> { match self { Nat::Z => None, Nat::S(nat) => f(*nat, val) } } }We can see that this produces the behavior we want by writing a test sum function:
#[test] fn nat_to_usize() { fn sum(depth: Nat, xs: &[u64]) -> Option<u64> { if xs.is_empty() { Some(0) } else { Some(xs[0] + depth.recurse(sum, &xs[1..])?) } } let nat1 = Nat::S(Box::new(Nat::Z)); let nat2 = Nat::S(Box::new(Nat::S(Box::new(Nat::Z)))); assert_eq!(sum(nat2, &[3, 4]), Some(7)); assert_eq!(sum(nat1, &[3, 4]), None); }But this still has the problem that we have to actually pass around an entire parameter. We should be able to do better than that. And here's where the bullshit begins.
First, we repeat our definition of the naturals, but at the type level:
use std::phantom::PhantomData; struct Z; struct S<N: Nat> { _phantom: std::marker::PhantomData<N> } trait Nat { }Implementing the type-level version of Nat::recurse is trickier, because there's no way to express a type that requires a function to have a generic parameter, let alone require that generic parameter to be constrained by a trait. So we emulate it using another trait:
trait TakesNat { type In; type Out; fn call<N: Nat>(&mut self, arg: Self::In) -> Option<Self::Out>; }Note that we don't actually need to pass in a value of type N here.
With that definition in hand, we can fill in our Nat trait:
trait Nat { fn recurse<F: TakesNat>(f: &mut F, arg: F::In) -> Option<F::Out>; } impl Nat for Z { fn recurse<F: TakesNat>(f: &mut F, _arg: F::In) -> Option<F::Out> { None } } impl<N: Nat> Nat for S<N> { fn recurse<F: TakesNat>(f: &mut F, arg: F::In) -> Option<F::Out> { f.call::<N>(arg) } }Now we can write our type-level recursion:
#[test] fn nat_to_usize() { fn sum(depth: Nat, xs: &[u64]) -> Option<u64> { if xs.is_empty() { Some(0) } else { Some(xs[0] + depth.recurse(sum, &xs[1..])?) } } let nat1 = Nat::S(Box::new(Nat::Z)); let nat2 = Nat::S(Box::new(Nat::S(Box::new(Nat::Z)))); assert_eq!(sum(nat2, &[3, 4]), Some(7)); assert_eq!(sum(nat1, &[3, 4]), None); }But this requires a bunch of overhead to write, because we need to define the application struct. Everyone knows that Rust declarative macros are easy to read, so let's use those to clean it up! We'd like to be able to write
make_def! { fn sum(arg: &[u64]) -> Option<u64> { if arg.is_empty() { Some(0) } else { Some(arg[0] + ) } } }to define our function. But even if we were okay with explicitly writing N::recurse(self, &arg[1..]) in our recursive branch, we can't; if we try, we get this error:
error[E0424]: expected value, found module `self` --> src/lib.rs:68:38 | 52 | / fn call<N: Nat>(&mut self, $arg: Self::In) -> Option<Self::Out> { 53 | | $body 54 | | } | |______________- this function has a `self` parameter, but a macro invocation can only access identifiers it receives from parameters ... 68 | Some(arg[0] + N::recurse(self, &arg[1..])?) | ^^^^ `self` value is a keyword only available in methods with a `self` parameterThe error message is slightly confusing, but the gist of it is that Rust's macro hygiene rules mean that self's name resolution ignores the identifiers defined inside the macro itself.
Fortunately, all hope is not lost. And the solution: more macros.
macro_rules! fancy_make_def { (fn $name:ident($arg:ident : $in:ty) -> Option<$out:ty> $body:block) => { fn $name<N: Nat>($arg: $in) -> Option<$out> { struct Impl; impl TakesNat for Impl { type In = $in; type Out = $out; fn call<N: Nat>(&mut self, $arg: Self::In) -> Option<Self::Out> { macro_rules! recurse { ($subcall:expr) => { N::recurse(self, $subcall) } } $body } } Impl.call::<N>($arg) } } }This effectively lets us 'delay' the lookup of the N and self identifiers until after make_def! has expanded, meaning that they'll properly be in scope. And with that, we can write the final form of our guarded-recursion sum function:
#[test] fn fancy_macro_sum() { fancy_make_def! { fn sum(arg: &'static [u64]) -> Option<u64> { if arg.is_empty() { Some(0) } else { Some(arg[0] + recurse!(&arg[1..])?) } } } assert_eq!(sum::<S<S<Z>>>(&[3, 4]), Some(7)); assert_eq!(sum::<Z>(&[3, 4]), None); }Of course, we may have sacrificed a few small things:
- Our macro implementation means we can't take generic parameters (including lifetimes); solving this would probably require a proc macro.
- The compiler has to generate the sum function once for each type-level depth parameter, so if you supply a maximum depth of 100 then it'll generate 100 different sum functions.
- In addition increasing codegen time, this will also bloat your binary size.
But we avoided having to explicitly pass a single usize parameter, so it's all worth it.