How much code does that proc macro generate?

6 hours ago 2

Rust’s proc macros are powerful but tricky, and have some compile-time costs. This post is about a new tool to help quantify and reduce some of those costs.

To macros! The cause of, and solution to, all of Rust's problems.

Proc macro costs

There are several ways proc macros affect compile times.

The time to compile the proc macro crate itself.
The time to compile all crates that the proc macro crate depends on, often including proc-macro2, syn and quote.
The time to run the proc macro invocations.
The time to compile the code generated by the proc macro invocations.

For costs 1 and 2, “compile” means full compilation including code generation, even when you are running cargo check. This is one reason why cargo check output includes a mix of “Checking” and “Compiling” lines. (The other reason for “Compiling” lines is crates with build scripts, and crates that build scripts depend on.)

watt is an approach to avoid costs 1 and 2. This is reasonable, but those costs usually occur only on clean builds. Costs 3 and 4 will occur on clean builds and also on many rebuilds. I am particularly interested in cost 4, because I think it might be underappreciated. That is what this post is about.

Proc macros are trivial to invoke, but it’s harder to know how much code they generate. You can use cargo-expand to inspect the code after macro expansion has finished. This can be useful, but lots of people don’t know about it, and it doesn’t scale in a large codebase.

For example, I recently looked at a large project that had around 100,000 lines of code and used proc macros heavily. After macro expansion, the code size more than tripled! Once the project authors knew about this they were able to eliminate some proc macros that weren’t important, and replace some heavyweight ones with lightweight ones. After these small changes their compile times dropped by around 20%.

I recently implemented a new unstable flag for the Rust compiler, -Zmacro-stats, that can help shine a lot on proc macro code size. It is available in Nightly, and there are two basic ways to use it.

If you just want to know about a leaf crate of your project, you can run a command like this:

cargo +nightly rustc -- -Zmacro-stats

and it will print information about the amount of code generated by both proc macros and declarative macros, though the former are usually more notable.

Or, if you want to know about multiple crates of your project, you can run a command like this:

RUSTFLAGS="-Zmacro-stats" cargo +nightly build

What does the output look like? Consider this very simple Rust program called measure_proc_macros:

fn main() { println!("yellow bird"); } #[test] fn my_test() { assert_eq!(1 + 1, 2); }

Here’s the -Zmacro-stats output, which is printed on stderr:

First, we can see that the println! had one use, and it produced one line of code, which was 63 bytes of text. The byte count is the ultimate measure, but lines of code are also included because that’s a code size measurement that programmers are more used to. The line count is made on code as formatted by the compiler’s internal pretty printer, so it won’t be affected by the formatting of any of the hand-written code.

Next, we can also see an entry for ::std::format_args_nl!. Where did that come from? Well, the println! call expands to code that includes a call to ::std::format_args_nl!, which then gets expanded.

Finally, we see that the #[test] produced zero code. This makes sense, because tests are omitted for normal builds. (I also tried -Zmacro-stats with cargo test and saw that the #[test] invocation generated 24 lines and 802 bytes of code.)

What about proc macros? Let’s try out some derive proc macros by adding this code:

#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord, Hash)] #[derive(derive_more::From, derive_more::Add, derive_more::Constructor)] #[derive(serde::Serialize, serde::Deserialize)] #[derive(arbitrary::Arbitrary)] struct Point { x: i32, y: i32 }

This is a very simple struct with many derives:

all the builtin derives;
several builtin-like derives from derive_more;
serialization and deserialization derives from serde;
Arbitrary from arbitrary, which is used for fuzzing.

-Zmacro-stats produces this additional output:

=============================================================================== MACRO EXPANSION STATS: measure_proc_macros Macro Name Uses Lines Avg Lines Bytes Avg Bytes ------------------------------------------------------------------------------- #[derive(serde::Deserialize)] 1 159 159.0 8_622 8_622.0 #[derive(arbitrary::Arbitrary)] 1 81 81.0 3_679 3_679.0 ::std::thread::local_impl::thread_local_inner! 2 34 17.0 1_499 749.5 #[derive(serde::Serialize)] 1 23 23.0 1_002 1_002.0 #[derive(PartialOrd)] 1 12 12.0 440 440.0 #[derive(Ord)] 1 11 11.0 333 333.0 #[derive(Debug)] 1 8 8.0 269 269.0 #[derive(PartialEq)] 1 9 9.0 255 255.0 #[derive(Hash)] 1 8 8.0 252 252.0 #[derive(Default)] 1 10 10.0 246 246.0 #[derive(derive_more::Add)] 1 9 9.0 237 237.0 #[derive(Eq)] 1 9 9.0 219 219.0 #[derive(derive_more::From)] 1 6 6.0 205 205.0 ::std::thread_local! 2 5 2.5 198 99.0 #[derive(Clone)] 1 8 8.0 184 184.0 #[derive(derive_more::Constructor)] 1 7 7.0 174 174.0 #[derive(Copy)] 1 2 2.0 64 64.0 ===============================================================================

What have we learned?

The builtin macros generate small amounts of code, between 2 and 12 lines.
The derive_more macros are similar. (I tried all of the derives available in that crate, and they all gave similar results.)
serde::Serialize is a bit bigger at 23 lines, but serde::Deserialize is much bigger at 159 lines.
arbitrary::Arbitrary is also large, at 81 lines. You can’t tell it from this table, but further inspection revealed that this derive was responsible for the thread_local! and thread_local_inner! invocations, adding another 39 lines, for a total of 120 lines.

All this for a tiny struct with just two fields? Interesting. Let’s try one more example, adding this method code which uses an attribute macro and a declarative macro from the tracing crate:

impl Point { #[tracing::instrument] fn invert(p: Point) -> Point { tracing::debug!("invert the point"); Point { x: -p.x, y: -p.y } } }

Here’s the output for just that function:

=============================================================================== MACRO EXPANSION STATS: measure_proc_macros Macro Name Uses Lines Avg Lines Bytes Avg Bytes ------------------------------------------------------------------------------- ::tracing::valueset! 6 46 7.7 2_053 342.2 ::tracing::event! 2 35 17.5 1_551 775.5 ::tracing::span! 1 31 31.0 1_260 1_260.0 #[tracing::instrument] 1 34 34.0 1_238 1_238.0 ::tracing::metadata! 2 17 8.5 1_138 569.0 ::tracing::callsite2! 2 27 13.5 1_062 531.0 ::tracing::level_enabled! 3 6 2.0 434 144.7 ::tracing::fieldset! 6 6 1.0 255 42.5 ::tracing::if_log_enabled! 3 10 3.3 231 77.0 tracing::debug! 1 2 2.0 86 86.0 ::tracing_core::identify_callsite! 2 2 1.0 82 41.0 ::tracing_core::__macro_support::module_path! 2 2 1.0 42 21.0 module_path! 2 2 1.0 42 21.0 ::tracing::__macro_support::format_args! 1 1 1.0 32 32.0 ::tracing_core::__macro_support::file! 2 2 1.0 26 13.0 ::tracing::__macro_support::concat! 1 1 1.0 22 22.0 ::tracing::__macro_support::file! 1 1 1.0 13 13.0 ::tracing::__tracing_stringify! 1 1 1.0 13 13.0 ::tracing_core::__macro_support::line! 2 2 1.0 10 5.0 ::tracing::__macro_support::line! 1 1 1.0 5 5.0 stringify! 1 1 1.0 3 3.0 ::tracing::__tracing_log! 2 2 1.0 0 0.0 ===============================================================================

Wow! I did not expect that. That’s 232 lines of code and 20 auxiliary macros. On this larger example the usefulness of the “average” columns becomes clear.

Using it in practice

I know that -Zmacro-stats can be useful, because I used it on the aforementioned example that saw 20% reductions in compile times. If your codebase is littered with types with a dozen or more derives, give it a shot. If it tells you that a single proc macro macro is responsible for 25% of your post-expansion code, then great! Chase that down. But you could also waste time with it. Shaving off a few lines here and there is unlikely to have an effect. This tool is aimed at large codebases; if your codebase is small this is unlikely to be useful. As always with performance, measure to see if changes are worthwhile. And remember that when talking about compilation, lines of code is a potentially useful proxy measurement, but compile time is the true measurement of interest.

If you do fall into the 25% case, what should you do? It depends greatly on your codebase and how it uses the proc macro(s) in question. But in general, two obvious solutions are (a) avoid the proc macro altogether, or (b) find a cheaper alternative. For example, if serde::Deserialize is a big chunk of your code and you are not relying on any of serde’s advanced features, an alternative like serde-lite or miniserde might be worthwhile.

Finally, this is a very new tool so I’d love to hear from people who try it out, successfully or not, and any ideas for improvement. Thanks!

Read Entire Article