Taming the Bias: Unbiased* Safepoint-Based Stack Walking in JFR

6 days ago 1

Two years ago, I still planned to implement a new version of AsyncGetCallTrace in Java. This plan didn’t materialize, but Erik Österlund had the idea to fully walk the stack at safepoints during the discussions. Walking stacks only at safepoints normally would incur a safepoint-bias (see The Inner Workings of Safepoints), but when you record some program state in signal handlers, you can prevent this. I wrote about this idea and its basic implementation in Taming the Bias: Unbiased Safepoint-Based Stack Walking. I’ll revisit this topic in this week’s short blog post because Markus Grönlund took Erik’s idea and started implementing it for the standard JFR method sampler:

The new JFR Sampler

The implementation is now targeted for JDK 25 and described in the JEP 518 (JFR Cooperative Sampling):

We redesign JFR’s sampling mechanism to avoid relying on risky stack-parsing heuristics. Instead, we parse thread stacks only at safepoints.

To avoid the safepoint bias problem, we take samples cooperatively. When it is time to take a sample, JFR’s sampler thread still suspends the target thread. Rather than attempting to parse the stack, however, it just records the target’s program counter and stack pointer in a sample request, which it appends to an internal thread-local queue. It then arranges for the target thread to stop at its next safepoint, and resumes the thread.

The target runs normally until its next safepoint. At that time, the safepoint handling code inspects the queue. If it finds any sample requests, then, for each one, it reconstructs a stack trace, adjusting for safepoint bias, and emits a JFR execution-time sampling event.

JEP 518: JFR Cooperative Sampling

This sampling can be visualized as:

The stack walking at safepoints works because we know that we have a safepoint at the end of every non-inlined method. We use the program counter and stack pointer at the safepoint in combination with the current frame at the safepoint to find all the info typically missing when we only sample at safepoints. To quote Taming the Bias: Unbiased Safepoint-Based Stack Walking:

When we are at the return of a non-inlined method, we have enough information to obtain all relevant information of the top inlined and the first non-inlined frame using only the program counter, stack pointer, frame pointer, and bytecode pointer obtained in the signal handler. We focus on the first non-inlined method/frame, as inlined methods don’t have physical frames, and walking them would result in walking using Java internal information, which we explicitly want to avoid.

Advantages

The advantage compared to the current JFR sampler is substantial. Of course, it’s safer, as we’re not walking the Java stack out-of-thread in a signal handler, but the JEP also gives us more advantages:

Creating a sample request requires hardly any work, and could be done in response to a hardware event or inside a signal handler.
The code to create stack traces and emit events is simpler. For example, it can dynamically allocate memory when it runs on the target thread, which it could not do when running on the sampler thread.
The sampler thread has less work to do, since it need not run heuristics, improving scalability.

JEP 518: JFR Cooperative Sampling

One thing that the current JEP approach lacks, compared to my old approach, is incremental stack walking. With JEP 376 (Concurrent Thread-Stack Processing), it is effectively possible to mark frames, so we can omit walking common parts of frames. This could be an interesting avenue for further improvements.

I could end this blog post here, because everything is good, we walk at safepoints, bye.

Walking in Native

But there is a problem: Consider that your Java program calls a native method that runs for a long time. From Java’s perspective, the thread state is well defined while it runs in native, as the Java thread doesn’t change. The JFR sampler uses thread-local growable arrays, so it just pushes new sample requests to the array for a long time, bloating the thread-local state and not pushing anything into the JFR buffers. However, with the thread-state well-defined, the JFR sampler can directly record the stack trace while the thread is paused. So effectively, the sampling looks like:

So the sampling is cooperative, except when it is not:

Slight Safepoint Bias

Another issue is that the new JFR sampler, for now, sometimes has sample requests that it cannot adequately resolve at safepoints with the information available. In these cases, the sampler falls back on the safepoint frame to be still able to record a trace. This introduces some slight safepoint bias back again. But is this a problem? The makers of JProfiler would tell you it isn’t, because safepoint bias might not be relevant anymore anyway:

With Java 16, JEP 376 was delivered, and it was possible for profiling agents to use the lazy stack processing based on thread-local handshakes and avoid global safepoints for sampling. […] given the frequency of local safepoints, the remaining local safepoint bias is irrelevant for the vast majority of applications.

Why JVMTI sampling is better than async sampling on modern JVMs

I might disagree, because I like my data as accurate as possible.

Conclusion

In this blog post, I showed you the new developments in the standard JFR method sampler, which Markus Grönlund is currently rewriting for JDK 25. In my opinion, these changes should improve the sampler’s safety and allow more features in the future, including a CPU-time sampler, which I’ll cover in my next blog post.

Thanks for coming along. After many non-JVM blog posts, I’m trying to get into the habit of writing more directly Java-related posts again. See you next week.

P.S.: Writing this blog post was prompted by someone at JCON telling me that they missed my JVM posts. I’m currently finishing the CPU-time sampler, so these blog posts should also help others understand my code.

This project is part of my work in the SapMachine team at SAP, making profiling easier for everyone.

Johannes Bechberger is a JVM developer working on profilers and their underlying technology in the SapMachine team at SAP. This includes improvements to async-profiler and its ecosystem, a website to view the different JFR event types, and improvements to the FirefoxProfiler, making it usable in the Java world. He started at SAP in 2022 after two years of research studies at the KIT in the field of Java security analyses. His work today is comprised of many open-source contributions and his blog, where he writes regularly on in-depth profiling and debugging topics, and of working on his JEP Candidate 435 to add a new profiling API to the OpenJDK.

View all posts

New posts like these come out at least every two weeks, to get notified about new posts, follow me on BlueSky, Twitter, Mastodon, or LinkedIn, or join the newsletter:

Read Entire Article