Debugging memory leaks in Postgres, heaptrack edition

8 hours ago 1

In this post we'll introduce two memory leaks into Postgres and debug them with heaptrack. Like almost every memory leak tool available to us (including memleak which I wrote about last time), heaptrack requires you to be on Linux. But a Linux VM on a Mac is fine too (that is where I'm running this code from).

Although we use Postgres as the codebase with which to explore tools like memleak (last time) and heaptrack (this time), these techniques are relevant for memory leaks in C, C++, and Rust projects in general.

Thank you to my coworker Jacob Champion for reviewing a version of this post.

Grab and build Postgres.

$ git clone https://github.com/postgres/postgres $ cd postgres $ git checkout REL_17_STABLE $ ./configure --enable-debug \ --prefix=$(pwd)/build \ --libdir=$(pwd)/build/lib $ make -j16 && make install

A leak in postmaster

Every time a Postgres process starts up it is scheduled by the postmaster process. Let's introduce a memory leak into postmaster.

$ git diff src/backend/postmaster diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d032091495b..e0bf8943763 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -3547,6 +3547,13 @@ BackendStartup(ClientSocket *client_sock) Backend *bn; /* for backend cleanup */ pid_t pid; BackendStartupData startup_data; + MemoryContext old; + int *s; + + old = MemoryContextSwitchTo(TopMemoryContext); + s = palloc(80); + *s = 12; + MemoryContextSwitchTo(old); /* * Create backend data structure. Better before the fork() so we can

Remember that Postgres allocates memory in nested arenas called MemoryContexts. The top-level arena is called TopMemoryContext and it is freed as the process exits. Excessive allocations (leaks) in TopMemoryContext would not be caught by Valgrind memcheck or LeakSanitizer because the memory is actually freed as the process exits because TopMemoryContext is freed as the process exits. But while the process is alive, the above leak is real.

(If we switch from palloc to malloc above, LeakSanitizer does catch this leak. I didn't try Valgrind memcheck but it probably catches this too.)

An easy way to trigger this leak is by executing a ton of separate psql clients that create tons of Postgres client backend processes.

for run in {1..100000}; do psql postgres -c 'select 1'; done

With the diff above in place, rebuild and reinstall Postgres.

$ make -j16 && make install

Create a database and start a Postgres server.

$ ./build/bin/initdb testdb $ ./build/bin/postgres -D $(pwd)/testdb 2025-05-22 12:05:15.995 EDT [260576] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit 2025-05-22 12:05:15.996 EDT [260576] LOG: listening on IPv6 address "::1", port 5432 2025-05-22 12:05:15.996 EDT [260576] LOG: listening on IPv4 address "127.0.0.1", port 5432 2025-05-22 12:05:15.997 EDT [260576] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2025-05-22 12:05:16.001 EDT [260579] LOG: database system was shut down at 2025-05-22 11:37:53 EDT 2025-05-22 12:05:16.004 EDT [260576] LOG: database system is ready to accept connections

The integer in brackets is the PID of the postmaster process. In another terminal attach to the postmaster process with heaptrack.

$ sudo heaptrack -p 260576

The heaptrack process should run until the Postgres server ends.

In another terminal we'll exercise the leaking workload.

$ for run in {1..100000}; do ./build/bin/psql postgres -c 'select 1'; done

If you want to watch the memory usage climb while this workload is running, open top in another terminal.

When that is done we should have leaked a good deal of memory. Hit Control-C on the postgres process and now we can see what heaptrack tells us.

$ sudo heaptrack -p 260576 heaptrack output will be written to "/home/phil/postgres/heaptrack.postgres.260645.zst" injecting heaptrack into application via GDB, this might take some time... warning: 44 ./nptl/cancellation.c: No such file or directory injection finished heaptrack stats: allocations: 19 leaked allocations: 19 temporary allocations: 0 Heaptrack finished! Now run the following to investigate the data: heaptrack --analyze "/home/phil/postgres/heaptrack.postgres.260645.zst"

It uses a different PID in the log file name than the one we gave it to track but that's ok. Now we run heaptrack_print to format things.

$ heaptrack_print --print-leaks 1 ./heaptrack.postgres.260645.zst > leak.txt

And if we open that file, leak.txt, and search for MEMORY LEAKS we get this:

MEMORY LEAKS 16.77M leaked over 11 calls from AllocSetAllocFromNewBlock at /home/phil/postgres/src/backend/utils/mmgr/aset.c:908 in /home/phil/postgres/build/bin/postgres 16.77M leaked over 11 calls from: ServerLoop::BackendStartup at /home/phil/postgres/src/backend/postmaster/postmaster.c:3554 in /home/phil/postgres/build/bin/postgres ServerLoop at /home/phil/postgres/src/backend/postmaster/postmaster.c:1676 PostmasterMain at /home/phil/postgres/src/backend/postmaster/postmaster.c:1374 in /home/phil/postgres/build/bin/postgres main at /home/phil/postgres/src/backend/main/main.c:199 in /home/phil/postgres/build/bin/postgres

Which is exactly the leak we introduced.

To make sure we're not hallucinating, comment out the allocation diff and rerun heaptrack and this workload, the MEMORY LEAKS section will be empty. In this scenario, leak.txt looks like this:

$ cat leak.txt reading file "./heaptrack.postgres.504295.zst" - please wait, this might take some time... Debuggee command was: ./build/bin/postgres -D /home/phil/postgres/testdb finished reading file, now analyzing data: MOST CALLS TO ALLOCATION FUNCTIONS PEAK MEMORY CONSUMERS MEMORY LEAKS MOST TEMPORARY ALLOCATIONS total runtime: 146.664000s. calls to allocation functions: 0 (0/s) temporary memory allocations: 0 (0/s) peak heap memory consumption: 0B peak RSS (including heaptrack overhead): 21.93M total memory leaked: 0

Let's introduce a leak in another Postgres process and see if we can catch that too.

A leak in a client backend

A memory leak in a client backend can be harder to catch since it is ephemeral. The client backend starts up for a single Postgres session and exits when the session exits. But so long as we can grab the PID of the client backend process, we can attach to it with heaptrack.

Let's leak memory in TopMemoryContext in the implementation of random().

$ git diff src/backend/utils/ diff --git a/src/backend/utils/adt/pseudorandomfuncs.c b/src/backend/utils/adt/pseudorandomfuncs.c index 8e82c7078c5..886efbfaf78 100644 --- a/src/backend/utils/adt/pseudorandomfuncs.c +++ b/src/backend/utils/adt/pseudorandomfuncs.c @@ -20,6 +20,7 @@ #include "utils/fmgrprotos.h" #include "utils/numeric.h" #include "utils/timestamp.h" +#include "utils/memutils.h" /* Shared PRNG state used by all the random functions */ static pg_prng_state prng_state; @@ -84,6 +85,13 @@ Datum drandom(PG_FUNCTION_ARGS) { float8 result; + int* s; + MemoryContext old; + + old = MemoryContextSwitchTo(TopMemoryContext); + s = palloc(120); + MemoryContextSwitchTo(old); + *s = 90; initialize_prng();

We can trigger this leak by executing random() a bunch of times. For example with SELECT sum(random()) FROM generate_series(1, 100_0000);.

Build and install Postgres with this diff.

$ make -j16 && make install

And start up Postgres again against the testdb we created before.

$ ./build/bin/postgres -D $(pwd)/testdb 2025-05-22 14:37:11.322 EDT [704381] LOG: starting PostgreSQL 17.5 on aarch64-unknown-linux-gnu, compiled by gcc (Debian 14.2.0-19) 14.2.0, 64-bit 2025-05-22 14:37:11.323 EDT [704381] LOG: listening on IPv6 address "::1", port 5432 2025-05-22 14:37:11.323 EDT [704381] LOG: listening on IPv4 address "127.0.0.1", port 5432 2025-05-22 14:37:11.324 EDT [704381] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" 2025-05-22 14:37:11.327 EDT [704384] LOG: database system was shut down at 2025-05-22 14:31:00 EDT 2025-05-22 14:37:11.329 EDT [704381] LOG: database system is ready to accept connections

In a new terminal, start a psql session and find the corresponding client backend PID with pg_backend_pid().

$ ./build/bin/psql postgres psql (17.5) Type "help" for help. postgres=# select pg_backend_pid(); pg_backend_pid ---------------- 704389 (1 row) postgres=#

Keep this session alive and in a new terminal attach heaptrack to it.

$ sudo heaptrack -p 704389 heaptrack output will be written to "/home/phil/heaptrack.postgres.704409.zst" injecting heaptrack into application via GDB, this might take some time... warning: 44 ./nptl/cancellation.c: No such file or directory injection finished

Back in that 704389 psql session, run the leaking workload.

postgres=# SELECT sum(random()) FROM generate_series(1, 10_000_000); sum ------------------- 499960.8137393289 (1 row)

Now hit Control-D to exit psql gracefully.

The heaptrack terminal will tell us where to look.

$ sudo heaptrack -p 704389 heaptrack output will be written to "/home/phil/heaptrack.postgres.704409.zst" injecting heaptrack into application via GDB, this might take some time... warning: 44 ./nptl/cancellation.c: No such file or directory injection finished heaptrack stats: allocations: 206 leaked allocations: 177 temporary allocations: 8 removing heaptrack injection via GDB, this might take some time... ptrace: No such process. No symbol table is loaded. Use the "file" command. The program is not being run. Heaptrack finished! Now run the following to investigate the data: heaptrack --analyze "/home/phil/heaptrack.postgres.704409.zst"

Run heaptrack_print like before.

$ heaptrack_print --print-leaks 1 ./heaptrack.postgres.704409.zst > leak.txt

And look in leak.txt for the MEMORY LEAKS section again.

MEMORY LEAKS 1.37G leaked over 180 calls from AllocSetAllocFromNewBlock at /home/phil/postgres/src/backend/utils/mmgr/aset.c:908 in /home/phil/postgres/build/bin/postgres 1.37G leaked over 170 calls from: drandom at /home/phil/postgres/src/backend/utils/adt/pseudorandomfuncs.c:92 in /home/phil/postgres/build/bin/postgres ExecInterpExpr at /home/phil/postgres/src/backend/executor/execExprInterp.c:740 in /home/phil/postgres/build/bin/postgres ExecAgg::agg_retrieve_direct::advance_aggregates::ExecEvalExprSwitchContext at ../../../src/include/executor/executor.h:356 in /home/phil/postgres/build/bin/postgres ExecAgg::agg_retrieve_direct::advance_aggregates at /home/phil/postgres/src/backend/executor/nodeAgg.c:820 ExecAgg::agg_retrieve_direct at /home/phil/postgres/src/backend/executor/nodeAgg.c:2454 ExecAgg at /home/phil/postgres/src/backend/executor/nodeAgg.c:2179 standard_ExecutorRun::ExecutePlan::ExecProcNode

And we found our leak.

Considerations

Unlike memleak, heaptrack requires the process to exit before it will print leak reports. This is unfortunate since sometimes you don't want to let or force a process to end.

On the other hand, heaptrack has been reporting stack traces and line numbers in different environments for me more reliably than memleak has.

Again, both seem to be better options than Valgrind memcheck or LeakSanitizer if your leaked memory will get cleaned up before the program exits.

Both heaptrack and memleak seem to be good tools to have in the chest.

Read Entire Article