Understanding Weak References in Python

1 month ago 8
Cover: Strong reference vs Weak Reference

When working with Python (and many other languages), you often rely on the runtime to manage memory for you. Most of the time this works invisibly, but certain patterns such as objects that reference each other in cycles, long lived caches, or subscriber lists can create memory leaks if not handled carefully.

This happens because Python always creates strong references to objects, which means the object will be kept alive as long as all such strong references exist in the program. But when used in cyclic data structure, or in caches, these strong references can unnecessarily delay the deallocation of these objects.

Weak references provide a way to refer to objects without preventing them from being garbage collected. They let you build caches that automatically empty, subscriber lists that clean themselves up, and other data structures that will not accidentally extend object lifetimes.

In this article we will explore what weak references are, why they matter, and how to use them in Python. We will start with a review of reference counting, look at its limitations, and then dive into weak references and their practical uses.

 A Code Review Agent to Review your AI Generated Code
CodeRabbit CLI: A Code Review Agent to Review your AI Generated Code

As developers increasingly turn to CLI coding agents like Claude Code for rapid development, a critical gap emerges: who reviews the AI-generated code? CodeRabbit CLI fills this void by delivering senior-level code reviews directly in your terminal, creating a seamless workflow where code generation flows directly into automated validation. Review uncommitted changes, catch AI hallucinations, and get one-click fixes - all without leaving your command line. It’s the quality gate that makes autonomous coding truly possible, ensuring every line of AI-generated code meets production standards before it ships.

Get Started Today

Many languages either use reference counting as a mechanism to manage runtime memory or they provide first class primitives to do use reference counting.

In this scheme, every object has an associated reference count which means the number of places it is being used. For example, when you create an object and assign it to a variable it will have a reference count of 1. When you assign it to another variable or pass it to another function, its reference count will go up by 1.

Similarly, when a variable goes out of scope, or a function call returns then its reference count gets decremented. If the reference count of the object reaches 0, it gets deallocated or garbage collected.

CPython uses reference counting for managing the memory of its runtime. But other languages also offer it as well. For example, in C++ or rust when you use a smart pointer, it uses reference counting under the hood, the compiler generates code that increments and decrements the reference count of the objects.

If you want to understand how CPython implements reference counting internally, you can check out my article on that topic:

How CPython Implements Reference Counting: Dissecting CPython Internals

 Dissecting CPython Internals

This week we are diverting from AI and machine learning to discuss a more intense CS topic — memory management in Python. Memory management refers to the techniques used by the programming language runtime to allocate and free memory as programs execute. Understanding how memory management functions in a language is crucial to writing efficient and high…

Reference counting works well for most cases, but it is not a complete solution. Its simplicity comes with trade‑offs, and understanding these limitations helps motivate why Python also offers weak references.

One of those limitations is cyclic references. Cyclic references exist when objects hold references to each other in a cycle, e.g. in a graph data structure. But you can also end up creating cyclic references accidentally in complex systems. In such cases, the objects that are part of the cycle will never get freed until the cycle is broken. This is why CPython also implements a cycle breaking garbage collector (GC) that runs periodically, scans the objects for cycles and if it detects cycles that are no longer referenced from anywhere else, then it breaks them so that those objects can be freed.

Cyclic references can be problematic for performance because memory usage remains high until the GC runs, and the GC scan itself can be expensive (depending on the number of objects it needs to scan).

We can understand this with the help of an example. Consider the following code

     test1()     print_node_objects()
Demonstration of how reference counting works in Python

Let’s break it down:

  • The MyNode class implements a linked list node with a next field.

  • print_node_objects is a utility function. It finds all the MyNode objects that are currently alive and then prints their referrers, i.e., who is holding a reference to them.

    • It uses gc.get_objects() to get the list of all the currently alive objects in the Python interpreter and filters it down by checking for their type and selecting only MyNode type objects.

    • It finds the referrers to an object by using the gc.get_referrers() method which returns a list of referrer objects. We are filtering this list by type because during the call, the gc module itself becomes a referrer and we want to filter it away.

  • In the main function we call the test1() function that creates two MyNode objects, prints their reference counts and returns. After returning from test1, we call print_node_objects() to see if there are any MyNode type objects that are still alive.

If you run this program, you should see an output like the following:

➜ uv run --python 3.13 -- cycles.py n1 refcount: 2 n2 refcount: 2 n1 is being deleted n2 is being deleted No MyNode objects found

This is pretty much the expected output, but let’s spend a moment to ensure we don’t miss anything.

  • We see that the reference count for both n1 and n2 is 2. You might expect it to be 1 but it is 2 because during the call to sys.getrefcount, the object’s reference count gets incremented.

  • We see that the __del__ method of both the object gets called and prints a message. This happens because n1 and n2 are local variables inside test1(), and when it returns, its stack frame gets destroyed which results in the reference counts of all of its local objects (parameters and locally created variables) being decremented. In this case, because n1 and n2 reached reference count 0, they were deallocated and their __del__ method was called.

  • Finally, in main(), when print_node_objects() is called, we see that it does not find any MyNode objects on the heap that are still alive.

Next, we can do another test that creates a cycle between n1 and n2 and see that the objects stay alive after the return from the test function. The following figure shows the updated code where I’ve added a new function test2() and then calling it from main.

#!/usr/bin/env python  import gc import sys  class MyNode:     def __init__(self, name: str):         self.name: str = name         self.next = None      def __del__(self):         print(f"{self.name} is being deleted")  def print_node_objects():     obj_count = 0     for o in gc.get_objects():         if type(o) is MyNode:             obj_count += 1             print(f"{o.name} exists with referrers: {[n.name for n in gc.get_referrers(o) if type(n) is MyNode]}")     if obj_count == 0:         print("No MyNode objects found")  def test2():     n1 = MyNode("n1")     n2 = MyNode("n2")      n1.next = n2     n2.next = n1     print(f"n1 refcount: {sys.getrefcount(n1)}")     print(f"n2 refcount: {sys.getrefcount(n2)}")  def test1():     n1 = MyNode("n1")     n2 = MyNode("n2")     print(f"n1 refcount: {sys.getrefcount(n1)}")     print(f"n2 refcount: {sys.getrefcount(n2)}")   if __name__ == '__main__':     test1()     print_node_objects()     print("---------------------")     test2()     print_node_objects()
Demonstration of cyclic references. In test2() function we create cycle between n1 and n2 and see that they are left alive even after test2 returns.

If we run this program, we should see the following output:

➜ uv run --python 3.13 -- cycles.py n1 refcount: 2 n2 refcount: 2 n1 is being deleted n2 is being deleted No MyNode objects found --------------------- n1 refcount: 3 n2 refcount: 3 n1 exists with referrers: [’n2’] n2 exists with referrers: [’n1’] n1 is being deleted n2 is being deleted

Let’s focus on the output after the call to test2().

  • We see that in test2(), the reference count for n1 and n2 is 3, one higher than what it was in test1(). This is due to n1.next creating a reference to n2 and n2.next creating a reference to n1.

  • We also see that when test2() returns, the __del__ method of n1 and n2 is not called, it means that those objects are not deallocated and are still alive. This happened because during the return, the interpreter would decrement their reference count but this time the reference count does not reach 0.

  • After return from test2(), when we call print_node_objects(), we see that it tells us that the MyNode objects we created for n1 and n2 are still alive. We can also see that they are alive because they are holding cyclic reference to each other.

  • n1 and n2 finally get destroyed as the program ends because the CPython interpreter runs the GC before shutting down.

To avoid such cyclic references from leaking memory, CPython includes a garbage collector that periodically runs, detects cycles that are no longer from anywhere else, and breaks them so that the objects that are part of the cycle can get deallocated. You can verify it yourself by inserting a gc.collect() call after the call to test2() in the above program.

If you want to understand how the CPython garbage collector detects and breaks cycles, read my article on its internals:

However, there are other ways to avoid such pitfalls of reference counting and weak references is one of them. Let’s understand what they are and how they work.

Weak references are on the opposite spectrum of strong references. A weak reference does not increase the reference count of the underlying object, so it enables you to use an object without prolonging the lifetime of the object.

When the object’s reference count goes to 0, it can get deallocated even if there are weak references to it that are still being used. Naturally, this requires that when using a weak reference to an object, we always need to check if the underlying object is still alive.

In Python, to create weak references, we need to use the weakref.ref() function from the weakref module and pass the object for which we want to create a weak reference. For example:

n1_weakref = weakref.ref(n1)

weakref.ref() creates a weak reference to the given object and returns us a callable. To access the underlying object we need to invoke this callable everytime. If the object is still alive, it returns a handle to the object, otherwise it returns None. For example:

if n1_weakref(): print(f"name: {n1_weakref().name}") else: print("n1 no longer exists")

The following figure shows a full example of creating a weak reference and accessing it in our running linked list example.

#!/usr/bin/env python  import gc import sys import weakref  class MyNode:     def __init__(self, name: str):         self.name: str = name         self.next = None      def __del__(self):         print(f"{self.name} is being deleted")  def print_node_objects():     obj_count = 0     for o in gc.get_objects():         if type(o) is MyNode:             obj_count += 1             print(f"{o.name} exists with referrers: {[n.name for n in gc.get_referrers(o) if type(n) is MyNode]}")     if obj_count == 0:         print("No MyNode objects found")  def weakref_demo():     n1 = MyNode("n1")     print(f"n1 refcount: {sys.getrefcount(n1)}")     # use the ref function to create a weak reference to n1     # it gives us a callable that when called will try     # to access the underying object     n1_weakref = weakref.ref(n1)       # notice how n1's reference count remains unchanged     print(f"n1 refcount: {sys.getrefcount(n1)}")      # to access n1 using weakref we need to call it     if n1_weakref():         print(f"n1's name: {n1_weakref().name}")      # let's delete n1 and see if weakref still works     del n1     if n1_weakref():         print(f"n1's name: {n1_weakref()}")     else:         print("n1 no longer exists")     if __name__ == '__main__':     weakref_demo()     print("---------------------")     print_node_objects()
Demonstration of creating weak reference and using it

Output:

➜ uv run --python 3.13 -- weakref_cycles.py n1 refcount: 2 n1 refcount: 2 n1’s name: n1 n1 is being deleted n1 no longer exists --------------------- No MyNode objects found

From the output we can confirm a few things:

  • Creating a weak reference does not increase the object’s reference count

  • A weak reference does not prevent the object from being deallocated if its reference count goes to 0 (in the example we deleted n1 and after that we were not able to access it using the weak reference.).

I leave the problem of fixing the cyclic reference that we created in test2() as an exercise for you.

So far we’ve seen weak references as a tool for avoiding cycles, but their utility goes well beyond that. The weakref module also provides ready-made containers built on top of weak references. These containers, WeakValueDictionary and WeakSet, help you manage auxiliary data structures that should not extend the lifetimes of their contents. They solve practical problems such as caching, registries, and subscriber lists, where automatic cleanup is not just convenient but essential for avoiding leaks.

The weakref module provides WeakValueDictionary, which looks and behaves like a normal dictionary but with an important twist: its values are held only through weak references. If a value is no longer strongly referenced anywhere else, the dictionary entry disappears automatically.

This makes WeakValueDictionary a natural fit for caching and memoization. Imagine you compute expensive results or load large data structures and want to reuse them if they are still in memory. At the same time, you don’t want the cache itself to keep them alive forever. A WeakValueDictionary strikes that balance: it holds onto results only as long as the rest of the program does.

Another classic application is object interning or registries. For example, you may want to ensure there is only one canonical object representing a resource (like a symbol table entry, database connection, or parsed schema). By using a WeakValueDictionary, you avoid artificially extending the lifetimes of those resources.

Here’s a simple illustration:

import weakref class Data: def __init__(self, name): self.name = name def __repr__(self): return f"Data({self.name})" cache = weakref.WeakValueDictionary() obj = Data("expensive_result") cache["key"] = obj print("Before deletion:", dict(cache)) # Drop the strong reference obj = None print("After deletion:", dict(cache))

Output:

Before deletion: {'key': Data(expensive_result)} After deletion: {}

Notice how the cache entry vanishes automatically once the last strong reference goes away. There is no need for manual cleanup. Under the hood, this is implemented with weakref callbacks—the same mechanism we’ll see in the callback section.

Another container provided by the weakref module is WeakSet. This is similar to a regular set, except that it holds weak references to its elements. If an object is garbage collected, it will automatically vanish from the set.

One scenario where this is very handy is when you want to keep track of subscribers, observers, or listeners. These are objects that register interest in events produced by another object (often called the publisher). For instance:

  • GUI frameworks: widgets listen to events such as theme changes or window resizes.

  • Event buses: services subscribe to log events, metrics, or domain events.

  • Plugin systems: plugins register callbacks at load time to respond to hooks.

  • Background services: transient sessions (e.g., WebSocket connections) listen for updates from a long‑lived manager.

In all these cases, subscribers are often short‑lived, while the publisher lives much longer. Using a regular set to hold them risks memory leaks, because a strong reference in the set will keep the subscriber alive even when the rest of the program has forgotten it. With a WeakSet, the garbage collector automatically removes subscribers that are no longer strongly referenced anywhere else, so you don’t need explicit unsubscribe logic in every shutdown path.

Here’s a simple example:

import weakref class Listener: def __init__(self, name): self.name = name def __repr__(self): return f"Listener({self.name})" listeners = weakref.WeakSet() l1 = Listener("A") l2 = Listener("B") listeners.add(l1) listeners.add(l2) print("Before deletion:", list(listeners)) # Remove one listener l1 = None import gc; gc.collect() print("After deletion:", list(listeners))

Output:

Before deletion: [Listener(A), Listener(B)] After deletion: [Listener(B)]

This pattern is often extended into a publisher–subscriber model:

class Publisher: def __init__(self): self._subs = weakref.WeakSet() def subscribe(self, sub): self._subs.add(sub) def notify(self, payload): for s in list(self._subs): s.handle(payload) class Subscriber: def __init__(self, name): self.name = name def handle(self, payload): print(self.name, "got:", payload) pub = Publisher() sub = Subscriber("one") pub.subscribe(sub) pub.notify({"event": 1}) # delivered sub = None # drop last strong ref import gc; gc.collect() pub.notify({"event": 2}) # nothing printed; WeakSet cleaned itself

Using WeakSet here avoids leaks and simplifies lifecycle management. A caveat is that only weak‑referenceable objects (i.e., user‑defined classes) can be added; built‑ins like int or tuple won’t work. If your class uses __slots__, include __weakref__ to allow weak references.

Another useful feature of weakref.ref is the ability to attach a callback. A callback is a function that gets invoked automatically when the referent object is about to be finalized. This can be handy if you want to clean up auxiliary data structures or release resources when an object goes away.

import weakref class Resource: def __init__(self, name): self.name = name def __repr__(self): return f"Resource({self.name})" def on_finalize(wr): print("Resource has been garbage collected:", wr) obj = Resource("temp") wr = weakref.ref(obj, on_finalize) print("Created weak reference:", wr) # Drop strong reference obj = None # Force GC for demo purposes import gc; gc.collect()

Output:

Created weak reference: <weakref at 0x75f6773870b0; to ‘Resource’ at 0x75f677c4ee40> Resource has been garbage collected: <weakref at 0x75f6773870b0; dead>

Here, the on_finalize callback is called once the Resource instance is about to be collected. The weak reference itself becomes dead afterwards. This pattern is useful when you want to implement custom cleanup logic tied to an object’s lifecycle.

It’s also worth noting that containers like WeakValueDictionary and WeakSet use this same mechanism internally: they attach callbacks to their weak references so that entries are automatically removed when the referent objects are finalized.

Weak references are not a tool you’ll reach for every day, but when you need them they solve very real problems. At the lowest level, weakref.ref lets you point to an object without affecting its lifetime, and you can even attach a callback to run cleanup code at the moment it is collected. Building on that primitive, Python’s WeakValueDictionary and WeakSet give you higher level containers for caches, registries, and subscriber lists that automatically clean themselves up when their contents go away.

To summarize the differences:

A summary of the key APIs from the weakref module in Python and in which situation to use them
A summary of the key APIs from the weakref module in Python and in which situation to use them

Together, these features make it possible to build memory‑friendly systems that avoid leaks, reduce bookkeeping, and respect the natural lifetimes of your objects. Understanding weak references and knowing when to apply them will help you write code that is both safer and more efficient.

Share

Read Entire Article