Bits and Side Tables: How Reference Counting Works Internally in Swift

1 hour ago 2

Every iOS dev has a soft spot, or perhaps a weak spot, for Swift reference counts. This stems from the fact that we've all survived countless interview questions about it:

  • For your first job, you’ll no doubt be asked the classic “what's the difference between a value type and a reference type?”

  • As you grow to mid-level, maybe you’ll be asked “how is a weak reference different from an unowned reference?”

  • Perhaps if you're ever interviewed by me (a sadist, and not the saucy 50 shades of grey kind), I’ll throw out “why was the side table introduced?”

Reference counting is fundamental to the Swift you write every day. By understanding how something really works, down to the bit level, you have the theoretical foundation to eviscerate any interview curveball I can lob at you.

So today, we’re spelunking again through the low-level guts of the Swift Runtime, to find out the answers to questions like:

  • Where are the bits that make up strong, weak, and unowned reference counts stored?

  • Why are weak references less performant than strong and unowned references?

  • How does an unowned reference crash if it's accessed after deallocation?

  • Are unowned references always more performant than weak references?

Once you’ve digested this article, you’ll be able to flip the interview table, and make me your bitch.

Subscribe to Jacob’s Tech Tavern for free to get ludicrously in-depth articles on iOS, Swift, tech, & indie projects in your inbox every week.

Paid members get several benefits:

Upgrade today to lock in my September sale: only 2 days left!

Get 33% off forever

If you’re new to Jacob’s Tech Tavern, let’s quickly check we all understand what the runtime actually is. It's a C++ library that gets linked to your app at launch, and contains a bunch of basic language functionality like “allocating an object on the heap” and “knowing what type things are”.

And, most relevantly, automatic reference counting.

ARC was built into Swift from the start, but was only introduced to Objective-C in 2011. Before this, developers were manually managing their memory with explicit calls to retain, release, and @autoreleasepool.

You can see the Swift runtime in the source code. It’s not even that much code! All the code involved in reference counting lives in the runtime. Actually, even just a few files inside the runtime: HeapObject.cpp, RefCount.h, and RefCount.cpp.

If you’re keen to learn more about memory allocation and the Swift runtime internals, check out this absolute banger of an article.

Now we have the gist, we’re going to start exploring each kind of Swift reference.

To fundamentally understand reference types, let’s begin with HeapObject.

Every reference type is stored in a block of memory on the heap, accessed via a reference (or pointer) to the memory address. That’s why printing a reference type looks like this:

0x0000d14dba7f

It’s the 64-bit memory address that contains the actual data.

All these objects stored on the heap are laid out like this:

+-----------------------+ | HeapObject | <- header struct | - metadata (isa) ptr | <- type information | - inlineRefCounts | <- reference count bits +-----------------------+ | Instance Data | <- The actual data for your object | - field1 | | - field2 | | - ... | +-----------------------+

The HeapObject is the header block for the object, stored contiguously in front of the memory itself. It stores type metadata and the reference counts. HeapObject itself is a simple C struct. Here's the actual definition:

These two “words” of 64-bit* data are packed contiguously in front of every object on the heap. HeapObject is therefore a slightly confusing name, since it behaves more like a block of metadata for the actual object on the heap.

*To make life easy, I’m going to assume everything is on 64-bit devices.

The metadata pointer, or isa pointer, tells the runtime what type this object is, powering dynamic dispatch, reflection, and the type(of:) function.

But we’re looking for the bits I promised. These live in the other property, InlineRefCounts. This is inline because instead of a pointer to metadata, this property stores the reference counts as part of the HeapObject struct.

When you strongly reference a heap object in Swift, the compiler inlines calls to the runtime ABI function, _swift_retain(). This is what increments the reference count. Here's the code:

Pretty… simple… almost brain-dead in how straightforward it is, right? Just accesses the HeapObject’s refCounts property and calls the increment(1) method, to increment it by 1.

Before we go deeper, let’s understand the actual, physical, bit layout of these ref counts:

(*except the flags)

The bit layout of inlined ref counts takes a moment to get your head around, because it’s mixed in with a bunch of flags related to the heap object’s lifecycle state. We can see it defined here, as part of SwiftShims, in RefCounts.h. Unless you are fresh off a computer science course, you may not intuitively know what these bits look like. I’ll go ahead and illustrate with bitmasks.

Bit 0: “Pure Swift dealloc” flag. If set (1), the object can be deallocated purely via the Swift runtime. If clear (0), it needs to call into the Objective-C runtime for deallocation.

0b0000000000000000000000000000000000000000000000000000000000000001

Bits 1-31: Unowned reference count. I wonder where unowned(unsafe) is stored?*

*Turns out unowned(unsafe) refs aren’t counted at all! They just point to the data. Damn, that does sound pretty unsafe tbh.

0b0000000000000000000000000000000001111111111111111111111111111110

Bit 32: “Is deiniting” flag. This is set when deinit is called, and helps block deallocated objects from getting into a weird state at the end of their lifecycle.

0b0000000000000000000000000000000010000000000000000000000000000000

Bits 33-62:Strong extra” reference count. This is “extra” because when an object inits, the reference count is (logically) treated as 1 even when this whole (physical) field is zeros. So, “extra”. That’s why this count gets one less bit than unowned.

This optimisation saves a clock cycle or two for every initialisation, saving the system from having to perform a single-bit incrementation operation every time you allocate something. The strong reference count already begins at 1.

As an application developer, it beggars belief that you’d put so much design energy into saving a single clock cycle. But when we aggregate these hertz up across every single Swift object that gets instantiated, throughout the whole of time and space, the optimisation starts to make sense. Compiler engineers operate on another level.

0b011111111111111111111111111111110000000000000000000000000000000

Bit 63: “Use slow RC” flag. This bit is flipped when one of the inline reference counts, either unowned or strong extra, overflows and fills up past capacity. This means a count of about 2³⁰, or about a billion, references.

More commonly, the bit will flip when a weak reference is created.

The presence of this flag tells the runtime that the side table (keep reading) is being used for storing reference counts, meaning we are using “slow RC” (ref counting) due to the pointer indirection whenever the count is accessed.

The activation of this flag re-contextualises the rest of the bits in the InlineRefCount: the runtime reads the non-flag bits as a pointer to this side table instead of treating them as inlined reference counts. Now that’s what I call efficient bit-packing!

0b1000000000000000000000000000000000000000000000000000000000000000

If bits aren’t your thing, you could have said something before I got really into the crunchy details, pal. But here’s a handy, less meandering, diagram documented in RefCounts.h:

Storage layout: HeapObject { isa InlineRefCounts { atomic<InlineRefCountBits> { strong RC + unowned RC + flags OR HeapObjectSideTableEntry* } } }

Now we can come back to truly understand what happens with this incrementation operation we encountered in the _swift_retain_() method:

object->refCounts.increment(1);

This leads to the increment() method in RefCount.h:

Most of this logic is handling the case where the strong reference count overflows and a side table is needed. But for now let’s follow the fast path, incrementStrongExtraRefCount().

I looked into what SWIFT_UNLIKELY was and had to write an aside because it was so cool. It’s a branch prediction hint to the compiler. It helps tell the compiler that the “fast” path is way more likely, so it can optimise this code path with the expectation that it’s almost always the one that gets used. This means the hardware can speculatively execute the likely branch, basically borrowing free clock cycles from the future.

David Smith on Mastodon kindly explained the double negative via his profiling experience: if (SWIFT_UNLIKELY(!fast) is used because skipping the branch entirely simplifies the resulting assembly code.

This finally leads to the actual bitwise incrementing operation, which should hopefully feel a little clear by now.

The function is simply adding the increment (inc) to the inlined bits.

As long as the bits don’t overflow, it returns true, telling the increment() method that it’s using fast reference counting.

Offsets::StrongExtraRefCountShift is the bitwise shift that moseys the counting operation up towards the 1s and 0s at slots 33-62, where the strong extra ref count is stored.

“Deliberately overflows into the UseSlowRC field” betrays an elegant design choice: when the 30 bits of the strong ref count all fill, they overflow. This automatically flips the next bit, the “high bit” at the end of the 64-bit word, where the “use slow RC” flag lives. This triggers the slow path and side table allocation.

Here's where things get interesting.

Well, actually, I guess I hope it was already interesting. But, like, extra interesting.

To learn side tables, we must begin by understanding how weak references were implemented in the original version of Swift, and the problem this caused.

From Swift 1.0, up until Swift 4.0, objects stored two reference counts separately: a strong count and a weak count (which also counted unowned refs).

When the strong refCount went to zero but there were weak references remaining, the object was deinitialised but not deallocated: it stuck around in a “zombie” state.

The weak references still worked. The runtime nilled them out on access (or crashed, if an unowned reference was used), and decremented the weak refCount. It was only until all weak refs were accessed that the zombie memory could be deallocated.

This was a simple and performant implementation for weak references, but meant that a lot of memory could be hogged by these zombie objects.

With Swift 4, weak references were built different.

Now, when any weak reference is created to an object, the side table is created:

This is a separate heap allocation that contains:

  • A pointer back to the original object (HeapObject*)

  • Its own set of strong, unowned, & weak reference counts, plus flags (SideTableRefCounts)

Like the original InlineRefCounts, this is just a small object containing metadata about the object: reference counts & flags. Unlike HeapObject, which acts as the header portion of the object’s memory, the side table exists separately in memory.

When a weak reference is created, or any refCounts overflow, the InlineRefCounts struct is replaced by a pointer to this side table, with all the relevant reference counts copied over. Therefore the object and table both point at each other.

Critically, weak references point to this side table, not the object itself.

To create the side table, the InlineRefCountBits are loaded and used to set up the side table in RefCounts.cpp:

The side table handily solves the problem of zombie memory.

Now, when an object only stores weak references, it is deinited and immediately deallocated. The side table may live on, and weak references can continue to access it, returning nil if the object is gone.

Let’s refer again to the handy doc-comments in RefCounts.h to get a mental picture:

Storage layout: HeapObject { isa InlineRefCounts { atomic<InlineRefCountBits> { strong RC + unowned RC + flags OR HeapObjectSideTableEntry* } } } HeapObjectSideTableEntry { SideTableRefCounts { object pointer atomic<SideTableRefCountBits> { strong RC + unowned RC + weak RC + flags } } }

Now let’s understand why weak references are considered less performant than strong (or unowned) references.

You’ve likely heard about “indirection” of weak references leading to performance hits, and we can see it directly in WeakReference.h, where weak references themselves are defined in the runtime.

The WeakReferenceBits are, quite literally, the bits making up the pointer to the side table. This is the weak reference.

To access the side table, it calls getNativeOrNull() on the pointer’s bits. Then, calling tryRetain() on the side table’s pointer to the object, it attempts to create and return a strong reference to the HeapObject (or returns nil).

This adds many more steps to fetching memory compared to strong references, which already exist as pointers to HeapObject.

The performance penalty isn't just from the extra function call overhead here. The extra pointer dereferencing puts us at risk of a CPU cache miss every time we access a weak reference.

Don’t forget the side table allocation itself!

HeapObjectSideTableEntry* RefCounts<InlineRefCountBits>::allocateSideTable(bool failIfDeiniting)

This allocation and consequent migration of the reference counts create way more overhead on the initial weak reference. to ensure memory integrity, these operations must be synchronised across threads via atomic operations.

For both of these, there is additional overhead from ensuring the operations are all atomic and threads all see a consistent state.

One little-considered performance cost to weak references is the fact that the side table is now storing the strong (extra) and unowned reference counts.

Remember, the flag denoting the presence of the side table was called UseSlowRC. Slow. RC.

Now, anytime a strong or unowned reference is created, we require an extra layer of indirection to increment the count on the side table, instead of inline on the HeapObject.

See RefCount.h:

And tryIncrementSlow() is defined in RefCount.cpp:

Therefore, the very presence of the side table means all your subsequent reference counting operations are subject to potential cache misses.

This performance cost is real, but pretty small. The trade-off against the pre-Swift-4 situation where every weak reference would cause some memory to glom onto your shockingly overpriced iPhone RAM chips near-indefinitely is clearly worth it.

The situations in which this performance cost matters are rare, but as I like to say: profile your code before you listen to some guy on the internet!

Now, let’s look at my least favourite kind of reference: unowned.

Unowned references are the spicy middle ground between strong and weak.

Use them when you're absolutely certain the referenced object will outlive the reference. If you're wrong, the runtime will let you know. Violently.

Unowned references, like weak references, prevent retain cycles. Having an unowned reference count above zero won’t stop an object being deallocated, but they eschew the indirection of a side table: the unowned reference points directly to a HeapObject, therefore reducing cache miss potential. The unowned count usually lives in the InlineRefCountBits.

This gives better runtime performance when you can guarantee the lifecycle of the unowned-referenced object will outlive the reference itself, for example, the famous unownedExecutor that runs under the hood of actors. This is tied directly to the actor’s lifecycle, so the Swift library developers can be certain of their safety.

The downside of unowned references is that your program will crash if you mess up your assumptions around these lifetimes. Let’s look at how this crash actually takes place.

When accessing an object via an unowned reference, a runtime call is made to swift_unownedRetainStrong(). We can see this defined in HeapObject.cpp:

To (safely) access and use the unowned reference, the reference is temporarily promoted to a strong reference. More on this soon. tryIncrement() performs this promotion. We already saw this play out in RefCount.h:

getIsDeiniting() is the key here: if the object has been deinitialised, then the strong count increment operation fails and false is returned.

The swift_unownedRetainStrong function lands on the fallback branch: swift_abortRetainUnowned. This is a call directly into runtime/Errors.cpp:

When accessing an unowned reference, the runtime itself performs memory integrity checks on your assumptions about object lifetimes. If you get it wrong, much like the famous Stan of the Eminem song, it drives over a bridge and fatalErrors.

Runtime computational performance is naturally better with unowned references, as a benefit of the lack of side table indirection.

But it’s still a little worse than directly using strong references.

As we saw above, swift_unownedRetainStrong is called whenever you access an unowned reference, either promoting the reference to a strong reference while you use it, or aborting. This promotion is a memory integrity mechanism: while using the HeapObject via the unowned reference, the object’s strong refCount is incremented to keep it alive. This stops the memory being de-allocated and freed while a pointer can still access it*.

*This is known as use-after-free, and is the most common exploit for jailbreaks.

From Apple’s John_McCall:

“Unowned references are pretty much inherently slower than strong references: we have to do work to turn the unowned reference into a strong reference in order to actually use the object, and then of course we have to release that strong reference when we're done with it.”

The runtime cost of pointer promotion isn’t the only inefficiency we introduce with unowned references. Surprise, surprise, unowned references are basically implemented the same way as pre-Swift-4 weak references.

That’s right: unowned references keep memory hanging around, like an accidental fart in the elevator, even after the object is deinited. We see this in action by bouncing between the two files which, by now, are our two best friends:

The object is first deallocated in HeapObject.cpp: swift_deallocObjectImpl():

We check if the memory can be freed in RefCount.h: canBeFreedNow():

If the memory cannot be freed, the object state goes from deiniting to deinited: the zombie object. This is performed in RefCounts.cpp, with swift_unownedRelease():

The unowned release decrements the unowned reference count and then determines if the memory can be freed via swift_slowDealloc. This is, finally, done back in RefCounts.h with decrementUnownedShouldFree():

This means that the memory is freed when the last unowned reference is released. This gives unowned references the same memory performance hit that the old implementation for weak references had.

This information allows us to be smart about the trade-offs we make when designing our systems:

  • In compute-constrained environments, where we have very strong mental models of lifetimes, we might choose to prevent retain cycles with unowned references.

  • In memory-constrained environments, we may choose to use weak references when working with heavy heap-allocated objects.

In my opinion, unowned is generally great for library developers who know exactly what lifetimes and constraints they are working with. For feature work, I would frankly recommend that you avoid them, since you never know if the next person to make changes to the unowned object is going to carelessly mess with the lifetimes.

Finally, let me be clear: unowned references are not unsafe. Crashes are controlled detonations that actually prevent unsafe behaviour and protect system memory integrity. This (criminally underrated) piece goes into detail on fatalError and its implementation in the XNU kernel.

To cement our understanding of how the runtime orchestrates this memory allocation and freeing around unowned references, we can call again on our old friend, the ridiculously detailed inline documentation in RefCount.h.

It documents the entire lifecycle state machine of a heap object:

  • LIVE without side table

    • The object is alive.

    • Object's refcounts are initialized as 1 strong, 1 unowned, 1 weak.

    • No side table. No weak RC storage.

    • Strong variable operations work normally.

    • Unowned variable operations work normally.

    • Weak variable load can't happen.

    • Weak variable store adds the side table, becoming LIVE with side table.

    • When the strong RC reaches zero deinit() is called and the object becomes DEINITING.

  • LIVE with side table

    • Weak variable operations work normally.

    • Everything else is the same as LIVE.

  • DEINITING without side table

    • deinit() is in progress on the object.

    • Strong variable operations have no effect.

    • Unowned variable load halts in swift_abortRetainUnowned().

    • Unowned variable store works normally.

    • Weak variable load can't happen.

    • Weak variable store stores nil.

    • When deinit() completes, the generated code calls swift_deallocObject.

    • swift_deallocObject calls canBeFreedNow() checking for the fast path of no weak or unowned references.

    • If canBeFreedNow() the object is freed and it becomes DEAD.

    • Otherwise, it decrements the unowned RC and the object becomes DEINITED.

  • DEINITING with side table

    • Weak variable load returns nil.

    • Weak variable store stores nil.

    • canBeFreedNow() is always false, so it never transitions directly to DEAD.

    • Everything else is the same as DEINITING.

  • DEINITED without side table

    • deinit() has completed but there are unowned references outstanding.

    • Strong variable operations can't happen.

    • Unowned variable store can't happen.

    • Unowned variable load halts in swift_abortRetainUnowned().

    • Weak variable operations can't happen.

    • When the unowned RC reaches zero, the object is freed and it becomes DEAD.

  • DEINITED with side table

    • Weak variable load returns nil.

    • Weak variable store can't happen.

    • When the unowned RC reaches zero, the object is freed, the weak RC is decremented, and the object becomes FREED.

    • Everything else is the same as DEINITED.

  • FREED without side table

    • This state never happens.

  • FREED with side table

    • The object is freed but there are weak refs to the side table outstanding.

    • Strong variable operations can't happen.

    • Unowned variable operations can't happen.

    • Weak variable load returns nil.

    • Weak variable store can't happen.

    • When the weak RC reaches zero, the side table entry is freed and

    • the object becomes DEAD.

  • DEAD

    • The object and its side table are gone.

Here’s a quick tour of some ref-counting special cases that I didn’t care enough about to go that deep on. Hell, you have the tools now, so you can follow your curiosity.

If unowned isn’t performant enough for you, this reference skips runtime memory integrity checks entirely. This is effectively a strong reference directly to the heap object memory, except it doesn’t increment any kind of reference count. When you access an unowned(unsafe) reference for a freed object, it will SEGFAULT if you’re lucky, and perform dangerous use-after-free memory access if you’re unlucky.

c1fb8991-3b59-454b-acff-8b03628dd376_1199x710.jpeg.webp

Static & global objects can be marked immortal, which allows them to basically ignore all reference counting operations.

When it can predict an object’s exact size and lifetime at compile-time (SILGen to be precise), Swift can optimise certain heap allocations into faster stack allocations. Starting with an unowned count of 2 instead of 1 ensures the object's memory is never freed, since the count won’t reach zero. The object can still be deinited, but the memory is valid for the duration of the stack frame.

We began this piece with the sort of ruthless interview questions you might get if I’d missed lunch. Now, they should seem trivial to explain:

Where are the bits that make up strong, weak, and unowned reference counts stored?

The bits are inlined on the HeapObject header for each object on the heap. If a weak reference exists at all, these are all moved to a side table and the inlined bits is repurposed into a pointer to the side table.

Why are weak references less performant than strong and unowned references?

Weak references are less performant because they point to the side table. When accessing a weak reference, the reference is temporarily promoted to a strong reference, which requires indirection from the side table to the object itself. This has a high risk of a cache miss.

Furthermore, the presence of a weak reference means all future reference counting operations are performed on the side table, even when using a strong or unowned reference.

How does an unowned reference crash if it's accessed after deallocation?

Accessing an unowned reference also promotes the reference temporarily to a strong reference in order to prevent the memory from being freed during the access. The runtime checks the state of the object’s lifecycle and ensures the object is not deiniting before performing this retain operation. If the object is not active, the runtime calls fatalError to safely crash the app.

Are unowned references always more performant than weak references?

No! Weak references lead to slower pointer indirection, and so are computationally less efficient than strong and unowned references. But unowned references behave similarly to the pre-Swift-4 implementation of weak references.

To ensure the runtime memory integrity mechanisms behave correctly, deallocated objects are not freed (until all the unowned references go out of scope). This zombie object means unowned references are far less memory-efficient than weak references, which only keep around a relatively tiny side table.

Swift's reference counting mechanism is an elegant introduction to systems programming and compiler optimisation that balances performance, safety, and memory efficiency.

The next time someone asks you about reference counting in an interview, you can casually mention that weak references cause performance issues due to side table indirection and cache misses, that the strong reference count is stored with an off-by-one optimisation to save a couple of clock cycles in allocation, or that unowned references deliberately keep zombie memory alive to facilitate traps if you get your object lifetimes wrong.

Nothing feels better than watching your interviewer’s face light up when they realise they’re hiring someone who reads runtime source code in their spare time. Or making me your bitch.

Subscribe to Jacob’s Tech Tavern for free to get ludicrously in-depth articles on iOS, Swift, tech, & indie projects in your inbox every week.

Paid members get several benefits:

Read Entire Article