What Mother never told you about SVN Branching and Merging (2009)

4 days ago 1

Developers are fond of recounting their disastrous experiences branching and merging in Subversion and CVS.

Yet, thousands upon thousands of teams use SVN. Are they just not branching? What’s going on?

The Usage Model That Will Kill You
Commonly, you create a branch from trunk (at revision 100 for example), and then make a bunch of changes. Meanwhile, the rest of the team makes changes to trunk while you work in your branch. Then, you realize that while you’re not done with your branch yet, you need to get those trunk changes into your branch.

So you merge from trunk into your branch. You do this by taking all the changes in trunk that happened after revision 100 and applying them to your branch. Often, it seems to work. In reality, you are already pretty screwed at this point, you just don’t realize it.

For a couple weeks, maybe, you merge from trunk into your branch. A couple times.

Then comes your trip over the falls in a barrel. You are happy with your branch; you want to commit it to trunk and let the rest of your team in on your brilliance. You do this by taking the changes from your branch and merging them into trunk. This almost never works. All your tree is marked as changed, or files disappear, or changes are scrambled. Conflicts are marked where none should exists, and other changes are just gone.

You cry, you weep, you descend into alcohol and drug use, you watch old Star Trek reruns. You curse SVN forever.

Invariably people around you say “use GIT”; “use Mercurial”; “use Arch”; whatever. Go for it.

But why, oh why, does SVN (and CVS) fail to work? “This should work!” — I’ve heard that from so many developers so many times!

Why doesn’t it work?
Now, I am not going to go through the details as to why it doesn’t work, but there is a simple reason it doesn’t work: time. More specifically, order of changes.

The first time you pull from trunk to your branch, you copy a set of changes. Then second time you do it, depending on how attendant you are to start revisions, you may copy those changes again — this is slightly problematical.

The killer is when you then merge all the changes from you branch to trunk — not only do these include a bunch of changes already made from trunk, your time line is now screwy. Changes that occurred in the past to trunk are now showing up again in the merge, and SVN (and CVS) get very confused. Why? Look at the picture:

SVN Branch Merge Example 1 - Red for Trunk, Cyan for Branch.

SVN Branch Merger Example 1

For this example, assume this shows the sequence of changes to a single file in your repository. T0, T1, T3, and T4 indicate points in time at which your file is edited. That is also the order in which the modifications took place. This is important to SVN (and CVS). Why? Let’s look at what happens when you get to T5 and want to merge your changes into trunk: The sequence looks like: T1 (already was in trunk, and was reapplied to the branch), T0 (from the merge at T2), T4 (because it was already in trunk), then T3. So the order the merge tries to resolve is T1, T0, T4, T3. But it is more awkward than that, because T0 and T4 are already committed to trunk, really it looks like you are applying T0, T4, T1, T3. Or maybe, just T1, T3. But since changes are in terms of the previous state of the file being changed, T3’s modifications knew nothing about T4’s modifications, so you are applying modifications out of order, and things go boom.

Multiply this by a hundred files with hundreds of changes each both in trunk and your branch, and you have a recipe for disaster.

(Oh and file and directory deletions and additions are particularly damaging in this scenario.)

Now, this is an empirical explanation — there was a time where I walked through CVS diff logs to see what was going on but that was ten years ago. But through testing, I know this is reasonable model of SVN’s behavior when it comes to branching and merging. Lacking time to grok all of SVN’s source code, an empirical model is good enough for me to know working this way won’t work.

Bunny Hopping
So what is the solution? A deceptively simple rule shows the way:

All changes that are going to be committed to Trunk, must be applied to all changes already committed to Trunk, before those changes are merged into Trunk.

Hmmm. What does that mean? In the example above, it means that the changes at T1 and T3, even though they took place before the trunk change T4, need to be applied after T4.

How to do that? The secret is Bunny Hopping.

Consider this example:

SVN Branching with Bunny Hops

What’s going on here? First of all, instead of one branch, we have three. You may have twenty. Branching in SVN is fast enough not to care (in CVS though, ouch). Let’s look at the temporal sequence again:

You branch.
Someone else changed T0 on trunk. Sequence in trunk: T0
You make the change T1 on your first branch. Sequence in Branch: T1
You create a new branch from trunk at T2; it contains change T0. Then you merge your changes from your prior branch (the yellow arc — a Bunny Hop) into your new branch. This brings change T1 into your second branch after change T0. Sequence in your branch: T0, T1
You make change T3 in your branch. Sequence in your second branch: T0, T1, T3
Someone else makes change T4 in trunk. They committed to trunk first; they win. Sequence in trunk T0, T4
You make your third branch, at time T5. It includes changes T0, T4 in that order. You merge all your changes from the previous branch — this means you have to do the merging necessary to make changes T1 and T3 blend in correctly after T0, T4. But you get to do this in a workspace, off of a branch, nicely isolated from trunk. It is usually pretty easy. Sequence in the branch is now T0, T4, T1, T3
With no other overlapping changes between when you cut this branch and now (between T5 and T6), you merge your branch into trunk. Because all of the changes in this branch take place after the already committed changes in trunk, the merge goes smoothly.

When I am working, I number these sub-branches; if I am working on caching, I’ll end up with caching-1, caching-2, caching-3, etc. until I finish working on caching. It works well, mainly because branches are low-overhead, low-cost objects in SVN.

This methodology was discovered by a colleague of mine (Go Steve!) who grew so frustrated with our team’s inability to work productively in branches that he (and several of us) spent time in play repositories just trying different things. That was pre-IDE for me; we used Emacs and men were men.

We eventually evolved a neat little Python tool to manage the bunny hopping automatically for us — that was really cool. I’ve often wished I had the time to write a Subversive extension to do the same thing in Eclipse. But even manually, it is so ingrained in me as a work habit that I barely notice I do it. And I branch and merge, successfully, all the time.

Bunny hopping. What Mother never told you about Branching and Merging.

Explore posts in the same categories: Tech