A Simpler Phabricator Stacked Diff Workflow

TLDR: Squash commits. Branch from branches. Diff against branches. Rebase changes from bottom up, land from bottom up.

In this article I'd like to address one of the things that I see people struggling with quite often: how to effectively work on multiple dependent git branches when using Phabricator.

Uber switched to using Phabricator at some point in mid to late 2013 (which was before I joined). For me, moving from GitHub to Phab was a little confusing at first because they operate in fundamentally different ways. A lot of the workflows that people have established with GitHub just don't translate across directly which causes a lot of git-related face palming. If you're in that state right now, don't worry. I've got you covered. Here we go.

Prerequisites

  1. Update your tools. Run arc upgrade to get a new arc and libphutil.
  2. Ensure you're working in a compatible repository (see below).
  3. Understand the differences between how GitHub and Phab work (also below).

Compatible repos

Before we get started, this article is intended to define a workflow which is compatible with mutable repos - i.e. ones where git history can be rewritten. This is the default for Git with arc. The part that really matters here is that landing changes causes git to squash commits, and that's where most people's branching strategies get broken. We also need to understand that:

A diff is not a branch

This is the fundamental thing that you need to get your head around when moving from GitHub to Phab. When working with GitHub - if we want to make a change - we branch, then push that branch to a remote. GitHub then does a diff between those branches to calculate what has changed. When we hit 'merge', we're doing just that, we're doing a literal git merge of those two branches.

With Phab, a diff is not a branch. In fact, most of the time, we never push our branches to the remote. Branches live on your local machine and arc computes the diff between your branch and the target (most often the local master branch). It then pushes that diff up to Phabricator. When you go to land a branch, you're essentially just doing the merge/rebase locally, pushing the new master up to the remote, then deleting your branch.

I won't go into the pros and cons of these differences in this article, but it's a prerequisite to understanding how you can adapt your git workflow to remove headaches.

The Problem

So let's say that I want to start work on a new large feature. Let's say that this feature is likely to span a few thousand lines and many different files. Because I'm a good team player who strongly believes in proper code review, I don't want to give my teammates a diff to review that's anything larger than about 300 lines. Each diff should be concerned with a single logical change.

So I want to split my work up into 5 smaller diffs, with each one building on the last. But my teammates are busy and I can't expect them to drop what they're doing to review my first diff before I can move on.

As as side note, fellow Uberetto Kurtis Nusbaum has come up with a different solution. Whilst his strategy has as few benefits, it's a pretty radical departure from people's usual workflows. My attempt is much more similar to the GitHub model.

Part 1 - Simple example

Let's boil the problem down to the base case and address that first. We have a branch A with a change, and a branch B with a change which depends on A. I'm going to make these changes in a toy repo to keep things simple.

We start by configuring .arcconfig if it's not already done:

✔ ~/Projects/phab/willyd [master|✔] $ git log --oneline
65c437c Add .arcconfig

Our file shows immutable history:

{
  "repository.callsign": "WILLYD",
  "phabricator.uri" : "https://phab.company.com",
  "history.immutable": false
}

Now let's create a branch and make a simple change, then commit that change:

✔ ~/Projects/phab/willyd [master|✔] $ arc branch willA
Branch willA set up to track local branch master by rebasing.  
Switched to a new branch 'willA'  
✔ ~/Projects/phab/willyd [willA|✔] $ mkdir -p stacked/simple
✔ ~/Projects/phab/willyd [willA|✔] $ vim stacked/simple/a
✔ ~/Projects/phab/willyd [willA|…1] $ git add .
✔ ~/Projects/phab/willyd [willA|●1] $ git commit -m "Create simple directory. Add initial work in a"

Nothing radical there. We can now run arc diff like normal to create a revision:

✔ ~/Projects/phab/willyd [willA ↑·1|✔] $ arc diff
Created a new Differential revision:  
        Revision URI: https://phab.company.com/D670

Included changes:  
  A       stacked/simple/a

Good so far. We now want to move on to the next part of our feature whilst other people are reviewing that first diff. So now we want to make a new branch from willA:

✔ ~/Projects/phab/willyd [willA ↑·1|✔] $ arc branch willB
Branch willB set up to track local branch willA by rebasing.  
Switched to a new branch 'willB'  

Note that git reports that willB is set up to track willA, not master. That is very important. Let's do some work in willB:

✔ ~/Projects/phab/willyd [willB|✔] $ vim stacked/simple/b
✔ ~/Projects/phab/willyd [willB|…1] $ git add .
✔ ~/Projects/phab/willyd [willB|●1] $ git commit -m "Add work in b"
[willB cebed10] Add work in b
 1 file changed, 1 insertion(+)
 create mode 100644 stacked/simple/b

So we're ready to have our new changes reviewed. Make sure you're paying attention, because this part is the key to the whole thing. We want to diff against willA. From what I can tell, most people only ever use arc diff - if we did that, we'd create a diff between willB and master, which would also send the changes that we made in willA.

✔ ~/Projects/phab/willyd [willB ↑·1|✔] $ arc diff willA
Created a new Differential revision:  
        Revision URI: https://phab.company.com/D671

Included changes:  
  A       stacked/simple/b

Now let's examine the state of our branches and revisions with the handy arc branch:

✔ ~/Projects/phab/willyd [master|✔] $ arc branch
* master No Revision  Add .arcconfig
  willA  Needs Review D670: Create simple directory. Add initial work in a
  willB  Needs Review D671: Add work in b

Looks good! We've got two separate diffs up, and arc knows which branches they relate to. Because this isn't the real world, let's imagine that both of our diffs are accepted without us needing to make any changes.

Time to land our work - this part is important. We need to follow two rules here:

  1. We have to land from the bottom up.
  2. We have to land with --keep-branch.

This flow is often referred to as 'stacked', but maybe 'queued' is a better explanation because our diffs are FIFO. That means that we have to land willA before we land willB. Landing willB first will work, but it'll cause you some problems that I don't want to get in to here. Thankfully, I think that it's pretty intuitive to land willA before willB.

Using --keep-branch tells arc not to delete willA after we've landed it, which is the default behaviour.

✔ ~/Projects/phab/willyd [master|✔] $ git checkout willA 
Switched to branch 'willA'  
Your branch is ahead of 'master' by 1 commit.  
✔ ~/Projects/phab/willyd [willA ↑·1|✔] $ arc land --keep-branch
Landing current branch 'willA'.  
This commit will be landed:

      - 5acd280 Create simple directory. Add initial work in a

Landing revision 'D670: Create simple directory. Add initial work in a'...  
BUILDS PASSED  Harbormaster builds for the active diff completed successfully.  
PUSHING  Pushing changes to "origin/master".  
...
DONE  Landed changes.  

Let's take a look to make sure that everything went OK:

✔ ~/Projects/phab/willyd [master|✔] $ arc branch
  willB  Needs Review D671: Add work in b
* master Closed       D670: Create simple directory. Add initial work in a

Even though our willA branch is still around, arc no longer cares about it because its HEAD doesn't point to any revision. However, the HEAD of master points to D670 - the revision we just closed by landing - which is why it shows up as associated with the master branch. Let's go ahead and land willB:

✔ ~/Projects/phab/willyd [master|✔] $ git checkout willB 
Switched to branch 'willB'  
Your branch is ahead of 'willA' by 1 commit.  
  (use "git push" to publish your local commits)

If we checkout willB, you'll notice that it still knows it's ahead of willA, because we kept that branch around. If we try and arc land, arc doesn't know what to do, because commit history of willB references commits pertaining to both revisions:

✔ ~/Projects/phab/willyd [willB ↑·1|✔] $ arc land
Landing current branch 'willB'.  
These commits will be landed:

      - fb789bf Add work in b
      - 5acd280 Create simple directory. Add initial work in a

Usage Exception: There are multiple revisions on feature branch 'willB' which are not present on 'master':

     - D670: Create simple directory. Add initial work in a
     - D671: Add work in b

Separate these revisions onto different branches, or use --revision <id> to use the commit message from <id> and land them all.  

This is expected, because the diff from willB to master contains both commits. Now, this is the part where we're going to cheat a little bit. Because we're squashing commits, we can just land this diff onto master and retain the correct history. So we simply follow the advice and use --revision:

✘-1 ~/Projects/phab/willyd [willB ↑·1|✔] $ arc land --revision D671
Landing current branch 'willB'.  
These commits will be landed:

      - fb789bf Add work in b
      - 5acd280 Create simple directory. Add initial work in a

Landing revision 'D671: Add work in b'...  
 UPDATE  Local "willA" is ahead of remote "origin/master". Checking out "willA" but not pulling changes.
 CHECKOUT  Checking out "willA".
 DONE  Landed changes.

One final caveat; we're left on the willA branch, but we can just switch to master and pull to ensure we have the latest changes:

✔ ~/Projects/phab/willyd [willA ↓·1↑·1|✔] $ git checkout master
Switched to branch 'master'

✔ ~/Projects/phab/willyd [master ↓·1|✔] $ git pull
First, rewinding head to replay your work on top of it...  
Fast-forwarded master to aef75bb4fe05d82df87914c077470b809a6471f1.  

Notice that we got 1 change (the work we landed from B). We can now verify that our git history reads correctly:

✔ ~/Projects/phab/willyd [master|✔] $ git log --oneline
aef75bb Add work in b  
6fd4f22 Create simple directory. Add initial work in a  
65c437c Add .arcconfig  

Finito! However, in the real world, we don't often put up diffs which don't require any changes, so let's take a look at that use case too.

Part 2 - When things change

The above represents the base and ideal workflow, but in the real world, most diffs will need some changes. What happens to B if you need to change A?

Following the same steps, let's create our first diff in A:

✔ ~/Projects/phab/willyd [master|✔] $ arc branch willChangeA
✔ ~/Projects/phab/willyd [willChangeA|✔] $ mkdir -p stacked/changes/
✔ ~/Projects/phab/willyd [willChangeA|✔] $ echo 'Something in a' > stacked/changes/a
✔ ~/Projects/phab/willyd [willChangeA|…1] $ git add .
✔ ~/Projects/phab/willyd [willChangeA|●1] $ git commit -m "Add work in a"
✔ ~/Projects/phab/willyd [willChangeA ↑·1|✔] $ arc diff

Then do the same with B:

✔ ~/Projects/phab/willyd [willChangeA ↑·1|✔] $ arc branch willChangeB
Branch willChangeB set up to track local branch willChangeA by rebasing.  
✔ ~/Projects/phab/willyd [willChangeB|✔] $ echo 'Something in b' > stacked/changes/b
✔ ~/Projects/phab/willyd [willChangeB|…1] $ git add .
✔ ~/Projects/phab/willyd [willChangeB|●1] $ git commit -m "Add something in b"
✔ ~/Projects/phab/willyd [willChangeB ↑·1|✔] $ arc diff willChangeA
Included changes:  
  A       stacked/changes/b

Now inspect our branches, nothing new here:

✔ ~/Projects/phab/willyd [willChangeB ↑·1|✔] $ arc branch
  willChangeA Needs Review D672: Add work in a
* willChangeB Needs Review D673: Add something in b

But now let's say that during code review, we get a comment on willChangeA and need to make a change to that branch. We can go to willChangeA and make the change as normal:

✔ ~/Projects/phab/willyd [willChangeA ↑·1|✔] $ echo 'Additional work in a' >> stacked/changes/a
✔ ~/Projects/phab/willyd [willChangeA ↑·1|✚ 1] $ git add .
✔ ~/Projects/phab/willyd [willChangeA ↑·1|●1] $ git commit -m "Fixing things in a"
✔ ~/Projects/phab/willyd [willChangeA ↑·2|✔] $ arc diff
Updated an existing Differential revision:  
        Revision URI: https://phab.company.com/D672

Included changes:  
  A       stacked/changes/a

Phab correctly understands this change and updates the diff, but willChangeB has no idea that we just made this change. If we switch to B and tried to diff, we'd get problems, so we need to rebase willChangeB onto the new willChangeA:

✔ ~/Projects/phab/willyd [willChangeA ↑·2|✔] $ git checkout willChangeB
Switched to branch 'willChangeB'  
Your branch and 'willChangeA' have diverged,  
and have 1 and 1 different commit each, respectively.  
  (use "git pull" to merge the remote branch into yours)
✔ ~/Projects/phab/willyd [willChangeB ↓·1↑·1|✔] $ git pull
From .  
 * branch            willChangeA -> FETCH_HEAD
First, rewinding head to replay your work on top of it...  
Applying: Add something in b  

We can do this rebase by simply running git pull to bring changes in. If there was a conflict, you need to do an interactive rebase as usual to fix them (which I won't go into in this article).

We can now check our log and verify B knows about the new commit and has them in the right order:

✔ ~/Projects/phab/willyd [willChangeB ↑·1|✔] $ git log --oneline
9a27faf Add something in b  
e6880b1 Fixing things in a  
8d42ab9 Add work in a  

We can now carry on working on B, and we can also send the rebased diff to Phab (to build in CI) by running the same diff command:

✔ ~/Projects/phab/willyd [willChangeB ↑·1|✔] $ arc diff willChangeA
Updated an existing Differential revision:  
        Revision URI: https://phab.company.com/D673

Included changes:  
  A       stacked/changes/b
Addendum - Branch ad nauseam

This post covers the case of two simple branches, but there are two other real world cases that I commonly encounter which I think I should mention:

  1. When we have a larger dependency chain, e.g. A -> B -> C -> D.
  2. When we have a dependency tree and not a chain, e.g. A -> B, A -> C.

For the large chain, the exact same process applies, but you need to propagate changes all the way along the chain as you go. For example, if you make a change in A, you need to checkout B and rebase, then checkout C and rebase, etc. It's important that you never checkout C and rebase master or A directly. This is why I only ever rebase (in this context) by using git pull.

Dependency trees instead of chains is also pretty simple to deal with. If you make a change to A, you need to checkout B and pull, then checkout C and pull too. Land A with --keep-branch, then rebase B and C. You can land either B or C first, but you'll have to rebase one on the other (and fix any conflicts) like usual.

Conclusion

This method isn't drastically different to the GitHub workflow, and isn't very hard to figure out, in fact, I'm sure many people are already doing this as standard. All it takes is a little bit of digging with arc help to uncover some of these useful commands.

There are probably plenty of nuances to how this approach works with your particular set up, but it works well for me, so hopefully it'll help a other few people out there too.