Where golang got packages wrong

Preface

I recently watched a fantastic talk by Rich Hickley. On the face of it, it's about Clojure's spec package, but the underlying principles immediately hit me as applicable to both language ecosystems that I've been working with in the last few years (mainly JavaScript, and now Go).

Working with Go's packages and breaking changes has been a source of frustration for my team over the last few months. In this post I give a recap of parts of Rich's talk, see how they apply to Go with concrete examples, then discuss where I believe that Go got it wrong.

Change

It's not possible to paraphrase an hour long (self-professed) rant, but there are some key points that I can attempt to extract. The first set of points relates to how software changes:

  1. Publishing software is about making a commitment about the way something works. Specifically, what I'm giving you now, I'm not going to take away.
  2. We want to make our software better over time (e.g. faster, better designed, etc)
  3. The truth about dependencies can only be derived from code. Listing which packages you depend on doesn't give a true picture of the actual code dependencies.

The next thing we need to consider is that software, as it relates to change, has two important properties; what it requires and what it provides. Software requires and provides things at different levels - Rich talks about Clojure which has three levels (artifacts, namespaces and functions) - whereas we can consider Go which just has two levels (packages, functions).

  1. functions - This one should be quite intuitive. Functions require arguments and provide a return.
  2. packages - You can think of packages a simple a lookup function from the required name to the provided export (functions, but also structs/consts).

Finally, we can think about the different kinds of changes that you can make at each of these levels. Rich categorises them into three distinct types of change:

  1. Accretion - Providing more
  2. Relaxation - Requiring less
  3. Fixation - change without either of the above, e.g. fixing bugs, performance improvements

Putting aside some specifics of Golang's type system which we'll get into later, these three things are all ways in which you can change software without breaking it. It stands to reason that the inverse of these things must break software. Providing less or requiring more is, by definition, a breaking change.

Change in Golang

So how do we frame these changes in terms of writing Go? Let's take a look at the different types of change for packages and functions:

Functions

Go is statically typed, which means a 'function' is not just the name, but the whole signature. For example:

  • Foo() and FooBar() are different functions because they have a different name
  • Foo() and Foo(n int) are different functions because they have different arguments (requirements)
  • Foo() (*Bar, error) and Foo() Bar are different functions because they return (provide) different things.

Whilst Go doesn't allow us to have two functions with the same name and different signatures like some languages, it's still important to establish that fact that these are not the same function.

With that in mind, how can we define changes in terms of the same function? Go lets us define structs, which let us change what is required and what is provided without changing the signature explicitly. Suppose we had a struct called Rum which described a bottle of rum:

type Rum struct {  
  Name string
  Country string
}

And we wanted a way to search for a specific bottle:

type SearchOptions struct{  
  Name string
  Country string
}

func FindRum(options SearchOptions) (*Rum, nil) {  
  return database.Search(options.Name, options.Country)
}

If we change our code, we can keep our function the same by simply providing more:

type Rum struct {  
  Name string
  Country string
  Year int
}

By adding Year to the struct, we've provided more information without introducing any breaking changes. Programs written before Year was added will continue to work as expected. Likewise, if we improve our search algorithm so that it only needs to use Name instead of Name and Country to perform the search, we can require less by simply ignoring that part of the search:

func FindRum(options SearchOptions) (*Rum, nil) {  
  return database.Search(options.Name)
}

Now, anybody passing options.Country as "" will see the same behaviour, so we're requiring less.

It is important to note that whilst we can provide more by adding to a struct, we can't require less by removing a field from a struct. You can think of removing or adding fields from a struct as essentially changing the signature of the struct (which we know from functions makes it a different struct). So why can we add fields but not remove them? This is because Go includes a convenience of setting undeclared struct fields to their zero value.

Packages

Now we know when changes to functions are breaking, we can apply the same logic to packages. In fact, packages are simply a function of (name, [dependencies] -> exports). Or, if you're thinking in go:

func NewPackage(name string, dependencies []Package) Package  

Providing more means adding an extra export, such as a new function. Requiring less may mean having fewer dependencies for the same functionality. Just like with functions, these are backwards compatible changes.

On the flip side, requiring more (adding a dependency), providing less (removing an export) or changing the signature (changing the name) are all breaking changes. What's more, because packages export functions, changes to packages are the superset of changes to all functions in that package. That is, a breaking change to any exported function is also a breaking change to that package.

Managing change

Now that we've established which kinds of changes in Go are breaking and which aren't, how can we develop a strategy to manage those changes? This is where the Golang authors took a very hard-line and, I believe, smart approach. They essentially decided that for the lifetime of Go 1, they would never introduce any breaking changes:

It is intended that programs written to the Go 1 specification will continue to compile and run correctly, unchanged, over the lifetime of that specification. ... Go programs that work today should continue to work even as future "point" releases of Go 1 arise (Go 1.1, Go 1.2, etc.).

The APIs may grow, acquiring new packages and features, but not in a way that breaks existing Go 1 code.

This is a pretty bold commitment, and it's one of the things which makes Go a very attractive choice as a new language. From seeing the extreme difficulty of upgrading a large and critical deployment of languages (such as node) first hand at Uber, the promise of being able to update from from Go 1.1 to Go 1.8 without worrying that my code will break is fantastic.

The downside to this philosophy is that you better get it right the first time, because there's no going back. To work around this in a sensible manner, the Go authors chose the convention of putting experimental packages under the path x/ to signify that they may yet introduce breaking changes. When a package becomes stable, they move it from x/ to the standard library in the next release.

Where did it all go wrong?

Everywhere outside of the standard library. Let me explain.

The Go authors are notorious for being stubborn. Much of the scorn that they incur is misplaced, as people complain about not having feature X, even though that feature is orthogonal do Go's design goals. Still, my interpretation of the author's combined philosophy is:

This is the way we designed it. If you don't like it, don't use it. Wontfix. Closed.

This philosophy effectively means that everybody is expected to follow the standard library's approach to change. This is evident in the fact that the Go authors didn't include any tools for versioning or dependency management in the language.

By including gofmt as a standard tool in the language, Go effectively cut out all the format bike-shedding (an amazing idea, well executed). By building packages the way they did (canonical paths, not including any tools for dependency management), they effectively made the standard library's approach to change management the de facto approach.

In theory, this is fine. If everybody was to follow the same approach of never introducing breaking changes, it would be great. Just pull the latest master of each package and you're good to go! No need for versions, no need for lock files! Doesn't matter if you have a different version of a library in dev or prod, it just works!

Go went wrong by inherently relying on people to understand and to do the right thing. They trusted the community to follow their examples. In reality, people don't always do the right thing, and now we're paying the price.

I've been writing a lot of Go recently and consuming third party packages has been a constant source of frustration for me:

  1. Packages disappear from Github or other sources
  2. People introduce breaking changes (intentionally or accidentally)
  3. Packages are published without the promise of stability and they're never promoted to a stable version.

On that last note, even simple packages from Google have sat at "we can't promise we won't change it" version 0.x for over 5 years!

By providing a philosophy and not tools, the Go authors have forced the community to try and develop systems to protect themselves from these bad actors. We're now in the worst of both worlds: A do-the-right-thing philosophy, but not everybody is doing it.

Revert, revert, revert

In trying to protect itself, the community has reverted to what they already knew that kind-of worked. As Rich points out in his talk, Semver is inherently broken, but because we have no other option which is as consistently understood or used, it's the best we can do right now.

We now have a hugely fragmented system of multiple dependency managers (of which only some inter-operate nicely) and other people who ignore them completely and subscribe to the author's model. Some packages publish releases with Semver, many don't.

This has resulted in forcing us to take action such as vendoring all our dependencies and pinning packages to specific hashes. How can we not pin packages to specific versions if they appear stable but have a provision saying that they could break your shit at any time without warning?

We're now in worse place than we would be if Go has said "semver isn't great, but you have to use it to publish a package". And that is a travesty. That's where they got it wrong.

What could have happened differently ?

One of Go's most alluring features for those coming from other popular modern languages is its static typing. Given the explicit rules about breaking changes defined above, it would have been possible for the Go authors to include a set of standard tools to enforce that these changes didn't happen:

  1. Write a program, call it go breaks that given a diff/commit range, determines if any breaking changes have been introduced.
  2. Host a managed third party package registry which enforces no breaking changes for packages at the same path.
  3. Disallow removing packages from this registry.

If those things had happened from the start, we'd have a solid place for packages to live which we know that we could use without worrying about breaking changes, without the need to vendor all packages (though you might choose to anyway for other reasons) and without the need to pin specific versions or hashes.

We'd be able to develop against it as if it was the standard library, and reap all of the benefits that the Go philosophy gives us.

Of course, this is not a complete solution. There are still changes which don't formally change signatures but which break behaviour, and those can't be detected with a program (but that even happens in the standard library). People would still be free to go off-reservation and put their own packages in other places, but using them would be purely at your own risk.

Hopefully Go will standardise around something like this in the future, because anything would be better than what we have now.