As any good software developer should do, I use a version control system (VCS) to maintain my source code and other documents. I’ve used a variety of VCSs, both centralized and decentralized, and closed- and open-source tools. In the last two years I’ve switched from using CVS as my VCS to using Bazaar, and thought I’d post about why I made that choice.
What’s a DVCS?
Bazaar is an example of a new type of VCS, called distributed version control systems (DVCS). What’s a DVCS? Short answer: it’s a system where there is no assumed central or canonical source repository. Rather every checkout in a first-class repository in its own right, and anybody can commit to their own repository. With a DVCS, the choice of a canonical repository is a social or policy issue rather than a technical issue.
Why I like to use a DVCS comes down to two things:
- DVCSs separate versioning from releasing:
Changes are only copied between repositories when explicitly pushed or pulled. Since I have control over when I push changes from my repositories, I can commit when I want, even if the code is in a broken state, without fear that it will interfere with others. This separation of versioning from releasing is fantastic for recording touchpoints (for work-in-progress) or before undertaking a scary refactoring, and is one of the things I loved and missed most from OTI’s ENVY/Manager.
- DVCSs provide for easy branching and merging:
Making a branch is a trivial, lightweight operation; many people make a branch for each bug. You no longer have to remember what revision a branch came from before merging. You can sync to HEAD at will.
What is Bazaar?
Bazaar is the third of the Big Three open-source DVCSs, at least in terms of mindshare. In first spot is Git whose command-line syntax I find to be tortuous. In second spot is Mercurial aka Mercurial, largely due to being selected by Sun and Mozilla for managing OpenSolaris, OpenJDK, and the Mozilla codebases. But Bazaar should command a much bigger share, IMHO. I’ll explain why I like Bazaar.
Nice Features of Bazaar
The Command Line Feels Right
The CLI felt good: it’s CVS/Subversion-like. This similarity made it easy to migrate to, as well as easy to predict.
Supports Centralized and Decentralized Workflows
Bazaar can be set up to support centralized workflows through the use of checkouts. This is actually useful if you’re (a) paranoid [my trunk is a checkout of a branch on a different hard drive] or (b) you want to put your branch on a shared server.
Branches As First-Class Representations
Bazaar natively represents branches as directories in the file system, as opposed to the colocated branches as used by Git and Mercurial. I like its branches-as-first-class representations in the file system as I can use my many shell scripts and programs for processing the trees.
I also find Git/Mercurial’s co-located branches to be hard to wrap my brain around. There is a nifty plugin, bzr-colo, bringing co-located branches for those who like it though. I’ve begun to see some situations where co-located branches can be useful; with Bazaar, I can have both.
Can Push to Dumb Servers
Bazaar can push a branch across FTP and SFTP with no configuration of the remote side. This is amazingly helpful. Sure Bazaar can send patches via email, but I never use it (and I wrote the integration for MacOS X’s Mail.app!). It’s just so much easier to push a branch to a temp dir on my website where my collaborators can pull from it. No need to deal with mis-shaped patches.
Extensible Through Plugins
One of the less-unappreciated features of Bazaar is its rich plugin ecosystem: bzr-colo is just one example of the many plugins available for Bazaar. You can hook into various events to do additional processing, and it’s relatively easy if you can program. For example, VMWare Fusion/Samba turns on +x on any files touched, so I hacked one up a plugin to run a “find . -name ‘*.cs’ -print0 | xargs −0 chmod a-x” within the branch prior to commit. Some of the plugins are extraordinarily powerful.
Native Interaction With Non-Bazaar VCS Repositories
Bazaar has one developer is particular focussed on interacting with foreign repositories. Bazaar can now pull from Git, Subversion, and Mercurial repositores directly; it’s not as fast, but it means I don’t have to use Git (I’ll do anything to avoid Git’s CLI). There are some people in the community who are working on Perforce integration.
The beauty of this ability is that I don’t have to care whether other people put projects up on github.com, as I can access them using Bazaar rather than my nemesis, Git. It also means that projects can simply use Subversion to maintain their canonical branches and their developers can use their own tools without fear of interaction problems.
Unique Revision Identifiers and Human-Readable Revision Numbers
Like Git and Mercurial, Bazaar also uses SHA1 identifiers (called revids), but by default it exposes a per-branch revno. These revnos are monotonically increasing within a branch. Two branches with identical history will be at the same revno. There’s a slight pitfall in that revno N may represent two completely different revisions for two different branches if their branch histories are different, but it’s easy to work around that by using the branch name and revno — a human meaningful identifier. You can reference another branch by location and a revno within that branch, and you can also merge (‘cherrypick’) particular revisions too.
Bazaar also has a nice feature called stacked branches, where the local branch is created with a reference to the remote branch, and only the most recent revisions are downloaded. The remote branch is used for any operations that require looking at the entire history. As long as your operations are local (e.g., commits, diffs against the previous version), you won’t have any remote fetches.
Misconceptions about Bazaar
There are a few misconceptions that have built up around Bazaar.
- Bazaar is Canonical: It is true that many of the core developers are Canonical employees. Any many of the outside contributors seem to become hired on as employees. So what? Bazaar is GPLd, and Canonical has said repeatedly that it values Bazaar, and continues to dedicate resources to Bazaar’s development and support. Seems like a win-win situation to me.
- Bazaar is slow: It may have been slow once, but Bazaar has made huge leaps in performance in 2009, and with their 2.0 release Bazaar now competitive with Git (with one caveat: start-times; see below), and matches or outperforms Mercurial. It’s likely that if Sun, Mozilla, and Python were to re-evaluate their switch that they would choose Bazaar (IMHO). Especially since its cross-platform support is very good.
- Bazaar doesn’t scale: I actually thought this myself: I use Bazaar to maintain shadow repositories of some other source bases that I periodically access (e.g., pkgsrc, and some Eclipse projects), and doing updates could be agonizing. Then I tried using Git to manage the trees and discovered that Git and Mercurial suffered in the exact same way: pkgsrc is a huge beast, with 92979 files, and 25554 directories, and the scaling problems I saw were to do with the filesystem, not the tool. pkgsrc may actually be a good example where a model other than a versioned-tree is likely best.
So is Bazaar perfect? No, but it’s pretty damn good. You should give it a try.
For further information, see:
- Bazaar Migration Docs
- Bazaar in 5 Minutes and the Bazaar User Guide
- Download page
- the main Bazaar site
Appendix: Bazaar Tips
Aliases Prevent RSI
Bazaar allows specifying aliases to shorten frequently-used commands. Here are some of my frequently-used aliases; simply copy these into your ~/.bazaar/bazaar.conf:
[ALIASES] # --show-diff brings in a diff into the commit message c = commit --show-diff # show the last ten commits l = log --short --forward -r-10..-1 # show the last commit last = log -c −1 # use a coloured wdiff (http://www.gnu.org/software/wdiff/) cwdiff = diff --using cwdiff # diff ignoring whitespace changes diffspace = diff --diff-options=-wb # show changes that would come from pulling/merging a branch incoming = missing --theirs-only # show changes that would go when pushing/merging to a branch outgoing = missing --mine-only
Avoiding Sluggish Startup
In my opinion, Bazaar’s one downside is its apparent sluggishness on start-up. I say apparent as this sluggishness comes from the start-up costs from loading Python and the Python source. It’s not too bad when the Bazaar source files are in the cache, but
bzr rocks from cold-start on my MacBook Pro takes 3.5s [aside: Mercurial is just as slow].
Fortunately this is easily worked around by using bzrtools’ ‘shell’ extension (bzrtools is one of the popular plugins for Bazaar). This shell provides a command-line shell, complete with file and command-completion, and scrollable history. The Bazaar commands are then first-order commands (you just type “status”, “commit”, “log”, etc.): when it identifies a Bazaar command line, it executes the command locally. Otherwise it invokes the command using the user’s shell. This works well with my workflows, where I usually use a single window for commits anyways.
Bazaar Explorer, a powerful GUI that’s apparently pretty user-friendly. (I haven’t used it; I’m pretty happy with the command line.) You can see a walk-through at: http://doc.bazaar-vcs.org/explorer/en/visual-tour-windows.html
Launchpad is the Bazaar equivalent to GitHub. Unlike GitHub, Launchpad is oriented around projects rather than people; I’ve recently had to use GitHub to work on a CVCS-style shared project and we found it awkward; we all ended up using a single person’s repository.