Switch to DVCS Shows Branching Behavior Changes

I came across an academic paper published by Microsoft Research that analyzes distributed version control systems (DVCS), particularly Git and Mercurial, and their effects on project health by comparing issues experienced while under central version control with those under distributed version control. While much of the paper is difficult to digest due to attempts at analyzing a repository’s branching/merging statistics and quantifying their related conflicts, I still found some interesting facts and nuggets of wisdom backed by their empirical research and interviews.

Introduction page for the paper can be found here.
And a direct link to the paper in PDF form is here.


Here are some points that I found interesting:


- The term “Semantic conflict” - All VC systems are good at syntactic conflicts, but not semantic conflicts.
- Awareness of ‘Distract commits’, which are commits that are required to resolve merge conflicts.

These are useful terms when discussing pros/cons and VC system efficiency when comparing different styles of VC. Small as they are, having a name for these issues is powerful. I haven’t read any research that investigates these ideas in a CVC system to compare against these results.


- Studies show that branch usage greatly increases with new adoptees of DVC.
  - Pre-DVC, 1.54 branches/month. With-DVC, 3.67 branches/month (though I worry about methods used to obtain this info)
  - The idea that prior to DVC, branches were created only for releases, not new features.
  - To effectively use DVC branches, create one for each new feature, localized bug fix, or maintenance effort.

As explained in the paper, branches are important for two primary reasons: a) Branches make it easy to roll back a problematic feature and b) Branches isolate the state of a changing code base while a new feature is being developed. Point ‘b’ is important because it reduces interruptions caused by integrating changes from other team members who are developing concurrently. I’d like to find more research about the differences between merging feature branches and using a branchless always-merging-into-central-source system for managing concurrent development. This paper seems to neglect mentioning anything about CVC systems, making their research an meaningless number.


- Studies show that even with DVC, a central repo is still used.

This is an important fact of DVC systems to note. Some developers who consider adopting distributed version control systems may not understand how to manage a team when each team member has their own repository - it appears to be chaos. Contrary to this, the research in the paper shows that many teams choose to adopt DVC systems simply for the flexible branching it enables, and use the same workflow as before. A DVC does not force its users to be extremely distributed, rather it is flexible and allows the team to choose a single repository to be the central repository, simply by convention.


  - Academics advise us to checkpoint code at frequent intervals in a place separate from the ‘team repo’. Only tested and stable code should be integrated into the ‘team repo’. DVC systems enable and encourage this practice.

This is an example of good behavior that is enabled by a DVCS flexible branches. Because a developer has their own copy of the repository locally, they can make many checkpoints in their code to their local repository. Then, when their feature is close to complete, before they push the feature’s development history to the team repository, they can ‘squash’ these minor commits into a more manageable number, which makes the commit history easier to review and makes rolling back features much easier.


  - An accessible DVC repo enables anyone to contribute to the project. Developers without commit privileges are reduced to developing in a time without version control. Accepting changes from unofficial project members, such as this, has high barriers.

Most central version control systems require user accounts to be created for contributors, and contributions must be ‘pushed’ into the central repository. This means that permissions to use the repository must be asked for and given, which is an unnecessary barrier. Suppose a person without permissions obtains a copy of the source code and develops a bug-fix or new feature, how can they contribute this to the project? If the project was under a DVCS, an easily accessed copy of the repo would be available that can be cloned by potential contributors. This person can then craft a proper commit in their copy of the repository and send a pull-request to project maintainer, which contains the URL of the contributor’s repository and commit number to make it simple to pull into the official project repository without the overhead of adding the one-time contributor as a user in the VC system.

Comments

Be the first to comment!

Leave A Comment

Please help us stop spam by typing the word you see in the image below: