Response to "Subversion’s Future?"
Ben Collins-Sussman posted an interesting blog entry “Subversion’s Future?”, where one of the main points made was that while distributed source control systems are OK for smallish/open-source projects, Subversion’s sweet spot is with huge projects. I couldn’t disagree more. And here’s my response.
I’ve been using the distributed source controls systems for more than a decade and been watching other big projects using the distributed systems and it seems to me the DVCS provide the more benefits the bigger the size of the project. What are the characteristics of a huge project? In most cases, it means that there is a big team working on it. Big team means global team, all over the globe. This is not an “open-source thing”, this is a reality of corporate software development too. Most companies (well, at least those who are actually producing huge projects) are global companies, with offices in the U.S, Europe, India, etc. Working globally on the central repository *is* painful and slow. Tried to bisect the regression introduced between the releases, switching between many revisions, tried to follow the history of some code in Subversion? Doing this when the main repository is overseas is not fun.
A big team of engineers is typically organized in a hierarchy of smaller sub-teams focusing on particular area of the product. Again, it’s much more natural to organize a hierarchy of workspaces matching the structure of the organization. There are many benefits to that approach: mostly the members of such sub-team care about their area, not touching/changing anything in other places. And they better know their code, so they could find/fix new bugs and regressions faster. Typically, there is a special QA force for each sub-team, trained and specialized in testing particular area. Once they tested/OK’ed the particular state of the team code, it can be pushed upwards to the integration workspace. Thus, members of other teams won’t even be disrupted by local problems/regressions, since they would get the more stable and better tested code. The distributed source control allows to do that beautifully and naturally. Doing this in Subversion is seriously painful.
Also, what are those mythical huge projects that nobody knows about? How about OpenSolaris or Linux?Are they “huge enough”? How about Mozilla or Ubuntu or NetBeans or JDK or MySQL? And all these projects do use distributed source control tools. Solaris and Java SDK only recently were open-sourced. Before that they were, by all means, huge commercial software projects, each with many, many years of development by hundreds of people. They were developed with distributed source control system. There *is* a reason why Linux never used CVS/Subversion and why even commercial non open source distributed system was used to develop Linux (since there were no good open DVCS at the time). And the reason, of course, is that distributed source control helps managing the overwhelming complexity of the huge projects much better than centralized one.
One other point in the Ben’s blog entry was about usability and ease of use of Subversion. Yeah, it is easier to use in simple scenarios, but once the size of a project grows, it gets harder and harder. Besides, if engineers are smart enough to develop and maintain a huge project, adjusting to distributed source control systems would be piece of cake!
And for those folks who are stuck with Subversion, there is a great git-svn tool that would allow to leverage the power of distributed source control while working with centralized Subversion repository.
I’m not really saying that Subversion is “bad”. It was actually great for its time, but now there are better and smarter tools out there.
May 2nd, 2008 at 3:11 pm
What are your recommendations for the better and smarter tools?
May 2nd, 2008 at 4:05 pm
Mark,
For source control tools, I’d definitely recommend one of the latest DVCS like git or mercurial. Git is somewhat harder to use, but very powerful with huge amount of additional features. Mercurial is easier to use, has cleaner command line interface and works much better on Windows.
May 2nd, 2008 at 4:10 pm
Here’s an idea: Let’s posit for a moment that there’s a place for centralized source control systems.
Why not implement a new centralized source control system that runs git or hg on the server? Probably git, it seems a lot more suited to being used as a low-level source control system. As far as I can tell, even if you use git as a purely centralized system, other than some interface polish, it’s just better than SVN even then.
Thus the discussion about SVN’s future seems to be focussing far too much on the centralized vs. decentralized debate if you ask me.
May 2nd, 2008 at 4:22 pm
bazaar can manage both, centralized and decentralized.
May 2nd, 2008 at 4:31 pm
[...] Secondly, I don’t really see why Collins-Sussman thinks that Subversion’s advantage is with large projects. I think it’s the opposite. [...]
May 2nd, 2008 at 5:48 pm
The problem I see with DVCS in a corporate setting is “How do you centralize a decentralized system”? There needs to be one master tree where product releases are rolled from - it needs to be the final, authoritative source that anyone can pull from and get a particular branch point.
Development in these groups is intentionally bottle-necked at a single person so that there’s accountability for quality and institutional knowledge, which is often considered more important in these scenarios than individual productivity. Is there a DVCS answer to that concern? My team has tried using both git and mercurial in “centralized” fashion and in both instances ended up with corrupt “central” repositories after just a few days…perhaps, as Linus suggested in his Google tech talk, I’m simply too stupid and ugly to use DVCS.
May 2nd, 2008 at 6:02 pm
@AoD You have a lead or a manager of some sort on these teams no? You can usually find accountability there.
May 2nd, 2008 at 6:23 pm
AoD,
I don’t think that git or mercurial are “decentralized” systems. They’re distributed, and can be used in centralized manner. In those projects I’ve seen, the team members can commit only to their team repository, while in each team there is a special person/role: “integrator”. Only such person can push the code up in the hierarchy. Typically, it’s being done once the QA is OK with the quality of the code.
So individual team members in each sub-team happily work off their team repository, while integrators on all levels push the code up to the main repository - the root. That’s the repository where official Release Engineering folks create an official bundles from.
Typically, that main repository should be of good quality, since only very few integratiors can push there, and they push tested code.
That way, is much harder to break/damage the main repository than in Subversion case, when EVERYBODY has an access to the very same repo.
May 2nd, 2008 at 6:25 pm
Syl,
I agree, bazaar is yet another great distributed source control system. So, there are at least THREE great DVCS out there, and I don’t really see where Subversion fits in.
May 2nd, 2008 at 8:19 pm
You missed a very important feature, that being limited access. Some companies don’t want every developer to have access to all the code.
Likewise many corporate shops require tight integration with their bug tracking software. You don’t even get to check out a file without a documented need.
It is all about control.
May 2nd, 2008 at 8:53 pm
Commits imply merging into some repository, whether it is centralized or distributed. The difference is when the merge occurs. With a tool like subversion, it is at commit time. With hg and git, it is later on, some time after you commit to your owned repository.
If the changes are clear enough to review before being merged into the core product, this can work. If the changesets get too entwined with each other, reviewing a commit a week or two after making it becomes harder.
May 2nd, 2008 at 9:12 pm
I’m personally a big fan of distributed version control, however working in video games, where we have multiple hundred thousand line codebases and more problematically hundreds of gigabytes of content that is constantally being updated and needs to be in version control. I have just not personally been able to find out how to make distributed version control work in a situation where we easily have multi-terabyte repositories and a head revision that numbers in the 10s-100s of gigabytes.
May 2nd, 2008 at 11:48 pm
I’m not aware of git being available for Microsoft Windows.
For those of us “stuck” using Microsoft Windows, which distributed version control system do you recommend?
May 3rd, 2008 at 12:58 am
I enjoy Monotone, but I understand that most would prefer Mercurial. A problem that probally(just a guess) exists with all of them though is that the repositories store the information with case sensitive path names, and thus on window’s it is easy to get a folder into the repository with a case differing from the one on disk.
May 3rd, 2008 at 1:12 am
@Rich:
Mercurial: http://www.selenic.com/mercurial
Bazaar: http://www.bazaar-vcs.org
May 3rd, 2008 at 1:24 am
@Nicholas:
Mercurial and Bazaar both handle case insensitive path names.
May 3rd, 2008 at 1:43 am
So, granted that branch management on Subversion could be much better, and granted that I personally prefer to use DVCSes at this point, let me play devil’s advocate and say that your hierarchical development strategy can in fact be done in Subversion (or Perforce or any decent centralized VCS). All you have to do is have each group work in their own branch, and integrate up the repository hierarchy by integrating into different branches.
DVCSes make this easier but not because they are distributed, rather because they handle branches and merges properly. I still prefer to use them. But I don’t see any clear reason why you can’t do them on a CVCS that doesn’t make merges more difficult than they need to be. Even SVN can do it, though until/unless they start actually tracking branch/merge points it will be much more hassle than it needs to be.
May 3rd, 2008 at 2:36 am
@Rich: http://code.google.com/p/msysgit/
A preview but it works.
May 3rd, 2008 at 11:35 am
Jonathan Allen,
About limiting access. Yes, this one might be in favor of Subversion. But I believe git/hg also have means to solve this, via modules/forest type of thing.
But it most projects I’ve personally seen, the problem that often arises is not about very strict control but about things like mutable tags (in Subversion) when people accidentally change the content of tagged branch and nothing stops them (in default setting). Yes, there *are* tools to adjust the access rights, etc, and they are cumbersome and not very reliable (one error in config file and entire system might be totally open again).
May 3rd, 2008 at 11:37 am
Rich,
I use git on Windows with no problems. There are at least two efforts for git on Windows (git in Cygwin and MsysGit). Granted, they don’t work as great as on Linux/Unix, they a bit slower, but other than that git does work on Windows just fine.
May 3rd, 2008 at 11:39 am
Rich,
Oh, one more thing. Mercurial has a first-class windows integration out of box, and that was probably one of the biggest reasons why Solaris/JDK/Mozilla folks selected it over git at the time.
May 3rd, 2008 at 11:43 am
Nihil Est
The easy branching is just one of the benefits of DVCS, and even if Svn would get the excellent branching (I doubt it though from what I see in the upcoming SVN release), it is still painful to work with repository overseas. It is painful to have a single point of contact, it is painful to put all branches of all people and groups into single repository, and it is slower. And, if the main repo is down that would mean the entire organization is stuck not able to do the work. The main repository would be HUGE, hard to maintain/backup, folks would be stepping on each other toes and blocked the commit until the previous one is finished, etc.
May 5th, 2008 at 6:28 am
[...] them, but the financial model is definitely interesting. WIkiPatents Interview with Paul Graham Subversion & Git Stagerat - a site for live music fans Data stores and [...]