Reading through bazaar-vcs.org/ here are my comments about this VCS:

Bazaar

Bazaar is very interesting. However it lacks a cryptographically secured repository like GIT. Therefor in my situation, it is no improvement over CVS.

Bazaar has a way to access patches through the SHA1 sum, however this does not cryptographically secure the repository, as this sum can only be used to access a patchset, it does not sign all patches used for a revision. (Note that the claim that you cannot change the archive format when you use crypt-hashes is wrong. Nobody says, that the files must be hashed in binary form. It's enough if there is a way to compute a crypt-hash for the contents which includes the dependant crypt-hashes. This also solves the problem, when you try to authenticate archives from a third party. You do not do it, as all you do is to depend on the crypt-hash of the sources, which will never change, as you can verify this by applying reverse patches).

Cryptographically signed patches != Cryptographically secured repository

It does not help to cryptographically sign patches such, that they, in turn, include a path to check for other patches. Here is why:

  • The cryptographical backbone must not include components which are not always available. If I have Python installed, this does not mean that GPG is available on the platform, too. However even in absence of GPG you must be able to checkin and publish, that is, tell the revision number and hash to somebody.
  • The crypto must not depend on 3rd party. A signature can only be checked if you have access to the public key of the signature. Those can be broken as well. So if you test a public key 20 years after it was made, you can say it is worthless. So the cryptographic method must not be based on electronic signatures. Crypt hashes like MD5 are better in this respect, as when you always restrict input to text files only, it is near to impossible to create a second valid input to the hash. And if you want to tamper, it even is less possible.
  • The crypto must be defined independently from the archive format.
All this is violated by Bazaar, perhaps it's even the case partly for GIT. But GIT is able to cryptographically secure a repository, while Bazaar is not.

Bazaar not better than CVS for a single individual

CVS is widely adopted and deployed. It has good support in everything I need. It's downsides are good understood and discussed. So there is no point in changing into another version control system than CVS.

There would be, perhaps, if it would have as good support as TortoiseSVN for Windows. But it hasn't. Under Windows I started to use SVN because I quickly needed something able to CI/CO and TortoiseSVN came handy. However I think I will downgrade to CVS again, as TortoiseCVS is available in the meanwhile.

Likewise GIT and SVN are not usable

SVN

I already lost data due to TortoiseSVN, because it deletes files if you mark them deleted before they are added to the repository. This renders SVN unusable. CVS does not have this bug, as CVS is a lot older - it never deletes itself, you have to delete before you can mark something deleted.

GIT

GIT is unuable as it lacks basic needs. One need is drop-in-replacement of CVS, that is, nearly the same commands. However GIT does not allow for a simple update approach. So GIT changes habbits. That's exactly why Windows is so widely deployed: People do not change habits, even not bad ones.

However

If Bazaar would add the cryptographic background of GIT (which means you do no more need GPG for signing) and allows for SHA or MD5 hashes to authenticate all, from the last patch to the first checkin, then Bazaar would be the SVN I can vote for.

Bazaar really has some major strength in distributed development. The only thing lacking is cryptographic support like in GIT. OTOH GIT is too limited for me to use.

Perhaps this can be handled by Plugins? Two things come into mind:

  • Pull plugin: This is a plugin, which creates TAGs like for CVS ($Version$).
  • Push plugin: This adds cryptographic tracking

How to add cryptographic tracking (just an idea)

  • On commit time start to write a trackfile.
  • First you give each trackfile a random seed string. This includes the current date. The seed is used to initialize the hashes taken from the files.
  • Add the hashes of all changed files to the trackfile.
  • Add the hashes of the trackfiles of all previous revisions (for merges this can be more than one) to the trackfile.
  • Add the hashes of all relevant files from the previous revisions to the trackfile.
  • Add an ASCII representation of the patches to the trackfile.
  • Add the revision which was committed to the trackfile.
  • Now you do not use the seed anymore, so you use the basic hash algorithm.
  • Hash the trackfile. Publish this hash along with the revision.
  • If you are able to bring back the trackfile later on (this is: in 1000 years from now), you can throw away the trackfile now. But you must still note the hash and the random seed in something like an "ewiges logfile".
  • If you are not able to bring it back (usual case) you add the trackfile to a special branch for the trackfiles.
Note that trackfiles can be committed to the track-branch in any order. Also note that the trackfiles are renamed to their hash.

In the worst case this doubles the size of the repository. However it has some benefits:

  • A single trackfile authenticates all files of a certain revision, as it includes the hashes of the files.
  • From the hash-code of the trackfile you can find the revisions again easily (looking into the branch; To get from revisions to hash you must have some lookup method like an "ewiges Logfile" or some comments on the revisions).
  • A trackfile also authenticates all previous trackfiles.
  • To make it more tamper resistant there is an inherent dependency (the hashes of the files of the previous release) which must match from old to new. This way it shall be hard to impossible to fake elder trackfiles.

Attacks

Imagine following scenario:

You got the hash from a friend who says: "This code is OK". Now you download and ask yourself: "Is everything really untampered?"

Certain attacks can be thought of:

The trackfile is unharmed but an elder file has been replaced such, that you cannot notice the change.

  • The patches done to the old file must all match the hashes of the other (unharmed) trackfiles. This is nearly impossible.
  • Therefor there is only one way to find a file which was not changed and change that. However hashes are randomly seeded. So this file must not only match the original hash, it must also match all other hashes.
  • On future commits it is very likely that some developer checks in an unharmed file. In this moment verification of the harmed file will fail.
Therefor it is very likely that a changed old trackfile will be detected even if the creator was impossibly powerful to create a file which matches several differently seeded hashes. If evil offenders would be able to anticipate the future they can beat lotteries as well, which is more lucrative, I think.

Changing old trackfiles is nearly impossible, and for files, which are part of many revisions, this is true as well.

An attack is to introduce something into the patch-tree unnoticed. This assumes, all you have is the hash of the most recent trackfile, but no code.

Along with an elder file an elder trackfile was changed

Trackfiles themself are not as well protected as old files. However the same as noted above is true, as changing the file means to need a match for all hashes of the file. This nearly is impossible. So you can change the trackfile, but not the files. This, however, can be detected, as the trackfile than no more matches to the files (and likewise is true for the patch included in the trackfile).

Because of this you do not need to store the trackfiles, in case you can re-generate them from scratch without change. This step will fail if the files which are needed to re-create the trackfile are harmed, so somewhere you will see a hash-mismatch.

The latest trackfile was altered

The only attack therefor can be to replace the latest trackfile with some other trackfile, which matches the hash, too.

To counter that you can use a timeout approach, that is, you always commit in certain time intervals. Intervals, which are very likely to be too short to create another file which is able to replace the old one and which describes reasonable data (this can be done by introducing some additional complexities like checksums in the trackfile).

Also you can protect the trackfile by a second method like an "ewiges Logfile", where you, on a timely manner, re-publish a randomly seeded hash of the recent trackfile. The trick is, that the original trackfile will not match the newly seeded hash.

So people can run "dark verifiers" which pull the trackfiles as soon as they are published, and then verify the hashes of them with the head of the "ewiges Logfile".

So even if you do not have access to the code repository itself, you can check the validity, simply by tracking the trackfile.

Blind trust

To create blind trust (as needed for closed source) you can free the trackfile from the patchset, in that you encrypt the all-zero using the patchset as a key.

A software company then publishes the binary, the hash and the trackfiles, and all people out there are able to do a "dark verify" if the codebase was altered. If it ever comes to a lawsuit like in SCO vs. IBM, you can tell verify openly, that everything you did was disclosed, as you can seamlessly track all code to it's origin without any doubt, starting with the last published binary distribution.

Final note

Electronic signature cannot be used as a replacement for a cryptographically secured repository, as all above attacks still apply to electronic signatures, however there is a further form of attack, by attacking the signature itself. So the security must not be based on such a 3rd party approach, as this must fail.

You need a cryptographic backbone built into the VCS itself. As nothing else can give you any trust in that what you see is what you get.

And for me it's the minimum requirement to think about a switch from CVS. Because everything else might be nice, but in the long term, it will not be better than CVS.

And as I am well with CVS (it does not hinder me, it works well, is quite fast and I am fully satisfied with the version control I have) there is no need to change.

Except for one thing: I want to share the repository.

This means, uploading the repository to a web server (or FTP server) with read only access shall be enough for others to do checkouts.

Perhaps Bazaar is able to support me in this case (that is: I have a repository, somebody else has, too, and we can merge just by adding files and not changing old ones).

However this feature is not so important for me today, that I switch to bazaar. But it makes me look into it.

VCSses which did not work for me today thus are:

  • SVN has a major bug: SVN kills files: Use TortoiseSVN, add a file to the repository, DO NO CHECKIN, instead take back the add by marking the file deleted. The file vanishes immediately, no traces remain.
  • GIT is plainly unusable: sorry, this is a basic design flaw. GIT is good for the Linux kernel. It has a cryptographic backbone. But I am not get accustomed to it.
  • HG is raising entropy: HG supports distributed development. But I tend to have one central local repository. Having zillions of them creates chaos.
I am still in the search of a good VCS. However what I read from Bazaar sounds good. The problem is, changing from CVS to Bazaar is too cumbersome for me - so I stay at CVS.

-Tino, 2008-01-25