See also
permalink.de/tino/vcs
Opinion
VCS
Design of a post modern VCS.
Rationale
I still use the RCS based CVS as my VCS. Why? Because it fulfills all my needs. But this does not explain it fully. The truth is, that I don't use something else, because this no more fulfills my needs.
About CVS
CVS is really old:
- It lacks meta data.
- It lacks properly handling of binary files.
- It lacks atomic commits.
- It lacks a cryptographic backbone such that a revision signs all the files in it.
However it tracks the history of files very accurately and is easy to use.
The major point in using CVS is, that it is so old, that it still allows me to work with it in case I do not have CVS installed. And this lets it win in a shootout over any other VCS out there.
So here they are, the demands of me for a post modern VCS:
Requirements for a post modern VCS
Here are the mandatory requirements:
- Shell and HTTP level: The VCS must be consequently designed to work from shell and http level. This means: If you do not have any VCS installed, you must be able to checkout the latest revision of the files with ordinary shell, possibly with the help of some internet tools (like grep or curl) only (note: The need for switches like --mirror for wget is a no-go!). No other tools or complex options must be needed to do so. It would be ideal if this is feasible without keeping a "last checkout repository" on the VCS level, such that the VCS files are self contained. It is feasible to have some "renamer" task which at the end renames the files to the proper names, like "some/path/dirname--filename.c-revision-VCSid" into "dirname/filename.c" as this is easily done with a Shell one-liner. It is feasible to have some "list files" which list all URLs to retrieve a complete revision. However this file must be staticly generated at ci time!
- No VCS server for basic tasks needed: A file repository (aka fileserver like a webserver) must be enough to serve the basic VCS needs. So for single developers no server installation shall be needed in a heterogeneous environment ever. Only for team servers or distribution sites like SF there shall be possibly a need for a server variant.
- No mandatory Database: If a database is needed, it shall be not mandatory, that is, the database shall only speed up non-basic things like integrity checks, searching, n-phase-merges or complex revision actions (like deleting contents etc.). Everything else shall work without the database, and it must be possible to re-create the database any time.
- Repository merges on file level: It must be possible to merge two repositories with different states just using the file level. So "cp -r /src/repos1/ merge/; cp -r /src/repos2/ merge/; vcs rebuild merge/" or similar must be enough to do this merge. This means, if some files are present at both repositories, they must contain the same contents.
- Revision numbers in Source: I need automatic revision tags like in CVS. However there must be an improvement in that files are always checked in binary, that is, it checks out the revision, but on CI it replaces it back. So if you ever happen to forget to CO binary (without tag translation) you can still CI and then CO without tag replacement, to gain the old contents.
- Easy checkout/checkin/ignore/revisions: Files are always "checked out" if not otherwise instructed to keep them readonly (via MetaData or Commandline). Also "checkin" diffs or "add" files must not be needed to keep them in the respository. Both shall be a completely implicite operation, such that you can recover losses even if you forgot to checkin. Ignore/Add/Remove/Checkin therefor only make changes official by updating the metadata or killing such intermediate (not yet official) diffs.
And there are the typical "modern" requirements, too:
- Atomic commits: Either a commit is completely done, or it is rejected. This does not necessarily mean that the VCS is cleaned up afterwards (for example on a power loss while committing). Cleanups are complex tasks and may involve the server component.
- Cryptographic Backbone: A revision number must "sign" the complete trunk/branch/revision like in GIT. This is mandatory, as else repository merges would not work as described above. The VCS must make sure that files cannot change after they have been added to the repository without this being detected easily. And these files need unique names based only on their content, such that merges at filesystem level will not overwrite any files.
- Binary commits: It shall not handle binary files differently than text files. It's easy to fix the line ending at the client machine side, so there is no need to think about this at the VCS side. Text-flags and line endings are typical meta data.
- MetaData: In the basic design it shall keep metadata for files at ci level. However there is no need to support this if you co without going through the VCS. But the ability to keepmeta data is absolutely necessary.
Nice to haves
There are some "nice to haves", too. This are:
- Single file repository: All revisions of a file should be kept in a single file. This looks like it contradicts the Cryptographic Backbone, but this claim is false. It only means, that the file with all revisions must not change the data it already contains when new data is added to it. If the new revision is appended to the end of the file, the old data is not touched nor harmed. Note that this makes the file similar to an "Ewiges Logfile" like proposed by Lutz Donnerhacke.
- Multiple file repository: One single repository shall be able to keep several different files altogether. This is a consequence from the previous entry if you allow file renames and file links. So you can start with a single file, and "spawn" all other files from it, easily. Suddenly you have a multi file repository. Properly done there is no need to contradict to the HTTP and shell level. In contrast this makes it more easy for HTTP access, as you only need to support one single ever growing URL as a repository. Note that this is a "nice to have", as it shall not be mandatory to keep only one single file. Doing so would render the VCS difficult to use.
- Distributed file store: Allow the VCS to utilize DHTs and files stored elsewhere. This way new patches are loaded from other repository-parts automagically and this loads then at turn distribute the repository again. How this sync is done is nothing which has to be done on the VCS level.
Not interested
- Integration not needed: I am not interested in features like in Bazaar that you can use the VCS as a storage backbone of your programs. This is, for me it's enough to have an interfacing to the VCS through clever designed shell level.
- Plugins not needed: I do not need plugins. Even shell scripts which trigger on certain actions are only "nice to haves", as you can wrap this in the calling script (the SVN shall not need a server component at all). I even dislike to have shell scripts which are automatically triggered, as this could create unwanted sideeffects.
- Standalone server: I would like to see an SVN which can use any file store as a repository. Where "file store" can be FTP, HTTP DAV, whatever. The file store shall be the generic part of it, everything else shall be straight forward. It must be offline usable, online only is for data sharing. The latter then might be done via eMail. So perhaps it is single file based (like CVS) but uses a ZIP archive (or others) for collaboration (where the archives are like meta-patchsets uploaded to a Server).
Example Ideas
This only is an idea how to do it. For example following could checkout a revision:
cat VCS-file | grep "^A" | sort | cut -d' ' -f2- > extract.awk
awk -vREV="revision" -f extract.awk VCS-file
Another idea to checkout the last revision of everything in a file might be:
grep -v "^#" VCS-file | patch -p0
sed s/^ren/mv/ \#\#\#VCS\#\#\#rename-*.bat | sh -c
Thanks to first lines like
ren() { mv "$@"; }
copy() { cp "$@"; }
you would even be able to run the batch from unix shells unmodified as long as no paths are involved. I like it this spooky ;)
-Tino 2007-01-30 (this text is not ready yet), little extended 2008-01-25