An awful lot of files in SVN - Blog - Open Source - - php, photography and private stuff

An awful lot of files in SVN

Derick was always bitching at me for the huge amount of time needed to process the Webdav components sub directory when doing releases. We always supposed that the Subversion performance issue here resulted from the Webdav test suite, which contains an awful lot of small test files and quite some sub-directories. I finally found the time to refactor the tests and the performance improvement is astonishing.

Long story short: In the eZ Webdav component we use half-way automated acceptance tests beside the normal unit tests. These test cases ensure compatibility with different clients. Each of them holds captured request and response data, from a manual, successful test run with a certain client. Every test case consists of about 20-200 data sets, while each set - before yesterday - consisted of 5 files.

I now merged the data of each test run into a single file, so there is only 1 file per client acceptance test. Obviously, this reduced the number of files in the Webdav components sub-directory drastically:

$ find Webdav.old/ | grep -v svn | wc -l 22629 $ find | grep -v svn | wc -l 548

To give you an idea of how many files these were: A checkout of the PHP source tree contains about 51357 files.

The speed improvement for a recent checkout of the Webdav sub-directory, compared to a revision before this refactoring is amazing:

$ time svn co -r10366 Webdav.old ... real 8m32.405s user 1m19.961s sys 1m48.815s $ time svn co ... real 0m16.707s user 0m1.028s sys 0m0.640s

From more than 8 minutes to less than 17 seconds. Further more, the size of the checkouts differs drastically:

$ du -ch Webdav.old/ | tail -n1 261M total $ du -ch | tail -n1 44M total

From my observation, the size difference results mostly from the reduction of meta information, Subversion needs to store:

$ find Webdav.old/ -name '.svn' | xargs du -ch | tail -n1 175M total $ find -name '.svn' | xargs du -ch | tail -n1 24M total

Additionally the average file size in the Webdav directory was raised quite a bit:

$ find Webdav.old/ -type f | grep -v svn | xargs du -s | \ awk 'BEGIN { sum = 0 } { sum += $1 } END { print sum " / " NR " = " sum/NR }' 86464 / 22562 = 3.83228 $ find -type f | grep -v svn | xargs du -s | \ awk 'BEGIN { sum = 0 } { sum += $1 } END { print sum " / " NR " = " sum/NR }' 20324 / 497 = 40.8934

So, what is the lesson learned here: Do not store too many small files in an SVN repository, if you don't have to.

Disclaimer: No, this does not mean, that you should store all your source code in a single file, from now on! This is not the sense of version control. ;)

A little side note as the closing word: If you store a lot of such information, like generated test data, in SVN, be sure to mark these files as binary. This saves you a lot of junk on your commit mailing list. You can do this using:

$ svn propset 'svn:mime-type' 'application/octet-stream' the/binary/file

If you liked this blog post or learned something, please consider using flattr to contribute back: .



Add new comment

Fields with bold names are mandatory.