[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [E-devel] cvs, servers and stuff.

On Fri, 18 Aug 2006 06:04:37 +0300 Eugen Minciu <minciue@gmail.com> babbled:

> Hi everyone.
> I've been doing some thinking today. And I've been doing some testing as
> well. And there are a few things I realized.
> The first thing I realized is a reason why the pseudo-benchmark I created was
> giving out evil data. In git's case this is because git does a lot of extra
> operations on the client's disk (unpacking and such) which take up a lot of
> time. During that time the server wouldn't be under any load, which shows why
> the load on the server wasn't anywhere near constant.
> And then I realized why git wasn't really doing so well. And at the same time
> this is the reason why cvs isn't doing so well and, frankly no scm possibly
> could.
> The problem is that you have a truckload of binary data in the repository.
> There are many reasons why this shouldn't be so.
> 1) Binary data is way better off distributed in the form of archives, that
> can be mirrored by anyone (I'm thinking at least SF). That way people can get
> that data a lot faster and your server is happy too.
> 2) You don't change the binary data that much. And even when you do so, you
> could pacakge your data into archives like imlib2-data.tar.bz2 so that you
> repackage less.
> 3) Changes in binary data don't generally affect dependencies. They're not
> like API changes or whatever. Most of the time people will just need to grab
> one updated archive and that's it.
> 4) You could then use pkg-config to ensure the right version of the data is
> actually installed from your configure scripts.
> 5) Let's do some simple math. 
> You have 100MB worth of files. These account to 60MB binary and 40MB text.
> When you try to compress this, as git does, you get around 50MB binary and
> 8MB test. So that accounts for almost 60MB. 
> That means for every 60 people that would simultaneously download through CVS
> you can have 100 download through git (let's just ignore the other factors
> and focus on bandwith a little).
> Now suppose you have 40MB of text. With git you can then down to about 20% of
> the original size (maybe less, who knows). That means you could (in theory)
> actually have 5 times more downloads with git then with CVS.
> Now I'm not saying to not keep that data in a repo. You obviously have to.
> I'm just saying there's no need for people to have anonymous access to that
> repo, it could be for developers only.
> So, my suggestions are:
> 1) Move the data into its own repository

not going to happen. the data is an internal part of the projects - it gets
modifed 8new icons, images etc.) and is part of the build process. so not going
to happen. the code is useless without the data - there is no point splitting
it and doing so is a tonne of work that makes building more painful for
developers and users.

> 2) Convert the two repositories to git
> 3) Make that data repository devel-only.
> 4) Split the data into small packages (one for each data/ dir in the tree, I
> guess)
> 5) Make the source require the data through pkg-config
> 6) Have the data released as tarballs once it's changed (you can have that
> happen automatically with git, I'm assuming you can with the others as well)

at this point - why bother with git at all. just ake tarball snaps. much less

> And that's it. But for all this babbling, is this really worth it? 
> Like I said, I found client-side disk I/O to make the benchmarks mostly
> useless. But they still provide me with a good overview on server-side CPU &
> Memory usage
> So I opted for a new approach. I would have two terminals on my client. In
> one I'd do something like 'sleep 5 ; svn checkout ...'. In the second I'd do
> 'time read'. I would press enter once when network traffic actually began and
> once again when it stopped and that showed me how much everything took.
> So here's the timings. The repos have no history attatched.
> Repo with data:
> CVS: 			0:46
> SVN(svnserve): 		1:16
> SVN(HTTP): 		1:58
> GIT(git): 		1:23
> GIT(HTTP): 		1:53
> Same repo without data:
> CVS: 			0:12
> SVN(svnserve):		0:28
> SVN(http):		0:37
> GIT(http):		0:13
> And what about Git with its built in protocol? Just six seconds. How's that
> for taking some load off :) Of course you have to add/substract 1s for my
> timings on the keyboard but you get the overall idea.
> This is a very complicated way of doing things. But data should probably be
> separated from code. And it should probably be distributed in small archives.
> And people shouldn't have to use an SCM to get it.
> So ... Wadda ya say. Is this too complicated/ not worth it / stupid /
> braindamaged / interesting ?
> My brain farts more things like that on a regular basis. If the above makes
> sense, let me know and I'll give you a couple of other ideas as well :d
> Eugen.
> P.S: I knew Linus wouldn't lie ;)

though git seems nice - i am beginning to think its not going to solve a lot. we
need to really just provide alternate mechanisms to get the code and moe
anoncvs mirros i think.

> -------------------------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, security?
> Get stuff done quickly with pre-integrated technology to make your job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
> _______________________________________________
> enlightenment-devel mailing list
> enlightenment-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/enlightenment-devel

------------- Codito, ergo sum - "I code, therefore I am" --------------
The Rasterman (Carsten Haitzler)    raster@rasterman.com
Tokyo, Japan (東京 日本)