Varnish – Squid's heir apparent?
We’re in the process of rebuilding our server infrastructure. We’re shifting from debian sarge with postgres 7.4 to Ubuntu 7 and postgres 8. We currently investigate the stability and scalability of various rails stacks. Also we’re looking at different solutions of handling high volume static content delivery ie. delivery of our map tiles. (This could apply to any static content like thumbnails) During our research into reverse proxy alternatives, Paul Gold put me onto varnish.
Varnish is written from the ground up to be a high performance caching reverse proxy. The author built it due to his frustration at squid and he provides a detailed analysis of why squid sucks. In his own words…
Varnish is written from the ground up to be a high performance caching reverse proxy. Squid is a forward proxy that can be configured as a reverse proxy. Besides – Squid is rather old and designed like computer programs where supposed to be designed in 1980.
– Poul-Henning Kamp, Varnish architect and coder.
I’ve done a little bit of testing against lighttpd 1.4, apache 2.2, vs varnish with some surprising results.
My test involved using apache bench (ab) in a brute force test of fetching a 1k, 5k, 10k and 20k image file. I tested against 50,100, 200 concurrent users. (eg. ab -n 20000 -c 100 http://test/5kimage.jpg ) I tested against a default installation of apache 2.2 and lighttpd 1.4.12.
Here are the results:
File Size | Concurrent Users | Apache 2.2(reqs/ sec) | Lighttpd 1.4.12(reqs/ sec) | Varnish 1.1.1(reqs/ sec) |
---|---|---|---|---|
1k |
50 |
3792 |
2050 |
5386 |
1k |
100 |
3949 |
2135 |
5471 |
1k |
200 |
3973 |
1946 |
5228 |
5k |
50 |
2087* |
1655 |
2075* |
5k |
100 |
2051* |
1764 |
2076* |
5k |
200 |
2006* |
1764 |
2062 |
10k |
50 |
1063* |
1065* |
1065* |
10k |
100 |
1059* |
1064* |
1060* |
10k |
200 |
1056* |
1056* |
1055* |
20k |
50 |
571* |
560* |
570* |
20k |
100 |
569* |
560* |
564* |
20k |
200 |
566* |
562* |
566* |
* = Denotes network throughput was approaching 10.93Mb / sec . The size of the network connection was effectively putting a cap on the throughput.
A couple things to note from the testing. First, Apache forks a lot of processes, while lighttpd and varnish fork threads.Also, the CPU seemed to be under less load using lighttpd and varnish compared to apache.
From the results, Varnish excels at caching small files, and as fast as the other servers at higher file sizes. I want to do a bit more testing against varnish. For my next set of testing, I’ll test the webservers against 100 random images and post the results.
November 7th, 2007 at 8:49 pm
In other news, “even” apache has no problem saturating a 100Mbit network connection. Unless I’m sorely mistaken the webserver part of the stack is now done. Load balancing, on the other hand, appears to still be a hot topic…
November 8th, 2007 at 6:48 am
Hmmm, I’m not so sure. Our requirements for fast webserver to serve map tiles / thumbnails shows that one webserver is much better than others. The youtube guys advanced lighttpd a lot to make it run faster.
I agree in terms of loadbalancers, I’ve been lucky enough to have used a foundry ServerIron’s in the past. (Trademe use them too) They kick some serious ass!
November 10th, 2007 at 2:00 pm
I’m not convinced that simply using ab is a very accurate way of doing it. For example, using the -c option for concurrent isn’t really doing it because ab is a single threaded app. Also what client-server setup did you use? I’m not suggesting the trend of what you have is incorrect, just that ab isn’t much especially without knowing how you got those numbers.
November 10th, 2007 at 4:21 pm
AB is one one part of our testing. ab is only really a good measure for a theoretical maximum. A number of elements of the stack will cache up against an ab test. The next test we’ll be using WAST to test random thumbnail accesses over 1000 images. It that way we’ll be testing the equivalent of real world usage across the different web / caching system.
November 19th, 2007 at 6:32 pm
For testing the theoretical maximum you may want to give httperf a try instead of ab.