10 Apr, 2009
Contents
Today I thought I began reconsidering the benefits of a static approach for content-driven websites and the scalability implications of a file-based approach. As part of my thinking I wondered how practical it currently is to statically produce different page fragments but have them dynamically assembled and served. My use case for the purposes of the excersise was a blog application.
I began looking around for a caching appliction proxy which could help with this and quickly came across Varnish. Version 2 has basic support for Edge Side Includes (or ESI) which allow page fragments to be dynamically generated, cached by the server and dynamically assembled as pages are requested.
This technique could easily work for the blog application but there is one seemingly trival problem: if a user is signed in to the blog application we would need the sign in link to change to a link to the dashboard (probably with the user's name) and a sign out link. If the pages are being statically generated there is no way to do this with Varnish. The ESI page on the wiki dismisses it like this: <em>Content substitution is mostly a cosmetic feature ... For now we have deemed this feature uninteresting, but adding it is just a matter of programming.</em>. Isn't everything.</p>.
The only solution is to have that part of the page always dynamically generated even though most of the time it will be a static sign in link. One solution would be to have it generated by the same tools that generate the cached content but the speed having to handle a complete request to the back end server on every page would not be significantly slower than dynamically generating the whole page and since for most requests the content will just be a static sign in link this isn't a particularly appealing solution. Another approach would be to use Nginx in front of Varnish and use the SSI+Memcached to handle the semi-static parts of the page in addition to using Varnish and ESI for the fairly static parts. Let's look at how to set up the SSI, Memcached and Nginx part.
Note
I also looked at using ncache but couldn't get it to compile without Warnings and since there is so little documentation in English I didn't proceed Any further.
Nginx recieves a request which it passes to Varnish. If Varnish can't serve the request from the cache it goes to the Python server which dynamically generatest he page which Varnish caches and returns. The page contains a Server Side Include to be processed by Nginx. The request is passed back to Nginx and the Server Side Include is discovered. Nginx sends a request to Memcached for the data but it isn't there yet so Memcahced makes the request to the backend Python server instead which returns the data Nginx needs but before it does, it sets it in memcached. Nginx then assembles and returns the response.
On the next request the page is requested from varnish and served from the cache. Nginx parses the response and once again finds the Server side include so requests the data from memcached. This time the data is present and is served directly from memcached. Nginx returns the response without any use of the back end Python server at all.
This diagram sets out what happens on the first request:
request +-------+ +----------------+ Varnish requests data +--------+ --------> | | request proxied to Varnish | | from Python | | | | -----------------------------> | -----------> | ---------------------> | | | | | Varnish + | | | | response returned to Nginx | | Python generates page | | | | <----------------------------- | <----------- | <---------------------- | | | | +----------------+ which Varnish cahces | | | Nginx | | Python | | | Nginx requests SSI include | | | | direct from memcached +---------------+ Python caches data | | | | -----------------------------> | Memcached | <--------------------+ | | | | +---------------+ | | | | | | | | response | | Memcache doesn't have data so Nginx requests it from Python | | | <-------- | | ---------------------------------------------------------------------------->| | | | | | | | | Python returns the data to Nginx and sets it in Memcached for later | | | | | <-----------------------------------------------------------------------+----| | +-------+ +--------+
This diagram sets out what happens on subsequent requests:
request +-------+ +----------------+ +--------+ --------> | | request proxied to Varnish | | | | | | -----------------------------> | | | | | | | Varnish | | | | | response returned to Nginx | | | | | | <----------------------------- | | | | | | +----------------+ | | | Nginx | | Python | | | Nginx requests SSI include | | | | direct from memcached +----------------+ | | | | -----------------------------> | | | | | | | | | | | | | Memcached | | | response | | Memcached returns data | | | | <-------- | | ------------------------------ | | | | | | +----------------+ | | +-------+ +--------+
This time no request are made to the backend server and as a result the request is served much faster.
Note
I'm not actually going to test Varnish in this post, just see how well Nginx setup the Nginx, SSI and Memcached part.
Memcached is now established software so it will be in your distribution's repositories:
$ sudo apt-get install memcached
The documentation is at http://www.danga.com/memcached/ but for this test you can use this command:
$ ./memcached -d -m 512 -l 127.0.0.1 -p 11211
This sets memcached running as a daemon on the local machine on port 11211 (the default) and with 512Mb RAM. You can use less RAM for the test if you like.
You can see it running with:
$ ps aux | grep memcached
I've covered Nginx installation before so here is the summary. The use of the --prefix option installs Nginx to a folder on my desktop for testing.
wget http://sysoev.ru/nginx/nginx-0.6.36.tar.gz tar zxfv nginx-0.6.36.tar.gz cd nginx-0.6.36 ./configure --prefix=/home/james/Desktop/nginx make make install
You can change the configuration in /home/james/Desktop/nginx/conf/nginx.conf. You can start the server by executing /home/james/Desktop/nginx/sbin/nginx. If you want to run it on any of the standard ports (which is the case with the default setup) you'll need to use sudo:
$ sudo /home/james/Desktop/nginx/sbin/nginx
Once again the server runs as a daemon in the background. If you want to gracefully restart it after chaning config options you can do so like this:
$ sudo kill -HUP `cat /home/james/Desktop/nginx/logs/nginx.pid`
For the test I need the Python memcache bindings. As always I set things up in a virtual Python environment:
$ wget http://pylonsbook.com/virtualenv.py $ python virtualenv.py env New python executable in env/bin/python Installing setuptools...........................done. $ env/bin/easy_install python-memcached
The python-memcached module is a pure Python implementation. There is a C based one too which is faster but less well known and less well supported. I couldn't find any good python-memcached documentation either but the MySQL website has some:
http://dev.mysql.com/doc/refman/5.1/en/ha-memcached-interfaces-python.html
The backend server without any caching looks like this.
from wsgiref.util import setup_testing_defaults from wsgiref.simple_server import make_server # A relatively simple WSGI application. It's going to print out the # environment dictionary after being updated by setup_testing_defaults def simple_app(environ, start_response): status = '200 OK' headers = [('Content-type', 'text/plain')] if environ['PATH_INFO'] == '/hello': start_response(status, headers) return ["Hello World! 1."] elif environ['PATH_INFO'] == '/one': start_response(status, headers) return ["1"] else: start_response('404 Not Found', headers) return ["Not found"] httpd = make_server('', 8000, simple_app) print "Serving on port 8000..." httpd.serve_forever()
Save this as test1.py so it forms a basis for the comparison.
The version with caching looks like this:
from wsgiref.util import setup_testing_defaults from wsgiref.simple_server import make_server import memcache memc = memcache.Client(['127.0.0.1:11211']) # A relatively simple WSGI application. It's going to print out the # environment dictionary after being updated by setup_testing_defaults def simple_app(environ, start_response): status = '200 OK' headers = [('Content-type', 'text/plain')] if environ['PATH_INFO'] == '/hello': start_response(status, headers) return ['Hello World! <!--# include virtual="/one" -->'] elif environ['PATH_INFO'] == '/one': memc.set('/one', '1', 10) start_response(status, headers) return ["one"] else: start_response('404 Not Found', headers) return ["Not found"] httpd = make_server('', 8000, simple_app) print "Serving on port 8000..." httpd.serve_forever()
Notice that when we request /hello an SSI include is returned which triggers Nginx to request /one. Also notice that the line memc.set('/one', '1', 10) sets a value in memcached for the URL /one. This is the value Nginx will lookup. Save this as test2.py.
Now let's configure Nginx.
Add the following two lines to the server configuration:
location /one { set $memcached_key $uri; memcached_pass 127.0.0.1:11211; default_type text/html; error_page 404 = @fallback; } location @fallback { proxy_pass http://127.0.0.1:8000; } location /hello { ssi on; proxy_pass http://127.0.0.1:8000; }
Notice the line set $memcached_key $uri;. This tells Nginx to use the request URL as the key to lookup values in memcached. When we request /one and it is present in memcached, Nginx will therefore return whatever value is in there. If the key doesn't exist it will use @fallback and visit the server which will set the key so that it is there on future requests.
You'll need to restart Nginx as described earlier.
Start the test2.py server as a basis:
$ env/bin/python test2.py Serving on port 8000...
Visiting http://127.0.0.1:8000/hello displays Hello World! 1. as expected, and visiting http://127.0.0.1:8000/one displays 1 as expected. Wait 10 seconds for the memcache cache to clear (that was the 10 in the call to memc.set() function) then try the same from Nginx.
The default Nginx config file sets Nginx up to serve from port 85. Visiting http://127.0.0.1:85/one displays 1 as expected and visiting http://127.0.0.1:85/hello displays Hello World! 1. so Nginx is correctly processing the server side include.
You can see that the 1 is served from Python because there is only one line of output in the server output per request, not two. We can confirm this by running a benchmark.
Install Apache Bench for the benchmarks:
$ sudo apt-get install apache2-utils
Stop the test2.py server and try test1.py:
$ env/bin/python test1.py Serving on port 8000...
Now benchmark it. You'll need to wait about 10 seconds for the memcached cache to expire. Run it a few times to get an idea:
$ ab -n 1000 -c 10 http://localhost:85/one
On my machine the average is about 5ms/request.
Now stop the test1.py server and try the memcached-enhanced version 2.
$ env/bin/python test2.py Serving on port 8000...
Again run it a few times to get an idea:
$ ab -n 1000 -c 10 http://localhost:85/one
On my machine the average is about 0.6ms/request and only one request is made to the backend server:
localhost - - [17/Apr/2009 21:49:40] "GET /one HTTP/1.0" 200 3
The differences are less dramatic if you run the benchmarks against the /hello path becuase both configurations involve at least one trip to the backend Python server.
As a comparison, I've created a file called 1.html with the contents 1. Getting Nginx to serve that as a static file results in an average of 0.4ms/request, even quicker than memcahced.
The reason I didn't include Varnish in the end was because I realised that if your backend server had to worry about caching and purging files it really might as well just write and maintain static files anyway and have Nginx serve the static files directly. There is no real need to have Varnish getting in the way unless you have to have dynamic page generation, and in that case why are you caching pages anyway?
I hope that was a useful post anyway. Here are some related links:
Copyright James Gardner 1996-2020 All Rights Reserved. Admin.