SSI, Memcached and Nginx (plus Varnish, ESI and static generation)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
:Posted: 2009-04-10 22:14
:Tags: Hosting, Web
.. contents ::
Today I thought I began reconsidering the benefits of a static approach for
content-driven websites and the scalability implications of a file-based
approach. As part of my thinking I wondered how practical it currently is to
statically produce different page fragments but have them dynamically assembled
and served. My use case for the purposes of the excersise was a blog
application.
I began looking around for a caching appliction proxy which could help with
this and quickly came across `Varnish `_.
Version 2 has basic support for *Edge Side Includes* (or ESI) which allow page
fragments to be dynamically generated, cached by the server and dynamically
assembled as pages are requested.
This technique could easily work for the blog application but there is one
seemingly trival problem: if a user is signed in to the blog application we
would need the sign in link to change to a link to the dashboard (probably with
the user's name) and a sign out link. If the pages are being statically
generated there is no way to do this with Varnish. The `ESI page on the wiki
`_ dismisses it like this:
Content substitution is mostly a cosmetic feature ... For now we have
deemed this feature uninteresting, but adding it is just a matter of
programming.. Isn't everything.
.
The only solution is to have that part of the page *always* dynamically
generated even though most of the time it will be a static sign in link. One
solution would be to have it generated by the same tools that generate the
cached content but the speed having to handle a complete request to the back
end server on every page would not be significantly slower than dynamically
generating the whole page and since for most requests the content will just be
a static sign in link this isn't a particularly appealing solution. Another
approach would be to use Nginx in front of Varnish and use the SSI+Memcached to
handle the semi-static parts of the page in addition to using Varnish and ESI
for the fairly static parts. Let's look at how to set up the SSI, Memcached and
Nginx part.
.. note ::
I also looked at using `ncache `_ but couldn't get it
to compile without Warnings and since there is so little documentation in
English I didn't proceed Any further.
The Setup
=========
Nginx recieves a request which it passes to Varnish. If Varnish can't serve the
request from the cache it goes to the Python server which dynamically
generatest he page which Varnish caches and returns. The page contains a Server
Side Include to be processed by Nginx. The request is passed back to Nginx and
the Server Side Include is discovered. Nginx sends a request to Memcached for
the data but it isn't there yet so Memcahced makes the request to the backend
Python server instead which returns the data Nginx needs but before it does, it
sets it in memcached. Nginx then assembles and returns the response.
On the next request the page is requested from varnish and served from the
cache. Nginx parses the response and once again finds the Server side include
so requests the data from memcached. This time the data is present and is
served directly from memcached. Nginx returns the response without any use of
the back end Python server at all.
This diagram sets out what happens on the first request:
::
request +-------+ +----------------+ Varnish requests data +--------+
--------> | | request proxied to Varnish | | from Python | |
| | -----------------------------> | -----------> | ---------------------> | |
| | | Varnish + | |
| | response returned to Nginx | | Python generates page | |
| | <----------------------------- | <----------- | <---------------------- | |
| | +----------------+ which Varnish cahces | |
| Nginx | | Python |
| | Nginx requests SSI include | |
| | direct from memcached +---------------+ Python caches data | |
| | -----------------------------> | Memcached | <--------------------+ | |
| | +---------------+ | | |
| | | | |
response | | Memcache doesn't have data so Nginx requests it from Python | | |
<-------- | | ---------------------------------------------------------------------------->| |
| | | | |
| | Python returns the data to Nginx and sets it in Memcached for later | | |
| | <-----------------------------------------------------------------------+----| |
+-------+ +--------+
This diagram sets out what happens on subsequent requests:
::
request +-------+ +----------------+ +--------+
--------> | | request proxied to Varnish | | | |
| | -----------------------------> | | | |
| | | Varnish | | |
| | response returned to Nginx | | | |
| | <----------------------------- | | | |
| | +----------------+ | |
| Nginx | | Python |
| | Nginx requests SSI include | |
| | direct from memcached +----------------+ | |
| | -----------------------------> | | | |
| | | | | |
| | | Memcached | | |
response | | Memcached returns data | | | |
<-------- | | ------------------------------ | | | |
| | +----------------+ | |
+-------+ +--------+
This time no request are made to the backend server and as a result the request
is served much faster.
.. note ::
I'm not actually going to test Varnish in this post, just see how well Nginx setup
the Nginx, SSI and Memcached part.
Install Memcahced
=================
Memcached is now established software so it will be in your distribution's repositories:
::
$ sudo apt-get install memcached
The documentation is at http://www.danga.com/memcached/ but for this test you
can use this command:
::
$ ./memcached -d -m 512 -l 127.0.0.1 -p 11211
This sets memcached running as a daemon on the local machine on port 11211 (the
default) and with 512Mb RAM. You can use less RAM for the test if you like.
You can see it running with:
::
$ ps aux | grep memcached
Nginx
=====
I've covered Nginx installation before so here is the summary. The use of the
``--prefix`` option installs Nginx to a folder on my desktop for testing.
::
wget http://sysoev.ru/nginx/nginx-0.6.36.tar.gz
tar zxfv nginx-0.6.36.tar.gz
cd nginx-0.6.36
./configure --prefix=/home/james/Desktop/nginx
make
make install
You can change the configuration in
``/home/james/Desktop/nginx/conf/nginx.conf``. You can start the server by
executing ``/home/james/Desktop/nginx/sbin/nginx``. If you want to run it on
any of the standard ports (which is the case with the default setup) you'll
need to use ``sudo``:
::
$ sudo /home/james/Desktop/nginx/sbin/nginx
Once again the server runs as a daemon in the background. If you want to
gracefully restart it after chaning config options you can do so like this:
::
$ sudo kill -HUP `cat /home/james/Desktop/nginx/logs/nginx.pid`
Python
======
For the test I need the Python memcache bindings. As always I set things up in
a virtual Python environment:
::
$ wget http://pylonsbook.com/virtualenv.py
$ python virtualenv.py env
New python executable in env/bin/python
Installing setuptools...........................done.
$ env/bin/easy_install python-memcached
The ``python-memcached`` module is a pure Python implementation. There is a C
based one too which is faster but less well known and less well supported. I
couldn't find any good ``python-memcached`` documentation either but the MySQL
website has some:
http://dev.mysql.com/doc/refman/5.1/en/ha-memcached-interfaces-python.html
The Experiment
==============
The backend server without any caching looks like this.
::
from wsgiref.util import setup_testing_defaults
from wsgiref.simple_server import make_server
# A relatively simple WSGI application. It's going to print out the
# environment dictionary after being updated by setup_testing_defaults
def simple_app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/plain')]
if environ['PATH_INFO'] == '/hello':
start_response(status, headers)
return ["Hello World! 1."]
elif environ['PATH_INFO'] == '/one':
start_response(status, headers)
return ["1"]
else:
start_response('404 Not Found', headers)
return ["Not found"]
httpd = make_server('', 8000, simple_app)
print "Serving on port 8000..."
httpd.serve_forever()
Save this as ``test1.py`` so it forms a basis for the comparison.
The version with caching looks like this:
::
from wsgiref.util import setup_testing_defaults
from wsgiref.simple_server import make_server
import memcache
memc = memcache.Client(['127.0.0.1:11211'])
# A relatively simple WSGI application. It's going to print out the
# environment dictionary after being updated by setup_testing_defaults
def simple_app(environ, start_response):
status = '200 OK'
headers = [('Content-type', 'text/plain')]
if environ['PATH_INFO'] == '/hello':
start_response(status, headers)
return ['Hello World! ']
elif environ['PATH_INFO'] == '/one':
memc.set('/one', '1', 10)
start_response(status, headers)
return ["one"]
else:
start_response('404 Not Found', headers)
return ["Not found"]
httpd = make_server('', 8000, simple_app)
print "Serving on port 8000..."
httpd.serve_forever()
Notice that when we request ``/hello`` an SSI include is returned which
triggers Nginx to request ``/one``. Also notice that the line
``memc.set('/one', '1', 10)`` sets a value in memcached for the URL ``/one``.
This is the value Nginx will lookup. Save this as ``test2.py``.
Now let's configure Nginx.
Add the following two lines to the server configuration:
::
location /one {
set $memcached_key $uri;
memcached_pass 127.0.0.1:11211;
default_type text/html;
error_page 404 = @fallback;
}
location @fallback {
proxy_pass http://127.0.0.1:8000;
}
location /hello {
ssi on;
proxy_pass http://127.0.0.1:8000;
}
Notice the line ``set $memcached_key $uri;``. This tells Nginx to use the
request URL as the key to lookup values in memcached. When we request ``/one``
and it is present in memcached, Nginx will therefore return whatever value is
in there. If the key doesn't exist it will use ``@fallback`` and visit the
server which will set the key so that it is there on future requests.
You'll need to restart Nginx as described earlier.
Testing the Setup
=================
Start the ``test2.py`` server as a basis:
::
$ env/bin/python test2.py
Serving on port 8000...
Visiting http://127.0.0.1:8000/hello displays ``Hello World! 1.`` as expected,
and visiting http://127.0.0.1:8000/one displays ``1`` as expected. Wait 10
seconds for the memcache cache to clear (that was the ``10`` in the call to
``memc.set()`` function) then try the same from Nginx.
The default Nginx config file sets Nginx up to serve from port 85. Visiting
http://127.0.0.1:85/one displays ``1`` as expected and visiting
http://127.0.0.1:85/hello displays ``Hello World! 1.`` so Nginx is correctly
processing the server side include.
You can see that the ``1`` is served from Python because there is only one line
of output in the server output per request, not two. We can confirm this by
running a benchmark.
Running a Benchmark
===================
Install Apache Bench for the benchmarks:
::
$ sudo apt-get install apache2-utils
Stop the ``test2.py`` server and try ``test1.py``:
::
$ env/bin/python test1.py
Serving on port 8000...
Now benchmark it. You'll need to wait about 10 seconds for the memcached cache
to expire. Run it a few times to get an idea:
::
$ ab -n 1000 -c 10 http://localhost:85/one
On my machine the average is about 5ms/request.
Now stop the ``test1.py`` server and try the memcached-enhanced version 2.
::
$ env/bin/python test2.py
Serving on port 8000...
Again run it a few times to get an idea:
::
$ ab -n 1000 -c 10 http://localhost:85/one
On my machine the average is about 0.6ms/request and only one request is made
to the backend server:
::
localhost - - [17/Apr/2009 21:49:40] "GET /one HTTP/1.0" 200 3
The differences are less dramatic if you run the benchmarks against the
``/hello`` path becuase both configurations involve at least one trip to the
backend Python server.
Serving Static Files
====================
As a comparison, I've created a file called ``1.html`` with the contents ``1``.
Getting Nginx to serve that as a static file results in an average of
0.4ms/request, even quicker than memcahced.
The Future
===========
The reason I didn't include Varnish in the end was because I realised that if
your backend server had to worry about caching and purging files it really
might as well just write and maintain static files anyway and have Nginx serve
the static files directly. There is no real need to have Varnish getting in the
way unless you have to have dynamic page generation, and in that case why are
you caching pages anyway?
Summary
=======
I hope that was a useful post anyway. Here are some related links:
* http://blog.kovyrin.net/2007/08/05/using-nginx-ssi-and-memcache-to-make-your-web-applications-faster/
* http://www.reshetseret.com/app/blog/?p=3
* http://www.igvita.com/2008/02/11/nginx-and-memcached-a-400-boost/