Using yield Statements in WSGI Middleware can be Very Harmful ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ :Posted: 2009-04-05 21:04 :Tags: Pylons, Python WSGI middleware should **never** be implemented as a Python generator. That is middleware should never contain ``yield`` statements unless those statements are contained within another function which the middleware returns as it is called. Although you may find WSGI middleware you have implemented with ``yield`` statements appears will work perfectly well on its own, it will break other middleware components which wrap it. To see why let's look at some examples. Consider the following WSGI middleware, it uses a ``try...`` ``finally`` block around the WSGI application so we can add ``print`` statements before and after the WSGI application is returned. :: >>> class Middleware1(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... try: ... print 1 ... return self.app(environ, start_response) ... finally: ... print 2 ... >>> Here's a WSGI application which we can test with a dummy environment and dummy ``start_response()`` function: :: >>> def returning_app(environ, start_response): ... start_response('200 OK', []) ... return ['3', '4', '5'] ... >>> def dummy_start_response(status, headers, exc_info=None): ... pass ... >>> dummy_environ = {} Let's see what happens when we test this middleware: :: >>> for data in Middleware1(returning_app)(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 As you can see the numbers are all printed in order. First ``1`` is printed then the application is returned. Next the ``finally`` block is executed and ``2`` is printed. Finally the result of the application is iterated over printing the numbers ``3``, ``4`` and ``5``. This is exactly what you'd expect. You may have seen middleware which looks more like this though: :: >>> class Middleware2(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... try: ... print 1 ... app_iter = self.app(environ, start_response) ... for data in app_iter: ... yield data ... if hasattr(app_iter, 'close'): ... app_iter.close() ... finally: ... print 2 ... >>> You might be tempted to take this approach if you want to alter the response in some way, perhaps making all the letters uppercase for example. The code which calls ``close()`` is required by the PEP 333 spec in case an iterator is returned. If you run ``Middleware2`` in the same way, this is what you get: :: >>> for data in Middleware2(returning_app)(dummy_environ, dummy_start_response): ... print data ... 1 3 4 5 2 This certainly isn't what I expected the first time I ran it. The values from the application are printed before the code in the ``finally`` block is executed and as a result the numbers are out of order. Let's work through what is going on to cause this unexpected result. To understand what's going on you need to understand what the ``yield`` statement does. It passes the data it is yielding to the function or method which called it *before* it returns. That means that in the case of ``Middleware2``, the first number yielded from the application gets yielded on and printed before the second number is yielded, and long before the application returns. The line that prints the number 2 doesn't get called until the application has returned, by which time the numbers ``3``, ``4`` and ``5`` have been printed. If you were to use this middleware in a middleware stack the various parts of other middleware components might be executed out of place as I'll show you later. Fixing the Problem ------------------ To still be able to iterate over results without needing to buffer output but avoid the execution order problems created by using a generator directly you can put the code containing the ``yield`` statement inside another function and return the result of calling that function as shown below: :: >>> class Middleware3(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... try: ... print 1 ... def output(): ... for data in self.app(environ, start_response): ... yield data ... return output() ... finally: ... print 2 ... >>> for data in Middleware3(returning_app)(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 As you can see this new code results in execution happening the correct order. Using Generators in WSGI Applications ------------------------------------- Although using ``yield`` in WSGI *middleware* can be dangerous if not done properly, using it in WSGI applications is a perfectly good idea and can be used to stream data back to the browser. Here's another application which yields a new number every second. :: >>> import time >>> >>> def yielding_app(environ, start_response): ... start_response('200 OK', []) ... yield 3 ... time.sleep(1) ... yield 4 ... time.sleep(1) ... yield 5 ... If you run the applicaiton you'll see there is always a pause between the numbers ``3`` and ``4`` being printed and the numbers ``4`` and ``5`` being printed but that the order the different parts of the middleware are executed in remains the same as it did when using the ``returning_app``. ``Middleware2`` still produces an unexpected order as it did before and ``Middleware1`` and ``Middleware2`` still work correctly. :: >>> for data in Middleware1(yielding_app)(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 >>> for data in Middleware2(yielding_app)(dummy_environ, dummy_start_response): ... print data ... 1 3 4 5 2 >>> for data in Middleware3(yielding_app)(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 >>> More Complex Middleware Chains ------------------------------ If you have written middleware similar to ``Middleware2`` before and not noticed a problem it may be because the other middleware which wraps it did not have any ``try...`` ``finally`` blocks or that it behaved in such a way that the order of execution didn't cause a problem. Let's define another middleware classcalled ``OuterMiddleware`` to demonstrate some of the side-effects of middleware using ``yield`` statements. :: >>> class OuterMiddleware(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... try: ... print 1 ... return self.app(environ, start_response) ... finally: ... print 3 ... >>> Let's also redefine the existing middleware without ``try...`` ``finally`` blocks so that you can see it is the fact that they are generators that is causing a problem, not the fact that they include a ``try...`` ``finally`` block. Obviously without a ``try...`` ``finally`` block there is no way to print a value after the middleware returns so these examples will only print 5 numbers. I've also updated the other middleware so that the correctly written ones still output consequitive numbers: :: >>> def yielding_app(environ, start_response): ... start_response('200 OK', []) ... time.sleep(1) ... yield 4 ... time.sleep(1) ... yield 5 ... time.sleep(1) ... >>> class Middleware1(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... print 2 ... return self.app(environ, start_response) ... >>> class Middleware2(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... print 2 ... app_iter = self.app(environ, start_response) ... for data in app_iter: ... yield data ... if hasattr(app_iter, 'close'): ... app_iter.close() ... >>> class Middleware3(object): ... def __init__(self, app): ... self.app = app ... def __call__(self, environ, start_response): ... print 2 ... def output(): ... for data in self.app(environ, start_response): ... yield data ... return output() ... >>> Now let's see how the new middleware chains behave: :: >>> for data in OuterMiddleware(Middleware1(yielding_app))(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 >>> for data in OuterMiddleware(Middleware2(yielding_app))(dummy_environ, dummy_start_response): ... print data ... 1 3 2 4 5 >>> for data in OuterMiddleware(Middleware3(yielding_app))(dummy_environ, dummy_start_response): ... print data ... 1 2 3 4 5 >>> As you can see ``Middleware1`` and ``Middleware3`` still behave correctly but look what is happening in the example containing ``Middleware2``. The ``finally`` block of ``OuterMiddleware`` is executed before either the ``try`` block of ``Middleware2`` or the application itself. I hope this post has demonstrated that you have to be extremly careful when using ``yield`` statements in WSGI middleware, even when the middleware you produce appears to function correctly under certain circumstances. If you have weird execution order problems in your WSGI stack it is possible that one of the middleware components is using ``yield`` somewhere it shouldn't. If you are in any doubt it is best to avoid ``yield`` statements altogether.