Python Setuptools Egg Plugins
+++++++++++++++++++++++++++++

:Posted: 2010-01-15 13:21
:Tags: Python, Pylons
:Headline: Learn how to use the egg plugin feature of setuptools.

One very under-used feature of ``setuptools`` is its ability to allow you to
build eggs which serve as *plugins* for other *host* eggs.

The way plugins work is actually very simple. Imagine we are building a
database abstraction layer where the host package is called Database and
provides the high level interface, and the plugins are Psycopg2Database and
Pysqlite2Database which provide the lower level implementation so that by
installing new plugins you can add support for new databases to the Database
package.

To make this more concrete let's think about an ``insert_record()`` function
which allows users to insert a record and which returns
the ID of the newly-created row. It turns out that different SQL engines
expose slightly different ways of obtaining the ID of the last inserted row so
the implementation in for the different databases will happen in the plugins.

For this to work the Database package needs to be able to find all the
plugins, know which database they are for, and load their implementation of
the ``insert_record()`` function.

With ``setuptools``, each Python object in a plugin is acessed via an *entry
point*. An entry point is just a name that "points" to an object. We have two
types of object, a string representing the engine name, and the function for
inserting a record. Let's choose the entry point names ``engine_name`` and
``insert_record`` respectively for these objects.

Entry points have to exist in *entry point groups* so you'll need to create one of
those too. You can have multiple entry points in a single entry point group
and you can have multiple entry point groups in a single egg. Let's call our
entry point group ``database.engine``.

The host egg doesn't need any changes to support entry points but the plugins
need the entry points specified. 

Let's imagine the Psycopg2Database has a
``psycopg2database.helper.insert_record()``  function and a string at
``psycopg2database.engine_name`` and that the Pysqlite2Database
has a ``pysqlite2database.helper.insert_record()``  function and
``pysqlite2databse.engine_name`` string for the engine name. 

.. tip ::

    Confusingly, ``pysqlite2`` is the name of the Python module for accessing SQLite3
    databases and ``psycopg2`` is the name of the Pylons module for accessing
    PostgreSQL 8 databases, hence the naming above.

Here are the implementations of the objects the entry points point to:

``psycopg2database.helper.insert_record()``

    ::

        def insert_record(
            connection, 
            table_name, 
            data_dict,
            primary_key_column_name=None, 
            engine=None,
        ):
            print 'psycopg2 plugin not implemented yet'

``psycopg2database.engine_name``

    ::

        engine_name = "psycopg2"

``pysqlite2database.helper.insert_record()``

    ::

        def insert_record(
            connection, 
            table_name, 
            data_dict,
            primary_key_column_name=None, 
            engine=None,
        ):
            print 'pysqlite2 plugin not implemented yet'

``pysqlite2databse.engine_name``

    ::

        engine_name = "pysqlite2"

To make ``setuptools`` aware that the entry points point to these objects
change the Psycopg2Database ``setup.py`` to add the ``entry_points`` argument
to ``setup()`` like this:

::

    setup(
        ...
        entry_points="""
            [database.engine]
            insert_record=psycopg2database.helper:insert_record
            engine_name=psycopg2database:engine_name
        """,
        ...
    )

and change the Pysqlite2Database ``setup.py`` to add the ``entry_points`` argument
to ``setup()`` like this:

::

    setup(
        ...
        entry_points="""
            [database.engine]
            insert_record=pysqlite2database.helper:insert_record
            engine_name=pysqlite2:engine_name
        """,
        ...
    )

In each case the entry point must be under the entry point group name
(``[database.engine]``) and it must start with the entry point name
followed by an ``=`` sign. The part after the ``=`` can be
is the module path followed by a ``:`` followed by the name of the Python
object being pointed to.

At this point I usually re-install the plugin eggs to ensure ``setuptools``
finds the updated entry points.

::

    python setup.py develop

Now we need to be able to use the plugins. The code snippit below shows you
how to get all the engine names and ``insert_record()`` functions from every
insstalled plugin. Notice that once you've iterated over each entry point you
need to load them with ``.load()`` to get the actual Python object the entry
point points to:

::

    from pkg_resources import iter_entry_points
    
    dist_plugins = {}
    for ep in iter_entry_points(
        group='database.engine',
        # Use None to get all entry point names
        name=None,
    ):
        if not dist_plugins.has_key(ep.dist):
            dist_plugins[ep.dist] = {}
        dist_plugins[ep.dist][ep.name] = ep.load()
    
    print dist_plugins

If you run it you'll get this output:

::

    {Psycopg2Database 0.1.0 (/home/james/Desktop/Cur/Psycopg2Database/trunk): {'insert_record': <function insert_record at 0x7fff7d3269b0>, 'engine_name': 'psycopg2'}, Pysqlite2Database 0.1.0 (/home/james/Desktop/Cur/Pysqlite2Database/trunk): {'insert_record': <function insert_record at 0x7fff7d30fc80>, 'engine_name': 'pysqlite2'}}

It would be useful to present this as a single dictionary with the engine name
as the key and the function as its value. This code does this:

::

    plugins = {}
    for k, v in dist_plugins.items():
        plugins[v['engine_name']] = v['insert_record']

Now let's turn this into useful functionality. The ``insert_record()`` in the
Database package looks like this:

::


    insert_record(connection, table_name, data_dict, primary_key_column_name=None, engine=None)

Let's update it so that it loads the correct plugin based on the name:

::

    from pkg_resources import iter_entry_points
    
    plugins_loaded = False
    plugins = {}

    def load_plugins():
        dist_plugins = {}
        for ep in iter_entry_points(
            group='database.engine',
            # Use None to get all entry point names
            name=None,
        ):
            if not dist_plugins.has_key(ep.dist):
                dist_plugins[ep.dist] = {}
            dist_plugins[ep.dist][ep.name] = ep.load()
        for k, v in dist_plugins.items():
            plugins[v['engine_name']] = v['insert_record']
        plugins_loaded = True
    
    def insert_record(
        connection, 
        table_name, 
        data_dict,
        primary_key_column_name=None, 
        engine=None,
    ):
        if not plugins_loaded:
            load_plugins()
        if not plugins.has_key(engine):
            raise Exception('No driver for the %r engine'%engine)
        # Use the plugin's insert method
        return plugins[engine](
             connection, 
             table_name, 
             data_dict,
             primary_key_column_name,
             engine,
        )

If you saved the above as ``database_helper.py`` you could test it as follows:

::

    >>> from database_helper import insert_record
    >>> insert_record(1,2,3,4, 'mysqldb')
    Exception: No driver for the 'mysqldb' engine
    >>> insert_record(1,2,3,4, 'psycopg2')
    psycopg2 plugin not implemented yet
    >>> 

As you can see, an exception is raised when ``mysqldb`` is specified because
the plugin doesn't exist but when ``psycopg2`` is specified, the correct
function in the plugin gets called.

This code can still be improved though. Its reasonable to assume that the
``helper`` module for each plugin would need to import the Python database
module for the database it is abstracting. Since the current code loads every
entry point whether it is needed or not, all the Python database modules would
need to be present for every plugin that existed. This isn't a huge problem
because presumably you wouldn't install plugins for databases where you hadn't
also installed the underlying driver, but we can still do better.

Let's create a new dictionary called ``loaded`` and update the code so that
only the ``engine_name`` entry point is loaded:

::

    from pkg_resources import iter_entry_points
    
    plugins_loaded = False
    plugins = {}
    loaded = {}

    def load_plugins():
        dist_plugins = {}
        for ep in iter_entry_points(
            group='database.engine',
            # Use None to get all entry point names
            name=None,
        ):
            if not dist_plugins.has_key(ep.dist):
                dist_plugins[ep.dist] = {}
            dist_plugins[ep.dist][ep.name] = ep
        for k, v in dist_plugins.items():
            plugins[v['engine_name'].load()] = v['insert_record']
        plugins_loaded = True

Now in the ``insert_record()`` function we can load the actual entry point:

::
    
    def insert_record(
        connection, 
        table_name, 
        data_dict,
        primary_key_column_name=None, 
        engine=None,
    ):
        if not plugins_loaded:
            load_plugins()
        if not plugins.has_key(engine):
            raise Exception('No driver for the %r engine'%engine)
        if not loaded.has_key(engine):
            loaded[engine] = plugin[engine].load()
        # Use the loaded plugin's insert method
        return loaded[engine](
             connection, 
             table_name, 
             data_dict,
             primary_key_column_name,
             engine,
        )

That's all there is to it. You should now be able to go away and write your
own plugins. For some information about how entry points are used in Pylons,
read the Pylons Book chapter 17.

By the way, if you hadn't noticed yet, the Database, Psycopg2Database and
Psysqlite2Database packages are real and use roughly the mechanism described
here. If you fancy writing a plugin for your favorite database and releasing
it as an egg, feel free. The beauty of egg plugins is that I don't even need
to be involved because the Database module will respond to your plugin
automatically.

Good luck!