Python Setuptools Egg Plugins

Posted:	2010-01-15 13:21
Tags:	Python, Pylons
Headline:	Learn how to use the egg plugin feature of setuptools.

One very under-used feature of setuptools is its ability to allow you to build eggs which serve as plugins for other host eggs.

The way plugins work is actually very simple. Imagine we are building a database abstraction layer where the host package is called Database and provides the high level interface, and the plugins are Psycopg2Database and Pysqlite2Database which provide the lower level implementation so that by installing new plugins you can add support for new databases to the Database package.

To make this more concrete let's think about an insert_record() function which allows users to insert a record and which returns the ID of the newly-created row. It turns out that different SQL engines expose slightly different ways of obtaining the ID of the last inserted row so the implementation in for the different databases will happen in the plugins.

For this to work the Database package needs to be able to find all the plugins, know which database they are for, and load their implementation of the insert_record() function.

With setuptools, each Python object in a plugin is acessed via an entry point. An entry point is just a name that "points" to an object. We have two types of object, a string representing the engine name, and the function for inserting a record. Let's choose the entry point names engine_name and insert_record respectively for these objects.

Entry points have to exist in entry point groups so you'll need to create one of those too. You can have multiple entry points in a single entry point group and you can have multiple entry point groups in a single egg. Let's call our entry point group database.engine.

The host egg doesn't need any changes to support entry points but the plugins need the entry points specified.

Let's imagine the Psycopg2Database has a psycopg2database.helper.insert_record() function and a string at psycopg2database.engine_name and that the Pysqlite2Database has a pysqlite2database.helper.insert_record() function and pysqlite2databse.engine_name string for the engine name.

Tip

Confusingly, pysqlite2 is the name of the Python module for accessing SQLite3 databases and psycopg2 is the name of the Pylons module for accessing PostgreSQL 8 databases, hence the naming above.

Here are the implementations of the objects the entry points point to:

psycopg2database.helper.insert_record()

def insert_record(
    connection,
    table_name,
    data_dict,
    primary_key_column_name=None,
    engine=None,
):
    print 'psycopg2 plugin not implemented yet'

psycopg2database.engine_name

engine_name = "psycopg2"

pysqlite2database.helper.insert_record()

def insert_record(
    connection,
    table_name,
    data_dict,
    primary_key_column_name=None,
    engine=None,
):
    print 'pysqlite2 plugin not implemented yet'

pysqlite2databse.engine_name

engine_name = "pysqlite2"

To make setuptools aware that the entry points point to these objects change the Psycopg2Database setup.py to add the entry_points argument to setup() like this:

setup(
    ...
    entry_points="""
        [database.engine]
        insert_record=psycopg2database.helper:insert_record
        engine_name=psycopg2database:engine_name
    """,
    ...
)

and change the Pysqlite2Database setup.py to add the entry_points argument to setup() like this:

setup(
    ...
    entry_points="""
        [database.engine]
        insert_record=pysqlite2database.helper:insert_record
        engine_name=pysqlite2:engine_name
    """,
    ...
)

In each case the entry point must be under the entry point group name ([database.engine]) and it must start with the entry point name followed by an = sign. The part after the = can be is the module path followed by a : followed by the name of the Python object being pointed to.

At this point I usually re-install the plugin eggs to ensure setuptools finds the updated entry points.

python setup.py develop

Now we need to be able to use the plugins. The code snippit below shows you how to get all the engine names and insert_record() functions from every insstalled plugin. Notice that once you've iterated over each entry point you need to load them with .load() to get the actual Python object the entry point points to:

from pkg_resources import iter_entry_points

dist_plugins = {}
for ep in iter_entry_points(
    group='database.engine',
    # Use None to get all entry point names
    name=None,
):
    if not dist_plugins.has_key(ep.dist):
        dist_plugins[ep.dist] = {}
    dist_plugins[ep.dist][ep.name] = ep.load()

print dist_plugins

If you run it you'll get this output:

{Psycopg2Database 0.1.0 (/home/james/Desktop/Cur/Psycopg2Database/trunk): {'insert_record': <function insert_record at 0x7fff7d3269b0>, 'engine_name': 'psycopg2'}, Pysqlite2Database 0.1.0 (/home/james/Desktop/Cur/Pysqlite2Database/trunk): {'insert_record': <function insert_record at 0x7fff7d30fc80>, 'engine_name': 'pysqlite2'}}

It would be useful to present this as a single dictionary with the engine name as the key and the function as its value. This code does this:

plugins = {}
for k, v in dist_plugins.items():
    plugins[v['engine_name']] = v['insert_record']

Now let's turn this into useful functionality. The insert_record() in the Database package looks like this:

insert_record(connection, table_name, data_dict, primary_key_column_name=None, engine=None)

Let's update it so that it loads the correct plugin based on the name:

from pkg_resources import iter_entry_points

plugins_loaded = False
plugins = {}

def load_plugins():
    dist_plugins = {}
    for ep in iter_entry_points(
        group='database.engine',
        # Use None to get all entry point names
        name=None,
    ):
        if not dist_plugins.has_key(ep.dist):
            dist_plugins[ep.dist] = {}
        dist_plugins[ep.dist][ep.name] = ep.load()
    for k, v in dist_plugins.items():
        plugins[v['engine_name']] = v['insert_record']
    plugins_loaded = True

def insert_record(
    connection,
    table_name,
    data_dict,
    primary_key_column_name=None,
    engine=None,
):
    if not plugins_loaded:
        load_plugins()
    if not plugins.has_key(engine):
        raise Exception('No driver for the %r engine'%engine)
    # Use the plugin's insert method
    return plugins[engine](
         connection,
         table_name,
         data_dict,
         primary_key_column_name,
         engine,
    )

If you saved the above as database_helper.py you could test it as follows:

>>> from database_helper import insert_record
>>> insert_record(1,2,3,4, 'mysqldb')
Exception: No driver for the 'mysqldb' engine
>>> insert_record(1,2,3,4, 'psycopg2')
psycopg2 plugin not implemented yet
>>>

As you can see, an exception is raised when mysqldb is specified because the plugin doesn't exist but when psycopg2 is specified, the correct function in the plugin gets called.

This code can still be improved though. Its reasonable to assume that the helper module for each plugin would need to import the Python database module for the database it is abstracting. Since the current code loads every entry point whether it is needed or not, all the Python database modules would need to be present for every plugin that existed. This isn't a huge problem because presumably you wouldn't install plugins for databases where you hadn't also installed the underlying driver, but we can still do better.

Let's create a new dictionary called loaded and update the code so that only the engine_name entry point is loaded:

from pkg_resources import iter_entry_points

plugins_loaded = False
plugins = {}
loaded = {}

def load_plugins():
    dist_plugins = {}
    for ep in iter_entry_points(
        group='database.engine',
        # Use None to get all entry point names
        name=None,
    ):
        if not dist_plugins.has_key(ep.dist):
            dist_plugins[ep.dist] = {}
        dist_plugins[ep.dist][ep.name] = ep
    for k, v in dist_plugins.items():
        plugins[v['engine_name'].load()] = v['insert_record']
    plugins_loaded = True

Now in the insert_record() function we can load the actual entry point:

def insert_record(
    connection,
    table_name,
    data_dict,
    primary_key_column_name=None,
    engine=None,
):
    if not plugins_loaded:
        load_plugins()
    if not plugins.has_key(engine):
        raise Exception('No driver for the %r engine'%engine)
    if not loaded.has_key(engine):
        loaded[engine] = plugin[engine].load()
    # Use the loaded plugin's insert method
    return loaded[engine](
         connection,
         table_name,
         data_dict,
         primary_key_column_name,
         engine,
    )

That's all there is to it. You should now be able to go away and write your own plugins. For some information about how entry points are used in Pylons, read the Pylons Book chapter 17.

By the way, if you hadn't noticed yet, the Database, Psycopg2Database and Psysqlite2Database packages are real and use roughly the mechanism described here. If you fancy writing a plugin for your favorite database and releasing it as an egg, feel free. The beauty of egg plugins is that I don't even need to be involved because the Database module will respond to your plugin automatically.

Good luck!

(view source)