Database v2.0.1 documentation

Manual¶

Warning

This package currently only supports PostgreSQL using psycopg2.

Warning

This documentation is not up-to-date or correct.

Chapter 1. Introducing the Database Package¶

The Database package is two things:

Database Abstraction Layer: The simplest possible wrapper over the DB-API to allow you to write code which is reasonably portable across SQLite, PostgreSQL and MySQL
Service: A service object for the Flows framework to provide a connection pool and per-request connections

Let’s look at each in turn.

The Database package has its roots in Lemon, and then the Python Web Modules and is designed for people who like writing SQL directly to work with databases rather than using object-relational mappers or other higher-level abstractions.

Database Abstraction Layer¶

How does the Database package improve on the DB-API?¶

The Python DB-API 2.0 allows you to write SQL statements and retrieve results but there are a couple of areas where it could be extended for real-world use:

the majority of tables commonly have autoincrementing integer primary keys so it would be nice to be able to deal with auto-incrementing fields in a consistent way across databases, namely have the database API return the id of a newly-inserted row
ability to change the response format to lists of dictionaries or lists of objects rather than just lists of lists (or rather tuples of tuples)
ability to have the DB-API automatically manage the param style so that carefully written SQL will work on all databases, regardless of whether they are expecting %s, ? or some other marker for substitutions.
Ability to embed a converter to correct with driver-specific issues such as the psycopg2 module’s use of Unicode

In addition, it would be nice to be able to:

have an API which made simple inserts, queries and updates against one record trivially simple and not require any SQL
ability to pool database connections to improve latency

The Database package provides all these features.

Why not use an even higher-level abstraction like SQLAlchemy?¶

Here are some reasons:

If you already know SQL and aren’t likely to need cross-database support then there is little point in investing the time to learn the higher-level abstraction’s APIs when they won’t give you the same control as direct SQL anyway
The Database package is only a few hundred lines of code so any problems will be much easier to track down

If these two reasons don’t apply you might be better off with the higher level abstraction.

The Helpers¶

The Database package’s functionality is implemented as a series of simple helpers:

database.helper.query(connection, sql, values=(), format='dict', fetch=True, substitute=True, param_style='format')¶

Execute sql using a cursor obtained from connection. Any %s characters in the SQL string are substituted with the values in values in a safe way helping you to avoid SQL injection attacks. If you need a literal %s in the SQL, you write it as %%s. The SQL is translated to use the correct param_style for the underlying database which you specify as the param_style argument. This allows you to use %s for all SQL statements. To avoid this substitution you can set substitute to False.

If fetch is True (the default), the results from executing the query are returned. This is usually what you want for SELECT statements but probably not what you want for INSERT, UPDATE or DELETE statements. By default the results are returned as a list of dictionaries where the keys in the dictionaries are the column names and the values are the values from that row. If you have the BareNecessities package installed the results are returned as a list of bn.AttributeDict objects which are like dictionaries but which also allow attribute access to any keys which are also valid Python names. You can specify format='list' to have the results returned in the format used by DB-API 2.0 cursor (usually a tuple of tuples).

Here’s a simple example:

>>> rows = query(
...     connection,
...     """
...     SELECT 
...         name
...       , age
...     FROM 
...         person
...     WHERE
...         name=%s     
...     """,
...     (u'James',)
... )
>>> print rows[0]['name']
u'James' 

Note

When specifying values they should always be specified as a tuple. When using a tuple with a single value in Python you must always have a trailing comma so that the brackets are treated as a tuple.

database.helper.update_record(connection, table_name, primary_key_column_name, primary_key_value, data_dict, substitute=True, param_style='format')¶

Update a single row in a table identified by a primary key. For more complex update operations use query() and write the required SQL by hand.

connection: The name of the connection to use to perform the update
table_name: The name of the table to update
primary_key_column_name: The name of the primary key column
primary_key_value: The ID or primary key value of the record to update
data_dict: A dictionary where the keys represents the columns to update and the values represent the new values to use
substitute: If True any %s characters will be substituted for the correct param_style characters in the format specified by param_style
param_style: The param_style format the underlying database driver expects. Can be 'format' for %s style or query for ? style

An example:

>>> update_record(
...     connection,
...     'person',
...     'person_id',
...     34,
...     {'name': u'James'}
... )

database.helper.insert_record(connection, table_name, data_dict, primary_key_column_name=None, plugin_name=None)¶

Insert a new record into a table and return its integer primary key

There is a different version of this function for every supported database and the correct format is specified by engine or inferred from the connection.

connection: The name of the connection to use to perform the update
table_name: The name of the table to update
data_dict: A dictionary where the keys represents the columns to update and the values represent the new values to use
primary_key_column_name: The name of the primary key column. If primary_key_column_name is None, it is assumed you are following a naming convention where the primary key is the table name followed by _id.
plugin_name: The type and structure of the underlying database. This affects how the new ID is generated and returned since different databases handle it differently. Only the value postgresql is currently allowed.

An example:

>>> print insert_record(
...     connection, 
...      'person',
...     {'name': u'James'}
...     'person_id',
...     engine=None,
... )
2

database.helper.page_data(connection, sql, values, page=None, number=20, format='list', substitute=True, param_style='format')¶: Obtain a specific portion of a result set.

Services¶

Although the helpers do their jobs perfectly well it can quickly become tedious to keep specifying the connection, paramstyle and database_type arguments when they are likely to be the same for a particular web request.

To solve this problem you can use a service to keep track of these values and apply them for you.

An Example¶

Here’s an example where we use the start_flow() function to create a DatabaseConnectionService to connect to an SQLite database storing its data in memory. The start_flow() function will call the run() function with a single argument which has a database attribute. This object has a number of helpers attached and will automatically connect to the database the first time one of its helpers is called. It will then re-use that same connection for all subsequent helper calls from that flow object. At the end of the run() function, the start_flow() function will close any open connections, committing any outstanding changes unless an error occurred, in which case it will rollback all the changes that were made with that flow object so that the database is not left in an inconsistent state.

>>> from flows.provider import ServiceProvider
>>> from flows.provider import start_flow
>>> from database.service.connection import DatabaseConnectionService
>>>
>>> def run(flow):
...     print flow.database.query('SELECT * FROM name')
...
>>> start_flow(
...     run,
...     provider = ServiceProvider(
...         existing_services=dict(config=config),
...         available_services=dict(database=DatabaseConnectionService),
...     ),
...     required_services=['database'],
... )
[]

Database Drivers¶

Every database engine (eg PostgreSQL, MySQL, MS SQL Server etc) is different and each different engine often has different drivers (eg for PostgreSQL you could use psycopg or psycopg2).

The Database package supports the concept of plugins so that different engines and drivers can be supported without changing the API. The plugins each have a name which usually reflects the underlying database driver (eg psycopg2 etc) but can actually be named whatever you like.

To use a database you must specify the plugin you wish to use. If you are using a connection service, the Database package will load the plugin to determine the engine and driver you are using.

Connection Pool¶

When running a web application it can sometimes be useful to keep a pool of database connections open and give them out as they are needed by different requests. This saves the web application having to connect and disconnect from the database on every request.

Once you are using the structure shown in the previous example, this is just a case of importing, configuring and using a DatabaseConnectionPoolService instead of a DatabaseConnectionService:

# Database Connection
database.database = XXX
database.user = XXX
database.password = XXX
database.host = localhost
database.port = 5432

database.mincached = 5

database.creator = psycopg2.connect
database.fetch_converter = database.driver.postgresql:psycopg2_utf8_fetch

Configuration stages:

Core DBUtils options first (eg pool, plugin, maxshared)
Plugin options handled next (eg creator, fetch_converter, execute_converter, database, port, username etc)

>>> from database.service.connection import DatabaseConnectionPoolService
>>>
>>> def run(flow):
...     print flow.database.query('SELECT * FROM name')
...
>>> start_flow(
...     run,
...     provider = ServiceProvider(
...         existing_services=dict(config=config),
...         available_services=dict(database=DatabaseConnectionPoolService),
...     ),
...     required_services=['database'],
... )
[]

Service Extensions¶

Converters
Extra helpers

Chapter 2. Integrating Dia¶

If you design your Schema using Dia and tedia2sql you can have Flows automatically use it.

http://tedia2sql.tigris.org/usingtedia2sql.html

First install the software:

sudo apt-get install tedia2sql dia

Now replace the framework/instance.py file with this:

from database.framework.instance import create_tables, drop_tables, \
   dia_to_schema

def create_instance(flow):
    create_tables(flow)

def destroy_instance(flow):
    drop_tables(flow)

def on_load_mode_handlers(flow):
    return {
        'create': create_instance,
        'destroy': destroy_instance,
        'schema': dia_to_schema,
    }

You’ll also need to install the Database, psycopg2 and DBUtils modules and add them to the dependencies in setup.py.

Config file settings look like this:

# The dia_file and sql_file below are relative to the app dir
schema.dia_file = dynamic/Schema.dia
schema.sql_file = dynamic/Schema.sql
schema.tedia = /home/james/Desktop/Sites/GRP/code/trunk/tools/tedia2sql/tedia2sql.working
schema.type = postgres

# Database options
database.creator = psycopg2.connect
database.mincached = 5
database.database = example-dev
database.user = example-dev
database.password = example-dev
database.host = localhost
database.port = 5432

Note

You may need a patched version of tedia2sql if you get errors like this even though you have set up the unique constraints:

In association authkit_userMember username doesn't refer to a primary key or unique index
In association authkit_userPerson username doesn't refer to a primary key or unique index

This is the one line patch you need on Ubuntu Intrepid for example:

--- /usr/bin/tedia2sql  2008-11-07 10:52:02.000000000 +0000
+++ tedia2sql.working   2009-10-25 17:07:54.000000000 +0000
@@ -923,6 +923,7 @@
                                ];
                        }

+ $paramString = join(',', @$paramDescs);
                        if ($verbose) { print " * Got operation: $operName / $operType / ($paramString) / ($operTemplate)\n"; }
                        push @$operationDescs, [ $operName, $operType,
                                                 $paramDescs, $operTemplate,

Install postgres:

sudo apt-get install postgresql
sudo -u postgres createdb example-dev
sudo -u postgres createuser -SRDP example-dev
env/bin/easy_install psycopg2
env/bin/easy_install Database
env/bin/easy_install DBUtils

You can now connect to your new database like this:

$ psql example-dev -U example-dev -h localhost
Password for user example-dev:

Finally you’ll need to create your schema in Dia. There is an example in the example directory of the source distribution. Save this as dynamic/Schema.dia in your project.

Caution

It is best not to give any other file in the dynamic directory a name containing the name you choose for the Schema as a number of temporary objects are created and deleted when using the schema Dia file and these could interfere with your own code.

The existing Schema contains example tables for AuthKit and Wizard as well as an example one to many mapping between people who have multiple addresses. You will want to customise this schema for your own purposes or remove all the existing tables completely.

Now set up the tables by creating the dynamic/Schema.sql file, overwriting any existing one or creating a new one if necessary:

env/bin/python -m examplesite.cmd.run instance schema app.conf

Now create the schema in the database:

env/bin/python -m examplesite.cmd.run instance create app.conf

If you ever want to remove the schema, destroying all data you can run this command:

env/bin/python -m examplesite.cmd.run instance destroy app.conf

You can destroy and then create the schema in one step like this:

env/bin/python -m examplesite.cmd.run instance recreate app.conf