18 Aug, 2007
If you haven't come across DOIs before they are simply unique identifiers in the form 10.XXXX/some_label where XXXX is a four digit code assigned to your organisation and some_label is any label you want to create a DOI for, usually the unique ID of some object you want to reference. You can then associate a URL with the DOI so that it can be resolved to a real object on the web.
The other day I was trying to create some code so that I could programatically discover the URL a particular DOI resolved to. What I wanted to do was use urllib2 to post my DOI to the same URL the DOI resolver form at the bottom of http://doi.org posts to and then retrieve the HTTP response to find out where the DOI redirects to.
Here was my first attempt:
import urllib2 import urllib org_id = '10.3333' label = 'test' data = {'hdl':org_id+'/'+label, 'x':'13', 'y':'8'} fp = urllib2.urlopen('http://dx.doi.org', urllib.urlencode(data)) print fp.headers fp.close()
Unfortunately this doesn't work because the default behaviour is for urllib2 to follow the HTTP redirect to the redirect page so the headers are for the page that is redirected to, not the headers from the original response which issued the HTTP redirect which was what we wanted.
Date: Sat, 18 Aug 2007 13:04:45 GMT Server: Apache/2.0.55 (Ubuntu) DAV/2 SVN/1.3.1 mod_python/3.1.4 \ Python/2.4.3 PHP/5.1.2 proxy_html/2.4 mod_ssl/2.0.55 OpenSSL/0.9.8a X-Powered-By: PHP/5.1.2 X-Pingback: http://jimmyg.org/xmlrpc.php Connection: close Transfer-Encoding: chunked Content-Type: text/html; charset=UTF-8
To fix this you need to create your own handler:
import urllib2 import urllib class CustomRedirectHandler(urllib2.HTTPRedirectHandler): def http_error_301(self, req, fp, code, msg, headers): result = urllib2.HTTPRedirectHandler.http_error_301( self, req, fp, code, msg, headers) result.status = code return result org_id = '10.3333' label = 'test' data = {'hdl':org_id+'/'+label, 'x':'13', 'y':'8'} opener = urllib2.build_opener(CustomRedirectHandler()) req = urllib2.Request('http://dx.doi.org', urllib.urlencode(data)) fp = opener.open(req) print fp.url fp.close()
Now everything works as expected and the URL is printed.
There is some more information about urllib2 and redirects at Dive Into Python. Learn more about DOIs.
Copyright James Gardner 1996-2020 All Rights Reserved. Admin.