11.9. Putting it all together

You've seen all the pieces for building an intelligent HTTP web services client. Now let's see how they all fit together.

Example 11.17. The openanything function

This function is defined in openanything.py.


def openAnything(source, etag=None, lastmodified=None, agent=USER_AGENT):
    # non-HTTP code omitted for brevity
    if urlparse.urlparse(source)[0] == 'http':                                       1
        # open URL with urllib2                                                     
        request = urllib2.Request(source)                                           
        request.add_header('User-Agent', agent)                                      2
        if etag:                                                                    
            request.add_header('If-None-Match', etag)                                3
        if lastmodified:                                                            
            request.add_header('If-Modified-Since', lastmodified)                    4
        request.add_header('Accept-encoding', 'gzip')                                5
        opener = urllib2.build_opener(SmartRedirectHandler(), DefaultErrorHandler()) 6
        return opener.open(request)                                                  7
1 urlparse is a handy utility module for, you guessed it, parsing URLs. It's primary function, also called urlparse, takes a URL and splits it into a tuple of (scheme, domain, path, params, query string parameters, and fragment identifier). Of these, the only thing you care about is the scheme, to make sure that you're dealing with an HTTP URL (which urllib2 can handle).
2 You identify yourself to the HTTP server with the User-Agent passed in by the calling function. If no User-Agent was specified, you use a default one defined earlier in the openanything.py module. You never use the default one defined by urllib2.
3 If an ETag hash was given, send it in the If-None-Match header.
4 If a last-modified date was given, send it in the If-Modified-Since header.
5 Tell the server you would like compressed data if possible.
6 Build a URL opener that uses both of the custom URL handlers: SmartRedirectHandler for handling 301 and 302 redirects, and DefaultErrorHandler for handling 304, 404, and other error conditions gracefully.
7 That's it! Open the URL and return a file-like object to the caller.

Example 11.18. The fetch function

This function is defined in openanything.py.


def fetch(source, etag=None, last_modified=None, agent=USER_AGENT):  
    '''Fetch data and metadata from a URL, file, stream, or string'''
    result = {}                                                      
    f = openAnything(source, etag, last_modified, agent)              1
    result['data'] = f.read()                                         2
    if hasattr(f, 'headers'):                                        
        # save ETag, if the server sent one                          
        result['etag'] = f.headers.get('ETag')                        3
        # save Last-Modified header, if the server sent one          
        result['lastmodified'] = f.headers.get('Last-Modified')       4
        if f.headers.get('content-encoding', '') == 'gzip':           5
            # data came back gzip-compressed, decompress it          
            result['data'] = gzip.GzipFile(fileobj=StringIO(result['data']])).read()
    if hasattr(f, 'url'):                                             6
        result['url'] = f.url                                        
        result['status'] = 200                                       
    if hasattr(f, 'status'):                                          7
        result['status'] = f.status                                  
    f.close()                                                        
    return result                                                    
1 First, you call the openAnything function with a URL, ETag hash, Last-Modified date, and User-Agent.
2 Read the actual data returned from the server. This may be compressed; if so, you'll decompress it later.
3 Save the ETag hash returned from the server, so the calling application can pass it back to you next time, and you can pass it on to openAnything, which can stick it in the If-None-Match header and send it to the remote server.
4 Save the Last-Modified date too.
5 If the server says that it sent compressed data, decompress it.
6 If you got a URL back from the server, save it, and assume that the status code is 200 until you find out otherwise.
7 If one of the custom URL handlers captured a status code, then save that too.

Example 11.19. Using openanything.py

>>> import openanything
>>> useragent = 'MyHTTPWebServicesApp/1.0'
>>> url = 'https://book.diveintopython.org/redir/example301.xml'
>>> params = openanything.fetch(url, agent=useragent)              1
>>> params                                                         2
{'url': 'http://diveintomark.org/xml/atom.xml', 
'lastmodified': 'Thu, 15 Apr 2004 19:45:21 GMT', 
'etag': '"e842a-3e53-55d97640"', 
'status': 301,
'data': '<?xml version="1.0" encoding="iso-8859-1"?>
<feed version="0.3"
<-- rest of data omitted for brevity -->'}
>>> if params['status'] == 301:                                    3
...     url = params['url']
>>> newparams = openanything.fetch(
...     url, params['etag'], params['lastmodified'], useragent)    4
>>> newparams
{'url': 'http://diveintomark.org/xml/atom.xml', 
'lastmodified': None, 
'etag': '"e842a-3e53-55d97640"', 
'status': 304,
'data': ''}                                                        5
1 The very first time you fetch a resource, you don't have an ETag hash or Last-Modified date, so you'll leave those out. (They're optional parameters.)
2 What you get back is a dictionary of several useful headers, the HTTP status code, and the actual data returned from the server. openanything handles the gzip compression internally; you don't care about that at this level.
3 If you ever get a 301 status code, that's a permanent redirect, and you need to update your URL to the new address.
4 The second time you fetch the same resource, you have all sorts of information to pass back: a (possibly updated) URL, the ETag from the last time, the Last-Modified date from the last time, and of course your User-Agent.
5 What you get back is again a dictionary, but the data hasn't changed, so all you got was a 304 status code and no data.