paste.httpheaders – Manipulate HTTP Headers

HTTP Message Header Fields (see RFC 4229)

This contains general support for HTTP/1.1 message headers 1 in a manner that supports WSGI environ 2 and response_headers 3. Specifically, this module defines a HTTPHeader class whose instances correspond to field-name items. The actual field-content for the message-header is stored in the appropriate WSGI collection (either the environ for requests, or response_headers for responses).

Each HTTPHeader instance is a callable (defining __call__) that takes one of the following:

  • an environ dictionary, returning the corresponding header value by according to the WSGI’s HTTP_ prefix mechanism, e.g., USER_AGENT(environ) returns environ.get('HTTP_USER_AGENT')

  • a response_headers list, giving a comma-delimited string for each corresponding header_value tuple entries (see below).

  • a sequence of string *args that are comma-delimited into a single string value: CONTENT_TYPE("text/html","text/plain") returns "text/html, text/plain"

  • a set of **kwargs keyword arguments that are used to create a header value, in a manner dependent upon the particular header in question (to make value construction easier and error-free): CONTENT_DISPOSITION(max_age=CONTENT_DISPOSITION.ONEWEEK) returns "public, max-age=60480"

Each HTTPHeader instance also provides several methods to act on a WSGI collection, for removing and setting header values.

delete(collection)

This method removes all entries of the corresponding header from the given collection (environ or response_headers), e.g., USER_AGENT.delete(environ) deletes the ‘HTTP_USER_AGENT’ entry from the environ.

update(collection, *args, **kwargs)

This method does an in-place replacement of the given header entry, for example: CONTENT_LENGTH(response_headers,len(body))

The first argument is a valid environ dictionary or response_headers list; remaining arguments are passed on to __call__(*args, **kwargs) for value construction.

apply(collection, **kwargs)

This method is similar to update, only that it may affect other headers. For example, according to recommendations in RFC 2616, certain Cache-Control configurations should also set the Expires header for HTTP/1.0 clients. By default, apply() is simply update() but limited to keyword arguments.

This particular approach to managing headers within a WSGI collection has several advantages:

  1. Typos in the header name are easily detected since they become a NameError when executed. The approach of using header strings directly can be problematic; for example, the following should return None : environ.get("HTTP_ACCEPT_LANGUAGES")

  2. For specific headers with validation, using __call__ will result in an automatic header value check. For example, the _ContentDisposition header will reject a value having maxage or max_age (the appropriate parameter is max-age ).

  3. When appending/replacing headers, the field-name has the suggested RFC capitalization (e.g. Content-Type or ETag) for user-agents that incorrectly use case-sensitive matches.

  4. Some headers (such as Content-Type) are 0, that is, only one entry of this type may occur in a given set of response_headers. This module knows about those cases and enforces this cardinality constraint.

  5. The exact details of WSGI header management are abstracted so the programmer need not worry about operational differences between environ dictionary or response_headers list.

  6. Sorting of HTTPHeaders is done following the RFC suggestion that general-headers come first, followed by request and response headers, and finishing with entity-headers.

  7. Special care is given to exceptional cases such as Set-Cookie which violates the RFC’s recommendation about combining header content into a single entry using comma separation.

A particular difficulty with HTTP message headers is a categorization of sorts as described in section 4.2:

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)]. It MUST be possible to combine the multiple header fields into one “field-name: field-value” pair, without changing the semantics of the message, by appending each subsequent field-value to the first, each separated by a comma.

This creates three fundamentally different kinds of headers:

  • Those that do not have a #(values) production, and hence are singular and may only occur once in a set of response fields; this case is handled by the _SingleValueHeader subclass.

  • Those which have the #(values) production and follow the combining rule outlined above; our _MultiValueHeader case.

  • Those which are multi-valued, but cannot be combined (such as the Set-Cookie header due to its Expires parameter); or where combining them into a single header entry would cause common user-agents to fail (WWW-Authenticate, Warning) since they fail to handle dates even when properly quoted. This case is handled by _MultiEntryHeader.

Since this project does not have time to provide rigorous support and validation for all headers, it does a basic construction of headers listed in RFC 2616 (plus a few others) so that they can be obtained by simply doing from paste.httpheaders import *; the name of the header instance is the “common name” less any dashes to give CamelCase style names.

1

http://www.w3.org/Protocols/rfc2616/rfc2616-sec4.html#sec4.2

2

http://www.python.org/peps/pep-0333.html#environ-variables

3

http://www.python.org/peps/pep-0333.html#the-start-response-callable

class paste.httpheaders.EnvironVariable

a CGI environ variable as described by WSGI

This is a helper object so that standard WSGI environ variables can be extracted w/o syntax error possibility.

class paste.httpheaders.HTTPHeader(name, category=None, reference=None, version=None)

an HTTP header

HTTPHeader instances represent a particular field-name of an HTTP message header. They do not hold a field-value, but instead provide operations that work on is corresponding values. Storage of the actual field values is done with WSGI environ or response_headers as appropriate. Typically, a sub-classes that represent a specific HTTP header, such as _ContentDisposition, are 0. Once constructed the HTTPHeader instances themselves are immutable and stateless.

For purposes of documentation a “container” refers to either a WSGI environ dictionary, or a response_headers list.

Member variables (and correspondingly constructor arguments).

name

the field-name of the header, in “common form” as presented in RFC 2616; e.g. ‘Content-Type’

category

one of ‘general’, ‘request’, ‘response’, or ‘entity’

version

version of HTTP (informational) with which the header should be recognized

sort_order

sorting order to be applied before sorting on field-name when ordering headers in a response

Special Methods:

__call__

The primary method of the HTTPHeader instance is to make it a callable, it takes either a collection, a string value, or keyword arguments and attempts to find/construct a valid field-value

__lt__

This method is used so that HTTPHeader objects can be sorted in a manner suggested by RFC 2616.

__str__

The string-value for instances of this class is the field-name.

Primary Methods:

delete()

remove the all occurrences (if any) of the given header in the collection provided

update()

replaces (if they exist) all field-value items in the given collection with the value provided

tuples()

returns a set of (field-name, field-value) tuples 5 for extending response_headers

Custom Methods (these may not be implemented):

apply()

similar to update, but with two differences; first, only keyword arguments can be used, and second, specific sub-classes may introduce side-effects

parse()

converts a string value of the header into a more usable form, such as time in seconds for a date header, etc.

The collected versions of initialized header instances are immediately registered and accessible through the get_header function. Do not inherit from this directly, use one of _SingleValueHeader, _MultiValueHeader, or _MultiEntryHeader as appropriate.

apply(collection, **kwargs)

update the collection /w header value (may have side effects)

This method is similar to update only that usage may result in other headers being changed as recommended by the corresponding specification. The return value is defined by the particular sub-class. For example, the _CacheControl.apply() sets the Expires header in addition to its normal behavior.

compose(**kwargs)

build header value from keyword arguments

This method is used to build the corresponding header value when keyword arguments (or no arguments) were provided. The result should be a sequence of values. For example, the Expires header takes a keyword argument time (e.g. time.time()) from which it returns a the corresponding date.

delete(collection)

removes all occurances of the header from the collection provided

parse(*args, **kwargs)

convert raw header value into more usable form

This method invokes values() with the arguments provided, parses the header results, and then returns a header-specific data structure corresponding to the header. For example, the Expires header returns seconds (as returned by time.time())

update(collection, *args, **kwargs)

updates the collection with the provided header value

This method replaces (in-place when possible) all occurrences of the given header with the provided value. If no value is provided, this is the same as remove (note that this case can only occur if the target is a collection w/o a corresponding header value). The return value is the new header value (which could be a list for _MultiEntryHeader instances).

values(*args, **kwargs)

find/construct field-value(s) for the given header

Resolution is done according to the following arguments:

  • If only keyword arguments are given, then this is equivalent to compose(**kwargs).

  • If the first (and only) argument is a dict, it is assumed to be a WSGI environ and the result of the corresponding HTTP_ entry is returned.

  • If the first (and only) argument is a list, it is assumed to be a WSGI response_headers and the field-value(s) for this header are collected and returned.

  • In all other cases, the arguments are collected, checked that they are string values, possibly verified by the header’s logic, and returned.

At this time it is an error to provide keyword arguments if args is present (this might change). It is an error to provide both a WSGI object and also string arguments. If no arguments are provided, then compose() is called to provide a default value for the header; if there is not default it is an error.

paste.httpheaders.get_header(name, raiseError=True)

find the given HTTPHeader instance

This function finds the corresponding HTTPHeader for the name provided. So that python-style names can be used, underscores are converted to dashes before the lookup.

paste.httpheaders.list_headers(general=None, request=None, response=None, entity=None)

list all headers for a given category

paste.httpheaders.normalize_headers(response_headers, strict=True)

sort headers as suggested by RFC 2616

This alters the underlying response_headers to use the common name for each header; as well as sorting them with general headers first, followed by request/response headers, then entity headers, and unknown headers last.