The Unicode Chapter

In normal Mako operation, all parsed template constructs and output streams are handled internally as Python 3 str (Unicode) objects. It’s only at the point of Template.render() that this stream of Unicode objects may be rendered into whatever the desired output encoding is. The implication here is that the template developer must :ensure that the encoding of all non-ASCII templates is explicit (still required in Python 3, although Mako defaults to utf-8), that all non-ASCII-encoded expressions are in one way or another converted to unicode (not much of a burden in Python 3), and that the output stream of the template is handled as a unicode stream being encoded to some encoding (still required in Python 3).

Specifying the Encoding of a Template File

Changed in version 1.1.3: As of Mako 1.1.3, the default template encoding is “utf-8”. Previously, a Python “magic encoding comment” was required for templates that were not using ASCII.

Mako templates support Python’s “magic encoding comment” syntax described in pep-0263:

## -*- coding: utf-8 -*-

Alors vous imaginez ma surprise, au lever du jour, quand
une drôle de petite voix m’a réveillé. Elle disait:
 « S’il vous plaît… dessine-moi un mouton! »

As an alternative, the template encoding can be specified programmatically to either Template or TemplateLookup via the input_encoding parameter:

t = TemplateLookup(directories=['./'], input_encoding='utf-8')

The above will assume all located templates specify utf-8 encoding, unless the template itself contains its own magic encoding comment, which takes precedence.

Handling Expressions

The next area that encoding comes into play is in expression constructs. By default, Mako’s treatment of an expression like this:

${"hello world"}

looks something like this:

context.write(str("hello world"))

That is, the output of all expressions is run through the ``str`` built-in. This is the default setting, and can be modified to expect various encodings. The str step serves both the purpose of rendering non-string expressions into strings (such as integers or objects which contain __str()__ methods), and to ensure that the final output stream is constructed as a Unicode object. The main implication of this is that any raw byte-strings that contain an encoding other than ASCII must first be decoded to a Python unicode object.

Similarly, if you are reading data from a file that is streaming bytes, or returning data from some object that is returning a Python byte-string containing a non-ASCII encoding, you have to explicitly decode to Unicode first, such as:

${call_my_object().decode('utf-8')}

Note that filehandles acquired by open() in Python 3 default to returning “text”: that is, the decoding is done for you. See Python 3’s documentation for the open() built-in for details on this.

If you want a certain encoding applied to all expressions, override the str builtin with the decode built-in at the Template or TemplateLookup level:

t = Template(templatetext, default_filters=['decode.utf8'])

Note that the built-in decode object is slower than the str function, since unlike str it’s not a Python built-in, and it also checks the type of the incoming data to determine if string conversion is needed first.

The default_filters argument can be used to entirely customize the filtering process of expressions. This argument is described in The default_filters Argument.

Defining Output Encoding

Now that we have a template which produces a pure Unicode output stream, all the hard work is done. We can take the output and do anything with it.

As stated in the “Usage” chapter, both Template and TemplateLookup accept output_encoding and encoding_errors parameters which can be used to encode the output in any Python supported codec:

from mako.template import Template
from mako.lookup import TemplateLookup

mylookup = TemplateLookup(directories=['/docs'], output_encoding='utf-8', encoding_errors='replace')

mytemplate = mylookup.get_template("foo.txt")
print(mytemplate.render())

Template.render() will return a bytes object in Python 3 if an output encoding is specified. By default it performs no encoding and returns a native string.

Template.render_unicode() will return the template output as a Python str object:

print(mytemplate.render_unicode())

The above method disgards the output encoding keyword argument; you can encode yourself by saying:

print(mytemplate.render_unicode().encode('utf-8', 'replace'))