.. _py2-versus-py3:

Python 2 versus Python 3
========================

.. currentmodule:: patsy

The biggest difference between Python 2 and Python 3 is in their
string handling, and this is particularly relevant to Patsy since
it parses user input. We follow a simple rule: input to Patsy
should always be of type ``str``. That means that on Python 2, you
should pass byte-strings (not unicode), and on Python 3, you should
pass unicode strings (not byte-strings). Similarly, when Patsy
passes text back (e.g. :attr:`DesignInfo.column_names`), it's always
in the form of a ``str``.

In addition to this being the most convenient for users (you never
need to use any b"weird" u"prefixes" when writing a formula string),
it's actually a necessary consequence of a deeper change in the Python
language: in Python 2, Python code itself is represented as
byte-strings, and that's the only form of input accepted by the
:mod:`tokenize` module. On the other hand, Python 3's tokenizer and
parser use unicode, and since Patsy processes Python code, it has
to follow suit.

There is one exception to this rule: on Python 2, as a convenience for
those using ``from __future__ import unicode_literals``, the
high-level API functions :func:`dmatrix`, :func:`dmatrices`,
:func:`incr_dbuilders`, and :func:`incr_dbuilder` do accept
``unicode`` strings -- BUT these unicode string objects are still
required to contain only ASCII characters; if they contain any
non-ASCII characters then an error will be raised. If you really need
non-ASCII in your formulas, then you should consider upgrading to
Python 3. Low-level APIs like :meth:`ModelDesc.from_formula` continue
to insist on ``str`` objects only.