Pandas support

Warning: pandas support is currently experimental, don’t expect everything to work.

It is convenient to use the Pandas package when dealing with numerical data, so Pint provides PintArray. A PintArray is a Pandas Extension Array, which allows Pandas to recognise the Quantity and store it in Pandas DataFrames and Series.

Installation

Pandas support is provided by the pint-pandas package. To install it use either:

python -m pip install pint-pandas

Or:

conda install -c conda-forge pint-pandas

Basic example

This example will show the simplist way to use pandas with pint and the underlying objects. It’s slightly fiddly as you are not reading from a file. A more normal use case is given in Reading a csv.

First some imports (you don’t need to import pint_pandas for this to work)

[1]:
import pandas as pd
import pint
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-1-c24a82475971> in <module>
----> 1 import pandas as pd
      2 import pint

ModuleNotFoundError: No module named 'pandas'

Next, we create a DataFrame with PintArrays as columns.

[2]:
df = pd.DataFrame({
    "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
    "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
})
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-2-381c91812326> in <module>
----> 1 df = pd.DataFrame({
      2     "torque": pd.Series([1, 2, 2, 3], dtype="pint[lbf ft]"),
      3     "angular_velocity": pd.Series([1, 2, 2, 3], dtype="pint[rpm]"),
      4 })
      5 df

NameError: name 'pd' is not defined

Operations with columns are units aware so behave as we would intuitively expect.

[3]:
df['power'] = df['torque'] * df['angular_velocity']
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-2cfa3462697c> in <module>
----> 1 df['power'] = df['torque'] * df['angular_velocity']
      2 df

NameError: name 'df' is not defined

We can see the columns’ units in the dtypes attribute

[4]:
df.dtypes
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-5cc0934cc03c> in <module>
----> 1 df.dtypes

NameError: name 'df' is not defined

Each column can be accessed as a Pandas Series

[5]:
df.power
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-63aee292d66f> in <module>
----> 1 df.power

NameError: name 'df' is not defined

Which contains a PintArray

[6]:
df.power.values
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-26d8785a35ae> in <module>
----> 1 df.power.values

NameError: name 'df' is not defined

The PintArray contains a Quantity

[7]:
df.power.values.quantity
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-ce03413c05f9> in <module>
----> 1 df.power.values.quantity

NameError: name 'df' is not defined

Pandas Series accessors are provided for most Quantity properties and methods, which will convert the result to a Series where possible.

[8]:
df.power.pint.units
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-ee88b103d5f4> in <module>
----> 1 df.power.pint.units

NameError: name 'df' is not defined
[9]:
df.power.pint.to("kW").values
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-3fafd42e5436> in <module>
----> 1 df.power.pint.to("kW").values

NameError: name 'df' is not defined

Reading from csv

Reading from files is the far more standard way to use pandas. To facilitate this, DataFrame accessors are provided to make it easy to get to PintArrays.

[10]:
import pandas as pd
import pint
import pint_pandas
import io
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-10-268c17893346> in <module>
----> 1 import pandas as pd
      2 import pint
      3 import pint_pandas
      4 import io

ModuleNotFoundError: No module named 'pandas'

Here’s the contents of the csv file.

[11]:
test_data = '''speed,mech power,torque,rail pressure,fuel flow rate,fluid power
rpm,kW,N m,bar,l/min,kW
1000.0,,10.0,1000.0,10.0,
1100.0,,10.0,100000000.0,10.0,
1200.0,,10.0,1000.0,10.0,
1200.0,,10.0,1000.0,10.0,'''

Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.

[12]:
df = pd.read_csv(io.StringIO(test_data), header=[0, 1])
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-12-d0e741e5ad92> in <module>
----> 1 df = pd.read_csv(io.StringIO(test_data), header=[0, 1])
      2 # df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
      3 df

NameError: name 'pd' is not defined

Then use the DataFrame’s pint accessor’s quantify method to convert the columns from np.ndarrays to PintArrays, with units from the bottom column level.

[13]:
df.dtypes
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-5cc0934cc03c> in <module>
----> 1 df.dtypes

NameError: name 'df' is not defined
[14]:
df_ = df.pint.quantify(level=-1)
df_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-14-25b295e4ca8b> in <module>
----> 1 df_ = df.pint.quantify(level=-1)
      2 df_

NameError: name 'df' is not defined

As previously, operations between DataFrame columns are unit aware

[15]:
df_.speed * df_.torque
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-412bdf683a33> in <module>
----> 1 df_.speed * df_.torque

NameError: name 'df_' is not defined
[16]:
df_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-16-2d133f3be267> in <module>
----> 1 df_

NameError: name 'df_' is not defined
[17]:
df_['mech power'] = df_.speed * df_.torque
df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
df_
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-17-5af8013d64c7> in <module>
----> 1 df_['mech power'] = df_.speed * df_.torque
      2 df_['fluid power'] = df_['fuel flow rate'] * df_['rail pressure']
      3 df_

NameError: name 'df_' is not defined

The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.

[18]:
df_.pint.dequantify()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-a19beeefef66> in <module>
----> 1 df_.pint.dequantify()

NameError: name 'df_' is not defined

This allows for some rather powerful abilities. For example, to change single column units

[19]:
df_['fluid power'] = df_['fluid power'].pint.to("kW")
df_['mech power'] = df_['mech power'].pint.to("kW")
df_.pint.dequantify()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-64c7f8a50f1b> in <module>
----> 1 df_['fluid power'] = df_['fluid power'].pint.to("kW")
      2 df_['mech power'] = df_['mech power'].pint.to("kW")
      3 df_.pint.dequantify()

NameError: name 'df_' is not defined

The units are harder to read than they need be, so lets change pints default format for displaying units.

[20]:
pint_pandas.PintType.ureg.default_format = "~P"
df_.pint.dequantify()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-2e235076e975> in <module>
----> 1 pint_pandas.PintType.ureg.default_format = "~P"
      2 df_.pint.dequantify()

NameError: name 'pint_pandas' is not defined

or the entire table’s units

[21]:
df_.pint.to_base_units().pint.dequantify()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-21-df4d39be2ebb> in <module>
----> 1 df_.pint.to_base_units().pint.dequantify()

NameError: name 'df_' is not defined

Advanced example

This example shows alternative ways to use pint with pandas and other features.

Start with the same imports.

[22]:
import pandas as pd
import pint
import pint_pandas
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
<ipython-input-22-d44ee33305c4> in <module>
----> 1 import pandas as pd
      2 import pint
      3 import pint_pandas

ModuleNotFoundError: No module named 'pandas'

We’ll be use a shorthand for PintArray

[23]:
PA_ = pint_pandas.PintArray
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-c1d8ee679ee9> in <module>
----> 1 PA_ = pint_pandas.PintArray

NameError: name 'pint_pandas' is not defined

And set up a unit registry and quantity shorthand.

[24]:
ureg = pint.UnitRegistry()
Q_ = ureg.Quantity
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-7725dbb35407> in <module>
----> 1 ureg = pint.UnitRegistry()
      2 Q_ = ureg.Quantity

NameError: name 'pint' is not defined

Operations between PintArrays of different unit registry will not work. We can change the unit registry that will be used in creating new PintArrays to prevent this issue.

[25]:
pint_pandas.PintType.ureg = ureg
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-6e63c38edba8> in <module>
----> 1 pint_pandas.PintType.ureg = ureg

NameError: name 'ureg' is not defined

These are the possible ways to create a PintArray.

Note that pint[unit] must be used for the Series constuctor, whereas the PintArray constructor allows the unit string or object.

[26]:
df = pd.DataFrame({
        "length" : pd.Series([1,2], dtype="pint[m]"),
        "width" : PA_([2,3], dtype="pint[m]"),
        "distance" : PA_([2,3], dtype="m"),
        "height" : PA_([2,3], dtype=ureg.m),
        "depth" : PA_.from_1darray_quantity(Q_([2,3],ureg.m)),
    })
df
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-8bef4f217f9a> in <module>
----> 1 df = pd.DataFrame({
      2         "length" : pd.Series([1,2], dtype="pint[m]"),
      3         "width" : PA_([2,3], dtype="pint[m]"),
      4         "distance" : PA_([2,3], dtype="m"),
      5         "height" : PA_([2,3], dtype=ureg.m),

NameError: name 'pd' is not defined
[27]:
df.length.values.units
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-c15585b7a0bf> in <module>
----> 1 df.length.values.units

NameError: name 'df' is not defined