Interfacing with the Pandas Package

The pandas package is a package for high performance data analysis of table-like structures that is complementary to the Table class in astropy.

In order to exchange data between the Table class and the pandas.DataFrame class (the main data structure in pandas), the Table class includes two methods, to_pandas() and from_pandas().

Example

To demonstrate, we can create a minimal table:

>>> from astropy.table import Table
>>> t = Table()
>>> t['a'] = [1, 2, 3, 4]
>>> t['b'] = ['a', 'b', 'c', 'd']

Which we can then convert to a DataFrame:

>>> df = t.to_pandas()
>>> df
   a  b
0  1  a
1  2  b
2  3  c
3  4  d
>>> type(df)
<class 'pandas.core.frame.DataFrame'>

It is also possible to create a table from a DataFrame:

>>> t2 = Table.from_pandas(df)
>>> t2
<Table length=4>
  a      b
int64 string8
----- -------
    1       a
    2       b
    3       c
    4       d

The conversions to and from pandas are subject to the following caveats:

  • The DataFrame structure does not support multidimensional columns, so Table objects with multidimensional columns cannot be converted to DataFrame.

  • Masked tables can be converted, but in columns of float or string values the resulting DataFrame uses numpy.nan to indicate missing values. For float columns, the conversion therefore does not necessarily round-trip if converting back to an astropy table, because the distinction between numpy.nan and masked values is lost. This is not a problem for integer columns.

  • Tables with Mixin Columns can not be converted.