Interfacing with the Pandas Package¶
The astropy.timeseries
package is not the only package to provide
functionality related to time series. Another notable package is pandas, which provides a pandas.DataFrame
class. The main benefits of astropy.timeseries
in the context of astronomical
research are the following:
The time column is a
Time
object that supports very high precision representation of times, and makes it easy to convert between different time scales and formats (e.g., ISO 8601 timestamps, Julian Dates, and so on).The data columns can include
Quantity
objects with units.The
BinnedTimeSeries
class includes variable-width time bins.There are built-in readers for common time series file formats, as well as the ability to define custom readers/writers.
Nevertheless, there are cases where using pandas DataFrame
objects might make sense, so we provide methods to convert to/from
DataFrame
objects.
Example¶
Consider a concise example starting from a DataFrame
:
>>> import pandas
>>> import numpy as np
>>> from astropy.utils.introspection import minversion
>>> df = pandas.DataFrame()
>>> df['a'] = [1, 2, 3]
>>> times = np.array(['2015-07-04', '2015-07-05', '2015-07-06'], dtype=np.datetime64)
>>> df.set_index(pandas.DatetimeIndex(times), inplace=True)
>>> df
a
2015-07-04 1
2015-07-05 2
2015-07-06 3
We can convert this to an astropy
TimeSeries
using
from_pandas()
:
>>> from astropy.timeseries import TimeSeries
>>> ts = TimeSeries.from_pandas(df)
>>> ts
<TimeSeries length=3>
time a
Time int64
----------------------------- -----
2015-07-04T00:00:00.000000000 1
2015-07-05T00:00:00.000000000 2
2015-07-06T00:00:00.000000000 3
Converting to DataFrame
can also be done with
to_pandas()
:
>>> ts['b'] = [1.2, 3.4, 5.4]
>>> df_new = ts.to_pandas()
>>> df_new
a b
time
2015-07-04 1 1.2
2015-07-05 2 3.4
2015-07-06 3 5.4
Missing values in the time column are supported and correctly converted to a pandas’ NaT object:
>>> ts.time[2] = np.nan
>>> ts
<TimeSeries length=3>
time a b
Time int64 float64
----------------------------- ----- -------
2015-07-04T00:00:00.000000000 1 1.2
2015-07-05T00:00:00.000000000 2 3.4
-- 3 5.4
>>> df_missing = ts.to_pandas()
>>> df_missing
a b
time
2015-07-04 1 1.2
2015-07-05 2 3.4
NaT 3 5.4