Version 0.17.1 (November 21, 2015)¶
Note
We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization). This will help ensure the success of development of pandas as a world-class open-source project.
This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.
Highlights include:
Support for Conditional HTML Formatting, see here
Releasing the GIL on the csv reader & other ops, see here
Fixed regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376)
New features¶
Conditional HTML formatting¶
Warning
This is a new feature and is under active development. We’ll be adding features an possibly making breaking changes in future releases. Feedback is welcome in GH11610
We’ve added experimental support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
The styling is accomplished with HTML and CSS.
Accesses the styler class with the pandas.DataFrame.style
, attribute,
an instance of Styler
with your data attached.
Here’s a quick example:
In [1]: np.random.seed(123) In [2]: df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde")) In [3]: html = df.style.background_gradient(cmap="viridis", low=0.5)
We can render the HTML to get the following table.
a | b | c | d | e | |
---|---|---|---|---|---|
0 | -1.085631 | 0.997345 | 0.282978 | -1.506295 | -0.5786 |
1 | 1.651437 | -2.426679 | -0.428913 | 1.265936 | -0.86674 |
2 | -0.678886 | -0.094709 | 1.49139 | -0.638902 | -0.443982 |
3 | -0.434351 | 2.20593 | 2.186786 | 1.004054 | 0.386186 |
4 | 0.737369 | 1.490732 | -0.935834 | 1.175829 | -1.253881 |
5 | -0.637752 | 0.907105 | -1.428681 | -0.140069 | -0.861755 |
6 | -0.255619 | -2.798589 | -1.771533 | -0.699877 | 0.927462 |
7 | -0.173636 | 0.002846 | 0.688223 | -0.879536 | 0.283627 |
8 | -0.805367 | -1.727669 | -0.3909 | 0.573806 | 0.338589 |
9 | -0.01183 | 2.392365 | 0.412912 | 0.978736 | 2.238143 |
Styler
interacts nicely with the Jupyter Notebook.
See the documentation for more.
Enhancements¶
DatetimeIndex
now supports conversion to strings withastype(str)
(GH10442)Support for
compression
(gzip/bz2) inpandas.DataFrame.to_csv()
(GH7615)pd.read_*
functions can now also acceptpathlib.Path
, orpy:py._path.local.LocalPath
objects for thefilepath_or_buffer
argument. (GH11033) - TheDataFrame
andSeries
functions.to_csv()
,.to_html()
and.to_latex()
can now handle paths beginning with tildes (e.g.~/Documents/
) (GH11438)DataFrame
now uses the fields of anamedtuple
as columns, if columns are not supplied (GH11181)DataFrame.itertuples()
now returnsnamedtuple
objects, when possible. (GH11269, GH11625)Added
axvlines_kwds
to parallel coordinates plot (GH10709)Option to
.info()
and.memory_usage()
to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (GH11595)In [4]: df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 In [5]: df["B"] = df["A"].astype("category") # shows the '+' as we have object dtypes In [6]: df.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 9.0+ KB # we have an accurate memory assessment (but can be expensive to compute this) In [7]: df.info(memory_usage="deep") <class 'pandas.core.frame.DataFrame'> RangeIndex: 1000 entries, 0 to 999 Data columns (total 2 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 A 1000 non-null object 1 B 1000 non-null category dtypes: category(1), object(1) memory usage: 59.9 KB
Index
now has afillna
method (GH10089)In [8]: pd.Index([1, np.nan, 3]).fillna(2) Out[8]: Float64Index([1.0, 2.0, 3.0], dtype='float64')
Series of type
category
now make.str.<...>
and.dt.<...>
accessor methods / properties available, if the categories are of that type. (GH10661)In [9]: s = pd.Series(list("aabb")).astype("category") In [10]: s Out[10]: 0 a 1 a 2 b 3 b Length: 4, dtype: category Categories (2, object): ['a', 'b'] In [11]: s.str.contains("a") Out[11]: 0 True 1 True 2 False 3 False Length: 4, dtype: bool In [12]: date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category") In [13]: date Out[13]: 0 2015-01-01 1 2015-01-02 2 2015-01-03 3 2015-01-04 4 2015-01-05 Length: 5, dtype: category Categories (5, datetime64[ns]): [2015-01-01, 2015-01-02, 2015-01-03, 2015-01-04, 2015-01-05] In [14]: date.dt.day Out[14]: 0 1 1 2 2 3 3 4 4 5 Length: 5, dtype: int64
pivot_table
now has amargins_name
argument so you can use something other than the default of ‘All’ (GH3335)Implement export of
datetime64[ns, tz]
dtypes with a fixed HDF5 store (GH11411)Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax (
{x, y}
) instead of Legacy Python syntax (set([x, y])
) (GH11215)Improve the error message in
pandas.io.gbq.to_gbq()
when a streaming insert fails (GH11285) and when the DataFrame does not match the schema of the destination table (GH11359)
API changes¶
raise
NotImplementedError
inIndex.shift
for non-supported index types (GH8038)min
andmax
reductions ondatetime64
andtimedelta64
dtyped series now result inNaT
and notnan
(GH11245).Indexing with a null key will raise a
TypeError
, instead of aValueError
(GH11356)Series.ptp
will now ignore missing values by default (GH11163)
Deprecations¶
Performance improvements¶
Checking monotonic-ness before sorting on an index (GH11080)
Series.dropna
performance improvement when its dtype can’t containNaN
(GH11159)Release the GIL on most datetime field operations (e.g.
DatetimeIndex.year
,Series.dt.year
), normalization, and conversion to and fromPeriod
,DatetimeIndex.to_period
andPeriodIndex.to_timestamp
(GH11263)Release the GIL on some rolling algos:
rolling_median
,rolling_mean
,rolling_max
,rolling_min
,rolling_var
,rolling_kurt
,rolling_skew
(GH11450)Release the GIL when reading and parsing text files in
read_csv
,read_table
(GH11272)Improved performance of
rolling_median
(GH11450)Improved performance of
to_excel
(GH11352)Performance bug in repr of
Categorical
categories, which was rendering the strings before chopping them for display (GH11305)Performance improvement in
Categorical.remove_unused_categories
, (GH11643).Improved performance of
Series
constructor with no data andDatetimeIndex
(GH11433)Improved performance of
shift
,cumprod
, andcumsum
with groupby (GH4095)
Bug fixes¶
SparseArray.__iter__()
now does not causePendingDeprecationWarning
in Python 3.5 (GH11622)Regression from 0.16.2 for output formatting of long floats/nan, restored in (GH11302)
Series.sort_index()
now correctly handles theinplace
option (GH11402)Incorrectly distributed .c file in the build on
PyPi
when reading a csv of floats and passingna_values=<a scalar>
would show an exception (GH11374)Bug in
.to_latex()
output broken when the index has a name (GH10660)Bug in
HDFStore.append
with strings whose encoded length exceeded the max unencoded length (GH11234)Bug in merging
datetime64[ns, tz]
dtypes (GH11405)Bug in
HDFStore.select
when comparing with a numpy scalar in a where clause (GH11283)Bug in using
DataFrame.ix
with a MultiIndex indexer (GH11372)Bug in
date_range
with ambiguous endpoints (GH11626)Prevent adding new attributes to the accessors
.str
,.dt
and.cat
. Retrieving such a value was not possible, so error out on setting it. (GH10673)Bug in tz-conversions with an ambiguous time and
.dt
accessors (GH11295)Bug in output formatting when using an index of ambiguous times (GH11619)
Bug in comparisons of Series vs list-likes (GH11339)
Bug in
DataFrame.replace
with adatetime64[ns, tz]
and a non-compat to_replace (GH11326, GH11153)Bug in
isnull
wherenumpy.datetime64('NaT')
in anumpy.array
was not determined to be null(GH11206)Bug in list-like indexing with a mixed-integer Index (GH11320)
Bug in
pivot_table
withmargins=True
when indexes are ofCategorical
dtype (GH10993)Bug in
DataFrame.plot
cannot use hex strings colors (GH10299)Regression in
DataFrame.drop_duplicates
from 0.16.2, causing incorrect results on integer values (GH11376)Bug in
pd.eval
where unary ops in a list error (GH11235)Bug in
describe()
dropping column names for hierarchical indexes (GH11517)Bug in
DataFrame.pct_change()
not propagatingaxis
keyword on.fillna
method (GH11150)Bug in
.to_csv()
when a mix of integer and string column names are passed as thecolumns
parameter (GH11637)Bug in indexing with a
range
, (GH11652)Bug in inference of numpy scalars and preserving dtype when setting columns (GH11638)
Bug in
to_sql
using unicode column names giving UnicodeEncodeError with (GH11431).Fix regression in setting of
xticks
inplot
(GH11529).Bug in
holiday.dates
where observance rules could not be applied to holiday and doc enhancement (GH11477, GH11533)Fix plotting issues when having plain
Axes
instances instead ofSubplotAxes
(GH11520, GH11556).Bug in
DataFrame.to_latex()
produces an extra rule whenheader=False
(GH7124)Bug in
df.groupby(...).apply(func)
when a func returns aSeries
containing a new datetimelike column (GH11324)Bug in
pandas.json
when file to load is big (GH11344)Bugs in
to_excel
with duplicate columns (GH11007, GH10982, GH10970)Fixed a bug that prevented the construction of an empty series of dtype
datetime64[ns, tz]
(GH11245).Bug in
read_excel
with MultiIndex containing integers (GH11317)Bug in
to_excel
with openpyxl 2.2+ and merging (GH11408)Bug in
DataFrame.to_dict()
produces anp.datetime64
object instead ofTimestamp
when only datetime is present in data (GH11327)Bug in
DataFrame.corr()
raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (GH11560)Bug in the link-time error caused by C
inline
functions on FreeBSD 10+ (withclang
) (GH10510)Bug in
DataFrame.to_csv
in passing through arguments for formattingMultiIndexes
, includingdate_format
(GH7791)Bug in
DataFrame.join()
withhow='right'
producing aTypeError
(GH11519)Bug in
Series.quantile
with empty list results hasIndex
withobject
dtype (GH11588)Bug in
pd.merge
results in emptyInt64Index
rather thanIndex(dtype=object)
when the merge result is empty (GH11588)Bug in
Categorical.remove_unused_categories
when havingNaN
values (GH11599)Bug in
DataFrame.to_sparse()
loses column names for MultiIndexes (GH11600)Bug in
DataFrame.round()
with non-unique column index producing a Fatal Python error (GH11611)Bug in
DataFrame.round()
withdecimals
being a non-unique indexed Series producing extra columns (GH11618)