Version 0.14.1 (July 11, 2014)¶
This is a minor release from 0.14.0 and includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.
Highlights include:
New methods
select_dtypes()
to select columns based on the dtype andsem()
to calculate the standard error of the mean.Support for dateutil timezones (see docs).
Support for ignoring full line comments in the
read_csv()
text parser.New documentation section on Options and Settings.
Lots of bug fixes.
API changes¶
Openpyxl now raises a ValueError on construction of the openpyxl writer instead of warning on pandas import (GH7284).
For
StringMethods.extract
, when no match is found, the result - only containingNaN
values - now also hasdtype=object
instead offloat
(GH7242)Period
objects no longer raise aTypeError
when compared using==
with another object that isn’t aPeriod
. Instead when comparing aPeriod
with another object using==
if the other object isn’t aPeriod
False
is returned. (GH7376)Previously, the behaviour on resetting the time or not in
offsets.apply
,rollforward
androllback
operations differed between offsets. With the support of thenormalize
keyword for all offsets(see below) with a default value of False (preserve time), the behaviour changed for certain offsets (BusinessMonthBegin, MonthEnd, BusinessMonthEnd, CustomBusinessMonthEnd, BusinessYearBegin, LastWeekOfMonth, FY5253Quarter, LastWeekOfMonth, Easter):In [6]: from pandas.tseries import offsets In [7]: d = pd.Timestamp('2014-01-01 09:00') # old behaviour < 0.14.1 In [8]: d + offsets.MonthEnd() Out[8]: pd.Timestamp('2014-01-31 00:00:00')
Starting from 0.14.1 all offsets preserve time by default. The old behaviour can be obtained with
normalize=True
# new behaviour In [1]: d + offsets.MonthEnd() Out[1]: Timestamp('2014-01-31 09:00:00') In [2]: d + offsets.MonthEnd(normalize=True) Out[2]: Timestamp('2014-01-31 00:00:00')
Note that for the other offsets the default behaviour did not change.
Add back
#N/A N/A
as a default NA value in text parsing, (regression from 0.12) (GH5521)Raise a
TypeError
on inplace-setting with a.where
and a nonnp.nan
value as this is inconsistent with a set-item expression likedf[mask] = None
(GH7656)
Enhancements¶
Add
dropna
argument tovalue_counts
andnunique
(GH5569).Add
select_dtypes()
method to allow selection of columns based on dtype (GH7316). See the docs.All
offsets
supports thenormalize
keyword to specify whetheroffsets.apply
,rollforward
androllback
resets the time (hour, minute, etc) or not (defaultFalse
, preserves time) (GH7156):import pandas.tseries.offsets as offsets day = offsets.Day() day.apply(pd.Timestamp("2014-01-01 09:00")) day = offsets.Day(normalize=True) day.apply(pd.Timestamp("2014-01-01 09:00"))
PeriodIndex
is represented as the same format asDatetimeIndex
(GH7601)StringMethods
now work on empty Series (GH7242)The file parsers
read_csv
andread_table
now ignore line comments provided by the parametercomment
, which accepts only a single character for the C reader. In particular, they allow for comments before file data begins (GH2685)Add
NotImplementedError
for simultaneous use ofchunksize
andnrows
for read_csv() (GH6774).Tests for basic reading of public S3 buckets now exist (GH7281).
read_html
now sports anencoding
argument that is passed to the underlying parser library. You can use this to read non-ascii encoded web pages (GH7323).read_excel
now supports reading from URLs in the same way thatread_csv
does. (GH6809)Support for dateutil timezones, which can now be used in the same way as pytz timezones across pandas. (GH4688)
In [3]: rng = pd.date_range( ...: "3/6/2012 00:00", periods=10, freq="D", tz="dateutil/Europe/London" ...: ) ...: In [4]: rng.tz Out[4]: tzfile('/usr/share/zoneinfo/Europe/London')
See the docs.
Implemented
sem
(standard error of the mean) operation forSeries
,DataFrame
,Panel
, andGroupby
(GH6897)Add
nlargest
andnsmallest
to theSeries
groupby
allowlist, which means you can now use these methods on aSeriesGroupBy
object (GH7053).All offsets
apply
,rollforward
androllback
can now handlenp.datetime64
, previously results inApplyTypeError
(GH7452)Period
andPeriodIndex
can containNaT
in its values (GH7485)Support pickling
Series
,DataFrame
andPanel
objects with non-unique labels along item axis (index
,columns
anditems
respectively) (GH7370).Improved inference of datetime/timedelta with mixed null objects. Regression from 0.13.1 in interpretation of an object Index with all null elements (GH7431)
Performance¶
Improvements in dtype inference for numeric operations involving yielding performance gains for dtypes:
int64
,timedelta64
,datetime64
(GH7223)Improvements in Series.transform for significant performance gains (GH6496)
Improvements in DataFrame.transform with ufuncs and built-in grouper functions for significant performance gains (GH7383)
Regression in groupby aggregation of datetime64 dtypes (GH7555)
Improvements in
MultiIndex.from_product
for large iterables (GH7627)
Experimental¶
pandas.io.data.Options
has a new method,get_all_data
method, and now consistently returns a MultiIndexedDataFrame
(GH5602)io.gbq.read_gbq
andio.gbq.to_gbq
were refactored to remove the dependency on the Googlebq.py
command line client. This submodule now useshttplib2
and the Googleapiclient
andoauth2client
API client libraries which should be more stable and, therefore, reliable thanbq.py
. See the docs. (GH6937).
Bug fixes¶
Bug in
DataFrame.where
with a symmetric shaped frame and a passed other of a DataFrame (GH7506)Bug in Panel indexing with a MultiIndex axis (GH7516)
Regression in datetimelike slice indexing with a duplicated index and non-exact end-points (GH7523)
Bug in setitem with list-of-lists and single vs mixed types (GH7551:)
Bug in time ops with non-aligned Series (GH7500)
Bug in timedelta inference when assigning an incomplete Series (GH7592)
Bug in groupby
.nth
with a Series and integer-like column name (GH7559)Bug in
Series.get
with a boolean accessor (GH7407)Bug in
value_counts
whereNaT
did not qualify as missing (NaN
) (GH7423)Bug in
to_timedelta
that accepted invalid units and misinterpreted ‘m/h’ (GH7611, GH6423)Bug in line plot doesn’t set correct
xlim
ifsecondary_y=True
(GH7459)Bug in grouped
hist
andscatter
plots use oldfigsize
default (GH7394)Bug in plotting subplots with
DataFrame.plot
,hist
clears passedax
even if the number of subplots is one (GH7391).Bug in plotting subplots with
DataFrame.boxplot
withby
kw raisesValueError
if the number of subplots exceeds 1 (GH7391).Bug in subplots displays
ticklabels
andlabels
in different rule (GH5897)Bug in
Panel.apply
with a MultiIndex as an axis (GH7469)Bug in
DatetimeIndex.insert
doesn’t preservename
andtz
(GH7299)Bug in
DatetimeIndex.asobject
doesn’t preservename
(GH7299)Bug in MultiIndex slicing with datetimelike ranges (strings and Timestamps), (GH7429)
Bug in
Index.min
andmax
doesn’t handlenan
andNaT
properly (GH7261)Bug in
PeriodIndex.min/max
results inint
(GH7609)Bug in
resample
wherefill_method
was ignored if you passedhow
(GH2073)Bug in
TimeGrouper
doesn’t exclude column specified bykey
(GH7227)Bug in
DataFrame
andSeries
bar and barh plot raisesTypeError
whenbottom
andleft
keyword is specified (GH7226)Bug in
DataFrame.hist
raisesTypeError
when it contains non numeric column (GH7277)Bug in
Index.delete
does not preservename
andfreq
attributes (GH7302)Bug in
DataFrame.query()
/eval
where local string variables with the @ sign were being treated as temporaries attempting to be deleted (GH7300).Bug in
Float64Index
which didn’t allow duplicates (GH7149).Bug in
DataFrame.replace()
where truthy values were being replaced (GH7140).Bug in
StringMethods.extract()
where a single match group Series would use the matcher’s name instead of the group name (GH7313).Bug in
isnull()
whenmode.use_inf_as_null == True
where isnull wouldn’t testTrue
when it encountered aninf
/-inf
(GH7315).Bug in inferred_freq results in None for eastern hemisphere timezones (GH7310)
Bug in
Easter
returns incorrect date when offset is negative (GH7195)Bug in broadcasting with
.div
, integer dtypes and divide-by-zero (GH7325)Bug in
CustomBusinessDay.apply
raisesNameError
whennp.datetime64
object is passed (GH7196)Bug in
MultiIndex.append
,concat
andpivot_table
don’t preserve timezone (GH6606)Bug in
.loc
with a list of indexers on a single-multi index level (that is not nested) (GH7349)Bug in
Series.map
when mapping a dict with tuple keys of different lengths (GH7333)Bug all
StringMethods
now work on empty Series (GH7242)Fix delegation of
read_sql
toread_sql_query
when query does not contain ‘select’ (GH7324).Bug where a string column name assignment to a
DataFrame
with aFloat64Index
raised aTypeError
during a call tonp.isnan
(GH7366).Bug where
NDFrame.replace()
didn’t correctly replace objects withPeriod
values (GH7379).Bug in
.ix
getitem should always return a Series (GH7150)Bug in MultiIndex slicing with incomplete indexers (GH7399)
Bug in MultiIndex slicing with a step in a sliced level (GH7400)
Bug where negative indexers in
DatetimeIndex
were not correctly sliced (GH7408)Bug where
NaT
wasn’t repr’d correctly in aMultiIndex
(GH7406, GH7409).Bug where bool objects were converted to
nan
inconvert_objects
(GH7416).Bug in
quantile
ignoring the axis keyword argument (GH7306)Bug where
nanops._maybe_null_out
doesn’t work with complex numbers (GH7353)Bug in several
nanops
functions whenaxis==0
for 1-dimensionalnan
arrays (GH7354)Bug where
nanops.nanmedian
doesn’t work whenaxis==None
(GH7352)Bug where
nanops._has_infs
doesn’t work with many dtypes (GH7357)Bug in
StataReader.data
where reading a 0-observation dta failed (GH7369)Bug in
StataReader
when reading Stata 13 (117) files containing fixed width strings (GH7360)Bug in
StataWriter
where encoding was ignored (GH7286)Bug in
DatetimeIndex
comparison doesn’t handleNaT
properly (GH7529)Bug in passing input with
tzinfo
to some offsetsapply
,rollforward
orrollback
resetstzinfo
or raisesValueError
(GH7465)Bug in
DatetimeIndex.to_period
,PeriodIndex.asobject
,PeriodIndex.to_timestamp
doesn’t preservename
(GH7485)Bug in
DatetimeIndex.to_period
andPeriodIndex.to_timestamp
handleNaT
incorrectly (GH7228)Bug in
offsets.apply
,rollforward
androllback
may return normaldatetime
(GH7502)Bug in
resample
raisesValueError
when target containsNaT
(GH7227)Bug in
Timestamp.tz_localize
resetsnanosecond
info (GH7534)Bug in
DatetimeIndex.asobject
raisesValueError
when it containsNaT
(GH7539)Bug in
Timestamp.__new__
doesn’t preserve nanosecond properly (GH7610)Bug in
Index.astype(float)
where it would return anobject
dtypeIndex
(GH7464).Bug in
DataFrame.reset_index
losestz
(GH3950)Bug in
DatetimeIndex.freqstr
raisesAttributeError
whenfreq
isNone
(GH7606)Bug in
GroupBy.size
created byTimeGrouper
raisesAttributeError
(GH7453)Bug in single column bar plot is misaligned (GH7498).
Bug in area plot with tz-aware time series raises
ValueError
(GH7471)Bug in non-monotonic
Index.union
may preservename
incorrectly (GH7458)Bug in
DatetimeIndex.intersection
doesn’t preserve timezone (GH4690)Bug in
rolling_var
where a window larger than the array would raise an error(GH7297)Bug with last plotted timeseries dictating
xlim
(GH2960)Bug with
secondary_y
axis not being considered for timeseriesxlim
(GH3490)Bug in
Float64Index
assignment with a non scalar indexer (GH7586)Bug in
pandas.core.strings.str_contains
does not properly match in a case insensitive fashion whenregex=False
andcase=False
(GH7505)Bug in
expanding_cov
,expanding_corr
,rolling_cov
, androlling_corr
for two arguments with mismatched index (GH7512)Bug in
to_sql
taking the boolean column as text column (GH7678)Bug in grouped
hist
doesn’t handlerot
kw andsharex
kw properly (GH7234)Bug in
.loc
performing fallback integer indexing withobject
dtype indices (GH7496)Bug (regression) in
PeriodIndex
constructor when passedSeries
objects (GH7701).