What’s new in 1.5.1 (October 19, 2022)

These are the changes in pandas 1.5.1. See Release notes for a full changelog including other versions of pandas.

Behavior of groupby with categorical groupers (GH48645)

In versions of pandas prior to 1.5, groupby with dropna=False would still drop NA values when the grouper was a categorical dtype. A fix for this was attempted in 1.5, however it introduced a regression where passing observed=False and dropna=False to groupby would result in only observed categories. It was found that the patch fixing the dropna=False bug is incompatible with observed=False, and decided that the best resolution is to restore the correct observed=False behavior at the cost of reintroducing the dropna=False bug.

In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "x": pd.Categorical([1, None], categories=[1, 2, 3]),
   ...:         "y": [3, 4],
   ...:     }
   ...: )
   ...: 

In [2]: df
Out[2]: 
     x  y
0    1  3
1  NaN  4

1.5.0 behavior:

In [3]: # Correct behavior, NA values are not dropped
        df.groupby("x", observed=True, dropna=False).sum()
Out[3]:
     y
x
1    3
NaN  4


In [4]: # Incorrect behavior, only observed categories present
        df.groupby("x", observed=False, dropna=False).sum()
Out[4]:
     y
x
1    3
NaN  4

1.5.1 behavior:

# Incorrect behavior, NA values are dropped
In [3]: df.groupby("x", observed=True, dropna=False).sum()
Out[3]: 
   y
x   
1  3

# Correct behavior, unobserved categories present (NA values still dropped)
In [4]: df.groupby("x", observed=False, dropna=False).sum()
Out[4]: 
   y
x   
1  3
2  0
3  0

Fixed regressions

Bug fixes

Other

  • Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (GH48692)

Contributors

For contributors, please see /usr/share/doc/contributors_list.txt or https://github.com/pandas-dev/pandas/graphs/contributors