@@ -194,195 +194,6 @@ levels <merging.merge_on_columns_and_levels>` documentation section.
194
194
195
195
.. _whatsnew_0230.enhancements.sort_by_columns_and_levels :
196
196
197
- Sorting by a combination of columns and index levels
198
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
199
-
200
- Strings passed to :meth: `DataFrame.sort_values ` as the ``by `` parameter may
201
- now refer to either column names or index level names. This enables sorting
202
- ``DataFrame `` instances by a combination of index levels and columns without
203
- resetting indexes. See the :ref: `Sorting by Indexes and Values
204
- <basics.sort_indexes_and_values>` documentation section.
205
- (:issue: `14353 `)
206
-
207
- .. ipython :: python
208
-
209
- # Build MultiIndex
210
- idx = pd.MultiIndex.from_tuples([(' a' , 1 ), (' a' , 2 ), (' a' , 2 ),
211
- (' b' , 2 ), (' b' , 1 ), (' b' , 1 )])
212
- idx.names = [' first' , ' second' ]
213
-
214
- # Build DataFrame
215
- df_multi = pd.DataFrame({' A' : np.arange(6 , 0 , - 1 )},
216
- index = idx)
217
- df_multi
218
-
219
- # Sort by 'second' (index) and 'A' (column)
220
- df_multi.sort_values(by = [' second' , ' A' ])
221
-
222
-
223
- .. _whatsnew_023.enhancements.extension :
224
-
225
- Extending pandas with custom types (experimental)
226
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
227
-
228
- pandas now supports storing array-like objects that aren't necessarily 1-D NumPy
229
- arrays as columns in a DataFrame or values in a Series. This allows third-party
230
- libraries to implement extensions to NumPy's types, similar to how pandas
231
- implemented categoricals, datetimes with timezones, periods, and intervals.
232
-
233
- As a demonstration, we'll use cyberpandas _, which provides an ``IPArray `` type
234
- for storing ip addresses.
235
-
236
- .. code-block :: ipython
237
-
238
- In [1]: from cyberpandas import IPArray
239
-
240
- In [2]: values = IPArray([
241
- ...: 0,
242
- ...: 3232235777,
243
- ...: 42540766452641154071740215577757643572
244
- ...: ])
245
- ...:
246
- ...:
247
-
248
- ``IPArray `` isn't a normal 1-D NumPy array, but because it's a pandas
249
- :class: `~pandas.api.extensions.ExtensionArray `, it can be stored properly inside pandas' containers.
250
-
251
- .. code-block :: ipython
252
-
253
- In [3]: ser = pd.Series(values)
254
-
255
- In [4]: ser
256
- Out[4]:
257
- 0 0.0.0.0
258
- 1 192.168.1.1
259
- 2 2001:db8:85a3::8a2e:370:7334
260
- dtype: ip
261
-
262
- Notice that the dtype is ``ip ``. The missing value semantics of the underlying
263
- array are respected:
264
-
265
- .. code-block :: ipython
266
-
267
- In [5]: ser.isna()
268
- Out[5]:
269
- 0 True
270
- 1 False
271
- 2 False
272
- dtype: bool
273
-
274
- For more, see the :ref: `extension types <extending.extension-types >`
275
- documentation. If you build an extension array, publicize it on `the ecosystem page <https://pandas.pydata.org/community/ecosystem.html >`_.
276
-
277
- .. _cyberpandas : https://cyberpandas.readthedocs.io/en/latest/
278
-
279
-
280
- .. _whatsnew_0230.enhancements.categorical_grouping :
281
-
282
- New ``observed `` keyword for excluding unobserved categories in ``GroupBy ``
283
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
284
-
285
- Grouping by a categorical includes the unobserved categories in the output.
286
- When grouping by multiple categorical columns, this means you get the cartesian product of all the
287
- categories, including combinations where there are no observations, which can result in a large
288
- number of groups. We have added a keyword ``observed `` to control this behavior, it defaults to
289
- ``observed=False `` for backward-compatibility. (:issue: `14942 `, :issue: `8138 `, :issue: `15217 `, :issue: `17594 `, :issue: `8669 `, :issue: `20583 `, :issue: `20902 `)
290
-
291
- .. ipython :: python
292
-
293
- cat1 = pd.Categorical([" a" , " a" , " b" , " b" ],
294
- categories = [" a" , " b" , " z" ], ordered = True )
295
- cat2 = pd.Categorical([" c" , " d" , " c" , " d" ],
296
- categories = [" c" , " d" , " y" ], ordered = True )
297
- df = pd.DataFrame({" A" : cat1, " B" : cat2, " values" : [1 , 2 , 3 , 4 ]})
298
- df[' C' ] = [' foo' , ' bar' ] * 2
299
- df
300
-
301
- To show all values, the previous behavior:
302
-
303
- .. ipython :: python
304
-
305
- df.groupby([' A' , ' B' , ' C' ], observed = False ).count()
306
-
307
-
308
- To show only observed values:
309
-
310
- .. ipython :: python
311
-
312
- df.groupby([' A' , ' B' , ' C' ], observed = True ).count()
313
-
314
- For pivoting operations, this behavior is *already * controlled by the ``dropna `` keyword:
315
-
316
- .. ipython :: python
317
-
318
- cat1 = pd.Categorical([" a" , " a" , " b" , " b" ],
319
- categories = [" a" , " b" , " z" ], ordered = True )
320
- cat2 = pd.Categorical([" c" , " d" , " c" , " d" ],
321
- categories = [" c" , " d" , " y" ], ordered = True )
322
- df = pd.DataFrame({" A" : cat1, " B" : cat2, " values" : [1 , 2 , 3 , 4 ]})
323
- df
324
-
325
-
326
- .. code-block :: ipython
327
-
328
- In [1]: pd.pivot_table(df, values='values', index=['A', 'B'], dropna=True)
329
-
330
- Out[1]:
331
- values
332
- A B
333
- a c 1.0
334
- d 2.0
335
- b c 3.0
336
- d 4.0
337
-
338
- In [2]: pd.pivot_table(df, values='values', index=['A', 'B'], dropna=False)
339
-
340
- Out[2]:
341
- values
342
- A B
343
- a c 1.0
344
- d 2.0
345
- y NaN
346
- b c 3.0
347
- d 4.0
348
- y NaN
349
- z c NaN
350
- d NaN
351
- y NaN
352
-
353
-
354
- .. _whatsnew_0230.enhancements.window_raw :
355
-
356
- Rolling/Expanding.apply() accepts ``raw=False `` to pass a ``Series `` to the function
357
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
358
-
359
- :func: `Series.rolling().apply() <.Rolling.apply> `, :func: `DataFrame.rolling().apply() <.Rolling.apply> `,
360
- :func: `Series.expanding().apply() <.Expanding.apply> `, and :func: `DataFrame.expanding().apply() <.Expanding.apply> ` have gained a ``raw=None `` parameter.
361
- This is similar to :func: `DataFame.apply `. This parameter, if ``True `` allows one to send a ``np.ndarray `` to the applied function. If ``False `` a ``Series `` will be passed. The
362
- default is ``None ``, which preserves backward compatibility, so this will default to ``True ``, sending an ``np.ndarray ``.
363
- In a future version the default will be changed to ``False ``, sending a ``Series ``. (:issue: `5071 `, :issue: `20584 `)
364
-
365
- .. ipython :: python
366
-
367
- s = pd.Series(np.arange(5 ), np.arange(5 ) + 1 )
368
- s
369
-
370
- Pass a ``Series ``:
371
-
372
- .. ipython :: python
373
-
374
- s.rolling(2 , min_periods = 1 ).apply(lambda x : x.iloc[- 1 ], raw = False )
375
-
376
- Mimic the original behavior of passing a ndarray:
377
-
378
- .. ipython :: python
379
-
380
- s.rolling(2 , min_periods = 1 ).apply(lambda x : x[- 1 ], raw = True )
381
-
382
-
383
- .. _whatsnew_0210.enhancements.limit_area :
384
-
385
-
386
197
.. _whatsnew_0.23.0.contributors :
387
198
388
199
Contributors
0 commit comments