Skip to content

DOC: warn about apply with raw=True, if function returns Optional[int] #61632

@wrschneider

Description

@wrschneider

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

Documentation problem

when you use df.apply with raw=True you can get an error if the applied function returns None for some elements, because of the way underlying numpy infers the array type from the first element.

Example:

import pandas as pd
from typing import Optional

def func(a: int) -> Optional[int]:
  if a % 3 == 0: return 1
  if a % 3 == 1: return 0
  else: return None 

df = pd.DataFrame([[1], [2], [3], [4], [5], [6]])

print(df.apply(lambda row: func(row[0]), axis=1, raw=True))

This will raise an error

TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NoneType'

On the other hand, if the first returned value is None, numpy creates an array of object which can hold either int or None:

df = pd.DataFrame([2], [3], [4], [5], [6]])
print(df.apply(lambda row: func(row[0]), axis=1, raw=True))

will return

0    None
1       1
2       0
3    None
4       1
dtype: object

Suggested fix for documentation

Explain that the function must not return None if raw=True

or treat as a bug fix (i.e. allow specifying type of result ndarray explicitly)

Metadata

Metadata

Assignees

No one assigned

    Labels

    ApplyApply, Aggregate, Transform, MapDocs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions