Skip to content

QST: pandas get_loc with 'ffill' gives unexpected results within group #34967

@RobertRol

Description

@RobertRol
  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.


I have asked this question on SO but did not get any answers there. Below I give a simplified version of my SO question.
I am not sure whether this is a bug or whether I misunderstand how indexing within groups works.
https://stackoverflow.com/questions/62149056/pandas-get-loc-with-ffill-gives-unexpected-results-within-group

Assume you have the following dataframe

import pandas as pd
# pandas==1.0.4

df = pd.DataFrame({'idxDigits': [1, 1, 2, 2]},
                  index=pd.Index([0, 1 , 10, 11], name='myIdx'))
print(df)
#        idxDigits
# myIdx
# 0              1
# 1              1
# 10             2
# 11             2

and you want to find for each idxDigits number the dataframe entry that is at or before a user-specified index value idxSearchValue.
My approach was to define the following function

def mySelect(x, idxSearchValue):
    print('idxSearchValue: {}'.format(idxSearchValue))
    idx = x.index.get_loc(idxSearchValue, 'ffill')
    return x.iloc[[idx]].reset_index()

and apply it via groupby

res = df.groupby(['idxDigits'], as_index=False).apply(mySelect, idxSearchValue=10)
# idxSearchValue: 10
# idxSearchValue: 10

Although there is a perfect match for idxSearchValue = 10, we get the result

print(res.reset_index(drop=True))
#    myIdx  idxDigits
# 0      1          1
# 1     11          2

So, my question is:
Why does get_loc for group idxDigits==2 return myIdx = 11 although there is a perfect match within the group for idxSearchValue = 10?

BTW: Splitting the dataframe manually and applying get_loc gives the expected result

df1 = df[df.idxDigits == 1]
print(df1.iloc[[df1.index.get_loc(10, 'ffill')]].reset_index())
#    myIdx  idxDigits
# 0      1          1

df2 = df[df.idxDigits == 2]
print(df2.iloc[[df2.index.get_loc(10, 'ffill')]].reset_index())
#    myIdx  idxDigits
# 0     10          2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions