1
- < p > This rule raises an issue when 5 or more commands are applied on a data frame.</ p >
1
+ < p > This rule raises an issue when 7 or more commands are applied on a data frame.</ p >
2
2
< h2 > Why is this an issue?</ h2 >
3
3
< p > The pandas library provides many ways to filter, select, reshape and modify a data frame. Pandas supports as well method chaining, which means that
4
4
many < code > DataFrame</ code > methods return a modified < code > DataFrame</ code > . This allows the user to chain multiple operations together, making it
5
5
effortless perform several of them in one line of code:</ p >
6
6
< pre >
7
7
import pandas as pd
8
8
9
- joe = pd.read_csv("data.csv", dtype={'user_id':'str', 'name':'str'}).set_index("name").filter(like='jo', axis=0).head()
9
+ schema = {'name':str, 'domain': str, 'revenue': 'Int64'}
10
+ joe = pd.read_csv("data.csv", dtype=schema).set_index('name').filter(like='joe', axis=0).groupby('domain').mean().round().sample()
10
11
</ pre >
11
12
< p > While this code is correct and concise, it can be challenging to follow its logic and flow, making it harder to debug or modify in the future.</ p >
12
13
< p > To improve code readability, debugging, and maintainability, it is recommended to break down long chains of pandas instructions into smaller, more
@@ -21,20 +22,20 @@ <h4>Noncompliant code example</h4>
21
22
import pandas as pd
22
23
23
24
def foo(df: pd.DataFrame):
24
- return df.set_index(" name" ).filter(like='joe', axis=0).groupby(" team")["salary"] .mean().head( ) # Noncompliant: too many operations happen on this data frame.
25
+ return df.set_index(' name' ).filter(like='joe', axis=0).groupby(' team') .mean().round().sort_values('salary').take([0] ) # Noncompliant: too many operations happen on this data frame.
25
26
</ pre >
26
27
< h4 > Compliant solution</ h4 >
27
28
< pre data-diff-id ="1 " data-diff-type ="compliant ">
28
29
import pandas as pd
29
30
30
31
def select_joes(df):
31
- return df.set_index(" name" ).filter(like='joe', axis=0)
32
+ return df.set_index(' name' ).filter(like='joe', axis=0)
32
33
33
34
def compute_mean_salary_per_team(df):
34
- return df.groupby(" team")["salary"] .mean()
35
+ return df.groupby(' team') .mean().round ()
35
36
36
37
def foo(df: pd.DataFrame):
37
- return df.pipe(select_joes).pipe(compute_mean_salary_per_team).head( ) # Compliant
38
+ return df.pipe(select_joes).pipe(compute_mean_salary_per_team).sort_values('salary').take([0] ) # Compliant
38
39
</ pre >
39
40
< h2 > Resources</ h2 >
40
41
< h3 > Documentation</ h3 >
0 commit comments