Skip to content

Conversation

aemerson
Copy link
Contributor

The alpha level can also be adjusted with --alpha

…t-test.

The alpha level can also be adjusted with --alpha

# Compute statistics on raw data before merging (if requested)
if config.statistics:
# Get metrics early for statistics computation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part duplicates logic from after the conditional, and lacks some checks that exist there. Need to unify this logic

if len(temp_metrics) == 0:
defaults = ["Exec_Time", "exec_time", "Value", "Runtime"]
for defkey in defaults:
if defkey in lhs_d.columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check if defkey in lhs_d.columns or defkey in rhs_d.columns to be compatible with the existing flow, or unify elseway

stats_dict = {}

for metric in metrics:
if metric not in lhs_d.columns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check if metric not in lhs_d.columns or metric not in rhs_d.columns: to be compatible with the existing flow, or unify elseway

for metric in data.columns.levels[0]:
data = add_diff_column(metric, data, absolute_diff=config.absolute_diff)

if config.statistics and stats_dict is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better check if config.statistics and stats_dict is not None and stats_dict: for when compute returns an empty dict (it cannot return None)


# Group by program
for program in lhs_d.index.get_level_values(1).unique():
lhs_values = lhs_d.loc[(slice(None), program), metric].dropna()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cant we do something more efficient like this pseudo code:

for program, group in lhs_d.groupby(level=1):
    lhs_values = group[metric].dropna()


stats_dict[metric][program] = {
'std_lhs': lhs_values.std(ddof=1) if len(lhs_values) >= 2 else float('nan'),
'std_rhs': rhs_values.std(ddof=1) if len(rhs_values) >= 2 else float('nan'),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should take the name from --rhs-name like f'std_{rhs_name}'

lhs_values = lhs_d.loc[(slice(None), program), metric].dropna()
rhs_values = rhs_d.loc[(slice(None), program), metric].dropna()

stats_dict[metric][program] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sink this conditional calculation into if len(lhs_values) >= 2 and len(rhs_values) >= 2:

t_stat, p_val = stats.ttest_ind(lhs_values, rhs_values)
stats_dict[metric][program]['t-value'] = t_stat
stats_dict[metric][program]['p-value'] = p_val
stats_dict[metric][program]['significant'] = "✅" if p_val < alpha else "❌"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change to something more parsable but still human readable like Y/N

def add_precomputed_statistics(data, stats_dict):
"""Add precomputed statistics to the unstacked dataframe."""
for metric in data.columns.levels[0]:
if metric not in stats_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As part of metrics logic unification, should this effectively become an assert (inverted)?

values = []
for program in data.index:
if program in stats_dict[metric]:
values.append(stats_dict[metric][program].get(stat_name, float('nan') if stat_name != 'significant' else ""))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminate duplicated expression float('nan') if stat_name != 'significant' else "" and hoist it to the top level loop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants