Skip to content

Data checks on model container are insufficient, leading to cryptic error on fairness evaluation #3

@andreapiso

Description

@andreapiso

The model container class requires quite a few parameters to be provided. Unfortunately, the documentation does not go into details on the specific format that should be used when providing these parameters.

This makes it practically impossible to debug errors such as the below one on cre_sco_obj.evaluate() :

  File "[diagnosis-tool/veritastool/fairness/fairness.py]()", line 172, in evaluate
    self._compute_fairness(n_threads=n_threads, seed = seed, eval_pbar=eval_pbar)
  File "[diagnosis-tool/veritastool/fairness/fairness.py]()", line 339, in _compute_fairness
    self.fair_metric_obj.execute_all_fair(n_threads=n_threads, seed = seed, eval_pbar=eval_pbar)
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 224, in execute_all_fair
    mp_result = thread.result()
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/_base.py]()", line 444, in result
    return self.__get_result()
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/_base.py]()", line 389, in __get_result
    raise self._exception
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/thread.py]()", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 268, in _execute_all_fair_map
    metric_obj.feature_mask = {k: v[idx] for k, v in metric_obj.use_case_object.feature_mask.items()}
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 268, in <dictcomp>
    metric_obj.feature_mask = {k: v[idx] for k, v in metric_obj.use_case_object.feature_mask.items()}
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/series.py]()", line 984, in __getitem__
    return self._get_with(key)
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/series.py]()", line 1019, in _get_with
    return self.loc[key]
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexing.py]()", line 967, in __getitem__

[show more (open the raw output data in a text editor) ...]

  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py]()", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py]()", line 5845, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")

The object clearly is not happy about some of the data that was passed to it, but there is no indication of which set is causing the error or anything actionable.

The data checks functions that veritastool provide, all pass succesfully:

container.check_data_consistency()
    data consistency check completed without issue

container.check_protected_columns()
    protected column check completed without issue

container.check_label_consistency()
    data consistency check completed without issue

container.check_label_length()
    label length check completed without issue

cre_sco_obj._check_input()
    [pass without error]

cre_sco_obj._check_special_params()
   [pass without error]

It would be good to have thorough checks and meaningful errors or our data scientists will struggle debugging this kind of issues as there is not much info provided.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions