Data checks on model container are insufficient, leading to cryptic error on  fairness evaluation

The model container class requires quite a few parameters to be provided. Unfortunately, the documentation does not go into details on the specific format that should be used when providing these parameters. 

This makes it practically impossible to debug errors such as the below one on `cre_sco_obj.evaluate()` :

```
  File "[diagnosis-tool/veritastool/fairness/fairness.py]()", line 172, in evaluate
    self._compute_fairness(n_threads=n_threads, seed = seed, eval_pbar=eval_pbar)
  File "[diagnosis-tool/veritastool/fairness/fairness.py]()", line 339, in _compute_fairness
    self.fair_metric_obj.execute_all_fair(n_threads=n_threads, seed = seed, eval_pbar=eval_pbar)
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 224, in execute_all_fair
    mp_result = thread.result()
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/_base.py]()", line 444, in result
    return self.__get_result()
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/_base.py]()", line 389, in __get_result
    raise self._exception
  File "[miniconda3/envs/veritas-dev/lib/python3.8/concurrent/futures/thread.py]()", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 268, in _execute_all_fair_map
    metric_obj.feature_mask = {k: v[idx] for k, v in metric_obj.use_case_object.feature_mask.items()}
  File "[diagnosis-tool/veritastool/metrics/fairness_metrics.py]()", line 268, in <dictcomp>
    metric_obj.feature_mask = {k: v[idx] for k, v in metric_obj.use_case_object.feature_mask.items()}
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/series.py]()", line 984, in __getitem__
    return self._get_with(key)
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/series.py]()", line 1019, in _get_with
    return self.loc[key]
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexing.py]()", line 967, in __getitem__

[show more (open the raw output data in a text editor) ...]

  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py]()", line 5782, in _get_indexer_strict
    self._raise_if_missing(keyarr, indexer, axis_name)
  File "[miniconda3/envs/veritas-dev/lib/python3.8/site-packages/pandas/core/indexes/base.py]()", line 5845, in _raise_if_missing
    raise KeyError(f"{not_found} not in index")
```

The object clearly is not happy about some of the data that was passed to it, but there is no indication of which set is causing the error or anything actionable. 

The data checks functions that veritastool provide, all pass succesfully:

```
container.check_data_consistency()
    data consistency check completed without issue

container.check_protected_columns()
    protected column check completed without issue

container.check_label_consistency()
    data consistency check completed without issue

container.check_label_length()
    label length check completed without issue

cre_sco_obj._check_input()
    [pass without error]

cre_sco_obj._check_special_params()
   [pass without error]

```

It would be good to have thorough checks and meaningful errors or our data scientists will struggle debugging this kind of issues as there is not much info provided. 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data checks on model container are insufficient, leading to cryptic error on fairness evaluation #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data checks on model container are insufficient, leading to cryptic error on fairness evaluation #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions