Skip to content

[BUG] UnicodeDecodeError with xgboost v2.1.3 and shap v.0.43.0 #135

@xuxu-wei

Description

@xuxu-wei

It yields a UnicodeDecodeError when I run the the demo.

The code is literally the demo, and the versions of shap and xgboost are 0.43.0 and 2.1.3 (see Screenshots below).

The error information

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
/tmp/ipykernel_748913/3137371197.py in <module>
      9                               classification=True)
     10 
---> 11 Feature_Selector.fit(X=X, y=y, n_trials=100, sample=False,
     12                      train_or_test = 'test', normalize=True,
     13                      verbose=True)

~/anaconda3/lib/python3.9/site-packages/BorutaShap.py in fit(self, X, y, sample_weight, n_trials, random_state, sample, train_or_test, normalize, verbose, stratify)
    466                 self.Check_if_chose_train_or_test_and_train_model()
    467 
--> 468                 self.X_feature_import, self.Shadow_feature_import = self.feature_importance(normalize=normalize)
    469                 self.update_importance_history()
    470                 hits = self.calculate_hits()

~/anaconda3/lib/python3.9/site-packages/BorutaShap.py in feature_importance(self, normalize)
    713         if self.importance_measure == 'shap':
    714 
--> 715             self.explain()
    716             vals = self.shap_values
    717 

~/anaconda3/lib/python3.9/site-packages/BorutaShap.py in explain(self)
    819 
    820 
--> 821         explainer = shap.TreeExplainer(self.model, 
    822                                        feature_perturbation = "tree_path_dependent",
    823                                        approximate = True)

~/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py in __init__(self, model, data, model_output, feature_perturbation, feature_names, approximate, **deprecated_options)
    164         self.feature_perturbation = feature_perturbation
    165         self.expected_value = None
--> 166         self.model = TreeEnsemble(model, self.data, self.data_missing, model_output)
    167         self.model_output = model_output
    168         #self.model_output = self.model.model_output # this allows the TreeEnsemble to translate model outputs types by how it loads the model

~/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py in __init__(self, model, data, data_missing, model_output)
    973             self.model_type = "xgboost"
    974             self.original_model = model.get_booster()
--> 975             xgb_loader = XGBTreeModelLoader(self.original_model)
    976             self.trees = xgb_loader.get_trees(data=data, data_missing=data_missing)
    977             self.base_offset = xgb_loader.base_score

~/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py in __init__(self, xgb_model)
   1685         self.read_arr('i', 29) # reserved
   1686         self.name_obj_len = self.read('Q')
-> 1687         self.name_obj = self.read_str(self.name_obj_len)
   1688         self.name_gbm_len = self.read('Q')
   1689         self.name_gbm = self.read_str(self.name_gbm_len)

~/anaconda3/lib/python3.9/site-packages/shap/explainers/_tree.py in read_str(self, size)
   1806 
   1807     def read_str(self, size):
-> 1808         val = self.buf[self.pos:self.pos+size].decode('utf-8')
   1809         self.pos += size
   1810         return val

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf9 in position 3248: invalid start byte

Screenshots

image

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions