SHAP-guided explanations ============= Adding SHAP to the explainer will make LUX try to build the explanation tree that is consistent with SHAP values. In some cases this is important feature, especially when the model is analysed not only with LUX, but also with SHAP. It also helps reducing so called Rashomon effect, because the LUX model uses the same features as blackbox model and therefore the explanations are suppose to be compliant with what really is happening in the balckbox model. This is done via SHAP-guided sampling method that generates the samples in the direction of the decision boundary of a blackbox model, but also due to oblique splits that can handle the linear boundaries better. The example of the situation is visualized in the below figure, where you can observe how others' explainers decision boundary differ with respect to the balckbox decision boundary. LUX maintains the similar behaviour to the balckbox model, limiting the Rashomon effect. .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/decbound-degenerated.png :alt: Decisions boundaries fro balckbox and other explainers Tutorials demonstrating the Rashomon effect: * The full example (shown below) with multiple datasets can be found here: `Notebook `_ * Another example of this phenomenon was given here: `XOR problem `_ where we demonstrate how the greedy nature of the decision tree algorithm can be overcome with SHAP-sampling, also limiting the Rashomon effect. Loading the dataset and building explanations without SHAP support ------------ For the sake of simplicity we use Wine dataset. Below there is a code that loads the dataset and fits RandomForestClassifier to it. .. code-block:: python from sklearn.ensemble import RandomForestClassifier from lux.lux import LUX from sklearn import datasets from sklearn.model_selection import train_test_split import numpy as np import pandas as pd wine = datasets.load_wine() features = wine['feature_names'] target = 'class' rs=42 fraction=0.01 #create daatframe with columns names as strings (LUX accepts only DataFrames withj string columns names) df_wine = pd.DataFrame(wine.data,columns=features) df_wine[target] = wine.target #train classifier train, test = train_test_split(df_wine, random_state=rs) clf = RandomForestClassifier(random_state=42)#svm.SVC(probability=True, random_state=rs) clf.fit(train[features],train[target]) clf.score(test[features],test[target]) Explanation without SHAP-guided explanations ------- Once we ha a model, we can explain it. First, we are going to explain it without SHAP-support. Below there is a code that fits LUX, and shows the visualization of the explanation. .. code-block:: python import graphviz import graphviz from graphviz import Source from IPython.display import SVG, Image i2e = train[features].sample(1, random_state=42).values #train lux on neighbourhood equal 30% instances lux = LUX(predict_proba = clf.predict_proba, neighborhood_size=int(len(train)*fraction),max_depth=2, node_size_limit = 1, grow_confidence_threshold = 0 ) lux.fit(train[features], train[target], instance_to_explain=i2e,class_names=[0,1,2]) i2edf = pd.DataFrame(i2e, columns=features) i2edf[target] =clf.predict(i2edf.values.reshape(1,-1))[0] lux.uid3.tree.save_dot('tree-wine.dot',fmt='.2f',visual=True, background_data=train, instance2explain=i2edf) gvz=graphviz.Source.from_file('tree-wine.dot') !dot -Tpng tree-wine.dot > tree-wine.png Image('tree-wine.png') .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/tree-wine.png :alt: Explanation-Tree Whn you look at the scatterplot with decision boundaries marked, you can see that it is reasonable, but can we get better? .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/noshap-wine-scatter.png :alt: Scatterplot Explanation with SHAP-guided explantions ------------- Note, that to enable SHAP-guided explanations, you only need to pass classifier as a parameter to LUX. .. code-block:: python lux = LUX(predict_proba = clf.predict_proba, classifier=clf, neighborhood_size=int(len(train)*fraction),max_depth=2, node_size_limit = 3, grow_confidence_threshold = 0 ) lux.fit(train[features], train[target], instance_to_explain=iris_instance,class_names=[0,1,2],discount_importance=False) i2edf = pd.DataFrame(i2e, columns=features) i2edf[target] =clf.predict(i2edf.values.reshape(1,-1))[0] lux.uid3.tree.save_dot('tree-wine-shap.dot',fmt='.2f',visual=True, background_data=train, instance2explain=i2edf) gvz=graphviz.Source.from_file('tree-wine-shap.dot') !dot -Tpng tree-wine-shap.dot > tree-wine-shap.png Image('tree-wine-shap.png') Now, first look at the SHAP values generated separately for the model. One can observe, that there are two features that contribute most tyo the models decisions. These are not the features selected in previous step. However, when we run the visualization again, for the LUX with SHAP-guided explanations, we ge the following decision tree. It is clear that the explanation model is now in compliance with balckbox classifier with respect to features used fro explanations. .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/shap-wine.png :alt: SHAP values The explanation tree is shown below. You can see that it is better aligned with SHAP-value than the pure decision tree generated without SHAP-guidance. .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/tree-wine-shap.png :alt: Explanation-Tree When you compare the scatterplot with decision boundaries from the previous one, you will also observe, that the SHAP-guided version is more clear: .. image:: https://raw.githubusercontent.com/sbobek/lux/main/pix/shap-wine-scatter.png :alt: Scatterplot