Show Code
import shap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import xgboostIn this notebook, we will demonstrate how to run SHAP (SHapley Additive exPlanations), a popular local feature importance tool for interpreting machine learning models.
In this notebook, we will:
Let’s dive in! 🚀
import shap
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import xgboostX, y = shap.datasets.adult()
# create a train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
d_train = xgboost.DMatrix(X_train, label=y_train)
d_test = xgboost.DMatrix(X_test, label=y_test)params = {
"eta": 0.01,
"objective": "binary:logistic",
"subsample": 0.5,
"base_score": np.mean(y_train),
"eval_metric": "logloss",
}
model = xgboost.train(
params,
d_train,
5000,
evals=[(d_test, "test")],
verbose_eval=100,
early_stopping_rounds=20,
)[0] test-logloss:0.54663
[100] test-logloss:0.36398
[200] test-logloss:0.31758
[300] test-logloss:0.30065
[400] test-logloss:0.29170
[500] test-logloss:0.28655
[600] test-logloss:0.28358
[700] test-logloss:0.28174
[800] test-logloss:0.28061
[900] test-logloss:0.27988
[1000] test-logloss:0.27939
[1100] test-logloss:0.27906
[1178] test-logloss:0.27887
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test, check_additivity=False)Visualizing how features impact one instance
For more information on SHAP as well as additional examples, you can check out the official documentation: SHAP Documentation.