Author

Prof. Tiffany Tang

Published

May 8, 2026

1 README

In this notebook, we will demonstrate how to run SHAP (SHapley Additive exPlanations), a popular local feature importance tool for interpreting machine learning models.

In this notebook, we will:

  • Load and preprocess a dataset (i.e., the California housing data).
  • Train a machine learning model (i.e., XGBoost).
  • Use SHAP to explain the model’s predictions.
  • Visualize feature importance and individual prediction explanations.

Let’s dive in! 🚀

Show Code
import shap
import numpy as np
import pandas as pd 
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
import xgboost

2 Load in Data

Show Code
X, y = shap.datasets.adult()

# create a train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=7)
d_train = xgboost.DMatrix(X_train, label=y_train)
d_test = xgboost.DMatrix(X_test, label=y_test)

3 Fit ML Model

Show Code
params = {
    "eta": 0.01,
    "objective": "binary:logistic",
    "subsample": 0.5,
    "base_score": np.mean(y_train),
    "eval_metric": "logloss",
}
model = xgboost.train(
    params,
    d_train,
    5000,
    evals=[(d_test, "test")],
    verbose_eval=100,
    early_stopping_rounds=20,
)
[0] test-logloss:0.54663
[100]   test-logloss:0.36398
[200]   test-logloss:0.31758
[300]   test-logloss:0.30065
[400]   test-logloss:0.29170
[500]   test-logloss:0.28655
[600]   test-logloss:0.28358
[700]   test-logloss:0.28174
[800]   test-logloss:0.28061
[900]   test-logloss:0.27988
[1000]  test-logloss:0.27939
[1100]  test-logloss:0.27906
[1178]  test-logloss:0.27887

4 Run SHAP

Show Code
explainer = shap.Explainer(model, X_train)
shap_values = explainer(X_test, check_additivity=False)

4.1 Summary plot of SHAP values

Show Code
shap.summary_plot(shap_values, X_test)

4.2 Force plot for a single prediction

Visualizing how features impact one instance

Show Code
import matplotlib

shap.initjs() # To set up the JavaScript visualization in Jupyter Notebook
i = 5  # Choose any sample from the test set
shap.force_plot(explainer.expected_value, shap_values[i].values, X_test.iloc[i,:], matplotlib=matplotlib)

4.3 Global feature importance plot using SHAP

Show Code
shap.plots.bar(shap_values)

5 Additional Resources

For more information on SHAP as well as additional examples, you can check out the official documentation: SHAP Documentation.