Unfooling SHAP and SAGE: Knockoff Imputation

Introduction

In the realm of explainable artificial intelligence (XAI), Shapley values have emerged as a prominent method for attributing contributions of individual features to the predictions made by complex models. However, traditional methods for calculating Shapley values are susceptible to adversarial attacks, particularly when extrapolated data points are involved. This vulnerability arises from the way feature attributions are derived, often leading to misleading interpretations of model behavior. In this blog post, we delve into a novel approach known as knockoff imputation, which enhances the robustness of Shapley values against such adversarial attacks.

Background on Shapley Values

Shapley values originate from cooperative game theory and provide a method for fairly distributing payouts among players based on their contributions to a coalition. In the context of machine learning, we can view the features of a model as players, and the Shapley value quantifies each feature’s contribution to the model’s output. The calculation of Shapley values involves evaluating the marginal contributions of each feature across all possible coalitions, which can be computationally intensive.

The formula for calculating the Shapley value \( \phi_j \) for feature \( j \) is given by:

\phi_j = \sum_{S \subseteq D \setminus \{j\}} \frac{|S|!(|D| – |S| – 1)!}{|D|!} (v(S \cup \{j\}) – v(S))

where \( v(S) \) is the value function that assigns a real-valued payout to each coalition \( S \). The challenge lies in evaluating \( v(S) \) for all subsets of features, especially when the number of features \( d \) is large.

The Vulnerability of Traditional Methods

Standard methods for calculating Shapley values often rely on sampling techniques that can lead to extrapolation. When estimating the value function𝑣(𝑆)v(S), we typically sample out-of-coalition features from marginal or conditional distributions. This can result in generating data points that lie outside the distribution of the training data, making them susceptible to adversarial manipulation.

Adversarial attacks exploit this vulnerability by crafting inputs that lead to misleading attributions. For example, you could train a model to produce fair outputs on synthetic data while it behaves discriminatorily on real-world data. During fairness audits, you can exploit this discrepancy by evaluating the model based on its Shapley values.

Knockoff Imputation: A Robust Alternative

To address the vulnerabilities associated with traditional Shapley value calculations, we propose the use of knockoff imputation. The model-X knockoff framework generates synthetic features (knockoffs) that preserve the statistical properties of the original features. It ensures that these knockoffs are indistinguishable from the original features when not considering the target variable.

Knockoffs are characterized by two key properties:

Pairwise Exchangeability: For any subset of features, swapping the original features with their knockoff counterparts does not change the joint distribution.
Conditional Independence: The response variable is conditionally independent of the knockoffs given the original features.

These properties make knockoffs particularly well-suited for imputing out-of-coalition features in Shapley value calculations. By using knockoffs, we can avoid the pitfalls of extrapolation and maintain the integrity of the data manifold.

Algorithm for Knockoff Imputed Shapley Values

The proposed algorithm for calculating knockoff imputed Shapley values is as follows:

Train a Knockoff Sampler: Fit a model to generate knockoffs based on the original data matrix.
Iterate Over Features: For each feature \( j \):

– Initialize the Shapley value \( \phi_j \) to zero.

– For a specified number of knockoffs \( N_{ko} \):

– Draw a sample of knockoffs.

– For each coalition \( S \):

– Calculate the value function \( v(S) \) using the original features and the sampled knockoffs.

– Update the contribution \( \Delta_j \) based on the marginal contribution of feature \( j \).

Return Shapley Values: After iterating through all features, return the calculated Shapley values.

This algorithm allows for upfront sampling of knockoffs, significantly reducing the computational burden compared to traditional methods that require separate models for each coalition.

Experimental Validation: Knockoff Imputation

To validate the effectiveness of knockoff imputation, we conducted experiments using both real and simulated datasets. The goal was to demonstrate that knockoff-imputed Shapley values are resilient to adversarial attacks that exploit extrapolated data.

In our experiments, we replicated scenarios where adversarial models attempted to manipulate feature attributions by generating synthetic data. We compared the performance of traditional Shapley value calculations with those using knockoff imputation.

The results indicated that knockoff imputed Shapley values consistently revealed the true importance of sensitive features, effectively “unfooling” the adversarial attacks. For instance, in a credit scoring scenario, the model correctly attributed importance to the feature “Gender” by using knockoff imputation, which previously masked the feature due to the adversarial strategy.

Unfooling SAGE Values

In addition to Shapley values, we also explored the application of knockoff imputation to SAGE (Shapley Additive Global Explanations) values. SAGE values aggregate feature contributions across multiple predictions, providing a global view of feature importance.

Similar to the findings with Shapley values, we observed that knockoff imputation enhanced the robustness of SAGE values against adversarial attacks. By leveraging the same principles of knockoff generation, we ensured that the global explanations remained faithful to the underlying model behavior.

Discussion and Future Directions

The introduction of knockoff imputation presents a significant advancement in the field of XAI, particularly in enhancing the robustness of Shapley and SAGE values. However, there are several avenues for future research:

Comparative Studies: Conduct comprehensive benchmark studies to compare knockoff imputation with other conditional sampling methods and traditional approaches.
Exploration of Trade-offs: Investigating the trade-offs between bias and variance introduced by knockoff imputation, particularly in scenarios with limited computational resources.
Integration with Other Techniques: Exploring the potential for combining knockoff imputation with other regularization techniques to further improve model interpretability.

Conclusion

In conclusion, knockoff imputation offers a promising solution to the challenges posed by adversarial attacks on Shapley and SAGE values. By generating synthetic features that maintain the statistical properties of the original data, we can enhance the robustness of feature attributions and ensure that explanations remain trustworthy. As the field of XAI continues to evolve, the integration of innovative methodologies like knockoff imputation will be crucial in building reliable and interpretable machine learning models.

Do you like to read more educational content? Read our blogs at Cloudastra Technologies or contact us for business enquiry at Cloudastra Contact Us.

Unfooling SHAP and SAGE: Knockoff Imputation for Shapley Values