Skip to main content

Reading Note: Noise-tolerant fair classification

blog_noise_tolerant

This post is the reading note for "Noise-tolerant fair classification." by Lamy, Alex, et al. This paper mainly focus on the question that Whether one can still learn fair classifiers given noisy sensitive features? (e.g., race or gender).

A quick answer is yes. The authors claim that if one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity we only need to change the desired fairness-tolerance.

To understand this paper, we must first review the following 2 major questions:

  1. Mutually contaminated learning
  2. Fairness learning

where mutually contaminated learning is a model for learning the distribution of samples with corrupted labels.

Mutually contaminated learning

In the framework of learning from mutually contaminated distributions (MC learning), instead of observing samples from the “true” (or “clean”) joint distribution 𝐷, one observes samples from a corrupted distribution . The MC learning assumes that The class-conditional distributions are mixtures of their true counterparts:

img

where and represent the distribution of samples with positive label and negative label, respectively.

Fairness learning

Typically, fairness is achieved by adding constraints which depend on the sensitive feature, and then correcting one’s learning procedure to achieve these fairness constraints. There are two central objectives: designing appropriate application-specific fairness criterion, and developing predictors that respect the chosen fairness conditions. Fairness objectives can be categorized into individual- and group-level fairness. In this paper, the authors only focus on group-level fairness.

For a fairness-aware binary classification, the formulation is typically

img

where is the loss function, and is the fairness score function. Here the fairness score function is determined by the definition of fairness. There are two fairness definitions discussed in this paper.

img

img

The corresponding fairness score functions are then defined by

  • disparity of demographic parity (DDP)

    img

  • disparity of equality of opportunity (DEO)

    img

where

img

Noise-tolerant fairness classification

The authors use MC learning as the noise model for the sensitive attributes instead of labels. Then the distribution for samples from different groups can be written in MC manner. For unknown noise parameters with

img

Similarly, for EO fairness,

img

Fairness constraints under MC learning

To learn fair classifier with corrupted sensitive attributes, the authors proved Theorem 2 and claimed that the fairness constraint on the clean distribution is equivalent to a scaled constraint on the noisy distribution.

img

Theorem 2 has an important algorithmic implication. Suppose we pick a fairness constraint and seek to solve Equation 2 for a given tolerance , then, given samples from , it suffices to simply change the tolerance to . In practice, will be unknown; however, there have been several algorithms proposed to estimate these from noisy data alone. Thus, we may use these to construct estimates of , and plug these in to construct an estimate of .

The corresponding algorithm is shown below:

img

Experiments

There are two application scenarios for noise-tolerant fairness learning

  • (Privacy setting)Even if noise-free sensitive features are available, we may wish to add noise so as to obfuscate sensitive attributes.
  • (PU setting)We wish to analyze data where the presence of the sensitive feature is only known for a subset of individuals, while for others the feature value is unknown.

where PU means that samples' sensitive attributes are either positive or unlabeled.

In the experiments, the sensitive feature’s value is randomly flipped with probability if its value was 1, or with probability if its value was 0. For privacy setting, , while for PU setting, , or .

The experiment of privacy setting is done on the COMPAS dataset

img

For PU setting, it is done on the low school dataset, one can refer to the original paper for detailed descriptions.

img

Reference: Lamy, Alex, et al. "Noise-tolerant fair classification." Advances in Neural Information Processing Systems. 2019.

Comments

Popular posts from this blog

Reading notes: On the Connection Between Adversarial Robustness and Saliency Map Interpretability

Etmann et al. Connection between robustness and interpretability On the Connection Between Adversarial Robustness and Saliency Map Interpretability Advantage and Disadvantages of adversarial training? While this method – like all known approaches of defense – decreases the accuracy of the classifier, it is also successful in increasing the robustness to adversarial attacks Connections between the interpretability of saliency maps and robustness? saliency maps of robustified classifiers tend to be far more interpretable, in that structures in the input image also emerge in the corresponding saliency map How to obtain saliency maps for a non-robustified networks? In order to obtain a semantically meaningful visualization of the network’s classification decision in non-robustified networks, the saliency map has to be aggregated over many different points in the vicinity of the input image. This can be achieved either via averaging saliency maps of noisy versions of the image (Smilkov...

Reading Notes: Probabilistic Model-Agnostic Meta-Learning

Probabilistic Model-Agnostic Meta-Learning Reading Notes: Probabilistic Model-Agnostic Meta-Learning This post is a reading note for the paper "Probabilistic Model-Agnostic Meta-Learning" by Finn et al. It is a successive work to the famous MAML paper , and can be viewed as the Bayesian version of the MAML model. Introduction When dealing with different tasks of the same family, for example, the image classification family, the neural language processing family, etc.. It is usually preferred to be able to acquire solutions to complex tasks from only a few samples given the past knowledge of other tasks as a prior (few shot learning). The idea of learning-to-learn, i.e., meta-learning, is such a framework. What is meta-learning? The model-agnostic meta-learning (MAML) [1] is a few shot meta-learning algorithm that uses gradient descent to adapt the model at meta-test time to a new few-shot task, and trains the model parameters at meta-training time to enable rapid adap...

Evaluation methods for recommender systems

There are plenty of recommender systems available, the question is, for a specific recommendation problem, which recommender system model to use? The prediction accuracy (ratio of correct predicted items) is a straightforward approach, however, this is in most cases doesn't give a good indication on how 'good' the model is? Because usually, the ratings are somehow ordinal, which means the ratings are ordered instead of categorical, a prediction of 4 star is better than prediction of 5 star for a ground truth of 3 star, while when evaluate with accuracy, 4-star prediction and 5-star are treated equal -- incorrect prediction. There are plenty of better evaluation methods available,  in this post, I will introduce some of them. But first of all, lets review some basic concepts in model evaluation. To simplify our settings, lets say that we have a binary classification model, it made predictions on a test dataset, the prediction result is shown in Figure 1. Then the pr...