This post is the reading note for "Noise-tolerant fair classification." by Lamy, Alex, et al. This paper mainly focus on the question that Whether one can still learn fair classifiers given noisy sensitive features? (e.g., race or gender).
A quick answer is yes. The authors claim that if one measures fairness using the mean-difference score, and sensitive features are subject to noise from the mutually contaminated learning model, then owing to a simple identity we only need to change the desired fairness-tolerance.
To understand this paper, we must first review the following 2 major questions:
- Mutually contaminated learning
- Fairness learning
where mutually contaminated learning is a model for learning the distribution of samples with corrupted labels.
Mutually contaminated learning
In the framework of learning from mutually contaminated distributions (MC learning), instead of observing samples from the “true” (or “clean”) joint distribution 𝐷, one observes samples from a corrupted distribution . The MC learning assumes that The class-conditional distributions are mixtures of their true counterparts:
where and represent the distribution of samples with positive label and negative label, respectively.
Fairness learning
Typically, fairness is achieved by adding constraints which depend on the sensitive feature, and then correcting one’s learning procedure to achieve these fairness constraints. There are two central objectives: designing appropriate application-specific fairness criterion, and developing predictors that respect the chosen fairness conditions. Fairness objectives can be categorized into individual- and group-level fairness. In this paper, the authors only focus on group-level fairness.
For a fairness-aware binary classification, the formulation is typically
where is the loss function, and is the fairness score function. Here the fairness score function is determined by the definition of fairness. There are two fairness definitions discussed in this paper.
The corresponding fairness score functions are then defined by
disparity of demographic parity (DDP)
disparity of equality of opportunity (DEO)
where
Noise-tolerant fairness classification
The authors use MC learning as the noise model for the sensitive attributes instead of labels. Then the distribution for samples from different groups can be written in MC manner. For unknown noise parameters with
Similarly, for EO fairness,
Fairness constraints under MC learning
To learn fair classifier with corrupted sensitive attributes, the authors proved Theorem 2 and claimed that the fairness constraint on the clean distribution is equivalent to a scaled constraint on the noisy distribution.
Theorem 2 has an important algorithmic implication. Suppose we pick a fairness constraint and seek to solve Equation 2 for a given tolerance , then, given samples from , it suffices to simply change the tolerance to . In practice, will be unknown; however, there have been several algorithms proposed to estimate these from noisy data alone. Thus, we may use these to construct estimates of , and plug these in to construct an estimate of .
The corresponding algorithm is shown below:
Experiments
There are two application scenarios for noise-tolerant fairness learning
- (Privacy setting)Even if noise-free sensitive features are available, we may wish to add noise so as to obfuscate sensitive attributes.
- (PU setting)We wish to analyze data where the presence of the sensitive feature is only known for a subset of individuals, while for others the feature value is unknown.
where PU means that samples' sensitive attributes are either positive or unlabeled.
In the experiments, the sensitive feature’s value is randomly flipped with probability if its value was 1, or with probability if its value was 0. For privacy setting, , while for PU setting, , or .
The experiment of privacy setting is done on the COMPAS dataset
For PU setting, it is done on the low school dataset, one can refer to the original paper for detailed descriptions.
Reference: Lamy, Alex, et al. "Noise-tolerant fair classification." Advances in Neural Information Processing Systems. 2019.
Comments
Post a Comment