In this work,
we investigate the impact of synthetic data generation in fairness-aware classification and demonstrate that conventional sampling methods amplify unfairness. We propose a data sampling method combined with boosting that accounts for fairness in a cumulative manner, the so-called FairSMOTEBoost, to tackle the combined problem of class-imbalance and unfairness.

A large number of experiments have been conducted on FairSMOTEBoost in comparison with 4 competitors. Our results indicate that the combination of synthetic oversampling with a fair boosting algorithm is efficient in terms of both the predictive performance and the fairness performance of the method.  

the method has two variations: 1-In the first variation, we extend the vanilla SMOTEBoost.

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

------------------------------------------------------------------------------------------------------------------------------

The experimental results are conducted on the following datasets:

  1. Adult-gender
  2. Adult-race
  3. Bank-gender
  4. Credit-gender
  5. Credit-marriage
  6. KDD census
  7. NYPD complaints-race
  8. NYPD complaints-gender
  9. Compas-gender
  10. Compas-race
  11. Dutch census-age
  12. Dutch census-gender

-------------------------------------------------------------------

Experiments include 1- a set of bar charts comparing 8 metrics for each of the methods as shown in the images. 2- a set of convergence charts showing how the measures change over each boosting round as shown in the charts. 3- A set of weight distribution charts showing how the synthetic generation, augments the weights of the minority groups of data over boosting rounds. 4- ABROCA charts showing the Area Between two Roc Curves of each method.


Also for the Vanilla setup of the algorithm, the results are conducted based on four different oversampling ratios. Namely 1%, 5%, 10%, and 20% of the number of minority samples in the original training set. In the following, you can see the results for the Bank & Adult-gender datasets.

Bank dataset


1- Performance charts

 

2- Internal behavior                                              

 Adaboost- BankRUSBoost-bank


SMOTEBoost-bankFairSMOTEBoost-bank

from the internal behaviour charts for different methods, we can see that RUSBoost and FarisSMOTEBoost have similar trends. In both methods, the performance for groups Protected_Positives and non_Protected_Positives increases during the learning phase but the final values for our method are slightly better than the RUSBoost and others. For the other two methods, the increase is smaller. TNRs for protected and non-protectd groups remains almost the same in all methods. Also, note that in all methods error rates increase a bit however the increase includes a keen jump for RUSBoost and the two other methods respectively, with final value larger than our method but for FairSMOTEBoost there is only a gentle increase with less final value. Balanced Error rates on the other hand, decrease greatly again with best final values for our method.

3- Weight Distribution charts

cumulative #of_instances per group in 10 boosting rounds4 groups weights per boosting rounds6 group weights per boosting round



-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Adult-gender data results

1- Performance charts






As it can be inferred from the charts, with the oversampling ratio increasing, the TPR for the minority class increases and TNRs decrease. Because Negative class is the majority class, by decreasing it's ratio, the Accuracy and Balanced accuracy drop although the decrease in balanced accuracy is slighter. Also a similar tend can be seen for the Fairness metrics where the EQ.op decreases (which in this case means approaching to optimality, since EQ.OP is to be minimized). We have the best EQ.OP in 10%.

2- Internal behavior

 

3- Weight Distribution charts



cumulative #of_instances per group in 10 boosting rounds

4 groups weights per boosting rounds

6 group weights per boosting round

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Protected groups augmentation (protected positives and protected negatives)


adult 10%- 20 boosting rounds

adult 10%- 20 boosting rounds

compa- race 10%- 20 boosting roundscompas - gender 10%- 20 boosting rounds

for datasets with less imbalance this seems to be better. compare the results above. It is obvious that by augmenting protected negatives (TNR) increases but not just for the protected subgroup of instances but also for the non-protected group. compare the results with the ones below where we smote based on minority class





adult 10%- 100 boosting rounds

adult 10%- 100 boosting rounds

compa- race 10%- 100 boosting roundscompas - gender 10%- 100 boosting rounds
  • Keine Stichwörter