TLDR: This research paper introduces a framework for learning and adapting to multiple, potentially conflicting, and time-varying fairness measures in AI systems. It models the relationships between actions and fairness criteria using a compatibility graph, which can be time-invariant or time-varying. The framework utilizes limited graph-structured feedback to learn optimal actions, balancing performance (e.g., revenue) with fairness. Algorithms are presented for both time-invariant and time-varying graph scenarios, demonstrated through a political advertising example where the system learns to allocate advertising space while considering evolving fairness preferences.
In the rapidly evolving landscape of artificial intelligence, ensuring fairness in automated decision-making processes has become paramount. As AI systems permeate various aspects of our lives, from business operations to societal dynamics, it’s increasingly recognized that a single measure of fairness is often insufficient. Instead, a comprehensive approach requires considering multiple fairness measures simultaneously, which can include both group and individual fairness notions.
The challenge lies in the fact that the relative importance or ‘weights’ of these fairness measures are often unknown beforehand. Moreover, these societal preferences and definitions of fairness are not static; they can change over time due to evolving data distributions or shifting societal norms. This paper, titled Learning Time-Varying Convexifications of Multiple Fairness Measures, delves into this complex problem, proposing a novel framework to learn these time-varying fairness preferences, even with limited feedback.
The Core Problem: Balancing Dynamic Fairness
Imagine the scenario of political advertising on the internet. Regulations might suggest equalizing advertising space-time for each political party. But what does ‘equal opportunity’ truly mean? Is it equal budgets, average price per ad, views, or reach? Should it align with past election results, current voting preferences, or be uniform across all registered parties? Furthermore, platforms also aim to balance ad revenue with these fairness considerations, and the ideal trade-off itself can change over time.
The paper highlights that fairness measures are not always independent; some may conflict, while others align. An action taken at one moment might enhance one fairness measure but negatively impact another. The authors model these relationships as a graph, where nodes represent fairness regularizers (the criteria for fairness) and available actions, and edges represent the relationships between them. This graph can be either time-invariant (fixed) or time-varying (changing over time), reflecting the dynamic nature of social norms.
A Framework with Graph-Structured Feedback
The proposed framework draws inspiration from ‘graph-structured bandits,’ a field in online learning that deals with partial observability of rewards. In this model, when an action is taken, not only is its immediate reward observed, but also the rewards associated with its ‘out-neighbors’ in the compatibility graph – other actions or regularizers that are affected or revealed by the chosen action.
The framework considers two types of performance metrics, or ‘regrets’:
- Dynamic Regret: Measures the difference between the cumulative reward achieved by the algorithm and the best possible sequence of actions chosen in hindsight, if all future information were known.
- Weak Regret: Compares the algorithm’s cumulative reward to the best single action that could have been chosen and applied consistently throughout all time steps.
The paper introduces two main cases for its algorithms:
Case I: Time-Invariant Graph-Structured Bandit Feedback
In this scenario, the compatibility graph (the relationships between actions and fairness measures) remains constant throughout the process. However, the reward function itself can still be time-variant and unknown. The authors adapt an ‘Exponentially-Weighted Algorithm with Linear Programming’ (Algorithm 1) to handle this setting. This algorithm learns the optimal actions by adjusting weights based on observed rewards, while also considering a ‘clique’ structure within the graph to ensure a balanced exploration of actions.
Case II: Time-Varying Graph-Structured Bandit Feedback
This is a more general and complex case where the compatibility graph itself can change over time, reflecting evolving social norms or even system malfunctions. Algorithm 2, presented in the supplementary material of the paper, extends the approach from Case I to adapt to these dynamic graph structures, requiring a linear program to be solved whenever a new graph configuration emerges.
Illustrative Example and Results
To demonstrate their approach, the authors revisit the political advertising example. They consider three political entities (two major parties and third-party candidates) and three fairness regularizers aiming for specific advertising space-time shares (e.g., 30%, 60%, 10% to match election results). The platform’s revenue is also factored in. The algorithms are tested using real-world advertising revenue data.
The numerical illustrations show how the proposed algorithms perform in terms of dynamic and weak regrets, as well as the overall objective values (rewards). The results, presented with mean and standard deviations across multiple trials, indicate that the algorithms effectively learn the trade-offs between revenue and fairness measures, even when the underlying relationships or preferences are dynamic and feedback is limited.
Also Read:
- Balancing Reciprocity: A Novel Fairness Mechanism for Collaborative AI
- Guiding Multi-Agent Planning with Graph Neural Networks
Conclusion
This research provides a robust framework for addressing the critical challenge of learning and adapting to multiple, potentially conflicting, and time-varying fairness measures in AI systems. By integrating graph-structured feedback, the approach offers a practical way to navigate the complexities of fairness in real-world applications, ensuring that AI models remain adaptable to evolving societal expectations and dynamic environments.


