Man is to Programmer as Woman is to Homemaker? Debiasing Word Embeddings


Basically, obviously word embeddings learned on the internet {and moreover, just straight-up human content} are really sexist. You can improve things slightly by taking a word embedding you care about {e.g. computer programmer} and making it equidistant to the categories you care about {e.g. man, woman}.


The paper "Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings" by Bolukbasi et al., published in 2016, presented a novel approach to reduce gender bias in word embeddings. Word embeddings, such as Word2Vec or GloVe, are trained on large corpora of text data and often reflect societal biases present in the training data. For example, these models may associate certain professions more with one gender than another, perpetuating stereotypes.

The authors began by identifying the gender bias present in word embeddings. They proposed a methodology to quantify the bias and demonstrated that even state-of-the-art embeddings are not immune to such biases.

Identifying and Quantifying Bias

The authors used the GloVe word embeddings trained on the Common Crawl corpus and showed that they can capture gender stereotypes to a large extent. They defined bias in terms of the "gender direction", a concept based on the vector space model representation of words.

Given a set of word pairs that exhibit true gender distinction (like 'he'-'she', 'his'-'hers', 'man'-'woman', etc.), they computed the differences of the corresponding word vectors and averaged these difference vectors to obtain the gender direction vector. More formally, the gender direction ( \vec{b} ) can be defined as:

[ \vec{b} = \frac{1}{|S|}\sum_{(w_a, w_b) \in S} \frac{(\vec{w_a} - \vec{w_b})}{| \vec{w_a} - \vec{w_b} |} ]

where ( S ) is the set of gender-specific word pairs, ( \vec{w_a} ) and ( \vec{w_b} ) are the vector representations of the words in a pair, and ( | \cdot | ) is the Euclidean norm.

The cosine similarity of a word ( w ) with the gender direction ( \vec{b} ) was used to measure the gender bias of that word. Words with high absolute similarity were considered to be gender-biased.

Debiasing Word Embeddings

The authors proposed a two-step process for debiasing the word embeddings:

  1. Neutralize: For each word they wanted to be gender-neutral, they made sure it was equidistant to a predefined set of gender-specific words by projecting it onto the space orthogonal to the gender direction. This ensures that the word is gender-neutral in the embedding space.

    Formally, if ( \vec{w} ) is the vector of a gender-neutral word, then its debiased vector ( \vec{w_{\text{debiased}}} ) is given by:

    [ \vec{w_{\text{debiased}}} = \vec{w} - \vec{b} \cdot \langle \vec{w}, \vec{b} \rangle ]

    where ( \langle \cdot, \cdot \rangle ) is the dot product.

  2. Equalize pairs: For every pair of words that should be equal apart from their gender (like 'grandmother'-'grandfather'), they moved the word vectors to be equidistant from the gender direction.

    The equalization formula for a pair of words ( \vec{w_a} ) and ( \vec{w_b} ) is given by:

    [ \mu = \frac{\vec{w_a} + \vec{w_b}}{2} ]

    [ \mu_B = \vec{b} \cdot \langle \mu, \vec{b} \rangle ]

The equalization formula for a pair of words ( \vec{w_a} ) and ( \vec{w_b} ) continues as follows:

The orthogonal component of (\mu), denoted as (\mu_{B}), is given by:

[ \mu_{B} = \mu - \mu_{B} ]

The projections of ( \vec{w_a} ) and ( \vec{w_b} ) onto the gender direction are computed as:

[ w_{aB} = \vec{b} \cdot \langle \vec{w_a}, \vec{b} \rangle ]

[ w_{bB} = \vec{b} \cdot \langle \vec{w_b}, \vec{b} \rangle ]

The corrected projections (w_{aB_{\text{corrected}}}) and (w_{bB_{\text{corrected}}}) for ( \vec{w_a} ) and ( \vec{w_b} ) respectively are given by:

[ w_{aB_{\text{corrected}}} = \sqrt{|\vec{w_a}|^2 - |\mu_{B}|^2} \cdot \frac{(w_{aB} - \mu_{B})}{|\vec{w_a} - \mu_{B} - \mu_{B}|} ]

[ w_{bB_{\text{corrected}}} = \sqrt{|\vec{w_b}|^2 - |\mu_{B}|^2} \cdot \frac{(w_{bB} - \mu_{B})}{|\vec{w_b} - \mu_{B} - \mu_{B}|} ]

Finally, the debiased vectors ( \vec{w_{a_{\text{debiased}}}} ) and ( \vec{w_{b_{\text{debiased}}}} ) for ( \vec{w_a} ) and ( \vec{w_b} ) respectively are computed as:

[ \vec{w_{a_{\text{debiased}}}} = \mu_{B} + w_{aB_{\text{corrected}}} ]

[ \vec{w_{b_{\text{debiased}}}} = \mu_{B} + w_{bB_{\text{corrected}}} ]

Implications and Critique

The paper was an important step towards acknowledging and addressing the problem of bias in AI, particularly in natural language processing. It presented a clear methodology for identifying and reducing gender bias in word embeddings, which can be extended to other forms of biases as well.

However, it's important to note that this method does not eliminate all forms of bias. It only reduces explicit bias in the geometric space of the word embeddings, and does not handle implicit, nuanced, or context-specific biases. The method also requires a predefined list of gender-neutral words, and its effectiveness depends on the completeness and accuracy of this list.

Moreover, while it's important to remove harmful biases from AI systems, there's an ongoing debate about whether "debiasing" can sometimes oversimplify the complexity of social phenomena and potentially erase important aspects of identity. Therefore, while this method can be a useful tool in some contexts, it's not a one-size-fits-all solution to the problem of bias in AI.

Tags: interpretability
👁️ 737
you need login for comment