3 ways to help mitigate bias toward LGBTQ+ people in machine learning models

By Ben Hutchinson, Software Engineer, Google | AI for Everyone

This year, the month of June marks not only Pride, but also the 50th anniversary of the Stonewall uprising. The uprising is widely considered a crucial event in the history of LGBTQ+ rights. In June 1969, police raided the Stonewall Inn, a neighborhood gay bar in New York City’s Greenwich Village. Members of the community resisted, leading to several days of riots. Today, the LGBTQ+ community faces bias not only from people, but also from machine learning models. 

For instance, words used to self-describe LGBTQ+ identities can also be misused by online harassers. When terms and phrases associated with this type of abuse are unintentionally and disproportionately represented in training datasets for your ML model, it’s at risk of contributing to outputs that reflect and reinforce historical stereotypes and biases. For example, if you're training a model to predict inappropriate language, this can lead to false positives that classify identity-related statements inaccurately – for example, as offensive or inappropriate for certain audiences. 

Here are 3 simple, immediate actions for researchers, ML practitioners, and any non-tech individual can take to reduce inequitable, and even harmful, outcomes for LGBTQ+ users. These actions can also help create equitable experiences for other historically marginalized and minority groups, too.

1. Add data to your training dataset to mitigate bias in text classification

Researchers, if you’re working on text classification, you can check for bias in your ML model by seeing if it performs better for a specific type of text (such as online comments) about some groups, than it performs for other groups. A method to help solve the challenge of unintentionally biased text classification models: add data that balances your training dataset with more positive examples of identity terms for underrepresented groups in the data. Researchers at Jigsaw found that this method improved an ML model that had learned to associate toxicity (defined as “rude, disrespectful, or unreasonable” to the point that users leave a site or app when they encounter this text) with specific LGBTQ+ words. Furthermore, they also found this led to increased overall model accuracy, improving the experience for everyone.

2. Evaluate your model

Developers, let’s say you’re creating a computer vision model for detecting images of weddings, and you want make sure it correctly predicts images of same-sex wedding photos as such. False Negative Rate is a useful metric for computer vision tasks which would be appropriate to use here. It measures the fraction of photos of weddings which are correctly labeled as such. By comparing False Negative Rates for photos of same-sex weddings and opposite-sex weddings, we can check for undesirable model bias, i.e. whether pictures of two brides or two grooms are not recognized as wedding photos. 

This is just one model evaluation method, for classification, specifically. It can help you understand how fair your model is for LGTBQ+ image content, so you can adjust it. If you’re interested in other forms of evaluation, see our ML Crash Course for developers.

3. Help create a new, inclusive training dataset for ML models

Even if you’re not technical, you — and anyone around the world — can still help create inclusive ML model training data by sharing positive LGBTQ+ identity statements and labels via Project Respect (projectrespect.withgoogle.com). The resulting dataset will soon be open to researchers and developers anywhere, to take steps to prevent models from misclassifying text referring to the LGBTQ+ community unfairly. This will help researchers and developers build helpful and inclusive product experiences for everyone.

As announced at the I/O conference last month, Google will continue to release tools and resources to help mitigate bias in AI systems, such as our recently open-sourced research technique for testing with concept activation vectors (TCAV for short) to make machine learning models more interpretable. Models that are easier to interpret are also easier to detect bias within. (Editor’s note: you can read a non-technical person’s description of how TCAV works here.) We’ll be sharing more tools year-round here on Accelerate with Google and on ai.google.


Contact Us

Stay in touch. We want to hear from you. Email us at Acceleratewithgoogle@google.com

Please note that this site and email address is not affiliated with any former Google program named Accelerator.