Research Computing Sciences

How to Make Language Technologies More Inclusive

, by Fabio Todesco
Dirk Hovy suggests a fairer way for automatic translation systems to deal with modern pronouns and advocates technologies that adapt to the users, instead of the opposite

Language technologies like automatic translation or voice-activated commands have made giant leaps in the last ten years, from laughingstock for kids to working tools for everyone. "Now, they should take some additional steps towards inclusivity," said Dirk Hovy, Associate Professor at Bocconi Department of Computing Sciences, anticipating his speech at the 27 June workshop Fairness in AI.

One area where automatic translation struggles is the use of modern pronouns. People with a female name and a male identity, for example, might choose to use feminine pronouns for themselves; non-binary people are frequently featured as they/them in English; the mid central vowel schwa (É™) is sometimes used in place of the gendered suffixes -o/i and -a/e in Italian. When translating into other languages, issues of intelligibility, grammatical correctness, and true representation of individuals arise.

Professor Hovy is studying the problem as a part of his ERC-funded INTEGRATOR project with Research Fellows Anne Lauscher and Debora Nozza. "Currently, automatic translation systems choose what they think to be the best translation, a sort of default choice, regardless of intelligibility, grammatical correctness, or representativeness" he says. "In theory, other choices could be possible: not translating – even though that sounds extreme – or, better, letting the user choose. What we say is that an automatic translation system could highlight the positions where pronouns do not fit its training and experience, and ask readers to make their own choice."

Link to related stories. Image: rainbow colors. Story headline: Pride: STEM Disciplines Fight Algorithmic Bias Link to related stories. Image: CPU processor. Story headline: When Machines Learn Prejudices Link to related stories. Image: a hooded person and symbols recalling cyber bullying. Story headline: Machines Get It Wrong: How to Avoid that Woman and Gay Are Mistaken as Bad Words Link to related stories. Image: a gavel on a computer. Story headline: How to Protect User Rights Against an Algorithm

To train a machine translation system, you feed it with large amounts of texts and let it learn by experience. If you feed it documents mostly written by middle-aged males, the translations will also sound as if men in that age bracket wrote them, as Hovy and co-authors showed in a paper. "These systems are now commonly used to translate a wide range of documents," said Hovy, "thus normalizing and reinforcing stereotypes. The risk is that, in the long run, only a middle-aged, straight male's way of speaking would seem legitimate."

The same is true for voice-activated commands that tend to work well for adult males, but not for women, kids, or people with strong local accents. "They are often trained with low frequency voices and struggle to adapt to higher pitches," Professor Hovy says. "Kids and women must lower their tone if they want to be understood. It's a clear example of a technology that requires users to adapt to the system, instead of the system adapting to the user. But this setup is preventing users from expressing themselves through their way of speaking."