The Deep Learning Workshop by Mirza Rahim Baig Thomas V. Joseph Nipun Sadvilkar Mohan Kumar Silaparasetty and Anthony So
Author:Mirza Rahim Baig, Thomas V. Joseph, Nipun Sadvilkar, Mohan Kumar Silaparasetty, and Anthony So
Language: eng
Format: epub
Publisher: Packt Publishing Pvt. Ltd.
Published: 2020-07-30T00:00:00+00:00
Bias in Embeddings – A Word of Caution
When discussing regularities and analogies, we saw the following example:
king – man + woman = queen
It's great that the embeddings are capturing these regularities by learning from the text data. Let's try something similar to a profession. Let's see the term closest to doctor – man + woman:
model.wv.most_similar(positive=['woman', 'doctor'], \
negative=['man'], topn=5)
The output regarding the top five results will be as follows:
[('nurse', 0.6464251279830933),
('child', 0.5847542881965637),
('teacher', 0.569127082824707),
('detective', 0.5451491475105286),
('boyfriend', 0.5403486490249634)]
That's not the kind of result we want. Doctors are males, while females are nurses? Let's try another example. This time, let's try what the model thinks regarding females as corresponding to "smart" for "males":
model.wv.most_similar(positive=['woman', 'smart'], \
negative=['man'], topn=5)
We get the following top five results:
[('cute', 0.6156168580055237),
('dumb', 0.6035820245742798),
('crazy', 0.5834532976150513),
('pet', 0.582811713218689),
('fancy', 0.5697714686393738)]
We can see that the top terms are 'cute', 'dumb', and 'crazy'. That's not good at all.
What's happening here? Is this seemingly great representation approach sexist? Is the word2vec algorithm sexist? There definitely is bias in the resulting word vectors, but think about where the bias is coming from. It's the underlying data that uses 'nurse' for females in contexts where 'doctor' is used for males. It is, therefore, the underlying text that contains the bias, not the algorithm.
This topic has recently gained significant attention, and there is ongoing research around ways to assess and get rid of biases from the learned embeddings, but a good approach is to avoid biases in the data to begin with. If you trained word embeddings on YouTube comments, don't be surprised if they contain all kinds of extreme biases. You're better off avoiding text data that you suspect to have biases.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8529)
Test-Driven Development with Java by Alan Mellor(7441)
Data Augmentation with Python by Duc Haba(7331)
Principles of Data Fabric by Sonia Mezzetta(7076)
Learn Blender Simulations the Right Way by Stephen Pearson(7020)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6837)
RPA Solution Architect's Handbook by Sachin Sahgal(6252)
Hadoop in Practice by Alex Holmes(6038)
The Infinite Retina by Robert Scoble Irena Cronin(5954)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5878)
Big Data Analysis with Python by Ivan Marin(5744)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5410)
Pretrain Vision and Large Language Models in Python by Emily Webber(4702)
Infrastructure as Code for Beginners by Russ McKendrick(4484)
WordPress Plugin Development Cookbook by Yannick Lefebvre(4213)
Functional Programming in JavaScript by Mantyla Dan(4129)
The Age of Surveillance Capitalism by Shoshana Zuboff(4126)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(4005)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3981)
