Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg

Neural Network Methods for Natural Language Processing (Synthesis Lectures on Human Language Technologies) by Yoav Goldberg

Author:Yoav Goldberg [Goldberg, Yoav]
Language: eng
Format: azw3
Publisher: Morgan & Claypool Publishers
Published: 2017-05-22T04:00:00+00:00


10.7LIMITATIONS OF DISTRIBUTIONAL METHODS

The distributional hypothesis offers an appealing platform for deriving word similarities by representing words according to the contexts in which they occur. It does, however, have some inherent limitations that should be considered when using the derived representations.

Definition of similarity The definition of similarity in distributional approaches is completely operational: words are similar if used in similar contexts. But in practice, there are many facets of similarity. For example, consider the words dog, cat, and tiger. On the one hand, cat is more similar to dog than to tiger, as both are pets. On the other hand, cat can be considered more similar to tiger than to dog as they are both felines. Some facets may be preferred over others in certain use cases, and some may not be attested by the text as strongly as others. The distributional methods provide very little control over the kind of similarities they induce. This could be controlled to some extent by the choice of conditioning contexts (Section 10.5), but it is far from being a complete solution.

Black Sheeps When using texts as the conditioning contexts, many of the more “trivial” properties of the word may not be reflected in the text, and thus not captured in the representation. This happens because of a well-documented bias in people’s use of language, stemming from efficiency constraints on communication: people are less likely to mention known information than they are to mention novel one. Thus, when people talk of white sheep, they will likely refer to them as sheep, while for black sheep they are much more likely to retain the color information and say black sheep. A model trained on text data only can be greatly misled by this.

Antonyms Words that are the opposite of each other (good vs. bad, buy vs. sell, hot vs cold) tend to appear in similar contexts (things that can be hot can also be cold, things that are bought are often sold). As a consequence, models based on the distributional hypothesis tend to judge antonyms as very similar to each other.

Corpus Biases For better or worse, the distributional methods reflect the usage patterns in the corpora on which they are based, and the corpora in turn reflect human biases in the real world (cultural or otherwise). Indeed, Caliskan-Islam et al. [2016] found that distributional word vectors encode “every linguistic bias documented in psychology that we have looked for,” including racial and gender stereotypes (i.e., European American names are closer to pleasant terms while African American names are closer to unpleasant terms; female names are more associated with family terms than with career terms; it is possible to predict the percentage of women in an occupation according to U.S. census based on the vector representation of the occupation name). Like with the antonyms case, this behavior may or may not be desired, depending on the use case: if our task is to guess the gender of a character, knowing that nurses are stereotypically females while doctors are stereotypically males may be a desired property of the algorithm.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.