The Oxford Handbook of Computational Linguistics by Ruslan Mitkov
Author:Ruslan Mitkov
Language: eng
Format: epub
Published: 2010-12-04T07:27:00+00:00
21.6 LEXICON ACQUISITION FROM MACHINE-READABLE DICTIONARIES
Machine-readable dictionaries (MRDs) have been recognized as a valuable resource for constructing lexical knowledge bases for NLP tasks, and some pioneering work in extracting lexical knowledge from dictionaries was undertaken a couple of decades ago. The dictionary definition of a sense is in general described by a genus term followed by a set of differentiae to discriminate it from related senses. As a result of extracting the genus term, the identification of the hypernym appears straightforward.
An early work for constructing a taxonomy from a MRD is reported by Amsler (1981). He investigated the possibility of using definition sentences in a dictionary (Merrian-Webster Pocket Dictionary) to extract the genus terms and to construct a taxonomy of nouns and verbs. Though the analysis and disambiguation of the definition sentences are done manually, he points out a number of problems in this direction of research subsequently recognized by many other researchers.
1. Since the definition is written in natural language and disambiguation of the senses of genus terms is not easy, a tangled taxonomic hierarchy is usually obtained.
2. Genus terms at upper positions (such as cause, thing, and class) tend to form loops.
3. Some words in the definition appearing immediately before of such as a type of ... do not form genus terms. In such cases, the word appearing immediately after of tends to be the genus term.
Chodorow, Byrd, and Heidorn (1985) used a pattern-matching method on the Webster 7th New Collegiate Dictionary to extract genus terms from the definition sentences. By introducing several heuristic rules, they successfully identified those for verbs with almost loo per cent accuracy. As for nouns, they propose very simple rules to identify the head nouns taking into account the of cases mentioned above, reporting about 98 per cent accuracy in extracting the genus terms.
Disambiguation of genus terms at upper positions is important in order to obtain a consistent taxonomy. Guthrie et al. (1990) describe how some dictionaries like LDOCE (Longman Dictionary of Contemporary English) provide box codes (semantic codes) and subject codes (area codes), and using such information is effective in disambiguating upper genus terms. Bruce and Guthrie (1992) extend this idea.
Lexical knowledge acquisition from MRDs is still not successful enough to construct useful lexical knowledge bases for NLP for several reasons. Ide and Veronis (1993) question (1) whether MRDs really contain information useful for NLP, and (2) whether this information is relatively easy to extract from MRDs. With regard to the first question, they point out that selection of a genus term in a definition sentence can be arbitrary. They are also ambiguous and easily form loops, since genus terms at the higher levels of the hierarchy tend to be too general. Furthermore, some of the information in dictionaries is insufficient and frequently lacks senses that appear in corpus usages. As for the second question, although genus term extraction is rather successful as reported in Chodorow, Byrd, and Heidorn (1985), the extracted terms are, in many cases, too general and are inappropriate as the direct hypernyms of the entry words.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Cecilia; Or, Memoirs of an Heiress — Volume 1 by Fanny Burney(32054)
Cecilia; Or, Memoirs of an Heiress — Volume 3 by Fanny Burney(31453)
Cecilia; Or, Memoirs of an Heiress — Volume 2 by Fanny Burney(31402)
The Lost Art of Listening by Michael P. Nichols(7157)
We Need to Talk by Celeste Headlee(5412)
Asking the Right Questions: A Guide to Critical Thinking by M. Neil Browne & Stuart M. Keeley(5355)
On Writing A Memoir of the Craft by Stephen King(4658)
Dialogue by Robert McKee(4157)
Pre-Suasion: A Revolutionary Way to Influence and Persuade by Robert Cialdini(3973)
I Have Something to Say: Mastering the Art of Public Speaking in an Age of Disconnection by John Bowe(3774)
Elements of Style 2017 by Richard De A'Morelli(3235)
The Book of Human Emotions by Tiffany Watt Smith(3138)
Fluent Forever: How to Learn Any Language Fast and Never Forget It by Gabriel Wyner(2915)
Name Book, The: Over 10,000 Names--Their Meanings, Origins, and Spiritual Significance by Astoria Dorothy(2836)
Good Humor, Bad Taste: A Sociology of the Joke by Kuipers Giselinde(2821)
Why I Write by George Orwell(2771)
The Grammaring Guide to English Grammar with Exercises by Péter Simon(2646)
The Art Of Deception by Kevin Mitnick(2622)
Don't Sleep, There Are Snakes by Daniel L. Everett(2496)
