The Oxford Handbook of Computational Linguistics by Ruslan Mitkov

The Oxford Handbook of Computational Linguistics by Ruslan Mitkov

Author:Ruslan Mitkov
Language: eng
Format: epub
Published: 2010-12-04T07:27:00+00:00


21.6 LEXICON ACQUISITION FROM MACHINE-READABLE DICTIONARIES

Machine-readable dictionaries (MRDs) have been recognized as a valuable resource for constructing lexical knowledge bases for NLP tasks, and some pioneering work in extracting lexical knowledge from dictionaries was undertaken a couple of decades ago. The dictionary definition of a sense is in general described by a genus term followed by a set of differentiae to discriminate it from related senses. As a result of extracting the genus term, the identification of the hypernym appears straightforward.

An early work for constructing a taxonomy from a MRD is reported by Amsler (1981). He investigated the possibility of using definition sentences in a dictionary (Merrian-Webster Pocket Dictionary) to extract the genus terms and to construct a taxonomy of nouns and verbs. Though the analysis and disambiguation of the definition sentences are done manually, he points out a number of problems in this direction of research subsequently recognized by many other researchers.

1. Since the definition is written in natural language and disambiguation of the senses of genus terms is not easy, a tangled taxonomic hierarchy is usually obtained.

2. Genus terms at upper positions (such as cause, thing, and class) tend to form loops.

3. Some words in the definition appearing immediately before of such as a type of ... do not form genus terms. In such cases, the word appearing immediately after of tends to be the genus term.

Chodorow, Byrd, and Heidorn (1985) used a pattern-matching method on the Webster 7th New Collegiate Dictionary to extract genus terms from the definition sentences. By introducing several heuristic rules, they successfully identified those for verbs with almost loo per cent accuracy. As for nouns, they propose very simple rules to identify the head nouns taking into account the of cases mentioned above, reporting about 98 per cent accuracy in extracting the genus terms.

Disambiguation of genus terms at upper positions is important in order to obtain a consistent taxonomy. Guthrie et al. (1990) describe how some dictionaries like LDOCE (Longman Dictionary of Contemporary English) provide box codes (semantic codes) and subject codes (area codes), and using such information is effective in disambiguating upper genus terms. Bruce and Guthrie (1992) extend this idea.

Lexical knowledge acquisition from MRDs is still not successful enough to construct useful lexical knowledge bases for NLP for several reasons. Ide and Veronis (1993) question (1) whether MRDs really contain information useful for NLP, and (2) whether this information is relatively easy to extract from MRDs. With regard to the first question, they point out that selection of a genus term in a definition sentence can be arbitrary. They are also ambiguous and easily form loops, since genus terms at the higher levels of the hierarchy tend to be too general. Furthermore, some of the information in dictionaries is insufficient and frequently lacks senses that appear in corpus usages. As for the second question, although genus term extraction is rather successful as reported in Chodorow, Byrd, and Heidorn (1985), the extracted terms are, in many cases, too general and are inappropriate as the direct hypernyms of the entry words.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.