The AI Model Handbook: A guide to the world of artificial intelligence modeling (The Artificial Intelligence Handbook Series 2) by Trinh Minh

The AI Model Handbook: A guide to the world of artificial intelligence modeling (The Artificial Intelligence Handbook Series 2) by Trinh Minh

Author:Trinh, Minh [Trinh, Minh]
Language: eng
Format: epub
Publisher: Rodeo Press
Published: 2021-12-26T00:00:00+00:00


6.3 Classical NLP Modelling

Symbolic NLP

It is possible to teach a computer vocabulary, syntax, and grammar to solve language tasks. This approach is symbolic NLP and uses parsing techniques to identify the words, their roles, and their meanings (Part-of-Speech or POS tagging). Because of the complexity and ambiguity of language and its relative free form, it is difficult to make a hand-written inventory of all the rules required to understand and generate some language.

Another approach is to learn language probabilistically, using a statistical language model that is trained on real-world data. Because of the considerable amount of digital text available with corpora of millions, billions, and even trillions of words, and the large availability of computing power, the statistical approach has gained the upper hand while the symbolic paradigm has not made meaningful progress in real-world applications. MIT Professor Noam Chomsky has been very critical of the statistical approach despite its success. He was quoted as saying:

“It's true there's been a lot of work on trying to apply statistical models to various linguistic problems. I think there have been some successes, but a lot of failures. There is a notion of success, which I think is novel in the history of science. It interprets success as approximating unanalyzed data.” (“Pinker/Chomsky Q&A from MIT150 Panel”)

Norvig (“On Chomsky and the Two Cultures of Statistical Learning”) has an interesting article addressing his criticism. In particular, he points out the empirical success of these models applied to search engines (Norvig works at Google), speech recognition, machine translation, and question answering.

Language Model

A language model describes the probability distribution of words. It is a statistical representation of language. It answers the question of what is the probability that a word appears after a sequence of words, or what is the probability that a sentence was said vs. another one. This is a compelling approach to develop language applications because it can leverage existing textual data and can tell, for instance, if a sentence is grammatically correct or logical because correct and logical sentences are more likely to occur in the data.

Bag of words

The simplest language model is the bag-of-words model, where only the frequency of each word matters, neither the ordering nor the presence of other words. It is a poor model to generate sentences, but it is helpful to measure sentiment or classify text. If some words tend to appear more frequently in a negative sentence, their presence can indicate that a sentence is likely to be negative, using the Bayes formula of conditional probabilities.

N-gram models

A more advanced approach than the bag-of-words is the N-gram model. In the N-gram model, the probability of each word is conditional (depends) on the previous N-1 words. A bigram model accounts only for the previous word; a 3-gram model will account for the previous two words, etc. Given these conditional probabilities, the probability of a full sentence can be calculated thanks to the law of iterated expectations. It will be expressed as a simple product of conditional probabilities or as the sum of logarithmic probabilities if logarithms are used.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Whisky: Malt Whiskies of Scotland (Collins Little Books) by dominic roskrow(56095)
What's Done in Darkness by Kayla Perrin(26623)
The Fifty Shades Trilogy & Grey by E L James(19101)
Shot Through the Heart: DI Grace Fisher 2 by Isabelle Grey(19089)
Shot Through the Heart by Mercy Celeste(18956)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 10 by Isuna Hasekura and Jyuu Ayakura(17142)
Python GUI Applications using PyQt5 : The hands-on guide to build apps with Python by Verdugo Leire(17034)
Peren F. Statistics for Business and Economics...Essential Formulas 3ed 2025 by Unknown(16905)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 03 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16844)
Wolf & Parchment: New Theory Spice & Wolf, Vol. 01 by Isuna Hasekura and Jyuu Ayakura & Jyuu Ayakura(16472)
The Subtle Art of Not Giving a F*ck by Mark Manson(14397)
The 3rd Cycle of the Betrayed Series Collection: Extremely Controversial Historical Thrillers (Betrayed Series Boxed set) by McCray Carolyn(14163)
Stepbrother Stories 2 - 21 Taboo Story Collection (Brother Sister Stepbrother Stepsister Taboo Pseudo Incest Family Virgin Creampie Pregnant Forced Pregnancy Breeding) by Roxi Harding(13692)
Scorched Earth by Nick Kyme(12792)
The Ultimate Python Exercise Book: 700 Practical Exercises for Beginners with Quiz Questions by Copy(11248)
D:\Jan\FTP\HOL\Work\Alien Breed - Tower Assault CD32 Alien Breed II - The Horror Continues Manual 1.jpg by PDFCreator(11204)
De Souza H. Master the Age of Artificial Intelligences. The Basic Guide...2024 by Unknown(11179)
Drei Generationen auf dem Jakobsweg by Stein Pia(10987)
Suna by Ziefle Pia(10906)
Scythe by Neal Shusterman(10375)