Natural Language Processing Fundamentals by Sohom Ghosh

Natural Language Processing Fundamentals by Sohom Ghosh

Author:Sohom Ghosh
Language: eng
Format: epub
Publisher: Packt Publishing
Published: 2020-01-15T00:00:00+00:00


Topic Discovery

The main goal of topic modeling is to find a set of topics that can be used to classify a set of documents. These topics are implicit because we do not know what they are beforehand, and they are unnamed. We just generally assume that some documents are similar to each other and that we can organize them into topics.

The number of topics is usually small; that is, from 2 to 10. However, there are some use cases in which you may want to have up to 100 (or even more) topics. Since it is the computer algorithm that discovers the topics, the number is generally arbitrary. These topics may not always directly correspond to topics a human would identify. In practice, the number of topics should be much smaller than the number of documents. This helps the topic modeling algorithm in the sorting process. The more examples of documents that we provide, the better the accuracy with which the algorithm can sort and place the documents into categories.

The number of topics chosen depends on the documents and the objectives of the project. You may want to increase the number of topics, if you have a large number of documents or if the documents are fairly diverse. Conversely, if you are analyzing a narrow set of documents, you may want to decrease the number of topics. This generally flows from your assumptions about the documents. If you think that the document set might inherently contain a large number of topics, you should configure the algorithm to look for a similar number of topics. Essentially, here, you are guiding the algorithm to discover what is already inherent in the documents, and you may have already gained a fair idea of that from sampling a few documents and seeing what types of topics they contain.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Popular ebooks
Enterprise LMS with Adobe Learning Manager by Damien Bruyndonckx(3516)
Building Data Science Solutions with Anaconda by Dan Meador(3425)
Simplify Big Data Analytics with Amazon EMR by Sakti Mishra(2688)
Building Modern CLI Applications in Go by Marian Montagnino(1427)
Getting Started with Forex Trading Using Python by Alex Krishtop(1370)
Simplify Big Data Analytics With Amazon EMR: A Beginner's Guide to Learning and Implementing Amazon EMR for Building Data Analytics Solutions by Sakti Mishra(1076)
Natural Language Processing Fundamentals by Sohom Ghosh(702)
Sebastian Raschka Python Machine Learning Unlock deeper insights into Machine Leaning with this vital guide to cutting-edge predictive analytics by Unknown(538)
+ Excel for Finance and Accounting: Learn how to optimize Excel formulas and functions for financial analysis by Suraj Kumar Lohani(513)
Mastering Data Analysis with Python: A Comprehensive Guide to NumPy, Pandas, and Matplotlib by Rajender Kumar(388)
Statistical Data Analysis Using SAS Intermediate Statistical Methods 2nd Edition by Unknown(346)
Fundamentals of Data Engineering by Joe Reis and Matt Housley(339)
Essential Guide to LLMOps by Ryan Doan;(312)
Data Wrangling Using Pandas, SQL, and Java by Oswald Campesato(304)
Hacking SaaS by Eric Mersch(246)
ChatGPT Millions: Ideas to Generate Your First Million Using ChatGPT and AI by Johnson Omar(230)
MICROSOFT OFFICE 365 FOR BEGINNERS & ADVANCED USERS: THE MOST UPDATED USERG GUIDE TO LEARN MICROSOFT OFFICE 365 (WORD, EXCEL, POWERPOINT, & PUBLISHER) by GIBSON CRYSTAL(225)
Microsoft PowerPoint - BATSA presentation D Crow - web.ppt by denslows(223)
Confident Data Science: Discover The Essential Skills of Data Science by Adam Ross Nelson(204)
Natural Language Processing with Java and LingPipe Cookbook by Unknown(199)