Feature Engineering Bookcamp by Sinan Ozdemir;
Author:Sinan Ozdemir; [Ozdemir, Sinan]
Language: eng
Format: epub
Publisher: Simon & Schuster
Published: 2022-08-24T22:00:00+00:00
Stemmed
Removed of any stop words
And in this case our resulting list of tokens is
['wait', 'plane']
We can now use this custom tokenizer by setting our TfidfVectorizerâs tokenizer parameter, as seen in listing 5.16. Note that because our tokenizer will lowercase and remove stop words for us, we wonât need to grid search for these parameters.
Listing 5.16 Using our custom tokenizer
ml_pipeline = Pipeline([ ('vectorizer', TfidfVectorizer(tokenizer=stem_tokenizer)), â¶ ('classifier', clf) ]) params = { # 'vectorizer__lowercase': [True, False], # 'vectorizer__stop_words': [], â· 'vectorizer__max_features': [100, 1000, 5000], 'vectorizer__ngram_range': [(1, 1), (1, 3)], 'classifier__C': [1e-1, 1e0, 1e1] } print("Stemming + Log Reg
=====================") advanced_grid_search( # remove cleaning train['text'], train['sentiment'], test['text'], test['sentiment'], ml_pipeline, params )
â¶ Using a custom tokenizer
â· Not needed anymore, as our tokenizer is removing stop words and is lowercasing
Our results (figure 5.19) show a reduction in performance, like we saw with our text cleaning.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Computer Vision & Pattern Recognition | Expert Systems |
Intelligence & Semantics | Machine Theory |
Natural Language Processing | Neural Networks |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8525)
Test-Driven Development with Java by Alan Mellor(7434)
Data Augmentation with Python by Duc Haba(7326)
Principles of Data Fabric by Sonia Mezzetta(7072)
Learn Blender Simulations the Right Way by Stephen Pearson(7011)
Microservices with Spring Boot 3 and Spring Cloud by Magnus Larsson(6830)
RPA Solution Architect's Handbook by Sachin Sahgal(6244)
Hadoop in Practice by Alex Holmes(6035)
The Infinite Retina by Robert Scoble Irena Cronin(5945)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(5875)
Big Data Analysis with Python by Ivan Marin(5739)
Life 3.0: Being Human in the Age of Artificial Intelligence by Tegmark Max(5407)
Pretrain Vision and Large Language Models in Python by Emily Webber(4698)
Infrastructure as Code for Beginners by Russ McKendrick(4478)
WordPress Plugin Development Cookbook by Yannick Lefebvre(4209)
Functional Programming in JavaScript by Mantyla Dan(4126)
The Age of Surveillance Capitalism by Shoshana Zuboff(4121)
Embracing Microservices Design by Ovais Mehboob Ahmed Khan Nabil Siddiqui and Timothy Oleson(4000)
Applied Machine Learning for Healthcare and Life Sciences Using AWS by Ujjwal Ratan(3977)
