Clean Data by Data Science Strategies for Tackling Dirty Data
Author:Data Science Strategies for Tackling Dirty Data
Language: eng
Format: epub
Publisher: Packt Publishing
Example project – Extracting data from e-mail and web forums
The Django IRC logs project was pretty simple. It was designed to show you the differences between three solid techniques that are commonly used to extract clean data from within HTML pages. The data we extracted included the line number, the username, and the IRC chat message, all of which were easy to find and required almost no additional cleaning. In this new example project, we will consider a case that is conceptually similar, but that will require us to extend the idea of data extraction beyond HTML to two other types of semi-structured text found on the Web: e-mail messages hosted on the Web and web-based discussion forums.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Implementing Enterprise Observability for Success by Manisha Agrawal and Karun Krishnannair(7305)
Supercharging Productivity with Trello by Brittany Joiner(6566)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(6413)
Mastering Tableau 2023 - Fourth Edition by Marleen Meier(6325)
Inkscape by Example by István Szép(6179)
Visualize Complex Processes with Microsoft Visio by David J Parker & Šenaj Lelić(5880)
Build Stunning Real-time VFX with Unreal Engine 5 by Hrishikesh Andurlekar(4872)
Design Made Easy with Inkscape by Christopher Rogers(4577)
Customizing Microsoft Teams by Gopi Kondameda(4117)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3932)
Extending Microsoft Power Apps with Power Apps Component Framework by Danish Naglekar(3710)
Business Intelligence Career Master Plan by Eduardo Chavez & Danny Moncada(3656)
Salesforce Platform Enterprise Architecture - Fourth Edition by Andrew Fawcett(3585)
Pandas Cookbook by Theodore Petrou(3564)
The Tableau Workshop by Sumit Gupta Sylvester Pinto Shweta Sankhe-Savale JC Gillet and Kenneth Michael Cherven(3366)
TCP IP by Todd Lammle(2982)
Drawing Shortcuts: Developing Quick Drawing Skills Using Today's Technology by Leggitt Jim(2910)
Applied Predictive Modeling by Max Kuhn & Kjell Johnson(2857)
Work Smarter with Microsoft OneNote by Connie Clark(2842)
