Cleaning Data for Effective Data Science by David Mertz

Cleaning Data for Effective Data Science by David Mertz

Author:David Mertz
Language: eng
Format: epub
Tags: COM037000 - COMPUTERS / Machine Theory, COM018000 - COMPUTERS / Data Processing, COM062000 - COMPUTERS / Data Modeling & Design
Publisher: Packt
Published: 2021-03-28T19:14:47+00:00


David

Davin

1

0.8

David

Maven

3

0.4

the quick brown fox jumped

thee quikc brown fax jumbed

5

0.814814814815

For this exercise, your goal is to identify every genuine name and correct all the misspelled ones to the correct canonical spelling. Keep in mind that sometimes multiple legitimate names are actually close to each other in terms of similarity measures. However, it is probably reasonable to assume that rare spellings are typos, at least if they are also relatively similar to common spellings. You may use whatever programming language, library, and metric you feel is the most useful for the task.

Reading in the data, we see it is similar to the human measures we have seen before:

names = pd.read_csv('data/humans-names.csv') names.head()



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.