Big Data Analytics Methods by Peter Ghavami

Big Data Analytics Methods by Peter Ghavami

Author:Peter Ghavami
Language: eng
Format: epub, pdf
Publisher: De Gruyter
Published: 2019-11-18T06:19:23.453000+00:00


Treatment phase:

Data cleansing (data scrubbing) removes invalid data points from a data set. You may delete data that does not fit the data series, pattern or frequency distribution. You must apply data transformations to de-duplicate data. But, first you should test data for finding matching data records to identify duplicate records.

Do not summarily remove outlier data. Be careful about deleting outlier data that may have significance. Cleansing (deleting) data can be done by human judgment if source or data collection processes are not trusted. Constraint tests can detect inaccurate data (e.g. SSN No: 999-99-9999), data out of range, format mismatch, and foreign key checks. You can detect issues such as missing data, “0” when blank or N/A was expected, “999” or “9999” indicating no data in your data set.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.