Big Data Visualization by Unknown
Author:Unknown
Language: eng
Format: epub
Publisher: Packt Publishing
Being successful with addressing data quality demands an understanding of both your data (which is what we just learned from Chapter 3, Understanding Your Data Using R, as well as the ability to identify and resolve the issues with your data.
In Chapter 1, Introduction to Big Data Visualization, we recognized the most general categories of data quality; how do you address them? The first step is to have an understanding of each of them:
Accuracy: There are many varieties of data inaccuracies and the most common examples include: poor math, out of range, invalid values, duplication, and more.
Completeness: Data sources may be missing values from particular columns, missing entire columns, or even complete transactions.
Update status: As part of your quality assurance, you need to establish the cadence of data refresh or updating as well as have the ability to determine when the data was last saved or updated. This is also referred to as latency.
Relevance: This involves identification and elimination of information that you don't need or care about, given your objectives. An example would be removing sales transactions for pickles if you are intending on studying personal grooming products.
Consistency: It's common to have to cross-reference or translate information across data sources. For example, recorded responses to a patient survey may require translation to a single consistent indicator to make later processing or visualizing easier.
Reliability: Reliability is chiefly concerned with making sure the method of data gathering leads to consistent results. A common data assurance process involves establishing baselines and ranges and then routinely verifying that data results fall within established expectations. For example, districts that typically have a mix of both registered Democrat and Republican voters would warrant an investigation if data suddenly was 100% single partied.
Appropriateness: Data is considered appropriate if it is suitable for the intended purpose; this can be subjective. For example, it's considered a fact that holiday traffic affects purchasing habits (that is, an increase in US flags in memorial day week does not indicate an average or expected weekly behavior).
Accessibility: Data of interest may be watered down in a sea of data you are not interested in, thereby reducing the quality of the interesting data since it is mostly inaccessible. This is particularly common in big data projects. Additionally, security may play a role in the quality of your data. For example, particular computers might be excluded from captured logging files or certain health-related information may be hidden and not part of a shared patient data.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
The Mikado Method by Ola Ellnestam Daniel Brolund(27094)
Hello! Python by Anthony Briggs(25942)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(25285)
Kotlin in Action by Dmitry Jemerov(24393)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(23591)
Dependency Injection in .NET by Mark Seemann(23311)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(21942)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(20847)
Grails in Action by Glen Smith Peter Ledbrook(19869)
Adobe Camera Raw For Digital Photographers Only by Rob Sheppard(17072)
Sass and Compass in Action by Wynn Netherland Nathan Weizenbaum Chris Eppstein Brandon Mathis(16832)
Secrets of the JavaScript Ninja by John Resig & Bear Bibeault(14464)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(12581)
Jquery UI in Action : Master the concepts Of Jquery UI: A Step By Step Approach by ANMOL GOYAL(11865)
A Developer's Guide to Building Resilient Cloud Applications with Azure by Hamida Rebai Trabelsi(10650)
Hit Refresh by Satya Nadella(9236)
The Kubernetes Operator Framework Book by Michael Dame(8588)
Exploring Deepfakes by Bryan Lyon and Matt Tora(8444)
Robo-Advisor with Python by Aki Ranin(8387)