Data as a Service by Sarkar Pushpak
Author:Sarkar, Pushpak
Language: eng
Format: epub
ISBN: 9781119055136
Publisher: Wiley
Published: 2015-07-30T00:00:00+00:00
Figure 8.3 Data profiling results on reference data
Data profiling tools typically utilize a set of algorithms that employ statistical and probabilistic technologies to analyze and explore the quality of data across multiple data sources. These automated tools can also analyze the different values received from data sources for a column. For example, the values in the wrong column for a phone number or address field can be easily detected by a data profiling tool.
Comparing these values with a baseline (or standardized) set of values, defined for that column by business SMEs, can help determine the overall quality of data received from a particular source of data. Profiling tools can also infer the underlying relationship that exists across multiple datasets. By using a variety of pattern-matching techniques, they can provide business insight from the underlying data, which is very difficult to perform manually without these tools.
In the context of metadata definitions, data profiling tools can help uncover semantic differences associated with key entities and attributes across different parts of an organization. This data discovery process can be followed by subsequent activities for standardizing semantic definitions and for data cleansing. In addition to this, the data lineage of data elements published on data services can be established and stored for future project use by IT application teams.
By profiling data, an IT organization can quickly evaluate the content, quality, and structure of underlying databases, files, etc. It can also discover the underlying quality problems in source systems with minimal effort (Figure 8.4). Data profiling can therefore help to identify actual problems with the data as they relate to business needs. For example, if the marketing group in an organization wants to increase the impact of an online ad campaign, data profiling tools can easily help to identify anomalies such as empty or missing address or phone records, incorrect reference-data entries, duplicate entries for the same contact, and problems with data uniformity.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Coding Theory | Localization |
Logic | Object-Oriented Design |
Performance Optimization | Quality Control |
Reengineering | Robohelp |
Software Development | Software Reuse |
Structured Design | Testing |
Tools | UML |
Deep Learning with Python by François Chollet(11888)
Hello! Python by Anthony Briggs(9371)
OCA Java SE 8 Programmer I Certification Guide by Mala Gupta(9341)
The Mikado Method by Ola Ellnestam Daniel Brolund(9306)
Dependency Injection in .NET by Mark Seemann(8859)
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(7849)
Test-Driven iOS Development with Swift 4 by Dominik Hauser(7320)
Grails in Action by Glen Smith Peter Ledbrook(7294)
The Well-Grounded Java Developer by Benjamin J. Evans Martijn Verburg(7115)
Secrets of the JavaScript Ninja by John Resig Bear Bibeault(5956)
Kotlin in Action by Dmitry Jemerov(4636)
Practical Vim (for Kathryn Amaral) by Drew Neil(3726)
Cracking the GRE Premium Edition with 6 Practice Tests, 2015 (Graduate School Test Preparation) by Princeton Review(3595)
Linux Device Driver Development Cookbook by Rodolfo Giometti(3408)
Learn Windows PowerShell in a Month of Lunches by Don Jones(3239)
Learning Java by Patrick Niemeyer & Daniel Leuck(2870)
Learning React: Functional Web Development with React and Redux by Banks Alex & Porcello Eve(2835)
Becoming a Dynamics 365 Finance and Supply Chain Solution Architect by Brent Dawson(2648)
Mastering Java 9 by Dr. Edward Lavieri(2571)