Data as a Service by Sarkar Pushpak

Data as a Service by Sarkar Pushpak

Author:Sarkar, Pushpak
Language: eng
Format: epub
ISBN: 9781119055136
Publisher: Wiley
Published: 2015-07-30T00:00:00+00:00


Figure 8.3 Data profiling results on reference data

Data profiling tools typically utilize a set of algorithms that employ statistical and probabilistic technologies to analyze and explore the quality of data across multiple data sources. These automated tools can also analyze the different values received from data sources for a column. For example, the values in the wrong column for a phone number or address field can be easily detected by a data profiling tool.

Comparing these values with a baseline (or standardized) set of values, defined for that column by business SMEs, can help determine the overall quality of data received from a particular source of data. Profiling tools can also infer the underlying relationship that exists across multiple datasets. By using a variety of pattern-matching techniques, they can provide business insight from the underlying data, which is very difficult to perform manually without these tools.

In the context of metadata definitions, data profiling tools can help uncover semantic differences associated with key entities and attributes across different parts of an organization. This data discovery process can be followed by subsequent activities for standardizing semantic definitions and for data cleansing. In addition to this, the data lineage of data elements published on data services can be established and stored for future project use by IT application teams.

By profiling data, an IT organization can quickly evaluate the content, quality, and structure of underlying databases, files, etc. It can also discover the underlying quality problems in source systems with minimal effort (Figure 8.4). Data profiling can therefore help to identify actual problems with the data as they relate to business needs. For example, if the marketing group in an organization wants to increase the impact of an online ad campaign, data profiling tools can easily help to identify anomalies such as empty or missing address or phone records, incorrect reference-data entries, duplicate entries for the same contact, and problems with data uniformity.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.