Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide by 2014

Microsoft SQL Server 2014 Business Intelligence Development Beginner's Guide by 2014

Author:2014
Language: eng
Format: epub, mobi
Publisher: Packt Publishing


Set the parsing method to Delimiters and choose Comma as the delimiter. Then check Use Knowledge Base Parsing.

Click on Finish and then publish the Knowledge Base.

In the data quality client's main window, under Data Quality Projects, click on Create New Data Quality Project. Name the project Address Cleansing Project. In the Use Knowledge Base drop-down list, choose Address KB. Leave activity as Cleansing and click on Next.

In the Map step, choose Data Source as Excel File and select the KB Parsing.xlsx file. Leave worksheet as is and check the Use first row as header checkbox option. In the mapping grid, just map the Address column from source to the Full Address domain from the Knowledge Base. You will also be able to view the Knowledge Base details on the right-hand side pane, which shows the composite domain, consisting of four domains: Country, State, City, and Address Line.

Click on Next, and in the Cleanse tab, start the process. You will see some profiling results that show the status, number of records, and some other information regarding this cleansing process. Click on Next after that.

In the Manage and View Results tab, you will see the list of values on each domain or composite domain. In this example, you will see Full Address on the left-hand side pane with seven values, because the Excel file had only seven rows.

In the main pane, you will see the tabs Suggested, Invalid, Corrected, Correct, and New. In this example, all of these records were new for the Knowledge Base. So you will see all of them listed under New. In the grid under the tab, you can see the value and the reason cleansing is needed. For this example, you will see that all records mentioned with a reason: New Value in domain Address Line. As you've seen in prior examples of this chapter, we did the knowledge discovery for Country, State, and City. So, for this reason, the incoming data has nothing new for those domains, but it had new values for Address Line. This is one of the advantages of Knowledge Base parsing in composite domains—it will find out the best location for data in the existing domain list based on Knowledge Base domain values and rules.

You can approve or reject new values; if you approve them, they will be listed under the Correct tab, and if you reject them, they will be listed under the Invalid tab. For this example, approve all of them and click on Next. The following screenshot shows the Correct tab after approving all new values:



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.