Applied Predictive Analytics by Dean Abbott
Author:Dean Abbott
Language: eng
Format: epub, pdf
ISBN: 9781118727690
Published: 2014-03-24T00:00:00+00:00
Within-Cluster Descriptions
Describing the mean of a cluster tells us what that cluster looks like, but tells us nothing about why that cluster was formed. Consider Clusters 1 and 2 from Table 7.2. The mean values describe each of the clusters, but a closer examination shows that nearly all the variables have means that are similar to one another (we will see how significant these differences are after computing ANOVAs). The only three variables that, after visual inspection, contain differences are DOMAIN1, DOMAIN2, and DOMAIN3. The differences are key: If the purpose of the cluster model is to find distinct sub-populations in the data, it is critical to not only describe each cluster, but also to describe how they differ from one another.
Examining the differences between cluster characteristics provides additional insight into why the clusters were formed. However, determining how the clusters differ can be quite challenging from reports such as the one shown in Table 7.2. Good visualization of the clusters can help. But whether you use tables or graphs, identifying differences requires scanning every variable and every cluster. If there are 20 inputs and 10 clusters, there are 200 histograms or summary statistics to examine.
Download
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.
Access | Data Mining |
Data Modeling & Design | Data Processing |
Data Warehousing | MySQL |
Oracle | Other Databases |
Relational Databases | SQL |
Algorithms of the Intelligent Web by Haralambos Marmanis;Dmitry Babenko(8301)
Azure Data and AI Architect Handbook by Olivier Mertens & Breght Van Baelen(6746)
Building Statistical Models in Python by Huy Hoang Nguyen & Paul N Adams & Stuart J Miller(6723)
Serverless Machine Learning with Amazon Redshift ML by Debu Panda & Phil Bates & Bhanu Pittampally & Sumeet Joshi(6602)
Data Wrangling on AWS by Navnit Shukla | Sankar M | Sam Palani(6385)
Driving Data Quality with Data Contracts by Andrew Jones(6333)
Machine Learning Model Serving Patterns and Best Practices by Md Johirul Islam(6097)
Learning SQL by Alan Beaulieu(5995)
Weapons of Math Destruction by Cathy O'Neil(5779)
Big Data Analysis with Python by Ivan Marin(5367)
Data Engineering with dbt by Roberto Zagni(4366)
Solidity Programming Essentials by Ritesh Modi(4012)
Time Series Analysis with Python Cookbook by Tarek A. Atwan(3873)
Pandas Cookbook by Theodore Petrou(3582)
Blockchain Basics by Daniel Drescher(3294)
Hands-On Machine Learning for Algorithmic Trading by Stefan Jansen(2906)
Feature Store for Machine Learning by Jayanth Kumar M J(2815)
Learn T-SQL Querying by Pam Lahoud & Pedro Lopes(2796)
Mastering Python for Finance by Unknown(2744)
