Classification in the Wild by unknow

Classification in the Wild by unknow

Author:unknow
Language: eng
Format: epub
Publisher: MIT Press


Predicting Diabetes

First we look at a population of 768 Pima Indian women near Phoenix, Arizona. Pima Indians older than thirty-five years have a high prevalence of diabetes, estimated to be around 50 percent.56 The classification task is to predict who will develop diabetes within five years. Available cues are age, number of pregnancies, plasma glucose concentration, blood pressure, insulin level, body mass index, a body fat estimate, and genetic risk measure accounting for diabetes diagnoses in relatives.

To train fast-and-frugal trees and tallying, we use the Best Fit method discussed in chapter 3.57 To support transparency, we set the maximum number of cues in tallying and fast-and-frugal trees to six. We compare these heuristics to the decision tree CART, the decision list RIPPER, and random forest. All competitors were trained on 70 percent of the instances and tested on the remaining 30 percent. To obtain stable performance estimates, we repeated this out-of-sample procedure one hundred times (data for out-of-population testing were not available).

Every random sample generated a slightly different classification rule. In figure 4.6, we show the rules learned from the complete data set. The random forest consists of five hundred large trees, one of which is shown in the figure.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.