MATLAB for Machine Learning by Ciaburro Giuseppe

MATLAB for Machine Learning by Ciaburro Giuseppe

Author:Ciaburro, Giuseppe [Ciaburro, Giuseppe]
Language: eng
Format: azw3
Tags: COM018000 - COMPUTERS / Data Processing, COM037000 - COMPUTERS / Machine Theory, COM004000 - COMPUTERS / Intelligence (AI) and Semantics
Publisher: Packt Publishing
Published: 2017-08-28T04:00:00+00:00


Figure 5.5: Graphic description of the tree

Figure 5.5 gives us useful information on the classification of the three floral species immediately. In most cases, the building of the decision tree is aimed at predicting class labels or responses. Indeed, after creating a tree, we can easily predict responses for new data. Suppose a new combination of four data items that represent the length and width of sepals and petals of a specific class of floral species has been detected:

>> MeasNew= [5.9 3.2 1.3 0.25];

To predict the classification based on the tree named ClassTree, previously created and trained, for new data, enter this:

>> predict(ClassTree,measNew)

ans =

cell

'setosa'

The predict() function returns a vector of predicted class labels for the predictor data in the table or matrix based on the trained classification tree. In this case, there is a single prediction because the variable passed to the function contained only one record. In the case of a data matrix containing multiple observations, we would have obtained a series of results equal to the number of rows of the data matrix.

So far, we have learned to build a classification tree from our data. Now we need to test the model's performance in predicting new observations. But what tools are available to measure the tree's quality?

To begin, we can calculate the resubstitution error; this is the difference between the response training data and the predictions the tree makes of the response based on the input training data. It represents an initial estimate of the performance of the model, and it works only in one direction, in the sense a high value for the resubstitution error indicates that the predictions of the tree will not be good. On the contrary, having a low resubstitution error does not guarantee good predictions for new data, so it tells us nothing about it.

To calculate the resubstitution error, simply type:

>> resuberror = resubLoss(ClassTree)

resuberror =

0.0200

The calculated low value suggests that the tree classifies nearly all of the data correctly. To improve the measure of the predictive accuracy of our tree, we perform cross-validation of the tree. By default, cross-validation splits the training data into 10 parts at random. It trains 10 new trees, each one on nine parts of the data. It then examines the predictive accuracy of each new tree on the data not included in training that tree. As opposed to the resubstitution error, this method provides a good estimate of the predictive accuracy of the resulting tree, since it tests new trees on new data:

>> cvrtree = crossval(ClassTree)

cvrtree =

classreg.learning.partition.ClassificationPartitionedModel

CrossValidatedModel: 'Tree'

PredictorNames: {'x1' 'x2' 'x3' 'x4'}

ResponseName: 'Y'

NumObservations: 150

KFold: 10

Partition: [1×1 cvpartition]

ClassNames: {'setosa' 'versicolor' 'virginica'}

ScoreTransform: 'none'

Properties, Methods

>> cvloss = kfoldLoss(cvrtree)

cvloss =

0.0733

At first, we used the crossval() function; it performs a loss estimate using cross-validation. A cross-validated classification model was returned. A number of properties were then available in MATLAB's workspace. Later, we calculated the classification loss for observations not used for training by using the kfoldLoss() function. The low calculated value confirms the quality of the model.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.