Fundamentals of Clinical Data Science by Pieter Kubben & Michel Dumontier & Andre Dekker

Fundamentals of Clinical Data Science by Pieter Kubben & Michel Dumontier & Andre Dekker

Author:Pieter Kubben & Michel Dumontier & Andre Dekker
Language: eng
Format: epub, pdf
ISBN: 9783319997131
Publisher: Springer International Publishing


8.5 Validation of a Prediction Model

8.5.1 The Importance of Splitting Training/Test Sets

In the previous paragraphs different metrics for evaluation of model performance have been discussed. As briefly discussed in paragraph “The bias-variance tradeoff” it is important to compute performance metrics not on the training dataset but on data that was not seen during the generation of the model, i.e. a test or validation set. This will ensure that you are not mislead into thinking you have a good performing model, while it may in fact be heavily overfitted on the training data. Overfitting means that the model is trained too well on the training set and starts to follow the noise in the data. This generally happens if we allow too many parameters in the final model. The performance on the training set is good, but on new data the model will fail. Underfitting corresponds to models that are too simplistic and do not follow the underlying patterns in the data, again resulting in poor performance in unseen data.

Properly evaluating your model on new/unseen data will improve the generalizability of the model. We differentiate between internal validation, where the dataset is split into a training set for model generation and a test set for model validation, and external validation, where the complete dataset is used for model generation and separate/other datasets are available for model validation.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.