Practical DataOps by Harvinder Atwal

Practical DataOps by Harvinder Atwal

Author:Harvinder Atwal
Language: eng
Format: epub
ISBN: 9781484251041
Publisher: Apress


Concept Drift

Machine learning and deep learning models learn rules from historical input and output data to make predictions from new input data. The relationship between new input and output data is assumed to remain the same as historical input and output data, so the machine learning model is expected to make useful predictions for new unseen data. The relationship may hold in some cases, for instance, those fixed by laws of nature such as image recognition algorithms for cats. However, other relationships like customer purchasing behavior, spam email detection, or product quality with machinery wear will eventually evolve in a process known as concept drift.

A passive strategy to solve the problem of concept drift is to retrain models using a window of recent data periodically. However, this is not an option in some circumstances due to negative feedback loops. Imagine a recommender system where specific customers see recommendations for product X based on previous purchasing relationships. The data for these customers is now biased because they are now even more likely to buy product X. A model trained on this data will boost the recommendation for product X further, even if in the outside world the original relationship has changed and customers prefer product Y to product X.

One solution to the negative feedback problem is to randomly hold out some instances from model predictions to create a baseline to measure model performance and generate an unbiased dataset for model training. Another strategy is to use a trigger to initiate a model update. Concept drift can be challenging to discover, and there are multiple algorithms for detection. Some of the commonly used algorithms are Drift detection method (DDM), Early drift detection method (EDDM), Geometric moving average detection method (GMADM), and Exponentially weighted moving average chart detection method. These algorithms can be built into continuous diagnostic monitoring of performance and alert when a model requires retraining. Predictions can then be turned off for some or all instances to generate unbiased data for retraining to avoid negative feedback loops. The retraining only happens when the model is not performing well so the cost of training and opportunity cost of prediction benefit is lower than the other two methods.



Download



Copyright Disclaimer:
This site does not store any files on its server. We only index and link to content provided by other sites. Please contact the content providers to delete copyright contents if any and email us, we'll remove relevant links or contents immediately.