How to identify and treat data leakage

August 22nd, 2022 / By: / Published in: Blog

This post was originally published on this site

An important pitfall to avoid when designing production-grade machine learning systems is called data leakage. Data leakage happens when a model is trained using information about the target variable that will not be available when the model is released into production. As a consequence, the reported performance of the model based on the training and validation sets will probably be very high, but it will not correspond to the performance observed in the production scenario. In some cases, the model may not be even deployable at all. There are many different contexts in which data leakage can arise, each with its own set of considerations and strategies to identify them. This article goes over a few of them.

Follow Us!

Stay up to date on the latest interviews with luminaries who are creating the future.

Follow Us on Facebook Follow Us on YouTube Follow Us on LinkedIn Follow Us on Twitter Follow Us on Instagram
Share via
Copy link
Powered by Social Snap