An important pitfall to avoid when designing production-grade machine learning systems is called data leakage. Data leakage happens when a model is trained using information about the target variable that will not be available when the model is released into production. As a consequence, the reported performance of the model based on the training and validation sets will probably be very high, but it will not correspond to the performance observed in the production scenario. In some cases, the model may not be even deployable at all. There are many different contexts in which data leakage can arise, each with its own set of considerations and strategies to identify them. This article goes over a few of them.