One of the biggest challenges in #IRL #datascience and #machinelearning versus what you learn in school is labeling errors: When you create your own dataset from sources you may not be measuring the outcomes and connecting them to prediction data correctly! If you have labeling errors you are either unable to predict anything or worse - good model performance metrics, but terrible real world performance. Possibly #dataleakage, but possibly other forms of incoherence. 🤯