Title: The 5 Most Useful Techniques to Handle Imbalanced Datasets
Author: Rahul Agarwal, Senior Statistical Analyst at Walmart Labs
Source: https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html
How: resampling, imbalanced-learn(imblearn); Tomek Links, SMOTE (Synthetic Minority Oversampling Technique); sklearn,
When to use this: At the occurrence of imbalanced datasets, that is, when "you have such a small sample for the positive class in your dataset that the model is unable to learn".
Why it's helpful: Address the problem of an imbalanced dataset: Random undersampling and oversampling, Undersampling and Oversampling using imbalanced-learn, Class weights in the models, and Change your Evaluation Metric.
Suggested application: Finance, marketing/ ad serving, transportation/ airline, medical, content moderation, etc.
Business impact or insights to be gained: Imbalanced datasets "fail to capture the minority class, which is most often the point of creating the model in the first place." Thus, analysis might overlook fraudulent bank transactions, identifying whether a patient has a rare disease, the faulty structural integrity of aircraft, etc.