Skip to content

Reducing Overfitting

Practical Levers

Once you have detected overfitting (e.g. large gap between train and validation metrics; see Overfitting analysis), you can act on the following.


1. Simplify the Model

  • LightGBM: Increase regularization (reg_alpha, reg_lambda), increase min_data_in_leaf, or reduce num_leaves / max depth.
  • Use hyperparameter optimization (Optuna) to search these parameters; the objective is validation metric (or cross-validation over cases), so the chosen model tends to generalize better.

2. Reduce Features

  • Run feature selection (feature selection pipeline) and keep only the top-K features by importance.
  • Retrain with the reduced schema; fewer features reduce the model’s capacity to memorize.

3. More and More Diverse Data

  • Add more cases (more runs, more leak sizes/locations) so the model sees a broader distribution.
  • Ensure train/validation split by case so that “more data” means more cases, not more windows from the same few cases.

4. Early Stopping and Validation

  • Use early stopping on the validation metric so training stops when validation stops improving.
  • Report and monitor both train and validation metrics in the training scripts and in model validation.

5. Robustness (Optional)

  • If the training config supports robustness (e.g. adding small noise or quantizing inputs), enable it so the model is trained to be less sensitive to tiny input changes.
  • This can improve generalization without changing the architecture.

Summary

Lever Action
Regularization Tune LightGBM reg_*, min_data_in_leaf, num_leaves via Optuna.
Features Use feature selection; train with fewer, stronger features.
Data More cases; split by case_id.
Stopping Early stopping on validation metric.
Robustness Enable if available in training config.

Consistent use of by-case split and validation metrics in the platform makes it easier to measure and then reduce overfitting in a principled way.