Reducing Overfitting¶

Practical Levers¶

Once you have detected overfitting (e.g. large gap between train and validation metrics; see Overfitting analysis), you can act on the following.

1. Simplify the Model¶

LightGBM: Increase regularization (reg_alpha, reg_lambda), increase min_data_in_leaf, or reduce num_leaves / max depth.
Use hyperparameter optimization (Optuna) to search these parameters; the objective is validation metric (or cross-validation over cases), so the chosen model tends to generalize better.

2. Reduce Features¶

Run feature selection (feature selection pipeline) and keep only the top-K features by importance.
Retrain with the reduced schema; fewer features reduce the model’s capacity to memorize.

3. More and More Diverse Data¶

Add more cases (more runs, more leak sizes/locations) so the model sees a broader distribution.
Ensure train/validation split by case so that “more data” means more cases, not more windows from the same few cases.

4. Early Stopping and Validation¶

Use early stopping on the validation metric so training stops when validation stops improving.
Report and monitor both train and validation metrics in the training scripts and in model validation.

5. Robustness (Optional)¶

If the training config supports robustness (e.g. adding small noise or quantizing inputs), enable it so the model is trained to be less sensitive to tiny input changes.
This can improve generalization without changing the architecture.

Summary¶

Lever	Action
Regularization	Tune LightGBM reg_*, min_data_in_leaf, num_leaves via Optuna.
Features	Use feature selection; train with fewer, stronger features.
Data	More cases; split by case_id.
Stopping	Early stopping on validation metric.
Robustness	Enable if available in training config.

Consistent use of by-case split and validation metrics in the platform makes it easier to measure and then reduce overfitting in a principled way.