Training Pipeline (Engineering Deep Dive)¶
Role¶
Training pipelines train LightGBM models for detection (binary), size/location (multiclass), or leak flow (regression). They share the same data-loading and by-case split logic, and write model, metrics, and optional deployment bundles.
Engineering Notes¶
- By-case split: Train/validation split by
case_id(see Train/validation split by case) to avoid leakage and get realistic metrics. - Idempotency: Config hash in
training_metadata.json; same config + same input ⇒ skip. - Detection: Threshold and strategy (e.g. max F1, min recall) are configurable; see Early detection and Model validation.
- Overfitting: Use feature selection and Optuna (see Overfitting analysis, Reducing overfitting).
Full Reference¶
All training scripts (PFM and OBSERVER), configuration keys, and outputs: Training (PFM & OBSERVER) in the main Pipelines section.