Training Pipeline (Engineering Deep Dive)¶

Role¶

Training pipelines train LightGBM models for detection (binary), size/location (multiclass), or leak flow (regression). They share the same data-loading and by-case split logic, and write model, metrics, and optional deployment bundles.

Engineering Notes¶

By-case split: Train/validation split by case_id (see Train/validation split by case) to avoid leakage and get realistic metrics.
Idempotency: Config hash in training_metadata.json; same config + same input ⇒ skip.
Detection: Threshold and strategy (e.g. max F1, min recall) are configurable; see Early detection and Model validation.
Overfitting: Use feature selection and Optuna (see Overfitting analysis, Reducing overfitting).

Full Reference¶

All training scripts (PFM and OBSERVER), configuration keys, and outputs: Training (PFM & OBSERVER) in the main Pipelines section.