World-Class Leak Detection Analysis¶

Purpose¶

This topic refers to the end-to-end analysis we aim for: from raw data and labels to trained models, threshold selection, and evaluation reports—with clear metrics, confusion matrices, and diagnostic plots so that both engineers and stakeholders can judge detection quality.

What “World-Class” Means Here¶

Reproducibility: Same config and data → same results; idempotent pipelines and by-case splits.
Transparency: Features and validation are documented (see Leak detection features, Model validation); metrics and thresholds are explicit.
Operational relevance: We separate leak from operational change (see Operational context) and validate on unseen cases, not just unseen windows.
Actionable outputs: Excel and JSON reports, confusion matrices, and optional plots (e.g. size/location diagnostics, leak series) so you can tune thresholds and interpret performance.

Where This Shows Up in the Platform¶

Training pipelines: Produce metrics, detection metrics (e.g. by threshold), and optional deployment bundles.
Test offline pipelines: Run models on held-out data and generate the reports and plots above.
Feature selection and Optuna: Help choose features and hyperparameters that generalize (see Overfitting analysis).

Using the Engineering docs (context, features, validation, overfitting) together with the Pipelines and Prefect & Production sections gives a complete picture of how we achieve and maintain high-quality leak detection.