Features for Leak Detection — Detailed Rationale¶

Executive Summary¶

This document explains why certain features are designed to separate leaks from operational changes in pipeline systems, and how sensor topology (relative position of sensors and leak) affects the physics we exploit.

The Challenge: Leak vs Operational Change¶

Similarities (why it is hard):

Both cause changes in pressure and flow.
Both are transients.
Both can show the same “direction” of change (e.g. pressure down, flow up or down).

Differences we exploit:

Aspect	Leak	Operational change
Nature	Uncontrolled physical event	Controlled (pump, valve)
Oscillations	Yes — pressure oscillates before settling	No — smooth, often monotonic
Turbulence	Yes — more energy at high frequencies	No — clean transient
PT–GT coherence	Low / erratic	High — coordinated
Transient duration	Can be long (minutes)	Usually short (seconds to settle)

So we design features that capture oscillation, turbulence/entropy, and coherence between pressure and flow.

Dependence on System Topology¶

Case 1: Leak downstream of sensors

Pressure: decreases (with oscillations).
Flow: increases (more fluid escaping past the leak).
PT–GT derivative correlation: negative (e.g. ≈ −0.6).

Case 2: Leak upstream of sensors

Pressure: decreases (with oscillations).
Flow: decreases (less fluid reaching the sensors).
PT–GT derivative correlation: positive (e.g. ≈ +0.3 to +0.5).

Conclusion: The sign of PT–GT correlation depends on topology. What is common to leaks in both cases is oscillation and higher variability. So we do not rely on the sign of one feature alone; we use combinations (oscillation + correlation + entropy + baseline deviation) so the model works across topologies.

Feature Groups and Their Role¶

1. Oscillation features (critical)¶

First-derivative zero crossings — Direction changes in the signal (e.g. pressure going down then up). Leaks → moderate to high; operational change → very low.
Second-derivative zero crossings — Changes in “acceleration”; further distinguish chaotic vs smooth transients.

These are largely topology-independent: leaks oscillate; operational changes usually do not.

2. Coherence / correlation¶

PT–GT derivative correlation — How aligned are pressure and flow changes? High → likely operational; low/erratic → more likely leak.
Used together with oscillation and entropy so that different topologies (upstream vs downstream leak) are still separable.

3. Spectral / entropy¶

Spectral entropy — How “spread” the energy is over frequencies. Leaks → more high-frequency content and higher entropy; clean operational change → lower.
Energy ratios (e.g. first half vs second half of spectrum) can help capture the shape of the transient.

4. Baseline deviation (optional but strong)¶

Compare current window to “normal” operating profiles (per flow/pressure level).
Large deviation + oscillation + low coherence → strong leak candidate.
See Operational context for the philosophy.

How This Is Used in the Platform¶

The features pipeline (wavelet + stats) produces many of these ingredients (derivatives, zero crossings, entropy-like measures).
The training pipelines use a feature schema (e.g. from feature selection) that includes the subset of columns we want the model to see.
Validation (e.g. train/val split by case, offline test) ensures we measure performance on unseen cases, not just unseen windows, so we get a realistic estimate of how well “leak vs operational” separation works in production.

This document gives the rationale; the exact names and formulas are in the feature extraction code and config (e.g. feature_columns, wavelet settings, and any custom metrics).