Temporal Context for LightGBM¶
Recommendation: Add Temporal Features (Do Not Change Model Architecture)¶
Using LightGBM (tabular, window-based) is a valid and efficient choice. You do not need to switch to LSTM or Transformers if you enrich each window with temporal context via extra features.
Strategy: Sliding-Window Context Features¶
Idea: Instead of feeding the model a raw sequence, we give it summary features that describe “what happened in this window and how it relates to the previous ones.”
- Before (isolated window): Each row = features of one window only → the model does not know what happened before.
- After (window + context): Each row = features of the current window plus features from the previous window(s) and deltas/trends between windows.
So the model still sees a single row per window (tabular), but that row carries temporal information: level in the previous window, change from previous to current, and simple trend (e.g. “increasing”, “stable”, “decreasing”).
What to Add¶
-
Previous-window features
For each key quantity (e.g. PT mean, GT mean, entropy), add the same quantity computed on the previous window: e.g.PT_mean_prev,GT_mean_prev. -
Delta features
PT_mean_delta = PT_mean_current - PT_mean_prevPT_mean_delta_pct = (current - prev) / prev-
PT_std_ratio = current_std / prev_std(captures increase in variability, typical in leaks). -
Trend features (optional)
If you have the last 3 windows: - Simple trend: “increasing”, “stable”, “decreasing” (e.g. from linear slope or comparison of means).
stability_duration: number of consecutive windows where the signal stayed “stable” (e.g. within a band).
This way the model can learn rules like: “if pressure dropped and variability increased compared to the previous window, treat as leak.”
Implementation Outline¶
- Extractor: When building batches, for each window load the current window and the previous window(s) (same source/case, ordered by time or index).
- Transformer: For each batch, compute current-window features as today, then add:
- Features from the previous window(s) (e.g. same stats, prefixed with
prev_). - Deltas and ratios (current vs prev).
- Optional trend/stability from the last 2–3 windows.
- Output: One row per window, but with more columns (current + prev + deltas + trend). LightGBM then trains on this enriched table.
No change to the training loop or to the model type—only to the feature set.
Why This Fits the Platform¶
- Consistent with operational context: Temporal context (previous level, deltas, trend) helps separate “slow operational drift” from “sudden leak onset.”
- Efficient: No sequence model; same fast training and deployment as today.
- Explainable: Features have clear meaning (previous mean, change, ratio, trend).
- Configurable: Number of previous windows and which stats to carry can be controlled in the pipeline config and feature code.
This document captures the temporal-context philosophy; the exact parameter names and formulas are in the feature extraction and pipeline configuration.