Skip to content

Features Pipeline (Engineering Deep Dive)

Role

Extracts wavelet and optional raw features from window Parquet files and writes a single aggregated Parquet plus schema and checkpoints. Supports parallel file processing and incremental checkpointing for long runs.


Engineering Notes

  • Leak vs operational: Feature design (wavelet, derivatives, oscillation, coherence) is central to leak detection and operational context.
  • Temporal context: Optional “previous window” and delta features align with temporal context.
  • Idempotency: If features_metadata.json exists with the same config hash, the pipeline exits without reprocessing.
  • Checkpoints: Buffer and checkpoint frequency allow resuming after interruption.

Full Reference

Configuration keys, script name, and S3/local behavior: Features in the main Pipelines section.