Features Pipeline (Engineering Deep Dive)¶
Role¶
Extracts wavelet and optional raw features from window Parquet files and writes a single aggregated Parquet plus schema and checkpoints. Supports parallel file processing and incremental checkpointing for long runs.
Engineering Notes¶
- Leak vs operational: Feature design (wavelet, derivatives, oscillation, coherence) is central to leak detection and operational context.
- Temporal context: Optional “previous window” and delta features align with temporal context.
- Idempotency: If
features_metadata.jsonexists with the same config hash, the pipeline exits without reprocessing. - Checkpoints: Buffer and checkpoint frequency allow resuming after interruption.
Full Reference¶
Configuration keys, script name, and S3/local behavior: Features in the main Pipelines section.