Skip to content

TPL/GENKEY Pipeline (Engineering Deep Dive)

Role

Converts OLGA simulator outputs (.tpl and .genkey pairs) into Parquet with selected columns, optional instrument noise, and metadata. It is the first stage in the data pipeline and feeds the Windows pipeline.


Engineering Notes

  • Idempotency: Already-processed files are skipped by comparing output presence; safe to re-run.
  • Parallelism: Configurable max_workers, io_workers, parse_workers for throughput.
  • Quality: Optional quality checks and per-file or run metadata support auditing and reproducibility.
  • Scheduling: This pipeline is the one most often run by the ETL scheduler and Prefect.

Full Reference

Configuration keys, script name, and S3/local behavior: TPL/GENKEY (OLGA) in the main Pipelines section.