TPL/GENKEY Pipeline (Engineering Deep Dive)¶
Role¶
Converts OLGA simulator outputs (.tpl and .genkey pairs) into Parquet with selected columns, optional instrument noise, and metadata. It is the first stage in the data pipeline and feeds the Windows pipeline.
Engineering Notes¶
- Idempotency: Already-processed files are skipped by comparing output presence; safe to re-run.
- Parallelism: Configurable
max_workers,io_workers,parse_workersfor throughput. - Quality: Optional quality checks and per-file or run metadata support auditing and reproducibility.
- Scheduling: This pipeline is the one most often run by the ETL scheduler and Prefect.
Full Reference¶
Configuration keys, script name, and S3/local behavior: TPL/GENKEY (OLGA) in the main Pipelines section.