ETL Configuration Reference¶
Purpose¶
ETL behavior is driven by YAML (or JSON) config: central pipelines_config.yml for all pipelines, and optional scheduler configs (e.g. etl_scheduler_config.yaml) that include a pipeline section plus a scheduler section.
Common Ideas¶
- Path resolution:
source_folder,output_folder,input_path, etc. are resolved viaresolve_config_pathsso that local and S3 paths work the same way in code. - Storage: Use explicit
s3://paths or astorageblock (type: s3,bucket, optionalprefix) to run on S3 where supported. - Idempotency: Pipelines use config hashes and “already processed” logic; see the main Pipelines and Development Conventions.
Where to Look¶
- Pipeline config keys: Per-pipeline pages under Pipelines (e.g. TPL/GENKEY, Windows, Features).
- Scheduler config: Scheduler configs and ETL scheduler.
- Config file location:
configs/pipelines_config.ymlandconfigs/etl_scheduler_*.yaml.
This Engineering page anchors the ETL config concept; the full reference is the config file and the Pipelines docs.