Skip to content

ETL Configuration Reference

Purpose

ETL behavior is driven by YAML (or JSON) config: central pipelines_config.yml for all pipelines, and optional scheduler configs (e.g. etl_scheduler_config.yaml) that include a pipeline section plus a scheduler section.


Common Ideas

  • Path resolution: source_folder, output_folder, input_path, etc. are resolved via resolve_config_paths so that local and S3 paths work the same way in code.
  • Storage: Use explicit s3:// paths or a storage block (type: s3, bucket, optional prefix) to run on S3 where supported.
  • Idempotency: Pipelines use config hashes and “already processed” logic; see the main Pipelines and Development Conventions.

Where to Look

  • Pipeline config keys: Per-pipeline pages under Pipelines (e.g. TPL/GENKEY, Windows, Features).
  • Scheduler config: Scheduler configs and ETL scheduler.
  • Config file location: configs/pipelines_config.yml and configs/etl_scheduler_*.yaml.

This Engineering page anchors the ETL config concept; the full reference is the config file and the Pipelines docs.