ETL Scheduler (APScheduler)¶
Role¶
The ETL scheduler runs a pipeline (typically TPL/GENKEY) on a schedule: by interval (e.g. every N seconds), by cron expression, or daily at fixed times. It is implemented with APScheduler (async) and is suitable for single-machine, in-process scheduling.
Features¶
- Idempotency: The underlying pipeline skips already-processed files; re-runs are safe.
- Single job at a time: Prevents overlapping runs and resource contention.
- Modes:
interval,cron,daily,multiple_daily(see Scheduler configs). - Logging: Rotating file and console; configurable level and format.
- Graceful shutdown: Handles SIGINT/SIGTERM and stops the scheduler cleanly.
- Stats: Success/failure counts and optional alerting on consecutive failures.
Config¶
The scheduler reads a YAML with two main sections:
pipeline: Same keys as the pipeline section inpipelines_config.yml(e.g.source_folder,output_folder,selected_columns,max_workers).scheduler:mode,interval_seconds(for interval),cron_expression(for cron),daily_timeordaily_times,timezone, andlogging(file, level, format, rotation).
Running¶
- One-shot (no schedule):
--run-once - Status:
--statusto print next run and basic stats
For production with observability and retries, use Prefect. For full config options, see Scheduler configs.