Quick Start¶
This guide gets you from zero to running your first pipeline in a few steps.
Prerequisites¶
- Python 3.10+
- pip (or uv/poetry; adjust commands as needed)
1. Clone and install¶
Install the package with ETL and ML dependencies (and optionally S3 and docs):
2. Config file¶
Pipelines read from a central config file. The repo includes configs/pipelines_config.yml with sections for every pipeline.
- Source/output paths: Edit
source_folder,output_folder, orinput_pathto point to your data and desired outputs. - S3 (optional): To use S3, install
.[s3]and setstorage.type: s3,storage.bucket, and optionalstorage.prefixin the desired section, or use explicits3://paths where supported.
Example (local):
tpl_genkey_pipeline:
source_folder: "data/raw/my_run"
output_folder: "data/processed/my_run"
# ... rest of section
3. Run a pipeline¶
From the project root (directory containing configs/ and scripts/):
TPL/GENKEY (OLGA to Parquet):
Windows (time-series to fixed windows):
Features (wavelet features from windows):
Training (e.g. PFM detection):
Each script will:
- Load its section from
pipelines_config.yml(e.g.tpl_genkey_pipeline,features_pipeline). - Resolve paths (local or S3).
- Check idempotency; exit early if already run with same config.
- Execute the pipeline and write outputs.
4. Run the ETL scheduler (optional)¶
To run the TPL/GENKEY pipeline on a schedule (e.g. every hour):
- Copy or edit
configs/etl_scheduler_config.yamlso thepipelinesection points to yoursource_folderandoutput_folder. - Set
scheduler.mode(e.g.interval) andscheduler.interval_seconds(e.g.3600). - Start the scheduler:
Or use the helper script:
5. Prefect (optional, for production)¶
For production-style scheduling with Prefect:
- Install Prefect deps (already in
.[etl]). - Use Docker Compose to run Prefect Server + worker (see
docker-compose.prefect.ymland Prefect & Production). - Deploy a flow with a schedule:
See Prefect & Production for full details.
Next steps¶
- Pipelines: See Pipelines overview and the per-pipeline pages for configuration and behavior.
- Architecture: See Architecture for layers and components.
- Conventions: See Development Conventions for config and code style.