Quick Start¶

This guide gets you from zero to running your first pipeline in a few steps.

Prerequisites¶

cd /path/to/repo
git clone <your-mlops-repo-url> mlops-platform
cd mlops-platform

Install the package with ETL and ML dependencies (and optionally S3 and docs):

pip install -e ".[etl,ml]"
# Optional: S3 and docs
# pip install -e ".[etl,ml,s3,docs]"

Pipelines read from a central config file. The repo includes configs/pipelines_config.yml with sections for every pipeline.

Source/output paths: Edit source_folder, output_folder, or input_path to point to your data and desired outputs.
S3 (optional): To use S3, install .[s3] and set storage.type: s3, storage.bucket, and optional storage.prefix in the desired section, or use explicit s3:// paths where supported.

Example (local):

tpl_genkey_pipeline:
  source_folder: "data/raw/my_run"
  output_folder: "data/processed/my_run"
  # ... rest of section

From the project root (directory containing configs/ and scripts/):

TPL/GENKEY (OLGA to Parquet):

python scripts/run_tpl_genkey_pipeline.py --config configs/pipelines_config.yml

Windows (time-series to fixed windows):

python scripts/run_windows_pipeline.py --config configs/pipelines_config.yml

Features (wavelet features from windows):

python scripts/run_features_pipeline.py --config configs/pipelines_config.yml

Training (e.g. PFM detection):

python scripts/run_training_pfm_detection_pipeline.py --config configs/pipelines_config.yml

Each script will:

Load its section from pipelines_config.yml (e.g. tpl_genkey_pipeline, features_pipeline).
Resolve paths (local or S3).
Check idempotency; exit early if already run with same config.
Execute the pipeline and write outputs.

To run the TPL/GENKEY pipeline on a schedule (e.g. every hour):

Copy or edit configs/etl_scheduler_config.yaml so the pipeline section points to your source_folder and output_folder.
Set scheduler.mode (e.g. interval) and scheduler.interval_seconds (e.g. 3600).
Start the scheduler:

python scripts/etl_scheduler.py --config configs/etl_scheduler_config.yaml

Or use the helper script:

./scripts/start_etl_scheduler.sh

For production-style scheduling with Prefect:

Install Prefect deps (already in .[etl]).
Use Docker Compose to run Prefect Server + worker (see docker-compose.prefect.yml and Prefect & Production).
Deploy a flow with a schedule:

./prefect_manage.sh deploy daily_at_4pm

See Prefect & Production for full details.

Pipelines: See Pipelines overview and the per-pipeline pages for configuration and behavior.
Architecture: See Architecture for layers and components.
Conventions: See Development Conventions for config and code style.