Skip to content

Prefect Integration for Production

Role

Prefect 3 is used to run pipeline flows (e.g. TPL/GENKEY) on a schedule or on demand, with observability (UI, run history, logs) and optional retries. It complements the in-process ETL scheduler when you need deployment, multiple workers, or a central dashboard.


Components

  • Prefect Server — API and UI (e.g. port 4200).
  • PostgreSQL — Backing store for Prefect (e.g. via Docker).
  • Prefect Worker — Pulls and runs flow runs (e.g. process worker).
  • Deployments — Flow + schedule + parameters (e.g. config path); defined in prefect.yaml and deployed via scripts.

Project layout

  • mlops/prefect/flows/ — e.g. genkey_flow.py (wraps TPL/GENKEY pipeline), scheduler_wrapper.py (run-once scheduler).
  • prefect.yaml — Declarative deployment (entrypoint, schedule, steps).
  • deploy_flows.py / prefect_manage.sh — Deploy with a named schedule (e.g. daily_at_4pm) and manage server/worker.

Typical workflow

  1. Start Server + Worker (e.g. docker compose -f docker-compose.prefect.yml up -d).
  2. Deploy: ./prefect_manage.sh deploy daily_at_4pm (or another schedule name).
  3. Worker executes runs on schedule; you can also trigger a run manually via UI or CLI.
  4. View runs and logs in the Prefect UI.

Progress and logging

Flows can use PrefectProgressMonitor so pipeline progress (e.g. files processed) is logged to Prefect and visible in the UI. Enable via pipeline config (use_prefect_progress, prefect_logger).

For a concise operational guide (commands, comparison with APScheduler), see the main docs: Prefect & Production.