Skip to content

Technology Stack

This page describes the core technologies used by the MLOps Platform.


Languages & Runtimes

Technology Version Role
Python 3.10+ Primary language for all pipelines, ETL, and ML code.
Async I/O asyncio Used in ETL pipelines, scheduler, and Prefect flows for non-blocking execution.

Data & Storage

Technology Role
Parquet Primary columnar format for intermediate and training data (via PyArrow).
CSV Export format for reports and compatibility; configurable encoding and separator.
Local filesystem Default storage for source_folder, output_folder, and artifacts.
Amazon S3 Optional storage via s3:// URIs or storage.type: s3 with bucket and prefix; requires s3fs.

ML & Numerical

Technology Role
LightGBM Training for binary (detection), multiclass (size/location), and regression (eak flow).
scikit-learn Splits, metrics, preprocessing (e.g. scaling), and utilities.
NumPy Array operations and numerical foundations.
Pandas DataFrames for ETL, features, and training data.
PyArrow Parquet read/write and efficient in-memory representation.
Optuna Hyperparameter optimization with configurable objectives and pruning.
PyWavelets Wavelet transforms for the features pipeline.
SciPy Scientific utilities used in feature extraction and signal processing.

Orchestration & Scheduling

Technology Role
APScheduler In-process scheduler for interval, cron, and daily runs (etl_scheduler.py).
Prefect 3 Flow orchestration, deployments, and observability for production.
Docker Optional: run Prefect server, worker, and PostgreSQL via docker-compose.prefect.yml.

Configuration & Tooling

Technology Role
YAML Main format for pipelines_config.yml, ETL scheduler config, and Prefect.
JSON Schemas, metadata, and some config overrides.
PyYAML Loading and parsing of YAML configuration files.

Development & Quality

Technology Role
pytest Unit and integration tests.
Black Code formatting.
isort Import sorting.
flake8 / mypy Linting and optional type checking.
pre-commit Git hooks for format and lint.

Optional Extras

  • s3fs — S3 support; install with pip install .[s3].
  • Jupyter / JupyterLab — Optional for notebooks; install with pip install .[jupyter].
  • MkDocs + Material — Documentation build; install with pip install .[docs].