Skip to content

Model Validation

Purpose

The platform provides configurable validation of trained models: check that artifacts exist, that inference runs correctly, and that validation metrics meet minimum thresholds (e.g. precision, recall, false positives). This supports release gates and regression checks after retraining.


Scripts

  • validate_best_model.py — Full validation of a fixed set of artifacts (model file, schema, metrics, inference test).
  • validate_model_configurable.py — Same idea but driven by a YAML config and CLI overrides; can generate a JSON report and is easier to integrate in CI.

Example:

python scripts/validate_model_configurable.py --config configs/model_validation_config.yml

What Gets Validated

  • Artifacts: Model file, feature schema, metrics file (and optionally deployment bundle).
  • Inference: Run predictions on a small set of samples (e.g. from the features dataset or from raw windows) and check that output shape and range are as expected.
  • Metrics vs thresholds: e.g. precision ≥ 0.9985, recall ≥ 0.9995, max false positives ≤ 5.
  • Report: Optional JSON report written to a path specified in config.

Config (Conceptual)

In configs/model_validation_config.yml (or equivalent) you typically have:

  • model: Path to model file, type (e.g. LightGBM).
  • data: Paths to feature schema, feature dataset, raw windows; number of samples for each inference method.
  • feature_vectorizer: Same config as in the pipeline (feature columns, wavelet, etc.) so that validation uses the same preprocessing as training.
  • validation_thresholds: Min precision, min recall, max false positives, prediction threshold.
  • execution: Which checks to run (artifacts, schema, inference method 1, inference method 2, save report) and report path.

How This Fits the Engineering Philosophy

  • Reproducibility: Same config and same data → same validation result.
  • By-case split: Training and Optuna already use a by-case split; validation typically runs on the same or a held-out set of cases so metrics are comparable.
  • CI/CD: The configurable script and JSON report make it easy to add a “validate model” step after training and fail the build if thresholds are not met.

For train/validation split design, see Train/validation split by case. For overfitting, see Overfitting analysis.