Model Validation¶
Purpose¶
The platform provides configurable validation of trained models: check that artifacts exist, that inference runs correctly, and that validation metrics meet minimum thresholds (e.g. precision, recall, false positives). This supports release gates and regression checks after retraining.
Scripts¶
validate_best_model.py— Full validation of a fixed set of artifacts (model file, schema, metrics, inference test).validate_model_configurable.py— Same idea but driven by a YAML config and CLI overrides; can generate a JSON report and is easier to integrate in CI.
Example:
What Gets Validated¶
- Artifacts: Model file, feature schema, metrics file (and optionally deployment bundle).
- Inference: Run predictions on a small set of samples (e.g. from the features dataset or from raw windows) and check that output shape and range are as expected.
- Metrics vs thresholds: e.g. precision ≥ 0.9985, recall ≥ 0.9995, max false positives ≤ 5.
- Report: Optional JSON report written to a path specified in config.
Config (Conceptual)¶
In configs/model_validation_config.yml (or equivalent) you typically have:
- model: Path to model file, type (e.g. LightGBM).
- data: Paths to feature schema, feature dataset, raw windows; number of samples for each inference method.
- feature_vectorizer: Same config as in the pipeline (feature columns, wavelet, etc.) so that validation uses the same preprocessing as training.
- validation_thresholds: Min precision, min recall, max false positives, prediction threshold.
- execution: Which checks to run (artifacts, schema, inference method 1, inference method 2, save report) and report path.
How This Fits the Engineering Philosophy¶
- Reproducibility: Same config and same data → same validation result.
- By-case split: Training and Optuna already use a by-case split; validation typically runs on the same or a held-out set of cases so metrics are comparable.
- CI/CD: The configurable script and JSON report make it easy to add a “validate model” step after training and fail the build if thresholds are not met.
For train/validation split design, see Train/validation split by case. For overfitting, see Overfitting analysis.