Aspect Coverage Matrix¶
This matrix maps the requested ML workflow aspects to concrete showcase projects, commands, and artifacts.
| Aspect | Where It Is Implemented | How to Run | Evidence Artifact(s) |
|---|---|---|---|
Data profiling (ydata-profiling) |
projects/eda-leakage-profiling-showcase, projects/feature-engineering-dimred-showcase |
make sync-profiling && make run |
artifacts/eda/profile_status.txt |
| Univariate analysis | EDA + feature engineering showcases via shared EDA utilities | make run |
artifacts/eda/univariate_summary.csv |
| Bivariate analysis vs target | EDA + feature engineering showcases | make run |
artifacts/eda/bivariate_vs_target.csv |
| Missing information visualization | EDA + feature engineering showcases | make sync-profiling && make run |
artifacts/eda/missingness_summary.csv, artifacts/eda/missing_plot_status.txt |
| Train/val/test split enforcement | Shared split contract for supervised projects | make check-contracts |
artifacts/splits/split_manifest.json with train_rows, val_rows, test_rows |
| Split strategy coverage (stratified/group/time/CV) | EDA leakage showcase + shared split helpers | make run in EDA showcase |
artifacts/splits/group_split_manifest.json, artifacts/splits/timeseries_split_manifest.json, artifacts/splits/cv_split_manifest.json |
| Data type analysis (numeric/categorical) | Feature engineering preprocessing pipelines | make run |
artifacts/features/feature_matrix_summary.csv |
| Categorical encodings (One-Hot + Label/Ordinal) | projects/feature-engineering-dimred-showcase |
make run |
Encoded matrix from preprocessing pipeline |
| Entity embeddings | Advanced FE runner (embedding proxy output) | make run-advanced |
artifacts/advanced/entity_embeddings.csv |
Advanced feature engineering (featuretools, tsfresh, autofeat) |
Advanced FE runner with optional dependencies | make sync-advanced && make run-advanced |
artifacts/advanced/featuretools_status.txt, artifacts/advanced/tsfresh_status.txt, artifacts/advanced/autofeat_status.txt |
| Distribution shift / drift | projects/mlops-drift-production-showcase |
make run && make run-drift |
Drift monitor outputs in artifacts/drift/ |
| Time-aware demand forecasting | projects/nyc-demand-forecasting-foundations-showcase |
make run |
artifacts/eval/metrics_summary.csv, artifacts/splits/time_split_manifest.json |
| Imbalanced dataset handling | projects/sota-supervised-learning-showcase, projects/credit-risk-classification-capstone-showcase |
make run |
Strategy comparison outputs and threshold-aware metrics |
| Correlations/distributions/densities | EDA + feature engineering showcases | make run |
artifacts/eda/correlation_matrix.csv |
| Information leakage analysis | Shared leakage utilities in supervised pipelines | make run |
artifacts/leakage/leakage_report.csv |
| Imputation techniques | Feature engineering preprocessing | make run |
Pipeline uses median + most-frequent imputers |
| Over/Under sampling + SMOTE hybrids | Supervised showcase data utilities | make sync-boosting (for optional libs), make run |
Strategy-level metrics and logs |
| Dimensionality reduction / feature subset selection | Feature engineering + dimred showcase | make run && make run-dimred |
artifacts/selection/selection_scores.csv, artifacts/dimred/embedding_quality_metrics.csv |
| SoTA modeling (XGBoost/LightGBM/CatBoost/Deep/Stacking) | Supervised showcase classification benchmark | make sync-boosting && make run |
artifacts/classification_benchmark.csv |
| Learning-to-rank modeling (LambdaRank + NDCG) | projects/learning-to-rank-foundations-showcase |
make run |
artifacts/eval/ranking_metrics.json, artifacts/splits/group_split_manifest.json |
| Overfitting/bias-aware evaluation (ROC/PR/Learning/Threshold, RMSE/MAE/R²) | Supervised, EDA, and related evaluation pipelines | make run |
artifacts/eval/metrics_summary.csv, artifacts/eval/threshold_analysis.csv, learning/validation curve artifacts |
| Explainability (SHAP/LIME) | projects/xai-fairness-audit-showcase |
make sync-explainability && make run-explainability |
artifacts/explainability/shap_status.txt, artifacts/explainability/lime_status.txt |
| Hyperparameter optimization (HyperOpt/Optuna) | projects/automl-hpo-showcase |
make run-advanced |
artifacts/hpo/trials.csv, artifacts/hpo/strategy_comparison.csv |
| Experiment tracking (MLflow) | AutoML and MLOps showcases | make run-advanced (AutoML), make run-tracking (MLOps) |
artifacts/hpo/mlflow_status.txt, artifacts/tracking/mlflow_status.txt |
| Productionization examples | MLOps serving + ranking API productization + demand API observability + rollout/systems showcases | make serve (MLOps), make dev + make export-openapi (ranking API / demand API) |
openapi.json, artifacts/registry/model_versions.json, http_requests_total metrics endpoint output, rollout decision logs, serving and monitoring artifacts |
Contract Enforcement¶
make check-contractsnow bootstraps missing supervised artifacts in quick mode and validates:- split manifests,
- EDA summaries,
- leakage reports,
- evaluation outputs,
- experiment logs.
- CI uses the same contract verifier path (
shared/scripts/verify_supervised_contract.py --bootstrap-missing) to avoid clean-checkout failures.