CGP — aprender-cgp Compute-GPU-Profile
Recipes for aprender-cgp v0.31.2 — cross-backend kernel profiler. Run the same kernel through scalar / SIMD / wgpu / CUDA paths, get a unified report with throughput, latency, energy estimate, and roofline-model placement.
Closes the ≥3 recipes per sister crate requirement from expand-cookbooks/subcrate-coverage.md.
Recipes
| # | Recipe | What |
|---|---|---|
| CGP.1 | cgp_regression_detector_baseline_vs_current | Bootstrap CI regression detector (Hoefler & Belli SC'15); 10% slowdown → Verdict::Regression |
| CGP.2 | cgp_roofline_classify_kernel | Synthetic RTX 4090 roofline; classify low-AI/high-AI kernels as memory-bound vs compute-bound |
| CGP.3 | cgp_roofline_ridge_point_per_precision | Ridge points across FP32/TF32/BF16/FP16/INT8; INT8 ridge = 2× FP16 ridge |
API surface exercised
cgp::analysis::regression::{RegressionDetector, Verdict}— bootstrap CIscgp::analysis::roofline::{RooflineModel, Precision, MemoryLevel, Bound}
GPU backends (wgpu, cuda) are gated behind cargo features and skipped on the cookbook's CI runner; scalar baseline always exercised.
Provenance
Added during PMAT-083 (expand-cookbooks initiative, v6.1.0).