PTX Kernel Analysis

Maps a 7B model inference to its 12-step CUDA PTX kernel execution sequence, computes roofline analysis per kernel, and detects performance issues (low occupancy, excessive shared memory, uncoalesced access patterns).

CLI Equivalent

apr ptx_map model.apr && apr ptx_explain model.apr

Key Concepts

CUDA PTX kernel mapping for transformer inference
Roofline analysis per kernel (compute vs memory bound)
Performance issue detection: occupancy, shared memory, coalescing

Run

cargo run --example gpu_ptx_analysis

Source

examples/gpu/gpu_ptx_analysis/main.rs

APR Cookbook - Idiomatic Rust Patterns for ML Model Deployment

PTX Kernel Analysis

CLI Equivalent

Key Concepts

Run

Source