Fake Quantization-Aware Training

CLI Equivalent: apr quantize --method qat --bits 4 --fake model.apr

What This Demonstrates

Fake quantization-aware training (QAT) that simulates quantization error during training so the model learns to be robust to it. Inserts fake-quantize/dequantize operations in the forward pass while keeping full-precision weights for gradient computation.

Run

cargo run --example quantize_fake_qat

Key APIs

FakeQuantize::new(bits, per_channel) -- create fake quantization module
.forward(tensor) -- quantize then immediately dequantize (simulates error)
.observer() -- track min/max ranges for calibration
convert_fake_to_real(model) -- replace fake-quant ops with actual Int4 quantization

Source

examples/optimize/quantize_fake_qat.rs

APR Cookbook - Idiomatic Rust Patterns for ML Model Deployment

Fake Quantization-Aware Training

What This Demonstrates

Run

Key APIs

Source