Standard KL Distillation
CLI Equivalent: apr distill --method kl --temperature 4.0 --alpha 0.7 teacher.apr student.apr
What This Demonstrates
Standard knowledge distillation using KL divergence to transfer knowledge from a large teacher model to a smaller student. Balances soft-label loss (teacher logits) with hard-label loss (ground truth).
Run
cargo run --example distill_standard_kl
Key APIs
DistillationLoss::new(temperature, alpha)-- configure temperature scaling and loss weighting.forward(&student_logits, &teacher_logits, &labels)-- compute combined distillation losssoftmax_with_temperature(logits, temp)-- temperature-scaled softmax