Progressive Distillation
CLI Equivalent: apr distill --method progressive --layers 12 --temperature 4.0 teacher.apr student.apr
What This Demonstrates
Progressive layer-wise distillation that transfers knowledge one layer at a time from teacher to student. Combines layer-wise MSE loss on hidden states with final KL divergence loss for more faithful knowledge transfer.
Run
cargo run --example distill_progressive
Key APIs
ProgressiveDistiller::uniform(num_layers, temperature)-- create layer-aligned distiller.layer_wise_mse_loss(&student_hidden, &teacher_hidden)-- intermediate representation matching.combined_loss(mse_loss, kl_loss)-- weighted combination of layer and output losses