Structured Pruning
Remove entire neurons, attention heads, or layers while maintaining model quality through distillation.
cargo run --example prune_structured
Remove entire neurons, attention heads, or layers while maintaining model quality through distillation.
cargo run --example prune_structured