Technique Interaction Matrix
Techniques are not independent. Order matters.
┌──────────────────────────────────────────────┐
│ TECHNIQUE INTERACTION MATRIX │
│ │
│ Column │ distill merge prune finetune │
│ THEN │ │
│ Row ↓ │ │
│──────────┼───────────────────────────────── │
│ distill │ — ✗bad ✓ok ✗bad │
│ merge │ ✓ok — ✓ok ✓✓best │
│ prune │ ✓ok ✓ok — ✗bad │
│ finetune │ ✓✓best ✓ok ✗bad — │
│ quantize │ ✓ok ✓ok ✓ok ✓ok │
└──────────────────────────────────────────────┘
Legend: Read as "column THEN row" (column happens first)
✓✓best = Optimal ordering
✓ok = Works but not optimal
✗bad = Harmful (degrades quality or wastes compute)
Key asymmetries:
distill→finetune = ✓✓best (adapt distilled knowledge to task)
finetune→distill = ✗bad (distillation overwrites fine-tuned specialization)
finetune→merge = ✓✓best (merge specialized variants)
merge→finetune = ✓ok (works but loses merge diversity)
Golden ordering: distill → finetune → merge → prune → quantize
Rationale:
- Distill first — Knowledge transfer works best on an unmodified student architecture
- Finetune second — LoRA adapts the distilled weights to target benchmarks
- Merge third — Combine fine-tuned variants while representations are still rich
- Prune fourth — Remove redundancy AFTER merging (merged models have more redundancy)
- Quantize last — Always final step; quantization is lossy and non-reversible
Note on QLoRA as implicit QAT: When the final deployment target is INT4, using QLoRA (§7.5) during the finetune step provides quantization-aware adaptation. The adapter trains against quantized base weights, making the final INT4 quantization less lossy than post-training quantization after full-precision LoRA.
Anti-patterns:
- Prune → Finetune: LoRA can't recover pruned knowledge effectively
- Finetune → Distill: Overwrites the fine-tuned specialization
- Quantize → anything: Quality loss compounds with every subsequent operation
Prompt strategy (§7.6) is orthogonal — it applies at eval time after all model modifications. No interaction with the training pipeline. Dogfooding shows prompt strategy yields +1.83pp (HumanEval) and +25.4pp (MBPP) at zero compute cost. Always optimize prompts before starting the training pipeline.