LoRA Fine-Tuning

Low-Rank Adaptation (LoRA) is a parameter-efficient fine-tuning technique that freezes the base model weights and injects trainable low-rank matrices into attention projections. Instead of updating all parameters (billions for modern LLMs), LoRA trains only a small number of additional parameters (typically < 1% of the base model).

How LoRA Works

For a pretrained weight matrix \( W_0 \in \mathbb{R}^{d \times k} \), LoRA constrains the update to a low-rank decomposition:

\[ W = W_0 + \Delta W = W_0 + BA \]

where \( B \in \mathbb{R}^{d \times r} \), \( A \in \mathbb{R}^{r \times k} \), and the rank \( r \ll \min(d, k) \).

A is initialized with random Gaussian values
B is initialized to zero (so \( \Delta W = 0 \) at the start of training)
Only A and B are updated during training
The base model weights \( W_0 \) remain frozen

Supported Base Models

Model	CLI Flag	Hidden Size	Head Dim	Vocab	Bias
Qwen2.5-Coder-0.5B	`--model-size 0.5B`	896	64	151,936	Yes
Qwen3.5-9B-Instruct	`--model-size 9B`	4,096	256	248,320	No

CLI Quickstart

Classification Fine-Tuning

# Plan mode (estimate VRAM, show config)
apr finetune --task classify --model-size 9B --plan

# Train with data
apr finetune --task classify \
    --model-size 9B \
    --data train.jsonl \
    --epochs 10 \
    --rank 16 \
    -o checkpoints/

# Qwen2.5 (smaller, for testing)
apr finetune --task classify \
    --model-size 0.5B \
    --data train.jsonl \
    -o checkpoints/

General LoRA Fine-Tuning

# Plan mode
apr finetune model.apr --method lora --model-size 7B --plan

# Train
apr finetune model.apr --method lora --data train.jsonl -o adapter.apr

# Merge adapter back into base model
apr finetune merge model.apr --adapter adapter.apr -o merged.apr

Qwen3.5-9B Specifics

Qwen3.5 introduces several architectural differences from Qwen2 that affect fine-tuning:

No Attention Bias

Qwen3.5 does not use bias in Q/K/V/O projections (use_bias=false). This means:

LoRA adapters target only the weight matrices, not bias vectors
The LoRA parameter count is slightly lower than equivalent Qwen2 models

Explicit Head Dimension

Qwen3.5-9B uses head_dim=256 (vs Qwen2's typical 128). The attention projection shapes are:

Projection	Shape	Notes
Q proj	[4096, 4096]	16 heads x 256 head_dim
K proj	[1024, 4096]	4 KV heads x 256 head_dim
V proj	[1024, 4096]	4 KV heads x 256 head_dim
O proj	[4096, 4096]	hidden_dim x (num_heads x head_dim)

Hybrid Attention

Qwen3.5 uses a mix of standard softmax attention and linear attention layers. LoRA targets Q/V projections in both layer types, ensuring the adapter captures both attention mechanisms.

248K Vocabulary

The larger vocabulary (248,320 tokens vs Qwen2's ~152K) affects embedding layer dimensions but does not change the LoRA targeting strategy, which focuses on attention projections.

Classification Pipeline

The classification fine-tuning pipeline consists of:

Base model config -- loaded via TransformerConfig::qwen3_5_9b() or qwen2_0_5b()
LoRA injection -- rank-16 adapters on Q and V projections
Classification head -- mean pooling + linear layer (hidden_size -> num_classes)
Training loop -- epoch management, validation split, early stopping, LR scheduling
Checkpointing -- periodic saves to APR format

Input text
    |
    v
[Tokenize] -> [Embedding] -> [Transformer Layers (frozen + LoRA)] -> [Mean Pool]
    |
    v
[Classification Head (trainable)] -> [Softmax] -> [Cross-Entropy Loss]

Contract Validation

Fine-tuning configs are validated against the model family contract:

contracts/model-families/qwen3_5.yaml -- source of truth for dimensions
contracts/classification-finetune-v1.yaml -- classification invariants
src/format/model_family_contract_falsify.rs -- Popperian falsification tests

The falsification tests (FALSIFY-FT-QWEN35-001 through 007) verify that the TransformerConfig::qwen3_5_9b() factory matches the YAML contract exactly. If they diverge, the test suite catches it before any training runs.

References

Hu et al. (2021). LoRA: Low-Rank Adaptation of Large Language Models
Dettmers et al. (2023). QLoRA: Efficient Finetuning of Quantized LLMs
docs/specifications/qwen3.5-fine-tune.md -- full specification

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning