Case Study: Model Merge Strategies (GH-245)

Model merging combines multiple fine-tuned models into a single model without additional training. This is how many top-ranked open models on HuggingFace are created — merges, not trained from scratch.

The 5 Strategies

StrategyModelsRequires BaseKey Parameter
Average2+No
Weighted2+No--weights
SLERP2 onlyNo--weights (interpolation t)
TIES2+Yes--density
DARE2+Yes--drop-rate, --seed

Average

Simple element-wise mean: result = (model_a + model_b) / N. Good baseline for ensemble-style merges.

Weighted

Weighted element-wise sum: result = w1*model_a + w2*model_b. Weights must sum to 1.0.

SLERP (Spherical Linear Interpolation)

Interpolates along the great circle between two weight vectors on a hypersphere. Preserves the magnitude of weights better than linear interpolation. Only works with exactly 2 models. Falls back to linear interpolation when vectors are nearly parallel.

TIES (Trim, Elect Sign, Merge)

  1. Compute task vectors: delta_i = model_i - base
  2. Trim small values below density * max(|delta|) per tensor
  3. Elect sign per element via majority vote across models
  4. Average values agreeing with elected sign
  5. Result: base + merged_delta

DARE (Drop And Rescale)

  1. Compute task vectors: delta_i = model_i - base
  2. Randomly drop elements with probability drop_rate
  3. Rescale remaining by 1 / (1 - drop_rate)
  4. Average rescaled deltas
  5. Result: base + avg(rescaled_deltas)

Running the Example

cargo run --example model_merge_strategies

Rust API

use aprender::format::{apr_merge, MergeOptions, MergeStrategy};

// Average (default)
apr_merge(&[&model_a, &model_b], &output, MergeOptions::default())?;

// Weighted
apr_merge(&[&model_a, &model_b], &output, MergeOptions {
    strategy: MergeStrategy::Weighted,
    weights: Some(vec![0.7, 0.3]),
    ..Default::default()
})?;

// SLERP
apr_merge(&[&model_a, &model_b], &output, MergeOptions {
    strategy: MergeStrategy::Slerp,
    weights: Some(vec![0.3]),  // interpolation parameter t
    ..Default::default()
})?;

// TIES
apr_merge(&[&task_a, &task_b, &task_c], &output, MergeOptions {
    strategy: MergeStrategy::Ties,
    base_model: Some(base_path),
    density: 0.2,
    ..Default::default()
})?;

// DARE
apr_merge(&[&task_a, &task_b, &task_c], &output, MergeOptions {
    strategy: MergeStrategy::Dare,
    base_model: Some(base_path),
    drop_rate: 0.5,
    seed: 42,
    ..Default::default()
})?;

CLI Usage

# Average
apr merge model_a.st model_b.st --strategy average -o merged.st

# Weighted
apr merge model_a.st model_b.st --strategy weighted --weights 0.7,0.3 -o merged.st

# SLERP
apr merge model_a.st model_b.st --strategy slerp --weights 0.3 -o merged.st

# TIES
apr merge task_a.st task_b.st task_c.st --strategy ties \
    --base-model base.st --density 0.2 -o merged.st

# DARE
apr merge task_a.st task_b.st task_c.st --strategy dare \
    --base-model base.st --drop-rate 0.5 --seed 42 -o merged.st

Example Output

Input models:
  base (zeros):
    layer.bias [4] = [0.000, 0.000, 0.000, 0.000]
  model_a (diag 1,2,3,4):
    layer.bias [4] = [0.500, 0.500, 0.500, 0.500]
  model_b (diag 4,3,2,1):
    layer.bias [4] = [1.000, 1.000, 1.000, 1.000]

1. Average: [0.750, 0.750, 0.750, 0.750]
2. Weighted (0.7A + 0.3B): [0.650, 0.650, 0.650, 0.650]
3. SLERP (t=0.3): [0.650, 0.650, 0.650, 0.650]
4. TIES (density=0.2): [0.583, 0.583, 0.583, 0.583]
5. DARE (drop=0.5): [0.833, 1.000, 0.833, 0.333]  (stochastic)

Note how SLERP produces slightly different results from weighted interpolation (curved vs linear path), and DARE produces stochastic results (some elements dropped, others rescaled).

See Also