Case Study: Mixture of Experts Construction
Ticket: GH-445
Module: aprender::online::moe_construction
Overview
Constructs MoE architectures from multiple dense models. Each source model contributes expert FFN weights via round-robin assignment, with learned routing to select top-k experts per token.
Key Components
MoeConfig— Expert count, per-token activation, routing methodRoutingMethod— TopK, SwitchTransformer, ExpertChoiceRouterInit— Random, Uniform, Balanced initializationplan_moe_construction— Round-robin expert assignment plannercompute_gate_weights— Router weight initialization (LAYOUT-002 row-major)compute_expert_load_balance— Coefficient of variation metric
Run
cargo run --example moe_construction
Falsification Tests
| ID | Property | Status |
|---|---|---|
| FALSIFY-MOE-001 | All assignments are valid | Falsified (holds) |
| FALSIFY-MOE-002 | Gate weights have correct dimensions | Falsified (holds) |
| FALSIFY-MOE-003 | Load balance is non-negative | Falsified (holds) |