Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Appendix C: Qwen3-Coder-Next Architecture Details

Layer PatternCountDescription
Gated DeltaNet → MoE36 (3 per block × 12 blocks)Linear attention with gating, routed to 10/512 experts
Gated Attention → MoE12 (1 per block × 12 blocks)Standard GQA with gating, routed to 10/512 experts
Total layers48

This hybrid architecture means realizar needs to support:

  • DeltaNet (linear attention variant) — likely a new gap
  • MoE routing (top-k expert selection) — may partially exist
  • Gated variants of both attention types