Memory Layout

The Sovereign AI Stack enforces a row-major tensor layout across all components. This is a critical architectural decision documented as LAYOUT-002 that affects aprender, realizar, and all model conversion pipelines.

LAYOUT-002: Row-Major Mandate

All tensors in the stack use row-major (C-style) memory layout. External formats that use column-major layout are transposed at import time.

External Formats                    Stack Internal (Row-Major)
----------------                    -------------------------
SafeTensors (row-major) ----------> APR v2 --> realizar --> output
                         (native)       ^
GGUF (column-major) ---------------/
                    (transposed by aprender)

Why Row-Major

Three factors drive this decision:

PyTorch/SafeTensors compatibility – HuggingFace models are natively row-major. No conversion needed for the most common import path.
Cache efficiency – Row-major matches C memory layout. When iterating over rows (the common case in matrix-vector products), data is contiguous in memory, maximizing L1/L2 cache utilization.
Kernel simplicity – Realizar’s fused quantization kernels (fused_q4k_parallel_matvec, fused_q6k_parallel_matvec) assume row-major layout. A single layout eliminates runtime branching.

Component Responsibilities

Component	Role
aprender	Transposes GGUF column-major data to row-major during `apr import`
realizar	Assumes row-major layout in all inference kernels
trueno	Provides both column-major and row-major kernels; APR code uses row-major

Diagnosing Layout Bugs

If model output produces garbage text like "olumbia+lsi nunca/localENTS" instead of coherent language, the root cause is almost always a layout mismatch: column-major data fed to a row-major kernel.

Fix: Ensure the model was converted through aprender’s GGUF converter, which transposes weight matrices to row-major.

Cache-Friendly Access Patterns

Row-major layout means elements in the same row are contiguous:

Row-major [3x4]:
  [a b c d | e f g h | i j k l]
   row 0     row 1     row 2

Column-major [3x4]:
  [a e i | b f j | c g k | d h l]
   col 0   col 1   col 2   col 3

For a matrix-vector product y = Wx, each output element computes dot(row_i, x). In row-major layout, row_i is a contiguous memory span, which the CPU prefetcher handles efficiently.

Quantized Tensor Layout

Quantized formats (Q4K, Q6K) store data in 256-element blocks. Each block contains scales, minimums, and quantized values packed together. The block layout is row-major at the block level:

Format	Block Size	Bytes per Block	Per-Row Blocks
Q4K	256 elements	144 bytes	ceil(dim / 256)
Q6K	256 elements	210 bytes	ceil(dim / 256)

APR v2 Format

The APR v2 binary format stores tensors with 64-byte alignment for zero-copy memory mapping. Metadata (including layout information) is padded to 64-byte boundaries:

[header] [metadata (64-byte aligned)] [tensor data (64-byte aligned)]

Navigate: Table of Contents

Keyboard shortcuts

The Batuta Book