Case Study: Hex Forensics — Format-Aware Binary Inspection

Why Binary Forensics?

When model inference produces garbage, you need to see the actual bytes. Not a high-level summary — the raw data. Traditional tools like xxd show bytes but don't understand model formats. apr hex bridges this gap: format-aware binary inspection that annotates GGUF headers, dequantizes Q4K/Q6K blocks, computes value distributions, and flags anomalies — all in a single command.

Toyota Way: Genchi Genbutsu — go and see the actual data at the source of the problem.

Quick Reference

# Auto-detect format, show summary + hex dump
apr hex model.gguf

# Annotated file header (magic, version, tensor_count, metadata)
apr hex model.gguf --header

# Raw bytes with ASCII column (like xxd, but format-aware)
apr hex model.gguf --raw --width 32 --limit 512

# Quantization super-block structure (Q4K/Q6K/Q8_0)
apr hex model.gguf --blocks --tensor "attn_q"

# Value distribution histogram + entropy + kurtosis
apr hex model.gguf --distribution --tensor "output.weight"

# Per-region byte entropy (corruption detection)
apr hex model.gguf --entropy

# GGUF → APR layout contract overlay
apr hex model.gguf --contract

# List all tensors with dtype and shape
apr hex model.gguf --list

# JSON output for scripting
apr hex model.gguf --json --tensor "attn_q"

Supported Formats

Format	Modes	Notes
GGUF	All 8 modes	Full support including blocks, contract
APR	header, raw, list, stats, distribution, entropy	Native format
SafeTensors	header, raw, list, entropy	JSON header + tensor data

Format is auto-detected from magic bytes:

47 47 55 46 = GGUF
41 50 52 00 = APR\0
First 8 bytes as u64 LE < 100MB = SafeTensors header length

Mode Deep Dives

`--header`: Annotated File Header

Shows the file header with byte offsets, raw hex, and decoded values:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  GGUF File Header
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  00000000: 47 47 55 46                 magic: "GGUF"
  00000004: 03 00 00 00                 version: 3
  00000008: 23 01 00 00 00 00 00 00     tensor_count: 291
  00000010: 1A 00 00 00 00 00 00 00     metadata_kv_count: 26

Color coding: dimmed offsets, yellow hex bytes, bold white labels, cyan values.

`--blocks`: Quantization Super-Block View

Annotates the internal structure of quantized blocks:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Block View: blk.0.ffn_down.weight (Q6_K, [4864, 896])
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Q6_K Super-Block #0 (256 elements, 210 bytes):
  00000000: 52 F2 40 26 24 D2 1B 22 ..  ql[0-127]: low 4 bits
  00000080: A6 9E A4 95 66 9A 8B AA ..  qh[0-63]: high 2 bits
  000000C0: D4 CC BD DC 67 CD 80 99 ..  scales[0-15]: 16 sub-block scales
  000000D0: 5F 01                       d (scale): 0.00002 (f16)

Supported dtypes: Q4_K (144B/256elem), Q6_K (210B/256elem), Q8_0 (34B/32elem).

`--distribution`: Value Histogram

Dequantizes tensor values and shows the distribution:

Distribution: blk.0.attn_norm.weight
  [  -0.532,   -0.425)                                            0.2%
  [  -0.104,    0.003)  ██████████████████████████               34.5%
  [   0.003,    0.110)  ████████████████████████████████████████ 52.1%
  [   0.110,    0.216)  ██████                                    8.8%

  Entropy: 1.62 bits
  Kurtosis: 6.06
  Min: -0.531738
  Max: 0.537109
  Mean: 0.031080
  Std: 0.095854

`--entropy`: Byte Entropy Analysis

Computes Shannon entropy with sliding window anomaly detection:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Byte Entropy Analysis (GGUF)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Total entropy: 7.9429 bits (0.0 = uniform, 8.0 = random)
  File size: 468.64 MiB
  Expected range: Q4K/Q6K: 7.5-8.0, F32: 5.0-7.5, F16: 6.0-7.5
  ─── Sliding Window (4KB)
  Min entropy: 3.0821 at 0x0
  Max entropy: 7.9293 at 0xEF010F0

Anomalous regions (entropy < 1.0) indicate corruption or all-zeros.

`--contract`: Layout Contract Overlay

Shows the GGUF→APR tensor name mapping with transpose requirements:

╭───────────────────────────┬────────────────────────────────────────┬───────────┬──────────╮
│ GGUF Name                 │ APR Name                               │ Transpose │ Critical │
├───────────────────────────┼────────────────────────────────────────┼───────────┼──────────┤
│ output.weight             │ lm_head.weight                         │ Yes       │ CRITICAL │
│ token_embd.weight         │ model.embed_tokens.weight              │ Yes       │ -        │
│ blk.0.attn_norm.weight    │ model.layers.{n}.input_layernorm.weight│ No        │ -        │
╰───────────────────────────┴────────────────────────────────────────┴───────────┴──────────╯

Algorithms

Shannon Entropy

H = -Σ p(x) * log2(p(x))

Where p(x) is the frequency of byte value x in the data. Range: 0.0 (all bytes identical) to 8.0 (perfectly uniform random). Quantized weights typically show 7.5-8.0; values below 5.0 suggest corruption or padding.

f16 → f32 Conversion

IEEE 754 half-precision uses 1 sign bit, 5 exponent bits, 10 mantissa bits. The conversion handles three cases: zero/subnormal (denormalize), normal (bias adjustment exp + 112), and special (Inf/NaN propagation). The bias trick exp + 112 (where 112 = 127 - 15) avoids unsigned integer underflow.

Q4_K / Q6_K Dequantization

Each super-block stores 256 elements with a shared scale factor d (f16) and per-element quantized values. Dequantization: value = d * (quant - zero_point). The block view shows the raw structure so you can verify the dequantization pipeline is reading the correct offsets.

Example

cargo run --example hex_forensics

See examples/hex_forensics.rs for standalone implementations of all algorithms.

Debugging Workflow

Start with --header — verify format, version, tensor count
Use --list — find tensor names and shapes
Use --blocks — verify quantization structure reads correct offsets
Use --distribution — check for NaN, zero clusters, unexpected ranges
Use --entropy — detect corruption or zero-padding regions
Use --contract — verify GGUF→APR name mapping and transpose flags

Aprender — Pure Rust ML Framework