Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

SIMD Acceleration

trueno-zram uses runtime CPU feature detection to select the optimal SIMD implementation.

Supported Backends

BackendInstruction SetRegister WidthPlatforms
AVX-512AVX-512F/BW/VL512-bitSkylake-X, Ice Lake, Zen 4
AVX2AVX2 + FMA256-bitHaswell+, Zen 1+
NEONARM NEON128-bitARMv8-A (AArch64)
ScalarNone64-bitAll platforms

Runtime Detection

#![allow(unused)]
fn main() {
use trueno_zram_core::simd::{detect, SimdFeatures};

let features = detect();

println!("AVX-512: {}", features.has_avx512());
println!("AVX2: {}", features.has_avx2());
println!("SSE4.2: {}", features.has_sse42());
}

Automatic Dispatch

The compressor automatically uses the best available backend:

#![allow(unused)]
fn main() {
use trueno_zram_core::CompressorBuilder;

let compressor = CompressorBuilder::new().build()?;

// Check which backend was selected
println!("Backend: {:?}", compressor.backend());
}

Performance by Backend

LZ4 Compression

BackendThroughputRelative
AVX-5124.4 GB/s1.45x
AVX23.2 GB/s1.05x
Scalar3.0 GB/s1.0x

ZSTD Compression

BackendThroughputRelative
AVX-51211.2 GB/s1.40x
AVX28.5 GB/s1.06x
Scalar8.0 GB/s1.0x

SIMD Optimizations

Hash Table Lookups (LZ4)

AVX-512 enables parallel hash probing for match finding:

// Scalar: Sequential probe
for offset in 0..16 {
    if hash_table[hash + offset] == pattern { ... }
}

// AVX-512: Parallel probe (16 comparisons at once)
let matches = _mm512_cmpeq_epi32(hash_values, pattern_broadcast);

Literal Copying

Wide vector moves for copying uncompressed literals:

// AVX-512: 64 bytes per iteration
_mm512_storeu_si512(dst, _mm512_loadu_si512(src));

// AVX2: 32 bytes per iteration
_mm256_storeu_si256(dst, _mm256_loadu_si256(src));

Match Extension

SIMD comparison for extending matches:

#![allow(unused)]
fn main() {
// Compare 64 bytes at once with AVX-512
let cmp = _mm512_cmpeq_epi8(src_chunk, dst_chunk);
let mask = _mm512_movepi8_mask(cmp);
let match_len = mask.trailing_ones();
}

Forcing a Backend

For testing or benchmarking, you can force a specific backend:

#![allow(unused)]
fn main() {
use trueno_zram_core::{CompressorBuilder, SimdBackend};

// Force scalar (no SIMD)
let scalar = CompressorBuilder::new()
    .prefer_backend(SimdBackend::Scalar)
    .build()?;

// Force AVX2 (will fail if not available)
let avx2 = CompressorBuilder::new()
    .prefer_backend(SimdBackend::Avx2)
    .build()?;
}