Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

trueno-zram: SIMD Memory Compression

trueno-zram provides SIMD-accelerated compression for Linux zram and general-purpose memory compression. It achieves 3+ GB/s with LZ4 and up to 13 GB/s with ZSTD on AVX-512.

Overview

trueno-zram delivers:

  • SIMD Acceleration: AVX2/AVX-512/NEON optimized
  • Multiple Algorithms: LZ4 (speed) and ZSTD (ratio)
  • Adaptive Selection: Entropy-based algorithm choice
  • Page Compression: 4KB aligned for zram integration
  • Optional CUDA: GPU acceleration for batch compression
┌─────────────────────────────────────────────────────────────┐
│                    trueno-zram                              │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │  LZ4 SIMD   │  │ ZSTD SIMD   │  │  Adaptive Selector  │  │
│  │  (3+ GB/s)  │  │ (13 GB/s)   │  │  (entropy-based)    │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│  AVX-512     │     AVX2      │     NEON      │   Scalar    │
└─────────────────────────────────────────────────────────────┘

Installation

[dependencies]
trueno-zram-core = "0.1"

# With adaptive compression
trueno-zram-adaptive = "0.1"

# With CUDA support
trueno-zram-cuda = { version = "0.1", optional = true }

Quick Start

#![allow(unused)]
fn main() {
use trueno_zram_core::{Compressor, Algorithm};

// Create compressor with LZ4 (fastest)
let compressor = Compressor::new(Algorithm::Lz4);

// Compress data
let compressed = compressor.compress(&data)?;
println!("Ratio: {:.2}x", data.len() as f64 / compressed.len() as f64);

// Decompress
let decompressed = compressor.decompress(&compressed)?;
assert_eq!(data, decompressed);
}

Algorithm Comparison

AlgorithmCompressDecompressRatioUse Case
LZ43+ GB/s4+ GB/s2.1xSpeed-critical
ZSTD-1500 MB/s1.5 GB/s2.8xBalanced
ZSTD-3300 MB/s1.5 GB/s3.2xBetter ratio
ZSTD-AVX51213 GB/s15 GB/s3.2xAVX-512 systems
Same-FillN/AN/A2048:1Zero/repeated pages

SIMD Backend Selection

#![allow(unused)]
fn main() {
use trueno_zram_core::{SimdBackend, detect_backend};

// Auto-detect best available backend
let backend = detect_backend();
println!("Using: {:?}", backend);

// Force specific backend
let compressor = Compressor::builder()
    .algorithm(Algorithm::Lz4)
    .backend(SimdBackend::Avx512)
    .build()?;
}

Backend Priority

PriorityBackendCondition
1AVX-512x86_64 with avx512f
2AVX2x86_64 with avx2
3NEONaarch64
4ScalarFallback

Page Compression

Optimized for 4KB page-aligned compression:

#![allow(unused)]
fn main() {
use trueno_zram_core::{PageCompressor, PAGE_SIZE};

let compressor = PageCompressor::new();

// Compress a 4KB page
let page: [u8; PAGE_SIZE] = get_page();
let compressed = compressor.compress_page(&page)?;

// Check if page is compressible
if compressed.len() < PAGE_SIZE / 2 {
    store_compressed(compressed);
} else {
    store_uncompressed(page);  // Not worth compressing
}
}

Adaptive Compression

Entropy-based algorithm selection:

#![allow(unused)]
fn main() {
use trueno_zram_adaptive::AdaptiveCompressor;

let compressor = AdaptiveCompressor::new();

// Automatically selects best algorithm per-page
let result = compressor.compress_adaptive(&data)?;

match result.algorithm_used {
    Algorithm::SameFill => println!("Zero/repeated page"),
    Algorithm::Lz4 => println!("High entropy, used LZ4"),
    Algorithm::Zstd { .. } => println!("Compressible, used ZSTD"),
}
}

Decision Tree

Is page all zeros/same byte?
  YES → Same-Fill (2048:1 ratio)
  NO  → Check entropy
        High entropy → LZ4 (fast, low ratio)
        Low entropy  → ZSTD (slower, high ratio)

Performance Benchmarks

Measured on AMD EPYC 7763 (AVX-512):

AlgorithmScalarAVX2AVX-512
LZ4 compress800 MB/s2.1 GB/s3.2 GB/s
LZ4 decompress1.2 GB/s3.5 GB/s4.5 GB/s
ZSTD-1150 MB/s350 MB/s500 MB/s
ZSTD-fast400 MB/s8 GB/s13 GB/s

Running the Example

cargo run --example trueno_zram_demo
  • trueno-ublk: GPU-accelerated block device using trueno-zram
  • trueno: SIMD/GPU compute primitives

References


Navigate: Table of Contents | Previous: whisper.apr | Next: trueno-ublk