Design Overview
Architecture
┌─────────────────────────────────────────────────────────┐
│ Public API │
│ CompressorBuilder, Algorithm, CompressedPage │
├─────────────────────────────────────────────────────────┤
│ Algorithm Selection │
│ ┌─────────┐ ┌─────────┐ ┌──────────┐ ┌─────────┐ │
│ │ LZ4 │ │ ZSTD │ │ Adaptive │ │Samefill │ │
│ └────┬────┘ └────┬────┘ └────┬─────┘ └────┬────┘ │
├───────┼────────────┼────────────┼─────────────┼────────┤
│ │ SIMD Dispatch │ │ │
│ ┌────▼────┐ ┌────▼────┐ ┌───▼───┐ ┌─────▼─────┐ │
│ │ AVX-512 │ │ AVX2 │ │ NEON │ │ Scalar │ │
│ └─────────┘ └─────────┘ └───────┘ └───────────┘ │
├─────────────────────────────────────────────────────────┤
│ GPU Backend (Optional) │
│ ┌─────────────────────────────────────────────────┐ │
│ │ CUDA Batch Compressor (trueno-gpu PTX) │ │
│ │ ├── H2D Transfer │ │
│ │ ├── Warp-Cooperative LZ4 Kernel │ │
│ │ └── D2H Transfer │ │
│ └─────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Crate Structure
trueno-zram/
├── crates/
│ ├── trueno-zram-core/ # Core compression library
│ │ ├── src/
│ │ │ ├── lib.rs # Public API
│ │ │ ├── error.rs # Error types
│ │ │ ├── page.rs # CompressedPage
│ │ │ ├── lz4/ # LZ4 implementation
│ │ │ ├── zstd/ # ZSTD implementation
│ │ │ ├── gpu/ # GPU batch compression
│ │ │ ├── simd/ # SIMD detection/dispatch
│ │ │ ├── samefill.rs # Same-fill detection
│ │ │ ├── compat.rs # Kernel compatibility
│ │ │ └── benchmark.rs # Benchmarking utilities
│ │ └── examples/
│ ├── trueno-zram-adaptive/ # ML-driven selection
│ ├── trueno-zram-generator/# systemd integration
│ └── trueno-zram-cli/ # CLI tool
└── bins/
└── trueno-ublk/ # ublk daemon
Key Design Decisions
1. Runtime SIMD Dispatch
CPU features are detected at runtime, not compile time:
#![allow(unused)]
fn main() {
// Detection happens once at startup
let features = simd::detect();
// Dispatch based on available features
if features.has_avx512() {
lz4::avx512::compress(input, output)
} else if features.has_avx2() {
lz4::avx2::compress(input, output)
} else {
lz4::scalar::compress(input, output)
}
}
2. Page-Based Compression
All compression operates on fixed 4KB pages:
#![allow(unused)]
fn main() {
pub const PAGE_SIZE: usize = 4096;
// This is enforced at the type level
pub fn compress(page: &[u8; PAGE_SIZE]) -> Result<CompressedPage>;
}
3. Builder Pattern
Configuration via builder pattern:
#![allow(unused)]
fn main() {
let compressor = CompressorBuilder::new()
.algorithm(Algorithm::Lz4)
.prefer_backend(SimdBackend::Avx512)
.build()?;
}
4. Trait-Based Abstraction
The PageCompressor trait enables polymorphism:
#![allow(unused)]
fn main() {
pub trait PageCompressor {
fn compress(&self, page: &[u8; PAGE_SIZE]) -> Result<CompressedPage>;
fn decompress(&self, page: &CompressedPage) -> Result<[u8; PAGE_SIZE]>;
}
}
5. Zero-Copy Where Possible
Minimize allocations in hot paths:
#![allow(unused)]
fn main() {
// Output buffer passed in, not allocated
fn compress_into(input: &[u8], output: &mut [u8]) -> Result<usize>;
}
6. No Panics in Library Code
All errors are returned as Result:
#![allow(unused)]
#![deny(clippy::panic)]
#![deny(clippy::unwrap_used)]
fn main() {
}
Dependencies
| Crate | Purpose |
|---|---|
thiserror | Error derive macros |
cudarc | CUDA driver bindings |
rayon | Parallel iteration |
trueno-gpu | Pure Rust PTX generation |
Feature Flags
| Flag | Default | Description |
|---|---|---|
std | Yes | Standard library |
nightly | No | Nightly SIMD features |
cuda | No | CUDA GPU support |