Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

trueno-ublk: GPU Block Device

trueno-ublk provides a GPU-accelerated ZRAM replacement using Linux’s userspace block device (ublk) interface. It achieves 10-50 GB/s throughput by offloading compression to GPU.

Overview

trueno-ublk delivers:

  • ublk Driver: Userspace block device via libublk
  • GPU Compression: CUDA/wgpu accelerated
  • ZRAM Replacement: Drop-in swap device
  • Adaptive Backend: Automatic GPU/SIMD/CPU selection
  • High Throughput: 10-50 GB/s with GPU
┌─────────────────────────────────────────────────────────────┐
│                      Linux Kernel                           │
│                    /dev/ublkb0                              │
└───────────────────────┬─────────────────────────────────────┘
                        │ io_uring
┌───────────────────────▼─────────────────────────────────────┐
│                    trueno-ublk                              │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ GPU Backend │  │ SIMD Backend│  │   CPU Backend       │  │
│  │ (CUDA/wgpu) │  │ (AVX/NEON)  │  │   (fallback)        │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Installation

[dependencies]
trueno-ublk = "0.1"

# With CUDA support (NVIDIA GPUs)
trueno-ublk = { version = "0.1", features = ["cuda"] }

System requirements:

  • Linux kernel 6.0+ (ublk support)
  • libublk userspace library
  • Root privileges for device creation

Quick Start

#![allow(unused)]
fn main() {
use trueno_ublk::{UblkDevice, DeviceConfig, Backend};

// Create device with 8GB capacity
let config = DeviceConfig {
    capacity_bytes: 8 * 1024 * 1024 * 1024,  // 8 GB
    queue_depth: 128,
    num_queues: 4,
    backend: Backend::Auto,  // Auto-select GPU/SIMD/CPU
};

let device = UblkDevice::create(config).await?;
println!("Created: /dev/{}", device.name());

// Run the device (blocks until shutdown)
device.run().await?;
}

Backend Selection

BackendThroughputLatencyCondition
CUDA50+ GB/s100 usNVIDIA GPU
wgpu20+ GB/s200 usAny GPU
AVX-51213 GB/s10 usx86_64
AVX23 GB/s5 usx86_64
NEON2 GB/s5 usARM64
Scalar800 MB/s2 usFallback
#![allow(unused)]
fn main() {
use trueno_ublk::Backend;

// Force specific backend
let config = DeviceConfig {
    backend: Backend::Cuda,  // NVIDIA GPU only
    ..Default::default()
};

// Or use adaptive (switches based on load)
let config = DeviceConfig {
    backend: Backend::Adaptive {
        gpu_batch_threshold: 64,  // Use GPU for 64+ pages
    },
    ..Default::default()
};
}

CLI Usage

# Create 8GB GPU-accelerated swap
sudo trueno-ublk --capacity 8G --backend auto

# Force CUDA backend with stats
sudo trueno-ublk --capacity 16G --backend cuda --stats

# Use as block device (not swap)
sudo trueno-ublk --capacity 4G --no-swap
sudo mkfs.ext4 /dev/ublkb0
sudo mount /dev/ublkb0 /mnt/fast-storage

systemd Integration

/etc/systemd/system/trueno-ublk.service:

[Unit]
Description=trueno-ublk GPU-accelerated swap
Before=swap.target

[Service]
Type=simple
ExecStart=/usr/local/bin/trueno-ublk \
    --capacity 16G \
    --backend auto
ExecStartPost=/sbin/mkswap /dev/ublkb0
ExecStartPost=/sbin/swapon -p 100 /dev/ublkb0

[Install]
WantedBy=swap.target

Enable:

sudo systemctl enable trueno-ublk
sudo systemctl start trueno-ublk

Performance Monitoring

#![allow(unused)]
fn main() {
use trueno_ublk::Stats;

let stats = device.stats();

println!("Compression ratio: {:.2}x", stats.compression_ratio);
println!("Read throughput:   {:.1} GB/s", stats.read_gbps);
println!("Write throughput:  {:.1} GB/s", stats.write_gbps);
println!("Backend:           {:?}", stats.active_backend);
println!("GPU utilization:   {:.0}%", stats.gpu_utilization * 100.0);
}

Example output:

┌─────────────────────────────────────────────────────┐
│ trueno-ublk stats                                   │
├─────────────────────────────────────────────────────┤
│ Device:          /dev/ublkb0                        │
│ Capacity:        16 GB                              │
│ Used:            8.2 GB (51%)                       │
│ Compressed:      2.1 GB (3.9x ratio)                │
│ Backend:         CUDA (RTX 4090)                    │
│ Read:            42.3 GB/s                          │
│ Write:           38.7 GB/s                          │
│ GPU util:        23%                                │
└─────────────────────────────────────────────────────┘

Comparison with zram

Featurezramtrueno-ublk
CompressionCPU onlyGPU/SIMD/CPU
Throughput~1 GB/s10-50 GB/s
AlgorithmsLZ4/ZSTDLZ4/ZSTD + custom
Batch processNoYes (GPU)
AdaptiveNoYes
Kernel reqAny6.0+ (ublk)

Running the Example

cargo run --example trueno_ublk_demo

Note: Running the actual ublk driver requires root privileges and Linux 6.0+.

  • trueno-zram-core: SIMD compression algorithms used by trueno-ublk
  • trueno-zram-adaptive: Entropy-based algorithm selection
  • trueno: SIMD/GPU compute primitives

References


Navigate: Table of Contents | Previous: trueno-zram | Next: Aprender