trueno-ublk: GPU Block Device

trueno-ublk provides a GPU-accelerated ZRAM replacement using Linux’s userspace block device (ublk) interface. It achieves 10-50 GB/s throughput by offloading compression to GPU.

Overview

trueno-ublk delivers:

ublk Driver: Userspace block device via libublk
GPU Compression: CUDA/wgpu accelerated
ZRAM Replacement: Drop-in swap device
Adaptive Backend: Automatic GPU/SIMD/CPU selection
High Throughput: 10-50 GB/s with GPU

┌─────────────────────────────────────────────────────────────┐
│                      Linux Kernel                           │
│                    /dev/ublkb0                              │
└───────────────────────┬─────────────────────────────────────┘
                        │ io_uring
┌───────────────────────▼─────────────────────────────────────┐
│                    trueno-ublk                              │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ GPU Backend │  │ SIMD Backend│  │   CPU Backend       │  │
│  │ (CUDA/wgpu) │  │ (AVX/NEON)  │  │   (fallback)        │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Installation

[dependencies]
trueno-ublk = "0.1"

# With CUDA support (NVIDIA GPUs)
trueno-ublk = { version = "0.1", features = ["cuda"] }

System requirements:

Linux kernel 6.0+ (ublk support)
libublk userspace library
Root privileges for device creation

Quick Start

#![allow(unused)]
fn main() {
use trueno_ublk::{UblkDevice, DeviceConfig, Backend};

// Create device with 8GB capacity
let config = DeviceConfig {
    capacity_bytes: 8 * 1024 * 1024 * 1024,  // 8 GB
    queue_depth: 128,
    num_queues: 4,
    backend: Backend::Auto,  // Auto-select GPU/SIMD/CPU
};

let device = UblkDevice::create(config).await?;
println!("Created: /dev/{}", device.name());

// Run the device (blocks until shutdown)
device.run().await?;
}

Backend Selection

Backend	Throughput	Latency	Condition
CUDA	50+ GB/s	100 us	NVIDIA GPU
wgpu	20+ GB/s	200 us	Any GPU
AVX-512	13 GB/s	10 us	x86_64
AVX2	3 GB/s	5 us	x86_64
NEON	2 GB/s	5 us	ARM64
Scalar	800 MB/s	2 us	Fallback

#![allow(unused)]
fn main() {
use trueno_ublk::Backend;

// Force specific backend
let config = DeviceConfig {
    backend: Backend::Cuda,  // NVIDIA GPU only
    ..Default::default()
};

// Or use adaptive (switches based on load)
let config = DeviceConfig {
    backend: Backend::Adaptive {
        gpu_batch_threshold: 64,  // Use GPU for 64+ pages
    },
    ..Default::default()
};
}

CLI Usage

# Create 8GB GPU-accelerated swap
sudo trueno-ublk --capacity 8G --backend auto

# Force CUDA backend with stats
sudo trueno-ublk --capacity 16G --backend cuda --stats

# Use as block device (not swap)
sudo trueno-ublk --capacity 4G --no-swap
sudo mkfs.ext4 /dev/ublkb0
sudo mount /dev/ublkb0 /mnt/fast-storage

systemd Integration

/etc/systemd/system/trueno-ublk.service:

[Unit]
Description=trueno-ublk GPU-accelerated swap
Before=swap.target

[Service]
Type=simple
ExecStart=/usr/local/bin/trueno-ublk \
    --capacity 16G \
    --backend auto
ExecStartPost=/sbin/mkswap /dev/ublkb0
ExecStartPost=/sbin/swapon -p 100 /dev/ublkb0

[Install]
WantedBy=swap.target

Enable:

sudo systemctl enable trueno-ublk
sudo systemctl start trueno-ublk

Performance Monitoring

#![allow(unused)]
fn main() {
use trueno_ublk::Stats;

let stats = device.stats();

println!("Compression ratio: {:.2}x", stats.compression_ratio);
println!("Read throughput:   {:.1} GB/s", stats.read_gbps);
println!("Write throughput:  {:.1} GB/s", stats.write_gbps);
println!("Backend:           {:?}", stats.active_backend);
println!("GPU utilization:   {:.0}%", stats.gpu_utilization * 100.0);
}

Example output:

┌─────────────────────────────────────────────────────┐
│ trueno-ublk stats                                   │
├─────────────────────────────────────────────────────┤
│ Device:          /dev/ublkb0                        │
│ Capacity:        16 GB                              │
│ Used:            8.2 GB (51%)                       │
│ Compressed:      2.1 GB (3.9x ratio)                │
│ Backend:         CUDA (RTX 4090)                    │
│ Read:            42.3 GB/s                          │
│ Write:           38.7 GB/s                          │
│ GPU util:        23%                                │
└─────────────────────────────────────────────────────┘

Comparison with zram

Feature	zram	trueno-ublk
Compression	CPU only	GPU/SIMD/CPU
Throughput	~1 GB/s	10-50 GB/s
Algorithms	LZ4/ZSTD	LZ4/ZSTD + custom
Batch process	No	Yes (GPU)
Adaptive	No	Yes
Kernel req	Any	6.0+ (ublk)

Running the Example

cargo run --example trueno_ublk_demo

Note: Running the actual ublk driver requires root privileges and Linux 6.0+.

trueno-zram-core: SIMD compression algorithms used by trueno-ublk
trueno-zram-adaptive: Entropy-based algorithm selection
trueno: SIMD/GPU compute primitives

References

Navigate: Table of Contents | Previous: trueno-zram | Next: Aprender

Keyboard shortcuts

The Batuta Book