Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

PCIe 5x Rule

The PCIe 5x rule determines when GPU offload is beneficial for compression.

The Rule

GPU beneficial when: T_compute > 5 × T_transfer

Where:

  • T_compute = CPU computation time
  • T_transfer = PCIe transfer time (H2D + D2H)

Why 5x?

GPU offload has overhead:

  1. H2D transfer: Copy data to GPU
  2. Kernel launch: ~5-10us overhead
  3. D2H transfer: Copy results back
  4. Synchronization: Wait for completion

The 5x factor accounts for these overheads and ensures GPU provides net benefit.

Calculation

#![allow(unused)]
fn main() {
use trueno_zram_core::gpu::meets_pcie_rule;
use trueno_zram_core::PAGE_SIZE;

fn check_gpu_benefit(
    pages: usize,
    pcie_bandwidth_gbps: f64,
    gpu_throughput_gbps: f64,
) -> bool {
    let data_bytes = pages * PAGE_SIZE;

    // Transfer time (round trip)
    let transfer_time = 2.0 * data_bytes as f64 / (pcie_bandwidth_gbps * 1e9);

    // GPU compute time
    let gpu_time = data_bytes as f64 / (gpu_throughput_gbps * 1e9);

    // CPU compute time (assume 4 GB/s baseline)
    let cpu_time = data_bytes as f64 / (4e9);

    // GPU beneficial if saves time
    cpu_time > (transfer_time + gpu_time) * 1.2  // 20% margin
}
}

Examples

PCIe 4.0 x16 (25 GB/s)

BatchData SizeTransferGPU TimeBeneficial?
100400 KB32 us4 usNo
1,0004 MB320 us40 usMarginal
10,00040 MB3.2 ms400 usYes
100,000400 MB32 ms4 msYes

PCIe 5.0 x16 (64 GB/s)

BatchData SizeTransferGPU TimeBeneficial?
1,0004 MB125 us40 usNo
10,00040 MB1.25 ms400 usYes
100,000400 MB12.5 ms4 msYes

Checking in Code

#![allow(unused)]
fn main() {
use trueno_zram_core::gpu::{GpuBatchCompressor, GpuBatchConfig};

let mut compressor = GpuBatchCompressor::new(config)?;
let result = compressor.compress_batch(&pages)?;

if result.pcie_rule_satisfied() {
    println!("GPU offload was beneficial");
    println!("Kernel time: {} ns", result.kernel_time_ns);
    println!("Transfer time: {} ns",
        result.h2d_time_ns + result.d2h_time_ns);
} else {
    println!("Consider using CPU for this batch size");
}
}

Backend Selection

#![allow(unused)]
fn main() {
use trueno_zram_core::gpu::{select_backend, BackendSelection, gpu_available};

let batch_size = 5000;
let has_gpu = gpu_available();

match select_backend(batch_size, has_gpu) {
    BackendSelection::Gpu => {
        // Use GPU batch compression
    }
    BackendSelection::Simd => {
        // Use CPU SIMD compression
    }
    BackendSelection::Scalar => {
        // Use scalar compression
    }
}
}

Optimization Tips

  1. Batch larger: Combine small batches into larger ones
  2. Use async DMA: Overlap transfers with computation
  3. Profile first: Measure actual transfer times
  4. Consider hybrid: Use CPU for small batches, GPU for large

Hardware-Specific Thresholds

GPUPCIeMin Beneficial Batch
H1005.0 x165,000 pages
A1004.0 x168,000 pages
RTX 40904.0 x1610,000 pages
RTX 30904.0 x1612,000 pages