Case Study: CUDA and GPU Backends

This chapter demonstrates how to configure aprender for different compute backends, including CPU SIMD, GPU (wgpu/WebGPU), and NVIDIA CUDA acceleration.

Overview

Aprender v0.18.0 introduces flexible backend configuration through the loading module, supporting:

BackendDescriptionUse Case
CpuSimdCPU with SIMD (AVX2/AVX-512/NEON)Default, works everywhere
GpuGPU via wgpu/WebGPU compute shadersCross-platform GPU acceleration
CudaNVIDIA CUDA via trueno-gpuMaximum performance on NVIDIA hardware
WasmWebAssemblyBrowser deployment
EmbeddedBare metal (no_std)IoT and embedded systems

Cargo.toml Configuration

Enable GPU or CUDA support in your Cargo.toml:

[dependencies]
# Default CPU SIMD backend
aprender = "0.18"

# With GPU acceleration (wgpu/WebGPU)
aprender = { version = "0.18", features = ["gpu"] }

# With NVIDIA CUDA support
aprender = { version = "0.18", features = ["cuda"] }

# Both GPU and CUDA
aprender = { version = "0.18", features = ["gpu", "cuda"] }

Backend Presets

Aprender provides preset configurations for common deployment scenarios:

Server Deployment (CPU SIMD)

use aprender::loading::LoadConfig;

let config = LoadConfig::server();
// - Backend: CpuSimd
// - Mode: MappedDemand (memory-mapped for large models)
// - Verification: Standard

GPU Deployment (wgpu/WebGPU)

use aprender::loading::LoadConfig;

let config = LoadConfig::gpu();
// - Backend: Gpu
// - Mode: MappedDemand
// - Cross-platform: Vulkan, Metal, DX12, WebGPU

NVIDIA CUDA Deployment

use aprender::loading::LoadConfig;

let config = LoadConfig::cuda();
// - Backend: Cuda
// - Mode: MappedDemand
// - Requires: NVIDIA driver + `cuda` feature

WASM Deployment (Browser)

use aprender::loading::LoadConfig;

let config = LoadConfig::wasm();
// - Backend: Wasm
// - Mode: Streaming (64MB memory limit)
// - For browser-based ML inference

Embedded Deployment (IoT)

use aprender::loading::LoadConfig;

let config = LoadConfig::embedded(64 * 1024); // 64KB memory budget
// - Backend: Embedded
// - Mode: Eager (deterministic)
// - Verification: Paranoid (NASA Level A)

Backend Properties

Each backend exposes properties to help you make runtime decisions:

use aprender::loading::Backend;

let backend = Backend::Cuda;

// Check if SIMD is available
assert!(!backend.supports_simd()); // CUDA uses GPU, not CPU SIMD

// Check if GPU accelerated
assert!(backend.is_gpu_accelerated()); // Yes!

// Check if NVIDIA driver required
assert!(backend.requires_nvidia_driver()); // Yes, CUDA needs NVIDIA

// Check if std library required
assert!(backend.requires_std()); // Yes (Embedded is no_std)

Custom Configurations

Build custom configurations using the builder pattern:

use aprender::loading::{Backend, LoadConfig, LoadingMode, VerificationLevel};
use std::time::Duration;

// High-performance CUDA configuration
let cuda_config = LoadConfig::new()
    .with_backend(Backend::Cuda)
    .with_mode(LoadingMode::Eager)           // Full load for low latency
    .with_verification(VerificationLevel::Paranoid)  // NASA Level A
    .with_max_memory(4 * 1024 * 1024 * 1024) // 4GB budget
    .with_time_budget(Duration::from_millis(500));

// GPU streaming for large models
let gpu_streaming = LoadConfig::new()
    .with_backend(Backend::Gpu)
    .with_mode(LoadingMode::Streaming)
    .with_streaming(2 * 1024 * 1024);  // 2MB ring buffer

Backend Comparison Matrix

PropertyCpuSimdGpuCudaWasmEmbedded
SIMD SupportYesNoNoNoNo
GPU AcceleratedNoYesYesNoNo
NVIDIA RequiredNoNoYesNoNo
Requires stdYesYesYesYesNo
Best ForGeneralCross-platform GPUMax NVIDIA perfBrowserIoT

Running the Example

# Default CPU SIMD backend
cargo run --example cuda_backend

# With GPU feature
cargo run --example cuda_backend --features gpu

# With CUDA feature (requires NVIDIA driver)
cargo run --example cuda_backend --features cuda

Toyota Way: Heijunka (Level Loading)

The backend system follows Toyota Way Heijunka principles:

  • Level resource demands: Each backend preset optimizes for its target environment
  • Jidoka (built-in quality): Verification levels ensure model integrity
  • Poka-yoke (error-proofing): Type-safe APIs prevent misconfiguration

Integration with trueno

Aprender's GPU support is powered by trueno, our SIMD-accelerated tensor library:

  • trueno: Core SIMD operations (CPU backend)
  • trueno/gpu: wgpu-based GPU compute shaders
  • trueno/cuda-monitor: NVIDIA CUDA integration via trueno-gpu

The trueno-gpu crate provides:

  • Pure Rust PTX generation (no LLVM, no nvcc)
  • Runtime CUDA driver loading
  • Device monitoring and memory metrics

Example Output

=== Aprender Backend Configuration Demo ===

1. CPU SIMD Backend (Default)
   -------------------------
   Backend: CpuSimd
   Supports SIMD: true
   GPU Accelerated: false
   Requires NVIDIA: false
   Requires std: true

2. GPU Backend (wgpu/WebGPU)
   -------------------------
   Backend: Gpu
   Supports SIMD: false
   GPU Accelerated: true
   Requires NVIDIA: false

3. NVIDIA CUDA Backend
   --------------------
   Backend: Cuda
   Supports SIMD: false
   GPU Accelerated: true
   Requires NVIDIA: true

4. Backend Comparison
   ------------------
   | Backend   | SIMD | GPU Accel | NVIDIA Req | std Req |
   |-----------|------|-----------|------------|---------|
   | CpuSimd   | Yes  | No        | No         | Yes     |
   | Gpu       | No   | Yes       | No         | Yes     |
   | Cuda      | No   | Yes       | Yes        | Yes     |
   | Wasm      | No   | No        | No         | Yes     |
   | Embedded  | No   | No        | No         | No      |

See Also