ComputeBrick Architecture

The Oracle of Compute

trueno is the "Oracle" of the ComputeBrick ecosystem. specifically trueno/src/brick.rs. It defines the ComputeBrick trait, TokenBudget, and BrickProfiler logic. It is the central dependency that realizar (inference), aprender (algorithms), and cbtop (visualization) all import to mathematically verify if performance and correctness assertions are met.

Core Concepts

A ComputeBrick is a self-verifying, token-centric compute unit that bundles:

  1. Operation: The compute operation (matmul, dot, softmax, etc.)
  2. Assertions: Falsifiable claims about the output (equivalence, bounds)
  3. Budget: Performance target in µs/token or tokens/sec
  4. Backend: Execution target (Scalar, AVX2, CUDA, etc.)

The "Pure Rust" Invariant

The ComputeBrick architecture enforces a "Pure Rust" stack.

  • No FFI to C++ libraries (like llama.cpp or ggml) for core compute.
  • Direct GPU Control: Use trueno-gpu for PTX generation and wgpu for cross-platform support.
  • Safety: unsafe is encapsulated strictly within Brick boundaries.

TokenBudget

Performance is not measured in abstract FLOPS, but in Tokens per Second (tok/s) or Microseconds per Token (µs/token).

pub struct TokenBudget {
    /// Latency budget per token (microseconds)
    pub us_per_token: f64,
    /// Throughput target (tokens/second)
    pub tokens_per_sec: f64,
}

This aligns low-level compute optimization directly with high-level LLM inference goals.

BrickProfiler

The BrickProfiler is the mechanism for "Real Profiling".

  • Real Measurements: It measures actual execution time using std::time::Instant.
  • Synchronization: For GPU operations, it mandates cudaDeviceSynchronize() (or equivalent) before start and after stop to ensure accurate timing.
  • Falsification: Derived or simulated metrics are explicitly FORBIDDEN.
// Example of Real Profiling
profiler.start("QkvBrick");
cuda_stream.synchronize(); // Ensure pre-reqs done
// ... execute kernel ...
cuda_stream.synchronize(); // Ensure kernel done
profiler.stop("QkvBrick", num_tokens);

Sovereign Stack Profiling Mandate

Every component in the Sovereign Stack MUST implement REAL BrickProfiler timing:

ComponentRepositoryMetricImplementation
truenotruenoSIMD Ops/secInstant::now()
trueno-gputruenoKernel LatencycudaEventRecord
trueno-zramtruenoCompression GB/sInstant + Batch
aprenderaprenderAlgorithm LatencyBrickProfiler
realizaraprenderInference LatencycudaDeviceSynchronize
presentaraprenderFrame TimerequestAnimationFrame

Integration

trueno provides the types. realizar implements the Bricks (e.g., QkvBrick, AttentionBrick). aprender uses Bricks for ML algorithms. cbtop visualizes the BrickProfiler output.