ComputeBrick Architecture
The Oracle of Compute
truenois the "Oracle" of the ComputeBrick ecosystem. specificallytrueno/src/brick.rs. It defines theComputeBricktrait,TokenBudget, andBrickProfilerlogic. It is the central dependency thatrealizar(inference),aprender(algorithms), andcbtop(visualization) all import to mathematically verify if performance and correctness assertions are met.
Core Concepts
A ComputeBrick is a self-verifying, token-centric compute unit that bundles:
- Operation: The compute operation (matmul, dot, softmax, etc.)
- Assertions: Falsifiable claims about the output (equivalence, bounds)
- Budget: Performance target in µs/token or tokens/sec
- Backend: Execution target (Scalar, AVX2, CUDA, etc.)
The "Pure Rust" Invariant
The ComputeBrick architecture enforces a "Pure Rust" stack.
- No FFI to C++ libraries (like llama.cpp or ggml) for core compute.
- Direct GPU Control: Use
trueno-gpufor PTX generation andwgpufor cross-platform support. - Safety:
unsafeis encapsulated strictly within Brick boundaries.
TokenBudget
Performance is not measured in abstract FLOPS, but in Tokens per Second (tok/s) or Microseconds per Token (µs/token).
pub struct TokenBudget {
/// Latency budget per token (microseconds)
pub us_per_token: f64,
/// Throughput target (tokens/second)
pub tokens_per_sec: f64,
}
This aligns low-level compute optimization directly with high-level LLM inference goals.
BrickProfiler
The BrickProfiler is the mechanism for "Real Profiling".
- Real Measurements: It measures actual execution time using
std::time::Instant. - Synchronization: For GPU operations, it mandates
cudaDeviceSynchronize()(or equivalent) before start and after stop to ensure accurate timing. - Falsification: Derived or simulated metrics are explicitly FORBIDDEN.
// Example of Real Profiling
profiler.start("QkvBrick");
cuda_stream.synchronize(); // Ensure pre-reqs done
// ... execute kernel ...
cuda_stream.synchronize(); // Ensure kernel done
profiler.stop("QkvBrick", num_tokens);
Sovereign Stack Profiling Mandate
Every component in the Sovereign Stack MUST implement REAL BrickProfiler timing:
| Component | Repository | Metric | Implementation |
|---|---|---|---|
| trueno | trueno | SIMD Ops/sec | Instant::now() |
| trueno-gpu | trueno | Kernel Latency | cudaEventRecord |
| trueno-zram | trueno | Compression GB/s | Instant + Batch |
| aprender | aprender | Algorithm Latency | BrickProfiler |
| realizar | aprender | Inference Latency | cudaDeviceSynchronize |
| presentar | aprender | Frame Time | requestAnimationFrame |
Integration
trueno provides the types.
realizar implements the Bricks (e.g., QkvBrick, AttentionBrick).
aprender uses Bricks for ML algorithms.
cbtop visualizes the BrickProfiler output.