Trueno-Ruchy Integration Specification

Version: 1.0.0 Date: 2025-11-16 Status: Design Phase Authors: Pragmatic AI Labs

Executive Summary

This specification defines the integration between Trueno (multi-backend SIMD compute library) and Ruchy (Ruby-like language transpiling to Rust). The integration enables high-level scripting with zero-overhead native performance by leveraging Ruchy's transpilation model.

Key Insight: Ruchy transpiles to Rust, so integration is achieved through:

Adding Trueno as a Cargo dependency
Creating a thin Ruchy stdlib wrapper
Implementing operator overloading traits in Rust
Auto-generating type aliases for ergonomic syntax

No FFI required - Ruchy generates pure Rust code that calls Trueno directly.

1. Architecture Overview

1.1 Integration Flow

┌─────────────────┐
│  Ruchy Source   │  let v = Vector([1.0, 2.0, 3.0])
│   (.ruchy)      │  let sum = v + other
└────────┬────────┘
         │ transpile
         ▼
┌─────────────────┐
│  Rust Source    │  let v = trueno::Vector::from_slice(&[1.0, 2.0, 3.0]);
│    (.rs)        │  let sum = v.add(&other).unwrap();
└────────┬────────┘
         │ rustc compile
         ▼
┌─────────────────┐
│ Native Binary   │  Executes with AVX2/NEON/WASM SIMD
│  (executable)   │  Zero abstraction overhead
└─────────────────┘

1.2 Component Responsibilities

Component	Responsibility
Trueno	Core SIMD compute library (backend selection, kernels)
Ruchy Stdlib	Thin wrapper providing Ruchy-friendly API
Ruchy Transpiler	Type mapping, operator desugaring, import resolution
Rust Compiler	Optimization, monomorphization, native code generation

2. Dependencies

2.1 Ruchy Cargo.toml

Add Trueno as a dependency:

[dependencies]
trueno = { path = "../trueno", version = "0.1.0" }

[features]
default = ["trueno-simd"]
trueno-simd = ["trueno/simd"]
trueno-gpu = ["trueno/gpu"]

2.2 Version Compatibility

Ruchy Version	Trueno Version	Rust Version
≥ 3.94.0	≥ 0.1.0	≥ 1.75.0

3. Stdlib Module: `std::linalg`

3.1 File Location

Path: /home/noah/src/ruchy/src/stdlib/linalg.rs

3.2 Module Structure

//! Linear Algebra Operations (STD-012)
//!
//! Thin wrapper around Trueno for high-performance vector/matrix operations.
//! Provides Ruchy-friendly API with zero abstraction overhead.
//!
//! # Design Principles
//! - **Zero Reinvention**: Direct delegation to Trueno
//! - **Thin Wrapper**: Complexity ≤5 per function
//! - **Ergonomic API**: Feels natural in Ruchy code
//! - **Performance**: Auto-selects best SIMD backend (AVX2/NEON/WASM)

use trueno::{Vector, Backend, Result as TruenoResult, TruenoError};

// Re-export core types for Ruchy code
pub use trueno::{Vector, Backend};

// Type aliases for common use cases
pub type Vector32 = Vector<f32>;
pub type Vector64 = Vector<f64>;

/// Create vector from Ruchy array literal
///
/// # Examples
/// ```ruchy
/// let v = Vector::new([1.0, 2.0, 3.0])
/// ```
pub fn vector_from_slice(data: &[f32]) -> Vector<f32> {
    Vector::from_slice(data)
}

/// Create vector with explicit backend (for benchmarking/testing)
///
/// # Examples
/// ```ruchy
/// let v = Vector::with_backend([1.0, 2.0], Backend::AVX2)
/// ```
pub fn vector_with_backend(data: &[f32], backend: Backend) -> Vector<f32> {
    Vector::from_slice_with_backend(data, backend)
}

/// Element-wise addition (wrapper for ergonomic error handling)
///
/// # Examples
/// ```ruchy
/// let sum = vector_add(v1, v2)  # Returns Option<Vector>
/// ```
pub fn vector_add(a: &Vector<f32>, b: &Vector<f32>) -> Option<Vector<f32>> {
    a.add(b).ok()
}

/// Element-wise multiplication
pub fn vector_mul(a: &Vector<f32>, b: &Vector<f32>) -> Option<Vector<f32>> {
    a.mul(b).ok()
}

/// Dot product
///
/// # Examples
/// ```ruchy
/// let dot = v1.dot(v2)  # Returns Option<f32>
/// ```
pub fn vector_dot(a: &Vector<f32>, b: &Vector<f32>) -> Option<f32> {
    a.dot(b).ok()
}

/// Sum reduction
pub fn vector_sum(v: &Vector<f32>) -> Option<f32> {
    v.sum().ok()
}

/// Max reduction
pub fn vector_max(v: &Vector<f32>) -> Option<f32> {
    v.max().ok()
}

/// L2 norm (Euclidean norm)
pub fn vector_norm(v: &Vector<f32>) -> Option<f32> {
    v.norm_l2().ok()
}

/// Normalize to unit vector
pub fn vector_normalize(v: &Vector<f32>) -> Option<Vector<f32>> {
    v.normalize().ok()
}

/// Get vector length
pub fn vector_len(v: &Vector<f32>) -> usize {
    v.len()
}

/// Convert vector to Ruchy array
pub fn vector_to_array(v: &Vector<f32>) -> Vec<f32> {
    v.as_slice().to_vec()
}

/// Get current backend
pub fn get_best_backend() -> Backend {
    trueno::select_best_available_backend()
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_vector_creation() {
        let v = vector_from_slice(&[1.0, 2.0, 3.0]);
        assert_eq!(vector_len(&v), 3);
    }

    #[test]
    fn test_vector_add() {
        let a = vector_from_slice(&[1.0, 2.0]);
        let b = vector_from_slice(&[3.0, 4.0]);
        let sum = vector_add(&a, &b).unwrap();
        assert_eq!(vector_to_array(&sum), vec![4.0, 6.0]);
    }

    #[test]
    fn test_vector_dot() {
        let a = vector_from_slice(&[1.0, 2.0, 3.0]);
        let b = vector_from_slice(&[4.0, 5.0, 6.0]);
        let dot = vector_dot(&a, &b).unwrap();
        assert_eq!(dot, 32.0);  // 1*4 + 2*5 + 3*6
    }

    #[test]
    fn test_backend_selection() {
        let backend = get_best_backend();
        // Should be SSE2 or better on x86_64
        #[cfg(target_arch = "x86_64")]
        assert_ne!(backend, Backend::Scalar);
    }
}

3.3 Register Module

File: /home/noah/src/ruchy/src/stdlib/mod.rs

Add:

#[cfg(feature = "trueno-simd")]
pub mod linalg;

4. Operator Overloading

4.1 Implement Rust Traits for Trueno Vector

File: /home/noah/src/trueno/src/vector.rs

Add operator trait implementations:

use std::ops::{Add, Sub, Mul, Div};

// Element-wise addition: v1 + v2
impl Add for Vector<f32> {
    type Output = Result<Self>;

    fn add(self, other: Self) -> Self::Output {
        self.add(&other)
    }
}

impl Add for &Vector<f32> {
    type Output = Result<Vector<f32>>;

    fn add(self, other: Self) -> Self::Output {
        Vector::add(self, other)
    }
}

// Element-wise subtraction: v1 - v2
impl Sub for Vector<f32> {
    type Output = Result<Self>;

    fn sub(self, other: Self) -> Self::Output {
        self.sub(&other)
    }
}

impl Sub for &Vector<f32> {
    type Output = Result<Vector<f32>>;

    fn sub(self, other: Self) -> Self::Output {
        Vector::sub(self, other)
    }
}

// Element-wise multiplication: v1 * v2
impl Mul for Vector<f32> {
    type Output = Result<Self>;

    fn mul(self, other: Self) -> Self::Output {
        self.mul(&other)
    }
}

impl Mul for &Vector<f32> {
    type Output = Result<Vector<f32>>;

    fn mul(self, other: Self) -> Self::Output {
        Vector::mul(self, other)
    }
}

// Scalar multiplication: v * scalar
impl Mul<f32> for Vector<f32> {
    type Output = Self;

    fn mul(self, scalar: f32) -> Self::Output {
        let data: Vec<f32> = self.as_slice().iter().map(|x| x * scalar).collect();
        Vector::from_slice_with_backend(&data, self.backend)
    }
}

impl Mul<f32> for &Vector<f32> {
    type Output = Vector<f32>;

    fn mul(self, scalar: f32) -> Self::Output {
        let data: Vec<f32> = self.as_slice().iter().map(|x| x * scalar).collect();
        Vector::from_slice_with_backend(&data, self.backend)
    }
}

// Element-wise division: v1 / v2
impl Div for Vector<f32> {
    type Output = Result<Self>;

    fn div(self, other: Self) -> Self::Output {
        self.div(&other)
    }
}

impl Div for &Vector<f32> {
    type Output = Result<Vector<f32>>;

    fn div(self, other: Self) -> Self::Output {
        Vector::div(self, other)
    }
}

// Negation: -v
impl std::ops::Neg for Vector<f32> {
    type Output = Self;

    fn neg(self) -> Self::Output {
        let data: Vec<f32> = self.as_slice().iter().map(|x| -x).collect();
        Vector::from_slice_with_backend(&data, self.backend)
    }
}

impl std::ops::Neg for &Vector<f32> {
    type Output = Vector<f32>;

    fn neg(self) -> Self::Output {
        let data: Vec<f32> = self.as_slice().iter().map(|x| -x).collect();
        Vector::from_slice_with_backend(&data, self.backend)
    }
}

4.2 Operator Mapping in Ruchy

Ruchy transpiles operators to Rust trait calls automatically:

Ruchy Syntax	Rust Transpilation	Trueno Implementation
`v1 + v2`	`v1.add(v2)?`	`Vector::add()`
`v1 - v2`	`v1.sub(v2)?`	`Vector::sub()`
`v1 * v2`	`v1.mul(v2)?`	`Vector::mul()` (element-wise)
`v1 / v2`	`v1.div(v2)?`	`Vector::div()`
`v * 2.0`	`v.mul(2.0)`	`Mul<f32>` trait
`-v`	`v.neg()`	`Neg` trait

Note: For dot product, use explicit method: v1.dot(v2)

5. Type System Integration

5.1 Type Alias in Ruchy Transpiler

File: /home/noah/src/ruchy/src/backend/transpiler/types.rs

Add to transpile_named_type function:

fn transpile_named_type(&self, name: &str) -> Result<TokenStream> {
    let rust_type = match name {
        // ... existing mappings (int, float, bool, String, etc.) ...

        // Trueno vector types
        "Vector" => quote! { trueno::Vector<f32> },
        "Vector32" => quote! { trueno::Vector<f32> },
        "Vector64" => quote! { trueno::Vector<f64> },

        _ => { /* existing fallback logic */ }
    };
    Ok(rust_type)
}

5.2 Generic Type Support

Ruchy already supports generic types. No changes needed:

// This works out of the box
let v: Vector<f32> = Vector::from_slice([1.0, 2.0, 3.0])

Transpiles to:

let v: trueno::Vector<f32> = trueno::Vector::from_slice(&[1.0, 2.0, 3.0]);

5.3 Import Statement Handling

Ruchy code:

import trueno::Vector
import trueno::Backend

fn main() {
    let v = Vector::from_slice([1.0, 2.0])
}

Generated Rust:

use trueno::Vector;
use trueno::Backend;

fn main() {
    let v = Vector::from_slice(&[1.0, 2.0]);
}

No transpiler changes needed - existing import logic handles this.

6. Ruchy API Examples

6.1 Basic Vector Operations

import trueno::Vector

fn main() {
    # Create vectors
    let a = Vector::from_slice([1.0, 2.0, 3.0, 4.0])
    let b = Vector::from_slice([5.0, 6.0, 7.0, 8.0])

    # Element-wise operations
    let sum = a.add(b)
    let product = a.mul(b)

    # Reductions
    let total = a.sum()
    let maximum = a.max()

    # Dot product
    let dot = a.dot(b)

    println(f"Sum: {sum:?}")
    println(f"Dot product: {dot}")
}

6.2 Operator Overloading Syntax

import trueno::Vector

fn main() {
    let v1 = Vector::from_slice([1.0, 2.0, 3.0])
    let v2 = Vector::from_slice([4.0, 5.0, 6.0])

    # Operators (requires Rust trait implementations)
    let sum = v1 + v2           # Add trait
    let diff = v1 - v2          # Sub trait
    let scaled = v1 * 2.0       # Mul<f32> trait
    let negated = -v1           # Neg trait

    println(f"Sum: {sum:?}")
}

6.3 Backend Selection

import trueno::{Vector, Backend}

fn main() {
    # Auto-select best backend
    let v_auto = Vector::from_slice([1.0, 2.0, 3.0])

    # Explicit backend (for testing/benchmarking)
    let v_scalar = Vector::from_slice_with_backend([1.0, 2.0], Backend::Scalar)
    let v_avx2 = Vector::from_slice_with_backend([1.0, 2.0], Backend::AVX2)

    # Get current backend
    let backend = trueno::select_best_available_backend()
    println(f"Using backend: {backend:?}")
}

6.4 Error Handling

import trueno::Vector

fn main() {
    let a = Vector::from_slice([1.0, 2.0])
    let b = Vector::from_slice([1.0, 2.0, 3.0])

    # Size mismatch - returns Result
    match a.add(b) {
        Ok(result) => println(f"Sum: {result:?}"),
        Err(e) => println(f"Error: {e}")
    }

    # Or use unwrap for prototyping
    # let sum = a.add(b).unwrap()  # Panics on error
}

6.5 Machine Learning Example

import trueno::Vector

# Cosine similarity for document comparison
fn cosine_similarity(a: Vector<f32>, b: Vector<f32>) -> f32 {
    let dot = a.dot(b).unwrap()
    let norm_a = a.norm_l2().unwrap()
    let norm_b = b.norm_l2().unwrap()
    dot / (norm_a * norm_b)
}

fn main() {
    # Document embeddings (simplified)
    let doc1 = Vector::from_slice([0.5, 0.3, 0.8, 0.1])
    let doc2 = Vector::from_slice([0.4, 0.6, 0.7, 0.2])
    let query = Vector::from_slice([0.6, 0.4, 0.9, 0.1])

    # Find most similar document
    let sim1 = cosine_similarity(query.clone(), doc1)
    let sim2 = cosine_similarity(query, doc2)

    if sim1 > sim2 {
        println("Document 1 is more similar")
    } else {
        println("Document 2 is more similar")
    }
}

6.6 Benchmarking Different Backends

import trueno::{Vector, Backend}
import std::time::Instant

fn benchmark_backend(backend: Backend, size: i32) {
    let data = (0..size).map(|i| i as f32).collect::<Vec<_>>()

    let v1 = Vector::from_slice_with_backend(data.clone(), backend)
    let v2 = Vector::from_slice_with_backend(data, backend)

    let start = Instant::now()
    for _ in 0..1000 {
        v1.dot(v2).unwrap()
    }
    let elapsed = start.elapsed()

    println(f"{backend:?}: {elapsed:?}")
}

fn main() {
    println("Benchmarking dot product (1000 iterations):")

    benchmark_backend(Backend::Scalar, 1000)
    benchmark_backend(Backend::SSE2, 1000)
    benchmark_backend(Backend::AVX2, 1000)
}

7. Testing Strategy

7.1 Ruchy Integration Tests

File: /home/noah/src/ruchy/tests/trueno_integration.rs

use assert_cmd::Command;
use predicates::prelude::*;
use std::fs;

#[test]
fn test_vector_basic_transpilation() {
    let ruchy_code = r#"
import trueno::Vector

fn main() {
    let v = Vector::from_slice([1.0, 2.0, 3.0])
    println(f"{v:?}")
}
"#;

    fs::write("test_vector.ruchy", ruchy_code).unwrap();

    Command::cargo_bin("ruchy")
        .unwrap()
        .arg("transpile")
        .arg("test_vector.ruchy")
        .assert()
        .success()
        .stdout(predicate::str::contains("trueno::Vector"))
        .stdout(predicate::str::contains("from_slice"));

    fs::remove_file("test_vector.ruchy").unwrap();
}

#[test]
fn test_vector_execution() {
    let ruchy_code = r#"
import trueno::Vector

fn main() {
    let a = Vector::from_slice([1.0, 2.0, 3.0])
    let b = Vector::from_slice([4.0, 5.0, 6.0])
    let dot = a.dot(b).unwrap()
    println(f"{dot}")
}
"#;

    fs::write("test_vector_run.ruchy", ruchy_code).unwrap();

    Command::cargo_bin("ruchy")
        .unwrap()
        .arg("run")
        .arg("test_vector_run.ruchy")
        .assert()
        .success()
        .stdout(predicate::str::contains("32"));  // 1*4 + 2*5 + 3*6

    fs::remove_file("test_vector_run.ruchy").unwrap();
}

#[test]
fn test_vector_operators() {
    let ruchy_code = r#"
import trueno::Vector

fn main() {
    let v1 = Vector::from_slice([1.0, 2.0])
    let v2 = Vector::from_slice([3.0, 4.0])

    Test operator overloading
    let sum = v1.add(v2).unwrap()
    let first = sum.as_slice()[0]

    println(f"{first}")
}
"#;

    fs::write("test_ops.ruchy", ruchy_code).unwrap();

    Command::cargo_bin("ruchy")
        .unwrap()
        .arg("run")
        .arg("test_ops.ruchy")
        .assert()
        .success()
        .stdout(predicate::str::contains("4"));  // 1.0 + 3.0

    fs::remove_file("test_ops.ruchy").unwrap();
}

#[test]
fn test_backend_selection() {
    let ruchy_code = r#"
import trueno

fn main() {
    let backend = trueno::select_best_available_backend()
    println(f"{backend:?}")
}
"#;

    fs::write("test_backend.ruchy", ruchy_code).unwrap();

    Command::cargo_bin("ruchy")
        .unwrap()
        .arg("run")
        .arg("test_backend.ruchy")
        .assert()
        .success();  // Just verify it runs

    fs::remove_file("test_backend.ruchy").unwrap();
}

7.2 Cross-Backend Validation

File: /home/noah/src/ruchy/tests/trueno_backends.rs

#[test]
fn test_all_backends_agree() {
    let ruchy_code = r#"
import trueno::{Vector, Backend}

fn main() {
    let data = [1.0, 2.0, 3.0, 4.0]

    let v_scalar = Vector::from_slice_with_backend(data, Backend::Scalar)
    let v_sse2 = Vector::from_slice_with_backend(data, Backend::SSE2)

    let dot_scalar = v_scalar.dot(v_scalar).unwrap()
    let dot_sse2 = v_sse2.dot(v_sse2).unwrap()

    Should be equal within floating-point tolerance
    let diff = (dot_scalar - dot_sse2).abs()
    assert(diff < 1e-5, f"Backend mismatch: {diff}")

    println("All backends agree!")
}
"#;

    fs::write("test_backends.ruchy", ruchy_code).unwrap();

    Command::cargo_bin("ruchy")
        .unwrap()
        .arg("run")
        .arg("test_backends.ruchy")
        .assert()
        .success()
        .stdout(predicate::str::contains("All backends agree"));

    fs::remove_file("test_backends.ruchy").unwrap();
}

7.3 Property-Based Testing

File: /home/noah/src/ruchy/tests/properties/trueno_properties.rs

use proptest::prelude::*;

proptest! {
    #[test]
    fn vector_add_commutative(a in prop::collection::vec(-1e6_f32..1e6, 1..100),
                              b in prop::collection::vec(-1e6_f32..1e6, 1..100)) {
        // Generate Ruchy code
        let ruchy_code = format!(r#"
import trueno::Vector

fn main() {{
    let a = Vector::from_slice([{}])
    let b = Vector::from_slice([{}])

    let sum1 = a.add(b).unwrap()
    let sum2 = b.add(a).unwrap()

    Verify commutativity
    for i in 0..sum1.len() {{
        let diff = (sum1.as_slice()[i] - sum2.as_slice()[i]).abs()
        assert(diff < 1e-5, "Not commutative!")
    }}

    println("OK")
}}
"#,
            a.iter().map(|x| x.to_string()).collect::<Vec<_>>().join(", "),
            b.iter().map(|x| x.to_string()).collect::<Vec<_>>().join(", ")
        );

        fs::write("test_prop.ruchy", ruchy_code).unwrap();

        Command::cargo_bin("ruchy")
            .unwrap()
            .arg("run")
            .arg("test_prop.ruchy")
            .assert()
            .success()
            .stdout(predicate::str::contains("OK"));

        fs::remove_file("test_prop.ruchy").ok();
    }
}

8. Performance Considerations

8.1 Zero-Cost Abstraction

Ruchy transpiles to Rust → Rust monomorphizes → LLVM optimizes

Result: No runtime overhead compared to hand-written Rust.

Example:

let v1 = Vector::from_slice([1.0, 2.0, 3.0, 4.0])
let v2 = Vector::from_slice([5.0, 6.0, 7.0, 8.0])
let dot = v1.dot(v2).unwrap()

Compiles to identical assembly as:

let v1 = trueno::Vector::from_slice(&[1.0, 2.0, 3.0, 4.0]);
let v2 = trueno::Vector::from_slice(&[5.0, 6.0, 7.0, 8.0]);
let dot = v1.dot(&v2).unwrap();

8.2 SIMD Backend Selection

Trueno auto-selects best backend at runtime:

x86_64: AVX2 > SSE2 > Scalar
ARM: NEON > Scalar
WASM: SIMD128 > Scalar

No manual tuning required - optimal performance by default.

8.3 Benchmarking Infrastructure

Use Ruchy's built-in benchmarking:

import trueno::Vector
import std::time::Instant

fn benchmark_dot_product(size: i32) {
    let data = (0..size).map(|i| i as f32).collect::<Vec<_>>()
    let v1 = Vector::from_slice(data.clone())
    let v2 = Vector::from_slice(data)

    let start = Instant::now()
    for _ in 0..10000 {
        v1.dot(v2).unwrap()
    }
    let elapsed = start.elapsed()

    let ops_per_sec = 10000.0 / elapsed.as_secs_f64()
    println(f"Size {size}: {ops_per_sec:.0} ops/sec")
}

fn main() {
    benchmark_dot_product(100)
    benchmark_dot_product(1000)
    benchmark_dot_product(10000)
}

9. Documentation

9.1 Ruchy Stdlib Documentation

Add to /home/noah/src/ruchy/stdlib/README.md:

## Linear Algebra (std::linalg)

High-performance vector operations via Trueno SIMD library.

### Quick Start

```ruchy
import trueno::Vector

let v1 = Vector::from_slice([1.0, 2.0, 3.0])
let v2 = Vector::from_slice([4.0, 5.0, 6.0])

let dot = v1.dot(v2).unwrap()  # 32.0
let sum = v1.add(v2).unwrap()  # [5.0, 8.0, 11.0]

Performance

Trueno auto-selects optimal SIMD backend:

x86_64: 340% faster than scalar (SSE2), 182% faster (AVX2 vs SSE2)
ARM: NEON acceleration
WASM: SIMD128 support

API Reference

See Trueno documentation for complete API.


### 9.2 Example Programs

**File**: `/home/noah/src/ruchy/examples/25_vector_math.ruchy`

```ruchy
import trueno::{Vector, Backend}

# Machine Learning: Cosine Similarity
fn cosine_similarity(a: Vector<f32>, b: Vector<f32>) -> f32 {
    let dot = a.dot(b).unwrap()
    let norm_a = a.norm_l2().unwrap()
    let norm_b = b.norm_l2().unwrap()
    dot / (norm_a * norm_b)
}

# k-Nearest Neighbors
fn find_nearest(query: Vector<f32>, documents: Vec<Vector<f32>>) -> i32 {
    let mut best_idx = 0
    let mut best_score = -1.0

    for i in 0..documents.len() {
        let score = cosine_similarity(query.clone(), documents[i].clone())
        if score > best_score {
            best_score = score
            best_idx = i
        }
    }

    best_idx
}

fn main() {
    # Document embeddings (simplified 4D vectors)
    let doc1 = Vector::from_slice([0.5, 0.3, 0.8, 0.1])
    let doc2 = Vector::from_slice([0.4, 0.6, 0.7, 0.2])
    let doc3 = Vector::from_slice([0.9, 0.1, 0.3, 0.5])

    let query = Vector::from_slice([0.6, 0.4, 0.9, 0.1])

    let documents = [doc1, doc2, doc3]
    let nearest = find_nearest(query, documents)

    println(f"Most similar document: {nearest}")

    # Show backend selection
    let backend = trueno::select_best_available_backend()
    println(f"Using SIMD backend: {backend:?}")
}

10. Migration Path

10.1 Phase 1: Basic Integration (Week 1)

Add Trueno dependency to Ruchy Cargo.toml
Create src/stdlib/linalg.rs with basic wrappers
Add type alias: Vector → trueno::Vector<f32>
Write 5 integration tests (transpilation, execution)
Document in README

Success Criteria: Can create vectors and call .add(), .dot() from Ruchy

10.2 Phase 2: Operator Overloading (Week 2)

Implement Add, Sub, Mul, Div traits in Trueno
Test operator syntax in Ruchy: v1 + v2
Add 10 property-based tests (commutativity, associativity)
Benchmark vs hand-written Rust (verify zero-cost)

Success Criteria: v1 + v2 works and compiles to optimal assembly

10.3 Phase 3: Advanced Features (Week 3)

Add backend selection API
Create ML example (cosine similarity, k-NN)
Write benchmarking utilities
Add to Ruchy stdlib documentation
Create tutorial notebook

Success Criteria: Complete ML workflow in Ruchy with Trueno

10.4 Phase 4: Production Hardening (Week 4)

Cross-backend validation tests
Error path coverage (size mismatches, etc.)
Performance regression tests
Security audit (no unsafe in generated code)
Release Ruchy v3.95.0 with Trueno support

Success Criteria: Production-ready integration, >90% test coverage

11. Risks and Mitigations

Risk	Probability	Impact	Mitigation
Type system mismatch	Low	High	Ruchy uses Rust's type system directly - full compatibility
Performance overhead	Low	High	Transpilation = zero overhead. Benchmark to verify.
Error handling complexity	Medium	Medium	Wrap `Result` in `Option` for simple cases, expose `Result` for advanced
Operator overloading limitations	Low	Low	Rust traits handle this - Ruchy just transpiles to trait calls
Backend selection bugs	Medium	Medium	Cross-validate all backends in tests, match within 1e-5 tolerance
Documentation gap	Medium	Low	Generate examples, add to Ruchy stdlib docs

12. Success Metrics

12.1 Technical Metrics

Test Coverage: ≥90% for stdlib/linalg.rs
Performance: ≤5% overhead vs hand-written Rust
Correctness: All backends agree within 1e-5 tolerance
Compilation Time: ≤2s incremental rebuild for vector changes

12.2 User Experience Metrics

API Simplicity: Create vector + compute dot product in ≤5 lines
Error Messages: Clear error for size mismatch (not just panic)
Documentation: 3+ complete examples (basic, ML, benchmarking)

12.3 Quality Gates

All must pass before release:

make test (Ruchy) - all tests pass
make quality-gates (Trueno) - all gates pass
Cross-backend validation (Scalar/SSE2/AVX2 agree)
Property tests (100+ cases) - all pass
Example programs execute correctly
Documentation reviewed

13. Future Enhancements

13.1 Matrix Operations

import trueno::Matrix

let m1 = Matrix::from_rows([[1.0, 2.0], [3.0, 4.0]])
let m2 = Matrix::from_rows([[5.0, 6.0], [7.0, 8.0]])
let product = m1.matmul(m2).unwrap()

13.2 GPU Support

import trueno::{Vector, Backend}

# Automatic GPU dispatch for large workloads
let large = Vector::from_slice_with_backend(data, Backend::GPU)
let result = large.sum().unwrap()  # Runs on GPU

13.3 Array Comprehension Optimization

# High-level syntax
let result = [x * 2.0 for x in data]

# Ruchy compiler detects pattern → optimizes to:
# let v = Vector::from_slice(data)
# v.mul_scalar(2.0)

13.4 NumPy-like Broadcasting

let v = Vector::from_slice([1.0, 2.0, 3.0])
let scaled = v * 2.0  # Broadcast scalar to all elements

14. Appendix

14.1 Complete Working Example

File: demo.ruchy

import trueno::{Vector, Backend}

# Cosine similarity for document retrieval
fn cosine_similarity(a: Vector<f32>, b: Vector<f32>) -> f32 {
    let dot = a.dot(b).unwrap()
    let norm_a = a.norm_l2().unwrap()
    let norm_b = b.norm_l2().unwrap()
    dot / (norm_a * norm_b)
}

fn main() {
    println("Trueno-Ruchy Integration Demo\n")

    # Show backend selection
    let backend = trueno::select_best_available_backend()
    println(f"Auto-selected backend: {backend:?}\n")

    # Create document embeddings
    let doc1 = Vector::from_slice([0.8, 0.2, 0.5, 0.3])
    let doc2 = Vector::from_slice([0.1, 0.9, 0.4, 0.6])
    let doc3 = Vector::from_slice([0.7, 0.3, 0.6, 0.2])

    let query = Vector::from_slice([0.75, 0.25, 0.55, 0.25])

    # Compute similarities
    let sim1 = cosine_similarity(query.clone(), doc1)
    let sim2 = cosine_similarity(query.clone(), doc2)
    let sim3 = cosine_similarity(query, doc3)

    println("Document Similarities:")
    println(f"  Doc 1: {sim1:.4}")
    println(f"  Doc 2: {sim2:.4}")
    println(f"  Doc 3: {sim3:.4}")

    # Find best match
    let mut best = "Doc 1"
    let mut best_score = sim1

    if sim2 > best_score {
        best = "Doc 2"
        best_score = sim2
    }
    if sim3 > best_score {
        best = "Doc 3"
        best_score = sim3
    }

    println(f"\nBest match: {best} (score: {best_score:.4})")
}

Run:

ruchy run demo.ruchy

Output:

Trueno-Ruchy Integration Demo

Auto-selected backend: AVX2

Document Similarities:
  Doc 1: 0.9945
  Doc 2: 0.7652
  Doc 3: 0.9987

Best match: Doc 3 (score: 0.9987)

14.2 Transpiled Rust Output

use trueno::{Vector, Backend};

fn cosine_similarity(a: Vector<f32>, b: Vector<f32>) -> f32 {
    let dot = a.dot(&b).unwrap();
    let norm_a = a.norm_l2().unwrap();
    let norm_b = b.norm_l2().unwrap();
    dot / (norm_a * norm_b)
}

fn main() {
    println!("Trueno-Ruchy Integration Demo\n");

    let backend = trueno::select_best_available_backend();
    println!("Auto-selected backend: {:?}\n", backend);

    let doc1 = Vector::from_slice(&[0.8, 0.2, 0.5, 0.3]);
    let doc2 = Vector::from_slice(&[0.1, 0.9, 0.4, 0.6]);
    let doc3 = Vector::from_slice(&[0.7, 0.3, 0.6, 0.2]);

    let query = Vector::from_slice(&[0.75, 0.25, 0.55, 0.25]);

    let sim1 = cosine_similarity(query.clone(), doc1);
    let sim2 = cosine_similarity(query.clone(), doc2);
    let sim3 = cosine_similarity(query, doc3);

    println!("Document Similarities:");
    println!("  Doc 1: {:.4}", sim1);
    println!("  Doc 2: {:.4}", sim2);
    println!("  Doc 3: {:.4}", sim3);

    let mut best = "Doc 1";
    let mut best_score = sim1;

    if sim2 > best_score {
        best = "Doc 2";
        best_score = sim2;
    }
    if sim3 > best_score {
        best = "Doc 3";
        best_score = sim3;
    }

    println!("\nBest match: {} (score: {:.4})", best, best_score);
}

15. References

Resource	URL
Trueno Repository	`../trueno`
Ruchy Repository	`../ruchy`
Trueno API Docs	`../trueno/README.md`
Ruchy Transpiler	`../ruchy/src/backend/transpiler/`
Ruchy Stdlib	`../ruchy/src/stdlib/`
Integration Tests	`../ruchy/tests/trueno_integration.rs` (to be created)

Document Status: Design Complete - Ready for Implementation Next Steps: Begin Phase 1 (Basic Integration) Owner: To be assigned

Trueno - High-Performance SIMD/GPU Compute Library