Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Whisper.apr: Pure Rust Speech Recognition

whisper.apr is a pure Rust implementation of OpenAI’s Whisper automatic speech recognition model, designed for the Sovereign AI Stack with WASM-first deployment and APR v2 model format.

Overview

whisper.apr delivers:

  • Pure Rust: No Python, no C++ dependencies
  • WASM-First: Browser deployment with full functionality
  • APR v2 Format: LZ4/ZSTD compressed models
  • Quantization: Int4/Int8 for reduced memory footprint
  • Streaming: Real-time transcription support
  • Multilingual: 99+ languages
┌─────────────────────────────────────────────────────────────┐
│                    whisper.apr                              │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ APR v2 Model│  │  Streaming  │  │   Quantization      │  │
│  │ LZ4/ZSTD    │  │ Transcriber │  │   Int4/Int8         │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│  trueno (SIMD)  │  aprender (ML)  │  realizar (inference)  │
└─────────────────────────────────────────────────────────────┘

Installation

[dependencies]
whisper-apr = "0.1"

# With GPU acceleration
whisper-apr = { version = "0.1", features = ["gpu"] }

# WASM-only (smaller bundle)
whisper-apr = { version = "0.1", default-features = false, features = ["wasm"] }

Quick Start

#![allow(unused)]
fn main() {
use whisper_apr::{WhisperModel, Transcriber, TranscribeOptions};

// Load model (APR v2 format with compression)
let model = WhisperModel::load_apr("whisper-small-int8.apr")?;
let transcriber = Transcriber::new(model);

// Transcribe audio file
let result = transcriber.transcribe_file(
    "audio.wav",
    TranscribeOptions::default(),
)?;

println!("Text: {}", result.text);
println!("Language: {}", result.language);

// With timestamps
for segment in result.segments {
    println!("[{:.2}s - {:.2}s] {}",
        segment.start, segment.end, segment.text);
}
}

Model Sizes

ModelFP32Int8Int4Languages
Tiny150 MB40 MB22 MB99+
Base290 MB75 MB40 MB99+
Small970 MB250 MB130 MB99+
Medium3.0 GB780 MB400 MB99+
Large6.2 GB1.6 GB820 MB99+

Streaming Transcription

Real-time transcription from audio stream:

#![allow(unused)]
fn main() {
use whisper_apr::{StreamingTranscriber, AudioChunk};

let mut streamer = StreamingTranscriber::new(model);

// Process audio chunks as they arrive
while let Some(chunk) = audio_source.next_chunk().await {
    if let Some(partial) = streamer.process_chunk(&chunk)? {
        print!("\r{}", partial.text);  // Live update
    }
}

// Finalize and get complete transcription
let final_result = streamer.finalize()?;
}

WASM Deployment

Browser-compatible transcription:

#![allow(unused)]
fn main() {
use whisper_apr::wasm::{WasmWhisper, init_wasm};

#[wasm_bindgen]
pub async fn transcribe_audio(audio_data: &[u8]) -> String {
    init_wasm().await;

    let whisper = WasmWhisper::load_from_bytes(MODEL_BYTES).await?;
    let result = whisper.transcribe(audio_data)?;
    result.text
}
}

Bundle sizes (gzipped):

ModelWASM RuntimeTotal
Tiny Int4200 KB22 MB
Base Int4200 KB40 MB
Small Int4200 KB130 MB

Language Detection

#![allow(unused)]
fn main() {
use whisper_apr::LanguageDetector;

let detector = LanguageDetector::new(&model);
let detection = detector.detect(&audio)?;

println!("Detected: {} ({:.1}% confidence)",
    detection.language, detection.confidence * 100.0);

// Top 5 candidates
for (lang, prob) in detection.top_languages(5) {
    println!("  {}: {:.1}%", lang, prob * 100.0);
}
}

Stack Integration

whisper.apr integrates with the Sovereign AI Stack:

DependencyVersionPurpose
trueno0.10+SIMD tensor operations
aprender0.20+ML primitives, APR v2 format
realizar0.4+Inference runtime (optional)

Running the Example

cargo run --example whisper_apr_demo

References


Navigate: Table of Contents | Previous: Realizar | Next: trueno-zram