WebAssembly for Machine Learning
WebAssembly (WASM) enables running ML models in browsers and edge devices with near-native performance.
Why WASM for ML?
| Deployment | Traditional | WASM |
|---|---|---|
| Browser | JavaScript (slow) | Near-native |
| Edge | Native binary per platform | Single binary |
| Security | Full system access | Sandboxed |
| Distribution | App store review | Instant deploy |
WASM Architecture
┌─────────────────────────────────────────┐
│ Host Environment │
│ ┌─────────────────────────────────┐ │
│ │ WASM Runtime │ │
│ │ ┌───────────────────────────┐ │ │
│ │ │ WASM Module │ │ │
│ │ │ ┌─────┐ ┌─────────┐ │ │ │
│ │ │ │Stack│ │ Linear │ │ │ │
│ │ │ │ │ │ Memory │ │ │ │
│ │ │ └─────┘ └─────────┘ │ │ │
│ │ └───────────────────────────┘ │ │
│ └─────────────────────────────────┘ │
└─────────────────────────────────────────┘
Compiling Rust to WASM
Setup
# Add WASM target
rustup target add wasm32-unknown-unknown
# Install wasm-pack for JS bindings
cargo install wasm-pack
Build
# Pure WASM
cargo build --target wasm32-unknown-unknown --release
# With JS bindings
wasm-pack build --target web
Cargo.toml
[lib]
crate-type = ["cdylib", "rlib"]
[dependencies]
wasm-bindgen = "0.2"
getrandom = { version = "0.2", features = ["js"] }
[target.'cfg(target_arch = "wasm32")'.dependencies]
getrandom = { version = "0.2", features = ["js"] }
Memory Considerations
Linear Memory
WASM has one contiguous memory buffer:
// Pass large arrays efficiently
#[wasm_bindgen]
pub fn predict(data: &[f32]) -> Vec<f32> {
// data points directly into WASM memory
model.forward(data)
}
Memory Limits
| Browser | Default | Max |
|---|---|---|
| Chrome | 2GB | 4GB |
| Firefox | 2GB | 4GB |
| Safari | 2GB | 4GB |
Plan for models < 2GB in-browser.
SIMD in WASM
WASM SIMD provides 128-bit vectors:
#[cfg(target_arch = "wasm32")]
use std::arch::wasm32::*;
// 4x f32 operations
let a = f32x4(1.0, 2.0, 3.0, 4.0);
let b = f32x4(5.0, 6.0, 7.0, 8.0);
let c = f32x4_add(a, b);
Speedup: 2-4x for vectorizable operations.
Browser Support
| Feature | Chrome | Firefox | Safari |
|---|---|---|---|
| WASM | ✅ | ✅ | ✅ |
| SIMD | ✅ (91+) | ✅ (89+) | ✅ (16.4+) |
| Threads | ✅ | ✅ | ✅ (15+) |
Threading
WASM threads require SharedArrayBuffer:
// Check support
if (crossOriginIsolated) {
// Can use threads
}
Security headers required:
Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp
Model Loading
From URL
const modelUrl = 'model.wasm';
const response = await fetch(modelUrl);
const wasmModule = await WebAssembly.instantiateStreaming(response);
From Bytes
const bytes = new Uint8Array(modelData);
const module = await WebAssembly.instantiate(bytes);
Lazy Loading
// Load model on demand
let model = null;
async function getModel() {
if (!model) {
model = await loadModel();
}
return model;
}
Performance Optimization
Minimize JS/WASM Boundary
// ❌ Many small calls
for i in 0..1000 {
js_call(data[i]);
}
// ✅ Batch operations
process_batch(&data[0..1000]);
Use Typed Arrays
// ❌ Regular array (copy required)
const input = [1.0, 2.0, 3.0];
// ✅ Float32Array (zero-copy)
const input = new Float32Array([1.0, 2.0, 3.0]);
Pre-allocate Memory
#[wasm_bindgen]
pub struct Model {
// Pre-allocated buffers
input_buffer: Vec<f32>,
output_buffer: Vec<f32>,
}
WebGPU Integration
Future: WASM + WebGPU for GPU inference:
const adapter = await navigator.gpu.requestAdapter();
const device = await adapter.requestDevice();
// Use GPU for matrix operations
const buffer = device.createBuffer({...});
Deployment Patterns
Static Hosting
/index.html
/app.js
/model.wasm
/model_bg.wasm (if using wasm-pack)
CDN Distribution
<script type="module">
import init, { Model } from 'https://cdn.example.com/model/model.js';
await init();
const model = new Model();
</script>
Service Worker Cache
self.addEventListener('install', (event) => {
event.waitUntil(
caches.open('model-v1').then((cache) => {
return cache.addAll(['/model.wasm']);
})
);
});
Limitations
| Feature | Status |
|---|---|
| File system | ❌ (use IndexedDB) |
| Network | Via fetch API |
| GPU | WebGPU (emerging) |
| Threading | Requires special headers |
| Memory | 4GB max |
References
- WebAssembly Specification: https://webassembly.org
- wasm-bindgen: https://rustwasm.github.io/wasm-bindgen/
- WebAssembly SIMD: https://v8.dev/features/simd