Introduction
Welcome to Trueno-DB, a GPU-accelerated database engine built with EXTREME TDD methodology and Toyota Way principles.
What is Trueno-DB?
Trueno-DB is a research-grade database engine that demonstrates how to build production-quality database systems using:
- GPU Acceleration: 50-100x faster aggregations for 100M+ row datasets
- Cost-Based Backend Selection: Physics-based model automatically selects optimal execution backend (GPU/SIMD/Scalar)
- Out-of-Core Execution: Morsel-driven parallelism prevents VRAM exhaustion
- EXTREME TDD: RED-GREEN-REFACTOR with property-based testing, mutation testing, and >90% coverage
- Toyota Way: Poka-Yoke, Genchi Genbutsu, Muda elimination, Jidoka, Heijunka, Kaizen
Why Another Database?
Trueno-DB is not a production database engine (use PostgreSQL, DuckDB, or ClickHouse for that). Instead, it's:
- Educational: Learn how to build database systems with modern best practices
- Research Platform: Explore heterogeneous computing for analytical workloads
- Methodology Showcase: Demonstrate EXTREME TDD applied to systems programming
- Quality Exemplar: A+ (98.2/100) TDG score with zero technical debt
Key Features
π Performance
- GPU Kernels: Parallel reduction, aggregations, hash join
- JIT Compilation: WGSL shader generation from query AST
- Kernel Fusion: Combine operators to minimize memory traffic
- SIMD Fallback: AVX2/AVX-512 via trueno library integration
π― Quality
- Test Coverage: >90% line coverage, 100% on core modules
- Mutation Testing: β₯80% mutation score
- Property-Based Testing: Correctness invariants verified with proptest
- Backend Equivalence: GPU == SIMD == Scalar (property-based verification)
π Toyota Way
- Poka-Yoke: Morsel-based paging prevents VRAM OOM
- Genchi Genbutsu: Physics-based cost model (PCIe Gen4 x16 = 32 GB/s)
- Muda: GPU only when compute > 5x transfer time
- Jidoka: Built-in quality (EXTREME TDD, zero defects)
- Heijunka: Bounded transfer queue (max 2 in-flight)
- Kaizen: Continuous improvement (pmat workflow)
Architecture Overview
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Query Interface β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Cost-Based Backend Dispatcher β
β β’ Minimum data size: 10 MB β
β β’ 5x rule: GPU if compute > 5x transfer β
β β’ PCIe Gen4 x16: 32 GB/s β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β β
βββββββ΄ββββββ βββββββ΄ββββββ ββββββ΄ββββββ
β GPU β β SIMD β β Scalar β
β (wgpu) β β (trueno) β β (naive) β
βββββββββββββ βββββββββββββ ββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββ
β Arrow Storage Backend β
β β’ Parquet reader β
β β’ Morsel iterator (128MB) β
β β’ GPU transfer queue β
ββββββββββββββββββββββββββββββββ
Current Status
Phase 1 - Core Engine (2/6 complete)
β CORE-001: Arrow Storage Backend (100% coverage)
- Parquet reader with streaming RecordBatch
- MorselIterator (128MB chunks, Poka-Yoke)
- GpuTransferQueue (bounded async, Heijunka)
β CORE-002: Cost-Based Backend Dispatcher (100% coverage)
- Physics-based 5x rule
- PCIe Gen4 x16 bandwidth calculations
- Minimum 10 MB data size threshold
π§ CORE-003: JIT WGSL Compiler π§ CORE-004: GPU Kernels π§ CORE-005: SIMD Fallback (Trueno Integration) π§ CORE-006: Backend Equivalence Tests
Quality Metrics
| Metric | Target | Current | Status |
|---|---|---|---|
| TDG Score | β₯85 | 98.2 | β A+ |
| Test Coverage | >90% | 85%+ | π‘ |
| Mutation Score | β₯80% | TBD | β³ |
| Tests Passing | 100% | 19/19 | β |
| Clippy Warnings | 0 | 0 | β |
| Build Time | <30s | <5s | β |
Note: Coverage is 100% on implemented modules (storage, backend). Overall 85%+ due to stub modules (GPU kernels, query engine not yet implemented).
Academic Foundation
All implementations are backed by peer-reviewed research:
- Leis et al. (2014): Morsel-driven parallelism
- Funke et al. (2018): GPU paging for out-of-core workloads
- Gregg & Hazelwood (2011): PCIe bus bottleneck analysis
- BreΓ et al. (2014): Operator variant selection on heterogeneous hardware
- Neumann (2011): JIT compilation for query execution
- Wu et al. (2012): Kernel fusion execution model
See Academic Foundation for full references.
Getting Started
# Clone repository
git clone https://github.com/paiml/trueno-db
cd trueno-db
# Run tests
make test
# Generate coverage report
make coverage
# Check quality
pmat tdg .
# Build release
make build
See Getting Started for detailed instructions.
Who Should Read This Book?
- Database Engineers: Learn modern database architecture patterns
- Systems Programmers: See EXTREME TDD applied to Rust systems code
- GPU Developers: Understand cost-based heterogeneous computing
- Quality Engineers: Study A+ quality achieved through TDD
- Students: Comprehensive guide to building database systems
Book Structure
- Architecture: System design and principles
- Core Components: Detailed implementation guide
- EXTREME TDD: Methodology deep dive
- Toyota Way: Manufacturing principles in software
- Quality Gates: Tools and enforcement
- Academic Foundation: Research backing
- Case Studies: Real-world examples from development
- Performance: Benchmarking and optimization
Contributing
Trueno-DB is open source and welcomes contributions. See Contributing for guidelines.
License
MIT License - see License for details.
Let's build production-quality database systems together!