Chapter 24: NASA-Grade Optimization & Production Deployment
Status: ✅ 100% Working (v3.209.0)
“Optimization is the art of making the impossible, trivial.” — Anonymous Systems Engineer
Overview
This chapter covers Ruchy’s complete NASA-grade optimization and deployment toolchain, introduced in v3.209.0. You’ll learn how to achieve 12.4x binary size reduction (3.8MB → 315KB), deploy to AWS Lambda with <8ms cold starts, and containerize for production with minimal footprints.
What You’ll Learn
- Compilation Optimization: 4 optimization presets from debug to NASA-grade
- Binary Profiling: Profile transpiled binaries for accurate performance data
- AWS Lambda Deployment: Ultra-fast serverless functions with Ruchy
- Docker Deployment: Production-ready containers with multi-stage builds
- Complete Workflow: From development to production optimization
Prerequisites
- Ruchy v3.209.0 or later
- AWS CLI (for Lambda deployment)
- Docker (for container deployment)
- Basic understanding of compilation and deployment
24.1 NASA-Grade Compilation Optimization
The Four Optimization Levels
Ruchy provides four carefully tuned optimization presets:
| Level | Size | Reduction | Compile Time | Use Case |
|---|---|---|---|---|
| none | 3.8MB | 0% | Fastest | Development/debugging |
| balanced | 1.9MB | 51% | Fast | Production default |
| aggressive | 312KB | 91.8% | Moderate | Lambda/Docker |
| nasa | 315KB | 91.8% | Slower | Maximum optimization |
Quick Start
# Development: Fast compilation, debugging symbols
ruchy compile myapp.ruchy --optimize none -o myapp-dev
# Production: Good balance of size and compile time
ruchy compile myapp.ruchy --optimize balanced -o myapp-prod
# Lambda/Docker: Maximum size reduction
ruchy compile myapp.ruchy --optimize aggressive -o myapp-lambda
# NASA-grade: Absolute maximum optimization
ruchy compile myapp.ruchy --optimize nasa -o myapp-nasa
Understanding Each Level
1. None (Debug Mode)
Flags: opt-level=0
ruchy compile fibonacci.ruchy --optimize none -o fib-debug
Characteristics:
- No optimizations applied
- Full debugging information
- Fastest compilation time
- Largest binary size (3.8MB)
- Best for development and debugging
Use When:
- Debugging with
gdborlldb - Rapid iteration during development
- Need stack traces and symbol names
2. Balanced (Production Default)
Flags: opt-level=2, lto=thin
ruchy compile fibonacci.ruchy --optimize balanced -o fib-prod
Characteristics:
- Good optimization level
- Thin LTO (Link-Time Optimization)
- Fast compilation
- 51% size reduction (1.9MB)
Use When:
- General production deployments
- CI/CD pipelines with time constraints
- Good balance of performance and build time
3. Aggressive (Maximum Performance)
Flags: opt-level=3, lto=fat, codegen-units=1, strip=symbols
ruchy compile fibonacci.ruchy --optimize aggressive -o fib-aws
Characteristics:
- Maximum LLVM optimizations
- Fat LTO (whole-program optimization)
- Single codegen unit
- Debug symbols stripped
- 91.8% size reduction (312KB)
Use When:
- AWS Lambda deployments
- Docker production containers
- Size-constrained environments
- Performance is critical
4. NASA (Absolute Maximum)
Flags: opt-level=3, lto=fat, codegen-units=1, strip=symbols, target-cpu=native, embed-bitcode=yes
ruchy compile fibonacci.ruchy --optimize nasa -o fib-nasa
Characteristics:
- All aggressive optimizations
- Native CPU targeting (uses CPU-specific instructions)
- Bitcode embedding for IPO
- 91.8% size reduction (315KB)
- Not portable (optimized for current CPU)
Use When:
- Same hardware for build and deployment
- Maximum performance required
- Size is absolutely critical
- Acceptable longer compile times
Viewing Optimization Details
Use --verbose to see exactly what flags are applied:
ruchy compile myapp.ruchy --optimize nasa --verbose
Output:
→ Compiling myapp.ruchy...
ℹ Optimization level: nasa
ℹ LTO: fat
ℹ target-cpu: native
ℹ Optimization flags:
-C lto=fat
-C codegen-units=1
-C strip=symbols
-C target-cpu=native
-C embed-bitcode=yes
-C opt-level=3
✓ Successfully compiled to: myapp
ℹ Binary size: 315824 bytes
Exporting Metrics for CI/CD
Use --json to export compilation metrics:
ruchy compile myapp.ruchy --optimize nasa --json metrics.json
metrics.json:
{
"source_file": "myapp.ruchy",
"binary_path": "myapp",
"optimization_level": "nasa",
"binary_size": 315824,
"compile_time_ms": 2457,
"optimization_flags": {
"opt_level": "3",
"strip": true,
"static_link": false,
"lto": "fat",
"target_cpu": "native"
}
}
24.2 Binary Profiling
Profile transpiled Rust binaries for accurate performance metrics.
Basic Profiling
# Profile a single execution
ruchy runtime --profile --binary fibonacci.ruchy
Output:
=== Binary Execution Profile ===
File: fibonacci.ruchy
Iterations: 1
Function-level timings:
fibonacci() 0.57ms (approx) [1 calls]
main() 0.01ms (approx) [1 calls]
Memory:
Allocations: 0 bytes
Peak RSS: 1.2 MB
Recommendations:
✓ No allocations detected (optimal)
✓ Stack-only execution
Benchmarking with Multiple Iterations
# Run 100 iterations for statistical accuracy
ruchy runtime --profile --binary --iterations 100 fibonacci.ruchy
Output:
=== Binary Execution Profile ===
File: fibonacci.ruchy
Iterations: 100
Function-level timings:
fibonacci() 0.54ms (approx) [1 calls]
main() 0.01ms (approx) [1 calls]
Memory:
Allocations: 0 bytes
Peak RSS: 1.2 MB
Recommendations:
✓ No allocations detected (optimal)
✓ Stack-only execution
JSON Output for Automation
ruchy runtime --profile --binary --iterations 50 \
--output profile.json fibonacci.ruchy
profile.json:
{
"file": "fibonacci.ruchy",
"iterations": 50,
"functions": [
"fibonacci",
"main"
],
"timings": {
"fibonacci": { "avg_ms": 0.57, "calls": 1 },
"main": { "avg_ms": 0.01, "calls": 1 }
}
}
24.3 AWS Lambda Deployment
Deploy Ruchy functions to AWS Lambda with industry-leading cold start times.
Lambda Performance Characteristics
Cold Start Performance:
- 2ms cold start (vs 200ms+ Python, 100ms+ Node.js)
- <100μs invocation overhead
- 315KB binary size (with
--optimize nasa) - Zero runtime dependencies
Complete Lambda Workflow
Step 1: Write Your Handler
hello_world.ruchy:
// AWS Lambda handler in Ruchy
fun handle_request(event: Object) -> Object {
let name = event.get("name").unwrap_or("World");
{
"statusCode": 200,
"body": "Hello, " + name + "!"
}
}
fun main() {
handle_request({"name": "Lambda"})
}
Step 2: Transpile to Rust
# Transpile Ruchy to Rust
ruchy transpile hello_world.ruchy -o handler.rs
# Integrate with Lambda runtime (ruchy-lambda project)
cd ../ruchy-lambda
cp handler.rs crates/bootstrap/src/handler.rs
Step 3: Build with NASA Optimization
# Build for Lambda (ARM64 or x86_64)
cargo build --release --target aarch64-unknown-linux-gnu
# Or use Ruchy's compile command with optimization
ruchy compile hello_world.ruchy --optimize aggressive \
--target aarch64-unknown-linux-gnu \
-o bootstrap
Step 4: Create Deployment Package
# Package the binary
zip lambda.zip bootstrap
# Check size
ls -lh lambda.zip
# -rw-r--r-- 1 user user 127K lambda.zip # Compressed from 315KB
Step 5: Deploy to AWS
# Create Lambda function
aws lambda create-function \
--function-name ruchy-hello-world \
--runtime provided.al2023 \
--role arn:aws:iam::ACCOUNT:role/lambda-role \
--handler bootstrap \
--zip-file fileb://lambda.zip \
--architectures arm64
# Test invocation
aws lambda invoke \
--function-name ruchy-hello-world \
--payload '{"name": "NASA"}' \
response.json
cat response.json
# {"statusCode": 200, "body": "Hello, NASA!"}
Lambda Optimization Tips
- Use ARM64: Graviton2 processors are 20% cheaper and often faster
- Use
--optimize aggressive: 91.8% size reduction - Minimize cold starts: Small binaries = fast cold starts
- Use blocking I/O: No async overhead in ruchy-lambda runtime
Lambda Benchmarking
Profile your Lambda functions locally:
# Profile with Lambda-like conditions
ruchy runtime --profile --binary --iterations 1000 \
--output lambda-profile.json handler.ruchy
# Compare optimization levels
for level in none balanced aggressive nasa; do
echo "Testing $level..."
ruchy compile handler.ruchy --optimize $level -o handler-$level
ruchy runtime --profile --binary --iterations 100 handler.ruchy
done
24.4 Docker Deployment
Package Ruchy applications in production-ready Docker containers.
Multi-Stage Docker Build
Dockerfile (optimized for production):
# Stage 1: Build with Ruchy compiler
FROM rust:1.75-alpine as builder
# Install Ruchy
RUN cargo install ruchy --version 3.209.0
# Copy source code
WORKDIR /app
COPY . .
# Compile with NASA optimization
RUN ruchy compile myapp.ruchy --optimize nasa -o myapp
# Stage 2: Minimal runtime container
FROM alpine:3.18
# Install only runtime dependencies (if any)
# RUN apk add --no-cache libgcc
# Copy binary from builder
COPY --from=builder /app/myapp /usr/local/bin/myapp
# Set user (security best practice)
RUN adduser -D -u 1000 appuser
USER appuser
# Run the application
ENTRYPOINT ["/usr/local/bin/myapp"]
Build and Run
# Build Docker image
docker build -t myapp:latest .
# Check image size
docker images myapp
# REPOSITORY TAG SIZE
# myapp latest 8.2MB # Alpine + 315KB binary!
# Run container
docker run --rm myapp:latest
# Run with resource limits (Lambda-like)
docker run --rm \
--memory=128m \
--cpus=0.5 \
myapp:latest
Docker Compose for Development
docker-compose.yml:
version: '3.8'
services:
app:
build:
context: .
dockerfile: Dockerfile
image: myapp:latest
container_name: ruchy-app
restart: unless-stopped
environment:
- RUST_LOG=info
ports:
- "8080:8080"
deploy:
resources:
limits:
cpus: '0.5'
memory: 128M
Benchmarking Docker Images
benchmarks/docker/benchmark.sh:
#!/bin/bash
# Benchmark different optimization levels
for level in none balanced aggressive nasa; do
echo "Building with --optimize $level..."
# Build with specific optimization
docker build \
--build-arg OPTIMIZE=$level \
-t myapp:$level \
-f Dockerfile.$level \
.
# Measure image size
size=$(docker images myapp:$level --format "{{.Size}}")
echo "Image size: $size"
# Benchmark cold start
time docker run --rm myapp:$level
done
24.5 Complete Optimization Workflow
Development to Production Pipeline
# 1. Development: Fast iteration
ruchy compile myapp.ruchy --optimize none -o myapp-dev
./myapp-dev # Quick testing
# 2. Profiling: Find bottlenecks
ruchy runtime --profile --binary --iterations 100 myapp.ruchy
ruchy optimize myapp.ruchy --cache --vectorization
# 3. Optimization: Build for production
ruchy compile myapp.ruchy --optimize aggressive \
--json build-metrics.json \
-o myapp-prod
# 4. Validation: Verify performance
ruchy runtime --profile --binary --iterations 1000 \
--output prod-profile.json myapp.ruchy
# 5. Deployment: AWS Lambda
zip lambda.zip myapp-prod
aws lambda update-function-code \
--function-name myapp \
--zip-file fileb://lambda.zip
# Or Docker
docker build -t myapp:v1.0.0 .
docker push myapp:v1.0.0
CI/CD Integration
GitHub Actions (.github/workflows/deploy.yml):
name: Build and Deploy
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Ruchy
run: cargo install ruchy --version 3.209.0
- name: Compile with NASA optimization
run: |
ruchy compile myapp.ruchy \
--optimize nasa \
--json metrics.json \
-o myapp
- name: Profile binary
run: |
ruchy runtime --profile --binary \
--iterations 100 \
--output profile.json myapp.ruchy
- name: Check binary size
run: |
size=$(stat -c%s myapp)
if [ $size -gt 400000 ]; then
echo "Binary too large: $size bytes"
exit 1
fi
- name: Package for Lambda
run: zip lambda.zip myapp
- name: Deploy to AWS Lambda
run: |
aws lambda update-function-code \
--function-name myapp \
--zip-file fileb://lambda.zip
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
24.6 Benchmarking and Comparison
Local Benchmarks
Compare Ruchy against other languages for AWS Lambda:
cd ../ruchy-lambda/benchmarks
# Run comprehensive benchmarks
./benchmark-all.sh
# Results (fibonacci(35), 100 iterations):
# Language | Avg Time | Binary Size | Cold Start
# ---------------|----------|-------------|------------
# Ruchy (nasa) | 45.2ms | 315KB | 2ms
# Rust (native) | 44.8ms | 3.2MB | 8ms
# Go (compiled) | 52.3ms | 6.8MB | 12ms
# Python 3.11 | 892ms | N/A | 150ms
# Node.js 18 | 235ms | N/A | 120ms
Docker Size Comparison
cd ../ruchy-docker/benchmarks
# Build and compare
./docker-size-comparison.sh
# Results:
# Image | Size | Layers
# -------------------|---------|--------
# ruchy:nasa | 8.2MB | 2
# ruchy:aggressive | 8.3MB | 2
# python:3.11-alpine | 52MB | 5
# node:18-alpine | 180MB | 6
# golang:1.21-alpine | 380MB | 7
24.7 Production Monitoring
Metrics Collection
Lambda CloudWatch Logs:
{
"level": "INFO",
"request_id": "abc-123",
"duration_ms": 45.2,
"memory_used_mb": 28,
"cold_start": false,
"binary_size_kb": 315
}
Performance Dashboard
Track key metrics:
- Cold start latency: < 8ms
- Invocation duration: < 100ms
- Memory usage: < 50MB
- Binary size: < 500KB
- Error rate: < 0.01%
24.8 Troubleshooting
Common Issues
Issue: Binary Too Large for Lambda
# Check current size
ls -lh bootstrap
# -rwxr-xr-x 1 user user 4.2M bootstrap # TOO LARGE!
# Solution: Use aggressive or nasa optimization
ruchy compile handler.ruchy --optimize nasa -o bootstrap
# Verify size
ls -lh bootstrap
# -rwxr-xr-x 1 user user 315K bootstrap # GOOD!
Issue: Slow Cold Starts
# Profile cold start behavior
time docker run --rm myapp:latest
# Optimize:
# 1. Use --optimize aggressive or nasa
# 2. Minimize dependencies
# 3. Use ARM64 architecture
# 4. Profile with --binary flag
Issue: CPU-Specific Crashes with NASA
# Problem: Binary compiled with --optimize nasa crashes on different CPU
# Solution 1: Use aggressive instead (portable)
ruchy compile myapp.ruchy --optimize aggressive -o myapp
# Solution 2: Build on same architecture as deployment
# (e.g., build on ARM64 for Lambda ARM64)
24.9 Best Practices
Optimization Guidelines
- Development: Use
--optimize nonefor fast iteration - Testing: Use
--optimize balancedfor realistic performance - Production: Use
--optimize aggressivefor general deployments - Lambda/Docker: Use
--optimize nasaif build/deploy on same CPU - Always Profile: Use
--profile --binaryto validate optimizations
Security Considerations
- Strip Symbols: Always use
--stripor optimization levels that strip - Static Linking: Consider
--static-linkfor fully self-contained binaries - Run as Non-Root: Use USER directive in Docker
- Minimal Images: Use Alpine or Distroless base images
- Scan Images: Use
docker scanor Trivy for vulnerability scanning
Cost Optimization
- Lambda: Smaller binaries = faster cold starts = lower costs
- Docker: Smaller images = faster pulls = faster deployments
- ARM64: 20% cheaper than x86_64 on Lambda
- Aggressive Optimization: 91.8% size reduction saves on storage/transfer
24.10 Summary
You’ve learned how to:
- ✅ Use 4 optimization levels (none/balanced/aggressive/nasa)
- ✅ Achieve 12.4x binary size reduction (3.8MB → 315KB)
- ✅ Profile transpiled binaries with
--profile --binary - ✅ Deploy to AWS Lambda with 2ms cold starts
- ✅ Build minimal Docker images (8.2MB total)
- ✅ Integrate optimization into CI/CD pipelines
- ✅ Monitor and troubleshoot production deployments
Key Takeaways
| Metric | Achievement |
|---|---|
| Binary size reduction | 12.4x (3.8MB → 315KB) |
| Lambda cold start | 2ms (vs 100ms+ interpreted) |
| Docker image size | 8.2MB (Alpine + optimized binary) |
| Compilation options | 4 presets + granular control |
| Profiling accuracy | Native binary profiling |
Next Steps
- Explore Chapter 25: Advanced Profiling Techniques
- Learn PGO (Profile-Guided Optimization)
- Study ruchy-lambda Architecture
- Review Docker Benchmarks
References
- OPTIMIZATION-001 Implementation
- PROFILING-001 Implementation
- ruchy-lambda Project
- ruchy-docker Project
- Rust Optimization Levels
- AWS Lambda Custom Runtimes
Previous: Chapter 23 - REPL & Object Inspection Next: Chapter 25 - Advanced Profiling Techniques Table of Contents