Chapter 24: NASA-Grade Optimization & Production Deployment

Status: ✅ 100% Working (v3.209.0)

“Optimization is the art of making the impossible, trivial.” — Anonymous Systems Engineer

Overview

This chapter covers Ruchy’s complete NASA-grade optimization and deployment toolchain, introduced in v3.209.0. You’ll learn how to achieve 12.4x binary size reduction (3.8MB → 315KB), deploy to AWS Lambda with <8ms cold starts, and containerize for production with minimal footprints.

What You’ll Learn

Compilation Optimization: 4 optimization presets from debug to NASA-grade
Binary Profiling: Profile transpiled binaries for accurate performance data
AWS Lambda Deployment: Ultra-fast serverless functions with Ruchy
Docker Deployment: Production-ready containers with multi-stage builds
Complete Workflow: From development to production optimization

Prerequisites

Ruchy v3.209.0 or later
AWS CLI (for Lambda deployment)
Docker (for container deployment)
Basic understanding of compilation and deployment

24.1 NASA-Grade Compilation Optimization

The Four Optimization Levels

Ruchy provides four carefully tuned optimization presets:

Level	Size	Reduction	Compile Time	Use Case
none	3.8MB	0%	Fastest	Development/debugging
balanced	1.9MB	51%	Fast	Production default
aggressive	312KB	91.8%	Moderate	Lambda/Docker
nasa	315KB	91.8%	Slower	Maximum optimization

Quick Start

# Development: Fast compilation, debugging symbols
ruchy compile myapp.ruchy --optimize none -o myapp-dev

# Production: Good balance of size and compile time
ruchy compile myapp.ruchy --optimize balanced -o myapp-prod

# Lambda/Docker: Maximum size reduction
ruchy compile myapp.ruchy --optimize aggressive -o myapp-lambda

# NASA-grade: Absolute maximum optimization
ruchy compile myapp.ruchy --optimize nasa -o myapp-nasa

Understanding Each Level

1. None (Debug Mode)

Flags: opt-level=0

ruchy compile fibonacci.ruchy --optimize none -o fib-debug

Characteristics:

No optimizations applied
Full debugging information
Fastest compilation time
Largest binary size (3.8MB)
Best for development and debugging

Use When:

Debugging with gdb or lldb
Rapid iteration during development
Need stack traces and symbol names

2. Balanced (Production Default)

Flags: opt-level=2, lto=thin

ruchy compile fibonacci.ruchy --optimize balanced -o fib-prod

Characteristics:

Good optimization level
Thin LTO (Link-Time Optimization)
Fast compilation
51% size reduction (1.9MB)

Use When:

General production deployments
CI/CD pipelines with time constraints
Good balance of performance and build time

3. Aggressive (Maximum Performance)

Flags: opt-level=3, lto=fat, codegen-units=1, strip=symbols

ruchy compile fibonacci.ruchy --optimize aggressive -o fib-aws

Characteristics:

Maximum LLVM optimizations
Fat LTO (whole-program optimization)
Single codegen unit
Debug symbols stripped
91.8% size reduction (312KB)

Use When:

AWS Lambda deployments
Docker production containers
Size-constrained environments
Performance is critical

4. NASA (Absolute Maximum)

Flags: opt-level=3, lto=fat, codegen-units=1, strip=symbols, target-cpu=native, embed-bitcode=yes

ruchy compile fibonacci.ruchy --optimize nasa -o fib-nasa

Characteristics:

All aggressive optimizations
Native CPU targeting (uses CPU-specific instructions)
Bitcode embedding for IPO
91.8% size reduction (315KB)
Not portable (optimized for current CPU)

Use When:

Same hardware for build and deployment
Maximum performance required
Size is absolutely critical
Acceptable longer compile times

Viewing Optimization Details

Use --verbose to see exactly what flags are applied:

ruchy compile myapp.ruchy --optimize nasa --verbose

Output:

→ Compiling myapp.ruchy...
ℹ Optimization level: nasa
ℹ LTO: fat
ℹ target-cpu: native
ℹ Optimization flags:
  -C lto=fat
  -C codegen-units=1
  -C strip=symbols
  -C target-cpu=native
  -C embed-bitcode=yes
  -C opt-level=3
✓ Successfully compiled to: myapp
ℹ Binary size: 315824 bytes

Exporting Metrics for CI/CD

Use --json to export compilation metrics:

ruchy compile myapp.ruchy --optimize nasa --json metrics.json

metrics.json:

{
  "source_file": "myapp.ruchy",
  "binary_path": "myapp",
  "optimization_level": "nasa",
  "binary_size": 315824,
  "compile_time_ms": 2457,
  "optimization_flags": {
    "opt_level": "3",
    "strip": true,
    "static_link": false,
    "lto": "fat",
    "target_cpu": "native"
  }
}

24.2 Binary Profiling

Profile transpiled Rust binaries for accurate performance metrics.

Basic Profiling

# Profile a single execution
ruchy runtime --profile --binary fibonacci.ruchy

Output:

=== Binary Execution Profile ===
File: fibonacci.ruchy
Iterations: 1

Function-level timings:
  fibonacci()    0.57ms  (approx)  [1 calls]
  main()         0.01ms  (approx)  [1 calls]

Memory:
  Allocations: 0 bytes
  Peak RSS: 1.2 MB

Recommendations:
  ✓ No allocations detected (optimal)
  ✓ Stack-only execution

Benchmarking with Multiple Iterations

# Run 100 iterations for statistical accuracy
ruchy runtime --profile --binary --iterations 100 fibonacci.ruchy

Output:

=== Binary Execution Profile ===
File: fibonacci.ruchy
Iterations: 100

Function-level timings:
  fibonacci()    0.54ms  (approx)  [1 calls]
  main()         0.01ms  (approx)  [1 calls]

Memory:
  Allocations: 0 bytes
  Peak RSS: 1.2 MB

Recommendations:
  ✓ No allocations detected (optimal)
  ✓ Stack-only execution

JSON Output for Automation

ruchy runtime --profile --binary --iterations 50 \
  --output profile.json fibonacci.ruchy

profile.json:

{
  "file": "fibonacci.ruchy",
  "iterations": 50,
  "functions": [
    "fibonacci",
    "main"
  ],
  "timings": {
    "fibonacci": { "avg_ms": 0.57, "calls": 1 },
    "main": { "avg_ms": 0.01, "calls": 1 }
  }
}

24.3 AWS Lambda Deployment

Deploy Ruchy functions to AWS Lambda with industry-leading cold start times.

Lambda Performance Characteristics

Cold Start Performance:

2ms cold start (vs 200ms+ Python, 100ms+ Node.js)
<100μs invocation overhead
315KB binary size (with --optimize nasa)
Zero runtime dependencies

Complete Lambda Workflow

Step 1: Write Your Handler

hello_world.ruchy:

// AWS Lambda handler in Ruchy
fun handle_request(event: Object) -> Object {
    let name = event.get("name").unwrap_or("World");

    {
        "statusCode": 200,
        "body": "Hello, " + name + "!"
    }
}

fun main() {
    handle_request({"name": "Lambda"})
}

Step 2: Transpile to Rust

# Transpile Ruchy to Rust
ruchy transpile hello_world.ruchy -o handler.rs

# Integrate with Lambda runtime (ruchy-lambda project)
cd ../ruchy-lambda
cp handler.rs crates/bootstrap/src/handler.rs

Step 3: Build with NASA Optimization

# Build for Lambda (ARM64 or x86_64)
cargo build --release --target aarch64-unknown-linux-gnu

# Or use Ruchy's compile command with optimization
ruchy compile hello_world.ruchy --optimize aggressive \
  --target aarch64-unknown-linux-gnu \
  -o bootstrap

Step 4: Create Deployment Package

# Package the binary
zip lambda.zip bootstrap

# Check size
ls -lh lambda.zip
# -rw-r--r-- 1 user user 127K lambda.zip  # Compressed from 315KB

Step 5: Deploy to AWS

# Create Lambda function
aws lambda create-function \
  --function-name ruchy-hello-world \
  --runtime provided.al2023 \
  --role arn:aws:iam::ACCOUNT:role/lambda-role \
  --handler bootstrap \
  --zip-file fileb://lambda.zip \
  --architectures arm64

# Test invocation
aws lambda invoke \
  --function-name ruchy-hello-world \
  --payload '{"name": "NASA"}' \
  response.json

cat response.json
# {"statusCode": 200, "body": "Hello, NASA!"}

Lambda Optimization Tips

Use ARM64: Graviton2 processors are 20% cheaper and often faster
Use --optimize aggressive: 91.8% size reduction
Minimize cold starts: Small binaries = fast cold starts
Use blocking I/O: No async overhead in ruchy-lambda runtime

Lambda Benchmarking

Profile your Lambda functions locally:

# Profile with Lambda-like conditions
ruchy runtime --profile --binary --iterations 1000 \
  --output lambda-profile.json handler.ruchy

# Compare optimization levels
for level in none balanced aggressive nasa; do
  echo "Testing $level..."
  ruchy compile handler.ruchy --optimize $level -o handler-$level
  ruchy runtime --profile --binary --iterations 100 handler.ruchy
done

24.4 Docker Deployment

Package Ruchy applications in production-ready Docker containers.

Multi-Stage Docker Build

Dockerfile (optimized for production):

# Stage 1: Build with Ruchy compiler
FROM rust:1.75-alpine as builder

# Install Ruchy
RUN cargo install ruchy --version 3.209.0

# Copy source code
WORKDIR /app
COPY . .

# Compile with NASA optimization
RUN ruchy compile myapp.ruchy --optimize nasa -o myapp

# Stage 2: Minimal runtime container
FROM alpine:3.18

# Install only runtime dependencies (if any)
# RUN apk add --no-cache libgcc

# Copy binary from builder
COPY --from=builder /app/myapp /usr/local/bin/myapp

# Set user (security best practice)
RUN adduser -D -u 1000 appuser
USER appuser

# Run the application
ENTRYPOINT ["/usr/local/bin/myapp"]

Build and Run

# Build Docker image
docker build -t myapp:latest .

# Check image size
docker images myapp
# REPOSITORY   TAG       SIZE
# myapp        latest    8.2MB   # Alpine + 315KB binary!

# Run container
docker run --rm myapp:latest

# Run with resource limits (Lambda-like)
docker run --rm \
  --memory=128m \
  --cpus=0.5 \
  myapp:latest

Docker Compose for Development

docker-compose.yml:

version: '3.8'

services:
  app:
    build:
      context: .
      dockerfile: Dockerfile
    image: myapp:latest
    container_name: ruchy-app
    restart: unless-stopped
    environment:
      - RUST_LOG=info
    ports:
      - "8080:8080"
    deploy:
      resources:
        limits:
          cpus: '0.5'
          memory: 128M

Benchmarking Docker Images

benchmarks/docker/benchmark.sh:

#!/bin/bash

# Benchmark different optimization levels
for level in none balanced aggressive nasa; do
  echo "Building with --optimize $level..."

  # Build with specific optimization
  docker build \
    --build-arg OPTIMIZE=$level \
    -t myapp:$level \
    -f Dockerfile.$level \
    .

  # Measure image size
  size=$(docker images myapp:$level --format "{{.Size}}")
  echo "Image size: $size"

  # Benchmark cold start
  time docker run --rm myapp:$level
done

24.5 Complete Optimization Workflow

Development to Production Pipeline

# 1. Development: Fast iteration
ruchy compile myapp.ruchy --optimize none -o myapp-dev
./myapp-dev  # Quick testing

# 2. Profiling: Find bottlenecks
ruchy runtime --profile --binary --iterations 100 myapp.ruchy
ruchy optimize myapp.ruchy --cache --vectorization

# 3. Optimization: Build for production
ruchy compile myapp.ruchy --optimize aggressive \
  --json build-metrics.json \
  -o myapp-prod

# 4. Validation: Verify performance
ruchy runtime --profile --binary --iterations 1000 \
  --output prod-profile.json myapp.ruchy

# 5. Deployment: AWS Lambda
zip lambda.zip myapp-prod
aws lambda update-function-code \
  --function-name myapp \
  --zip-file fileb://lambda.zip

# Or Docker
docker build -t myapp:v1.0.0 .
docker push myapp:v1.0.0

CI/CD Integration

GitHub Actions (.github/workflows/deploy.yml):

name: Build and Deploy

on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Ruchy
        run: cargo install ruchy --version 3.209.0

      - name: Compile with NASA optimization
        run: |
          ruchy compile myapp.ruchy \
            --optimize nasa \
            --json metrics.json \
            -o myapp

      - name: Profile binary
        run: |
          ruchy runtime --profile --binary \
            --iterations 100 \
            --output profile.json myapp.ruchy

      - name: Check binary size
        run: |
          size=$(stat -c%s myapp)
          if [ $size -gt 400000 ]; then
            echo "Binary too large: $size bytes"
            exit 1
          fi

      - name: Package for Lambda
        run: zip lambda.zip myapp

      - name: Deploy to AWS Lambda
        run: |
          aws lambda update-function-code \
            --function-name myapp \
            --zip-file fileb://lambda.zip
        env:
          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

24.6 Benchmarking and Comparison

Local Benchmarks

Compare Ruchy against other languages for AWS Lambda:

cd ../ruchy-lambda/benchmarks

# Run comprehensive benchmarks
./benchmark-all.sh

# Results (fibonacci(35), 100 iterations):
# Language       | Avg Time | Binary Size | Cold Start
# ---------------|----------|-------------|------------
# Ruchy (nasa)   | 45.2ms   | 315KB       | 2ms
# Rust (native)  | 44.8ms   | 3.2MB       | 8ms
# Go (compiled)  | 52.3ms   | 6.8MB       | 12ms
# Python 3.11    | 892ms    | N/A         | 150ms
# Node.js 18     | 235ms    | N/A         | 120ms

Docker Size Comparison

cd ../ruchy-docker/benchmarks

# Build and compare
./docker-size-comparison.sh

# Results:
# Image              | Size    | Layers
# -------------------|---------|--------
# ruchy:nasa         | 8.2MB   | 2
# ruchy:aggressive   | 8.3MB   | 2
# python:3.11-alpine | 52MB    | 5
# node:18-alpine     | 180MB   | 6
# golang:1.21-alpine | 380MB   | 7

24.7 Production Monitoring

Metrics Collection

Lambda CloudWatch Logs:

{
  "level": "INFO",
  "request_id": "abc-123",
  "duration_ms": 45.2,
  "memory_used_mb": 28,
  "cold_start": false,
  "binary_size_kb": 315
}

Performance Dashboard

Track key metrics:

Cold start latency: < 8ms
Invocation duration: < 100ms
Memory usage: < 50MB
Binary size: < 500KB
Error rate: < 0.01%

24.8 Troubleshooting

Common Issues

Issue: Binary Too Large for Lambda

# Check current size
ls -lh bootstrap
# -rwxr-xr-x 1 user user 4.2M bootstrap  # TOO LARGE!

# Solution: Use aggressive or nasa optimization
ruchy compile handler.ruchy --optimize nasa -o bootstrap

# Verify size
ls -lh bootstrap
# -rwxr-xr-x 1 user user 315K bootstrap  # GOOD!

Issue: Slow Cold Starts

# Profile cold start behavior
time docker run --rm myapp:latest

# Optimize:
# 1. Use --optimize aggressive or nasa
# 2. Minimize dependencies
# 3. Use ARM64 architecture
# 4. Profile with --binary flag

Issue: CPU-Specific Crashes with NASA

# Problem: Binary compiled with --optimize nasa crashes on different CPU

# Solution 1: Use aggressive instead (portable)
ruchy compile myapp.ruchy --optimize aggressive -o myapp

# Solution 2: Build on same architecture as deployment
# (e.g., build on ARM64 for Lambda ARM64)

24.9 Best Practices

Optimization Guidelines

Development: Use --optimize none for fast iteration
Testing: Use --optimize balanced for realistic performance
Production: Use --optimize aggressive for general deployments
Lambda/Docker: Use --optimize nasa if build/deploy on same CPU
Always Profile: Use --profile --binary to validate optimizations

Security Considerations

Strip Symbols: Always use --strip or optimization levels that strip
Static Linking: Consider --static-link for fully self-contained binaries
Run as Non-Root: Use USER directive in Docker
Minimal Images: Use Alpine or Distroless base images
Scan Images: Use docker scan or Trivy for vulnerability scanning

Cost Optimization

Lambda: Smaller binaries = faster cold starts = lower costs
Docker: Smaller images = faster pulls = faster deployments
ARM64: 20% cheaper than x86_64 on Lambda
Aggressive Optimization: 91.8% size reduction saves on storage/transfer

24.10 Summary

You’ve learned how to:

✅ Use 4 optimization levels (none/balanced/aggressive/nasa)
✅ Achieve 12.4x binary size reduction (3.8MB → 315KB)
✅ Profile transpiled binaries with --profile --binary
✅ Deploy to AWS Lambda with 2ms cold starts
✅ Build minimal Docker images (8.2MB total)
✅ Integrate optimization into CI/CD pipelines
✅ Monitor and troubleshoot production deployments

Key Takeaways

Metric	Achievement
Binary size reduction	12.4x (3.8MB → 315KB)
Lambda cold start	2ms (vs 100ms+ interpreted)
Docker image size	8.2MB (Alpine + optimized binary)
Compilation options	4 presets + granular control
Profiling accuracy	Native binary profiling

Next Steps

Explore Chapter 25: Advanced Profiling Techniques
Learn PGO (Profile-Guided Optimization)
Study ruchy-lambda Architecture
Review Docker Benchmarks

References

Previous: Chapter 23 - REPL & Object Inspection Next: Chapter 25 - Advanced Profiling Techniques Table of Contents

Keyboard shortcuts

The Ruchy Programming Language