Serverless vs Containers vs Edge
When deploying MCP servers to the cloud, you have three fundamental architectural choices: serverless functions, containers, and edge computing. Each approach has distinct characteristics that affect performance, cost, and operational complexity.
This lesson provides a deep technical comparison to help you make informed deployment decisions.
The Three Paradigms
Serverless Functions (AWS Lambda)
Serverless functions execute your code in response to events, with the cloud provider managing all infrastructure.
┌─────────────────────────────────────────────────────────────────┐
│ SERVERLESS ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request ──▶ API Gateway ──▶ Lambda Function ──▶ Response │
│ │ │
│ ▼ │
│ ┌──────────┐ │
│ │ Your Code│ │
│ │ (frozen) │ │
│ └──────────┘ │
│ │
│ Between requests: Function is frozen or terminated │
│ Scaling: Cloud spawns new instances automatically │
│ Billing: Pay only for execution time (GB-seconds) │
│ │
└─────────────────────────────────────────────────────────────────┘
How it works:
- Your code is packaged as a deployment artifact (ZIP or container)
- When a request arrives, AWS loads your code into a "microVM"
- Your handler function executes and returns a response
- The runtime may be reused for subsequent requests (warm start) or terminated (cold start)
Rust-specific behavior:
// Lambda handler - runs for each request async fn handler(event: LambdaEvent<Request>) -> Result<Response, Error> { // This code runs per-request let response = process_mcp_request(event.payload).await?; Ok(response) } #[tokio::main] async fn main() -> Result<(), Error> { // This runs ONCE during cold start // Initialize expensive resources here tracing_subscriber::fmt::init(); run(service_fn(handler)).await }
Containers (Google Cloud Run)
Containers package your application with its dependencies into a portable image that runs on managed infrastructure.
┌─────────────────────────────────────────────────────────────────┐
│ CONTAINER ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Request ──▶ Load Balancer ──▶ Container Instance ──▶ Response │
│ │ │
│ ▼ │
│ ┌────────────────┐ │
│ │ Your Server │ │
│ │ (always on) │ │
│ │ │ │
│ │ HTTP :8080 │ │
│ └────────────────┘ │
│ │
│ Between requests: Server stays running, handles concurrency │
│ Scaling: Platform adjusts container count based on load │
│ Billing: Pay for container uptime (vCPU-seconds + memory) │
│ │
└─────────────────────────────────────────────────────────────────┘
How it works:
- Your application is packaged as a Docker image
- The platform runs your container and routes HTTP traffic to it
- Your server handles multiple concurrent requests
- The platform scales containers up/down based on traffic
Rust container example:
# Multi-stage build for minimal image
FROM rust:1.75 AS builder
WORKDIR /app
COPY . .
RUN cargo build --release
FROM gcr.io/distroless/cc-debian12
COPY --from=builder /app/target/release/mcp-server /
EXPOSE 8080
CMD ["/mcp-server"]
// Container server - runs continuously #[tokio::main] async fn main() -> Result<()> { // Initialize once at startup let server = build_mcp_server().await?; // Run HTTP server - handles many requests let addr = SocketAddr::from(([0, 0, 0, 0], 8080)); StreamableHttpServer::new(addr, server) .run() .await }
Edge Computing (Cloudflare Workers)
Edge functions run your code at network edge locations, close to users worldwide.
┌─────────────────────────────────────────────────────────────────┐
│ EDGE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Tokyo │ │ London │ │ NYC │ │
│ │ Edge │ │ Edge │ │ Edge │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ │
│ │ │ │ │
│ User ──┘ User ──┘ User ──┘ │
│ (5ms) (5ms) (5ms) │
│ │
│ Your code: Compiled to WebAssembly, distributed globally │
│ Execution: Runs in V8 isolates (not containers) │
│ Scaling: Automatic across 300+ locations │
│ Billing: Pay per request + CPU time │
│ │
└─────────────────────────────────────────────────────────────────┘
How it works:
- Your Rust code is compiled to WebAssembly (WASM)
- The WASM module is deployed to edge locations worldwide
- Each request runs in an isolated V8 environment
- No cold start in the traditional sense - isolates spin up in microseconds
Rust WASM example:
use worker::*; #[event(fetch)] async fn main(req: Request, env: Env, _ctx: Context) -> Result<Response> { // Each request runs in its own isolate let router = Router::new(); router .post_async("/mcp", |req, _| async move { let body = req.text().await?; let response = handle_mcp_request(&body).await?; Response::ok(response) }) .run(req, env) .await }
Execution Model Comparison
Cold Start Behavior
Cold starts occur when the platform must initialize a new execution environment:
| Platform | Cold Start Cause | Typical Duration (Rust) |
|---|---|---|
| Lambda | No warm instance available | 50-150ms |
| Cloud Run | Container scaling up | 100-500ms |
| Workers | First request to edge location | 0-5ms |
Lambda cold start breakdown:
┌─────────────────────────────────────────────────────────────────┐
│ LAMBDA COLD START TIMELINE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 0ms 50ms 100ms 150ms 200ms │
│ │ │ │ │ │ │
│ ├────────────┼────────────┼────────────┼───────────┤ │
│ │ MicroVM │ Runtime │ Your │ Handler │ │
│ │ Init │ Init │ main() │ Exec │ │
│ │ (~30ms) │ (~10ms) │ (~10ms) │ (~50ms) │ │
│ │ │ │ │ │ │
│ └──────────────────────────────────────────────────────────── │
│ │
│ Rust advantage: main() initialization is minimal │
│ Python/Node: Interpreter startup adds 200-500ms │
│ │
└─────────────────────────────────────────────────────────────────┘
Strategies to minimize cold starts:
#![allow(unused)] fn main() { // Lambda: Initialize expensive resources once static DB_POOL: OnceCell<Pool<Postgres>> = OnceCell::new(); async fn get_pool() -> &'static Pool<Postgres> { DB_POOL.get_or_init(|| async { PgPoolOptions::new() .max_connections(5) .connect(&std::env::var("DATABASE_URL").unwrap()) .await .unwrap() }).await } async fn handler(event: Request) -> Result<Response> { // Pool is reused across warm invocations let pool = get_pool().await; // ... } }
Concurrency Model
Each platform handles concurrent requests differently:
┌─────────────────────────────────────────────────────────────────┐
│ CONCURRENCY MODELS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ LAMBDA (1 request per instance): │
│ │
│ Request 1 ──▶ [Instance A] ──▶ Response 1 │
│ Request 2 ──▶ [Instance B] ──▶ Response 2 │
│ Request 3 ──▶ [Instance C] ──▶ Response 3 │
│ │
│ Scaling: New instance for each concurrent request │
│ Memory: Separate per instance (128MB-10GB configurable) │
│ │
├─────────────────────────────────────────────────────────────────┤
│ │
│ CLOUD RUN (many requests per container): │
│ │
│ Request 1 ──┐ │
│ Request 2 ──┼──▶ [Container A] ──┬──▶ Response 1 │
│ Request 3 ──┘ │ ├──▶ Response 2 │
│ │ └──▶ Response 3 │
│ (async runtime) │
│ │
│ Scaling: Container handles up to 80 concurrent requests │
│ Memory: Shared within container (configurable 128MB-32GB) │
│ │
├─────────────────────────────────────────────────────────────────┤
│ │
│ WORKERS (isolated per request): │
│ │
│ Request 1 ──▶ [Isolate A] ──▶ Response 1 │
│ Request 2 ──▶ [Isolate B] ──▶ Response 2 │
│ Request 3 ──▶ [Isolate C] ──▶ Response 3 │
│ │
│ Scaling: Isolates are lightweight (microseconds to create) │
│ Memory: 128MB limit per isolate │
│ │
└─────────────────────────────────────────────────────────────────┘
Resource Limits
| Resource | Lambda | Cloud Run | Workers |
|---|---|---|---|
| Memory | 128MB - 10GB | 128MB - 32GB | 128MB |
| CPU | Proportional to memory | 1-8 vCPU | 10-50ms CPU time |
| Timeout | 15 minutes | 60 minutes | 30 seconds |
| Payload size | 6MB (sync) / 20MB (async) | 32MB | 100MB |
| Tmp storage | 512MB - 10GB | Ephemeral disk | None |
When to Choose Each Option
Choose Lambda When:
- ✅ Sporadic traffic - Pay nothing during idle periods
- ✅ AWS-native environment - VPC, RDS, DynamoDB integration
- ✅ Unpredictable scaling - 0 to thousands of concurrent users
- ✅ Simple deployment - No container management
- ✅ OAuth with Cognito - Built-in user management
# Ideal Lambda use case: Internal business tool
cargo pmcp deploy init --target aws-lambda
cargo pmcp deploy
# Result: HTTPS endpoint with automatic scaling
# Cost: ~$0.20 per million requests (128MB, 100ms avg)
Choose Cloud Run When:
- ✅ Long-running requests - Up to 60 minutes per request
- ✅ High concurrency per instance - Efficient resource usage
- ✅ Custom dependencies - Docker flexibility
- ✅ GCP-native environment - Cloud SQL, Firestore
- ✅ Minimum instances needed - Avoid cold starts entirely
# Ideal Cloud Run use case: Data processing with large queries
cargo pmcp deploy init --target google-cloud-run
cargo pmcp deploy --target google-cloud-run
# Result: Container-based deployment with persistent connections
# Cost: ~$0.00002400/vCPU-second + memory
Choose Workers When:
- ✅ Global user base - Minimize latency worldwide
- ✅ Stateless operations - No database, or using KV/D1
- ✅ High request volume - Millions of requests/day
- ✅ CPU-bound tasks - Parsing, transformation, validation
# Ideal Workers use case: Global API with caching
cargo pmcp deploy init --target cloudflare-workers
cargo pmcp deploy --target cloudflare-workers
# Result: Edge deployment to 300+ locations
# Cost: $0.50 per million requests (free tier: 100K/day)
Hybrid Architectures
For complex applications, you may combine deployment targets:
┌─────────────────────────────────────────────────────────────────┐
│ HYBRID DEPLOYMENT │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Global Users │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ Cloudflare Workers (Edge) │ │
│ │ - Request routing │ │
│ │ - Caching │ │
│ │ - Rate limiting │ │
│ │ - Authentication │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ AWS Lambda (Serverless) │ │
│ │ - Business logic │ │
│ │ - Database queries │ │
│ │ - Complex processing │ │
│ └─────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────┐ │
│ │ RDS / DynamoDB (Data Layer) │ │
│ └─────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
This architecture uses Workers for edge caching and routing, Lambda for serverless compute, and managed databases for persistence.
Migration Considerations
Lambda → Cloud Run
When to migrate:
- Hitting 15-minute timeout limit
- Need more than 10GB memory
- Want to reduce cold start impact with min instances
# Migration path
cargo pmcp deploy init --target google-cloud-run
# Update environment variables
cargo pmcp deploy --target google-cloud-run
# Verify, then destroy Lambda
cargo pmcp deploy destroy --target aws-lambda --clean
Lambda → Workers
When to migrate:
- Need global low-latency
- Workload is stateless
- Can use KV/D1 instead of RDS
Considerations:
- WASM has different capabilities than native code
- Database access patterns may need redesign
- Some crates don't compile to WASM
Summary
| Aspect | Lambda | Cloud Run | Workers |
|---|---|---|---|
| Execution model | Function per request | Container server | WASM isolate |
| Cold start (Rust) | 50-150ms | 100-500ms | 0-5ms |
| Concurrency | 1 per instance | Many per container | 1 per isolate |
| Max timeout | 15 min | 60 min | 30s |
| Best for | General serverless | Long-running, GCP | Global edge |
| Rust advantage | Fast cold start | Tiny images | Native WASM |
Choose based on your specific requirements:
- Traffic pattern (sporadic vs steady)
- Latency requirements (regional vs global)
- Execution duration (seconds vs minutes)
- Cloud ecosystem (AWS vs GCP vs Cloudflare)