Comparison: Cloud Run vs Lambda vs Workers
Choosing the right deployment platform for your MCP server is one of the most impactful architectural decisions you'll make. This lesson provides a comprehensive comparison of AWS Lambda, Cloudflare Workers, and Google Cloud Run to help you make an informed choice.
Learning Objectives
By the end of this lesson, you will:
- Understand the architectural differences between platforms
- Compare costs across different usage patterns
- Match platform capabilities to MCP server requirements
- Choose the right platform for your specific use case
Platform Architecture Comparison
Fundamental Differences
┌─────────────────────────────────────────────────────────────────────┐
│ Platform Architecture Comparison │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ AWS Lambda │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ ZIP Package → Lambda Runtime → Firecracker microVM │ │
│ │ Event-driven, 15min timeout, 10GB memory │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Cloudflare Workers │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ WASM Binary → V8 Isolate → Edge Network (300+ locations) │ │
│ │ Request-driven, 30s CPU time, 128MB memory │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ Google Cloud Run │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Docker Image → gVisor Sandbox → Managed Kubernetes │ │
│ │ Request-driven, 60min timeout, 32GB memory │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────┘
Capability Matrix
| Capability | Lambda | Workers | Cloud Run |
|---|---|---|---|
| Max timeout | 15 min | 30s (CPU) | 60 min |
| Max memory | 10 GB | 128 MB | 32 GB |
| Max request size | 6 MB | 100 MB | 32 MB |
| Max response size | 6 MB | 100 MB | 32 MB |
| Filesystem | /tmp (10 GB) | None | In-memory |
| Concurrency | 1 per instance | 1 per isolate | Configurable |
| Cold start | 100-500ms (Rust) | <5ms | 500ms-3s |
| GPU support | No | No | Yes |
| WebSockets | Via API Gateway | Yes (beta) | Yes |
| Deployment | ZIP, Container | WASM | Container |
Cold Start Comparison
Measured Cold Start Times
┌─────────────────────────────────────────────────────────────────────┐
│ Cold Start Times (Rust MCP Server) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Platform p50 p95 p99 │
│ ───────────────── ───────── ───────── ───────── │
│ Workers 2ms 5ms 10ms │
│ Lambda (SnapStart) 50ms 150ms 300ms │
│ Lambda (standard) 100ms 300ms 500ms │
│ Cloud Run 400ms 1.2s 2.5s │
│ │
│ Cold Start Breakdown: │
│ │
│ Workers: │
│ ├── WASM instantiation ─── 1-3ms │
│ └── Total ─────────────── ~5ms │
│ │
│ Lambda (Rust): │
│ ├── Environment setup ──── 50-100ms │
│ ├── Binary loading ──────── 10-30ms │
│ ├── Runtime init ────────── 10-50ms │
│ └── Total ─────────────── 70-180ms │
│ │
│ Cloud Run: │
│ ├── Container pull ──────── 200-500ms (cached) │
│ ├── Container start ─────── 100-300ms │
│ ├── Application init ────── 50-200ms │
│ └── Total ─────────────── 350-1000ms │
│ │
└─────────────────────────────────────────────────────────────────────┘
Cold Start Mitigation
| Platform | Mitigation Strategy | Cost Impact |
|---|---|---|
| Lambda | Provisioned concurrency | $$$ |
| Lambda | SnapStart (Java) | Free |
| Workers | Always fast (by design) | Free |
| Cloud Run | Min instances | $$ |
| Cloud Run | CPU boost | $ |
Cost Comparison
Pricing Models
┌─────────────────────────────────────────────────────────────────────┐
│ Pricing Model Comparison │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ AWS Lambda │
│ ├── Requests: $0.20 per 1M requests │
│ ├── Duration: $0.0000166667 per GB-second │
│ └── Free tier: 1M requests, 400,000 GB-seconds/month │
│ │
│ Cloudflare Workers │
│ ├── Requests: $0.30 per 1M requests (after 10M free) │
│ ├── Duration: $12.50 per 1M GB-seconds │
│ └── Free tier: 100,000 requests/day, 10ms CPU/request │
│ │
│ Google Cloud Run │
│ ├── CPU: $0.00002400 per vCPU-second │
│ ├── Memory: $0.00000250 per GiB-second │
│ ├── Requests: $0.40 per 1M requests │
│ └── Free tier: 2M requests, 180,000 vCPU-seconds/month │
│ │
└─────────────────────────────────────────────────────────────────────┘
Cost Scenarios
Scenario 1: Low Volume (10,000 requests/month)
┌─────────────────────────────────────────────────────────────────────┐
│ Assumptions: 10,000 requests/month, 200ms avg duration │
│ 512MB memory (Lambda/Cloud Run) │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Lambda: │
│ ├── Requests: 10K × $0.0000002 = $0.002 │
│ ├── Duration: 10K × 0.2s × 0.5GB × $0.0000166667 = $0.017 │
│ └── Total: $0.02 (within free tier) │
│ │
│ Workers: │
│ ├── Requests: Within free tier │
│ └── Total: $0.00 │
│ │
│ Cloud Run (min=0): │
│ ├── Requests: 10K × $0.0000004 = $0.004 │
│ ├── CPU: 10K × 0.2s × 1vCPU × $0.000024 = $0.048 │
│ ├── Memory: 10K × 0.2s × 0.5GB × $0.0000025 = $0.0025 │
│ └── Total: $0.05 (within free tier) │
│ │
│ Winner: Workers (always free at this volume) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Scenario 2: Medium Volume (1M requests/month)
┌─────────────────────────────────────────────────────────────────────┐
│ Assumptions: 1M requests/month, 200ms avg duration │
│ 512MB memory, consistent traffic │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Lambda: │
│ ├── Requests: 1M × $0.0000002 = $0.20 │
│ ├── Duration: 1M × 0.2s × 0.5GB × $0.0000166667 = $1.67 │
│ └── Total: ~$1.87/month │
│ │
│ Workers: │
│ ├── Requests: (1M - 300K free) × $0.0000003 = $0.21 │
│ └── Total: ~$0.21/month │
│ │
│ Cloud Run (min=0): │
│ ├── Requests: (1M - 2M free) = $0 (within free tier) │
│ ├── CPU: 1M × 0.2s × 1vCPU × $0.000024 = $4.80 │
│ ├── Memory: 1M × 0.2s × 0.5GB × $0.0000025 = $0.25 │
│ └── Total: ~$5.05/month │
│ │
│ Cloud Run (min=1): │
│ ├── Base: 720h × 1vCPU × $0.0864/h = $62.21 │
│ └── Total: ~$62/month (always-on instance) │
│ │
│ Winner: Workers ($0.21) < Lambda ($1.87) < Cloud Run ($5-62) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Scenario 3: High Volume (100M requests/month)
┌─────────────────────────────────────────────────────────────────────┐
│ Assumptions: 100M requests/month, 200ms avg duration │
│ 1GB memory, peak traffic patterns │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Lambda: │
│ ├── Requests: 100M × $0.0000002 = $20 │
│ ├── Duration: 100M × 0.2s × 1GB × $0.0000166667 = $333 │
│ └── Total: ~$353/month │
│ │
│ Workers: │
│ ├── Requests: (100M - 10M) × $0.0000003 = $27 │
│ ├── Duration: 100M × 0.01s × $0.0000125 = $12.50 │
│ └── Total: ~$40/month │
│ │
│ Cloud Run (min=5, max=50): │
│ ├── Base min instances: 720h × 5 × $0.12/h = $432 │
│ ├── Burst capacity: variable │
│ └── Total: ~$500-800/month │
│ │
│ Winner: Workers ($40) < Lambda ($353) < Cloud Run ($500+) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Cost Summary
| Volume | Best Choice | Monthly Cost |
|---|---|---|
| <100K | Workers (free) | $0 |
| 100K-1M | Workers | $0-1 |
| 1M-10M | Workers | $1-30 |
| 10M-100M | Workers or Lambda | $30-400 |
| 100M+ | Workers | $40+ |
Note: Cloud Run becomes competitive when you need features it uniquely provides (long timeouts, large memory, GPUs).
Use Case Decision Matrix
Decision Flowchart
┌─────────────────────────────────────────────────────────────────────┐
│ Platform Selection Flowchart │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Start │
│ │ │
│ ▼ │
│ Need GPU acceleration? │
│ │ │
│ Yes ──────────────────────────────────────▶ Cloud Run │
│ │ │
│ No │
│ │ │
│ ▼ │
│ Need >15 minute timeout? │
│ │ │
│ Yes ──────────────────────────────────────▶ Cloud Run │
│ │ │
│ No │
│ │ │
│ ▼ │
│ Need >128MB memory? │
│ │ │
│ Yes │
│ │ │
│ ▼ │
│ Need >10GB memory? │
│ │ │
│ Yes ──────────────────────────────────────▶ Cloud Run │
│ │ │
│ No ───────────────────────────────────────▶ Lambda │
│ │ │
│ No (≤128MB) │
│ │ │
│ ▼ │
│ Need global edge deployment? │
│ │ │
│ Yes │
│ │ │
│ ▼ │
│ Operations take <30s CPU time? │
│ │ │
│ Yes ──────────────────────────────────────▶ Workers │
│ │ │
│ No ───────────────────────────────────────▶ Lambda + CloudFront │
│ │ │
│ No (regional is fine) │
│ │ │
│ ▼ │
│ In AWS ecosystem? │
│ │ │
│ Yes ──────────────────────────────────────▶ Lambda │
│ │ │
│ No ───────────────────────────────────────▶ Workers (default) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Platform-Specific Strengths
Choose Lambda When:
- AWS ecosystem integration: RDS, DynamoDB, S3, Cognito
- Event-driven patterns: SQS, SNS, EventBridge triggers
- Moderate memory needs: 128MB to 10GB
- Existing AWS infrastructure: VPC, IAM, CloudWatch
- Step Functions orchestration: Complex workflows
#![allow(unused)] fn main() { // Lambda excels at AWS integrations use aws_sdk_dynamodb::Client; use lambda_runtime::{service_fn, LambdaEvent}; async fn handler(event: LambdaEvent<McpRequest>) -> Result<McpResponse, Error> { let config = aws_config::load_from_env().await; let client = Client::new(&config); // Native DynamoDB integration let result = client .get_item() .table_name("mcp-data") .key("id", AttributeValue::S(event.payload.id)) .send() .await?; Ok(process_result(result)) } }
Choose Workers When:
- Global edge deployment: Sub-50ms latency worldwide
- Low memory requirements: ≤128MB is sufficient
- Simple compute: Transformations, routing, caching
- Cost sensitivity: Best pricing at most volumes
- Fast cold starts: User-facing APIs
// Workers excels at edge compute use worker::*; #[event(fetch)] async fn main(req: Request, env: Env, _ctx: Context) -> Result<Response> { // Request processed at edge location closest to user let cache = env.kv("CACHE")?; // Check edge cache first if let Some(cached) = cache.get("result").text().await? { return Response::ok(cached); } // Process and cache at edge let result = process_request(&req).await?; cache.put("result", &result)?.execute().await?; Response::ok(result) }
Choose Cloud Run When:
- Long operations: Processing takes >15 minutes
- Large memory: Need 10GB+ for ML models, large datasets
- GPU workloads: ML inference, image processing
- Complex containers: Multiple processes, specific OS needs
- Portability: Same container runs anywhere
#![allow(unused)] fn main() { // Cloud Run excels at long/heavy operations use axum::{routing::post, Router}; use tokio::time::Duration; async fn ml_inference(input: Json<InferenceRequest>) -> Json<InferenceResponse> { // Load large model into memory (needs >10GB) let model = load_model("s3://models/large-llm.bin").await; // Long-running inference (can take 5+ minutes) let result = model.infer(&input.prompt).await; Json(InferenceResponse { result }) } }
Migration Considerations
Lambda to Cloud Run
┌─────────────────────────────────────────────────────────────────────┐
│ Lambda → Cloud Run Migration │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ What Changes: │
│ ├── ZIP → Docker image │
│ ├── Handler function → HTTP server │
│ ├── AWS SDK → GCP SDK (or keep AWS with credentials) │
│ ├── CloudWatch → Cloud Logging/Monitoring │
│ └── IAM roles → Service accounts │
│ │
│ What Stays: │
│ ├── Rust code (mostly) │
│ ├── Business logic │
│ ├── MCP protocol handling │
│ └── External API integrations │
│ │
│ Effort: Medium (1-2 weeks for typical MCP server) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Lambda to Workers
┌─────────────────────────────────────────────────────────────────────┐
│ Lambda → Workers Migration │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ What Changes: │
│ ├── ZIP → WASM binary │
│ ├── tokio → wasm-bindgen-futures │
│ ├── AWS SDK → Workers bindings (KV, D1, R2) │
│ ├── std::fs → Workers storage APIs │
│ └── Some crates may not compile to WASM │
│ │
│ What Stays: │
│ ├── Pure Rust logic │
│ ├── serde serialization │
│ ├── MCP protocol handling │
│ └── HTTP request/response patterns │
│ │
│ Effort: High (2-4 weeks, WASM compatibility work) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Workers to Lambda
┌─────────────────────────────────────────────────────────────────────┐
│ Workers → Lambda Migration │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ What Changes: │
│ ├── WASM → Native binary (easier) │
│ ├── Workers bindings → AWS SDK │
│ ├── KV/D1 → DynamoDB/RDS │
│ ├── R2 → S3 │
│ └── Edge deployment → Regional deployment │
│ │
│ What Stays: │
│ ├── All Rust code (WASM subset compiles to native) │
│ ├── Business logic │
│ ├── MCP protocol handling │
│ └── HTTP patterns │
│ │
│ Effort: Low-Medium (1-2 weeks, mostly SDK swaps) │
│ │
└─────────────────────────────────────────────────────────────────────┘
Multi-Platform Architecture
Hybrid Deployment Pattern
For complex MCP servers, consider a hybrid approach:
┌─────────────────────────────────────────────────────────────────────┐
│ Hybrid MCP Architecture │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ │
│ Client Request ────▶│ Workers (Edge) │ │
│ │ - Auth check │ │
│ │ - Rate limiting │ │
│ │ - Caching │ │
│ └────────┬────────┘ │
│ │ │
│ ┌───────────────────────┼───────────────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │
│ │ Lambda │ │ Lambda │ │ Cloud Run │ │
│ │ - Quick tools │ │ - DB queries │ │ - ML inference│ │
│ │ - <100ms │ │ - AWS integr. │ │ - Long ops │ │
│ └───────────────┘ └───────────────┘ └───────────────┘ │
│ │
│ Benefits: │
│ ├── Edge caching reduces backend calls │
│ ├── Route to best platform per operation type │
│ ├── Scale each tier independently │
│ └── Graceful fallback between platforms │
│ │
└─────────────────────────────────────────────────────────────────────┘
Implementation
// Workers edge router #[event(fetch)] async fn main(req: Request, env: Env, _ctx: Context) -> Result<Response> { let mcp_request: McpRequest = req.json().await?; // Route based on tool type let backend_url = match mcp_request.tool_name.as_str() { // Quick operations → Lambda "search" | "lookup" | "validate" => { env.var("LAMBDA_URL")?.to_string() } // Database operations → Lambda (AWS integration) "query" | "insert" | "update" => { env.var("LAMBDA_DB_URL")?.to_string() } // Heavy operations → Cloud Run "analyze" | "generate" | "process" => { env.var("CLOUD_RUN_URL")?.to_string() } // Default to Lambda _ => env.var("LAMBDA_URL")?.to_string() }; // Forward to appropriate backend let mut headers = Headers::new(); headers.set("Content-Type", "application/json")?; Fetch::Request(Request::new_with_init( &backend_url, RequestInit::new() .with_method(Method::Post) .with_headers(headers) .with_body(Some(serde_json::to_string(&mcp_request)?.into())), )?) .send() .await }
Summary
Quick Reference
| Factor | Lambda | Workers | Cloud Run |
|---|---|---|---|
| Best for | AWS integration | Global edge | Heavy workloads |
| Cold start | 100-500ms | <5ms | 500ms-3s |
| Max memory | 10 GB | 128 MB | 32 GB |
| Max timeout | 15 min | 30s CPU | 60 min |
| Pricing model | Per request + duration | Per request | Per resource |
| Cost at scale | Medium | Lowest | Highest |
| Deployment | ZIP or Container | WASM | Container |
| Ecosystem | AWS | Cloudflare | GCP |
Recommendations by Use Case
| MCP Server Type | Recommended Platform |
|---|---|
| Database explorer | Lambda (AWS) or Cloud Run (GCP) |
| File system tools | Cloud Run |
| API integration | Workers or Lambda |
| ML inference | Cloud Run |
| Real-time data | Workers |
| Multi-step workflows | Lambda + Step Functions |
| Global availability | Workers |
| Cost-sensitive | Workers |
Final Advice
- Start with Workers if your requirements fit within its constraints (128MB memory, 30s CPU time)
- Use Lambda for AWS ecosystem integration or when you need more memory/time
- Choose Cloud Run when you need maximum flexibility, GPUs, or very long operations
- Consider hybrid for complex MCP servers with varied operation types
The best platform is the one that matches your specific requirements while minimizing complexity and cost.
Practice Ideas
These informal exercises help reinforce the concepts.
Practice 1: Platform Comparison
Deploy the same MCP server to all three platforms and measure cold start times, response latency, and costs.
Practice 2: Cost Analysis
Calculate the monthly cost for your expected traffic pattern on each platform and identify the break-even points.
Practice 3: Migration Plan
Create a migration plan for moving an existing MCP server from one platform to another, identifying all required changes.