Lab: Inference Server
Build a model serving infrastructure with batching and health checks.
Objectives
- Implement prediction endpoint
- Add request batching
- Configure health monitoring
Demo Code
See demos/course3/week4/inference-server/
Lab Exercise
See labs/course3/week4/lab_4_5_serving.py
Key Components
#![allow(unused)] fn main() { pub struct InferenceServer { model: Box<dyn Model>, batcher: RequestBatcher, metrics: ServerMetrics, } impl InferenceServer { pub async fn predict(&self, request: PredictRequest) -> PredictResponse { let start = Instant::now(); let result = self.batcher.add(request).await; self.metrics.record_request(start.elapsed()); result } pub fn health(&self) -> HealthResponse { HealthResponse { status: "healthy", model_loaded: self.model.is_loaded(), requests_processed: self.metrics.total_requests(), } } } }