Week 1: Foundation Models and LLM Serving

Overview

Understand LLM serving by building a tokenizer and inference server in Rust.

Topics

#TypeTitlePlatformDuration
1.1VideoThe GenAI LandscapeConcept10 min
1.2VideoDatabricks Foundation Model APIsDatabricks10 min
1.3LabQuery Models in PlaygroundDatabricks25 min
1.4VideoGGUF Format and QuantizationSovereign10 min
1.5LabServe Local Model with realizarSovereign35 min
1.6VideoTokenization Deep DiveConcept10 min
1.7LabBuild BPE TokenizerSovereign30 min
1.8VideoExternal Models and AI GatewayDatabricks8 min
1.9QuizLLM Serving Fundamentals15 min

Sovereign AI Stack Components

  • realizar for GGUF inference
  • tokenizers crate for BPE

Key Concepts

Tokenization

  • BPE (Byte-Pair Encoding) algorithm
  • Vocabulary and merge rules
  • Special tokens: <|endoftext|>, <|pad|>

Model Quantization

  • FP16, INT8, INT4 representations
  • GGUF format: Q4_K_M, Q5_K_M, Q8_0
  • Memory vs accuracy trade-offs