Week 3: Delta Lake & Workflows
Overview
Build reliable data pipelines with Delta Lake — ACID transactions, schema enforcement, DML operations (INSERT, UPDATE, MERGE), and time travel. Then orchestrate pipelines with Databricks Jobs, Dashboards, and Workflows.
Topics
| # | Type | Title | Duration |
|---|---|---|---|
| 3.1.1 | Video | What Is Delta Lake | 10 min |
| 3.1.2 | Video | Delta Lake Concepts | 12 min |
| 3.1.3 | Video | Creating Delta Tables | 10 min |
| 3.2.1 | Video | Insert, Update & Merge | 12 min |
| 3.2.2 | Video | Time Travel | 8 min |
| 3.3.1 | Video | Jobs, Dashboards & Workflows | 12 min |
| — | Lab | Delta Tables | 45 min |
| — | Lab | Jobs & Workflows | 30 min |
| — | Quiz | Delta Lake & Workflows | 15 min |
Key Concepts
Delta Lake Architecture
Delta Table
├── _delta_log/ # Transaction log (JSON + Parquet)
│ ├── 00000000000000.json # Version 0
│ ├── 00000000000001.json # Version 1
│ └── 00000000000010.checkpoint.parquet
└── part-00000-*.parquet # Data files (standard Parquet)
The transaction log records every change, enabling ACID guarantees.
Delta Lake Features
| Feature | What It Does | Why It Matters |
|---|---|---|
| ACID Transactions | Atomic, consistent writes | No corrupt/partial data |
| Schema Enforcement | Validates data on write | Data quality |
| Schema Evolution | Add columns safely | Agile development |
| Time Travel | Query historical versions | Auditing, rollback |
| MERGE (Upsert) | INSERT + UPDATE + DELETE | Efficient CDC |
| Auto-Optimize | Compacts small files | Query performance |
DML Operations
- INSERT:
df.write.format("delta").mode("append") - UPDATE:
UPDATE table SET col = val WHERE condition - MERGE: Match on key — update if exists, insert if not
- Time Travel:
SELECT * FROM table VERSION AS OF n
Databricks Workflows
- Job: Scheduled execution of a notebook or script
- Task: Single unit of work within a workflow
- Workflow: Multi-task DAG with dependencies
- Dashboard: SQL-powered visualizations connected to SQL Warehouses
- Widgets: Parameterize notebooks for reusable pipelines
Certification Topics
Key accreditation concepts from this week:
- Delta Lake provides ACID transactions via the transaction log
- MERGE combines INSERT, UPDATE, and DELETE in one atomic operation
- Time travel enables querying any previous version of the data
- Schema enforcement prevents bad data; schema evolution adds columns safely
- Jobs use job clusters (auto-created, auto-terminated) for scheduled workloads
- Workflows orchestrate multi-step pipelines with DAG dependencies
Demo Code
demos/course1/week3/databricks-delta/— Delta tables, DML, time traveldemos/course1/week3/databricks-workflows/— Jobs, dashboards, workflows