15. Implementation Phases
Phase 0: Pipeline Manifest, Contracts & Quality Baseline (Week 1)
- Write
configs/pipeline/albor.yaml— full pipeline manifest (infra + data + train + eval + publish) -
apr pipeline plan— validate entire DAG, estimate resources -
apr pipeline apply --target cuda-driver --target vulkan-driver --target data-dir— provision infra - Verify
truenowgpu on W5700X via Vulkan (not Metal — Linux) - Verify
truenoCUDA on 4090 - Download Qwen3-Coder-Next to intel box, verify it loads in realizar
-
pmat tdg baseline createon all stack components -
pv coverage contracts/ --binding— establish contract coverage baseline -
batuta falsify . --critical-only— initial falsification assessment
Phase 1: Data Pipeline + Tokenizer Contract (Week 1-2)
- Ingest local ground truth corpora via
alimentar import local(fix ALB-019 if needed)- depyler: examples/ + tdd-book/tests/ (~1,845 files, ~219K lines)
- hf-ground-truth-corpus (~11,928 files)
- jax-ground-truth-corpus (~2,697 files)
- vllm-ground-truth-corpus (~1,118 files)
- Ingest local ML framework code (Tier 2, ~53K files)
- Download external datasets via
alimentar import hf(StarCoder Python, FineWeb-Edu) - Quality validation via
alimentar quality checkon all sources - Build weighted training mix with 10x upsampling on Tier 1 (fix ALB-020 if needed)
- Write
bpe-tokenizer-kernel-v1.yamlcontract (ALB-014) -
pv probar+pv kanion tokenizer contract - Train BPE tokenizer on mixed corpus (fix ALB-001 if needed)
- Verify FALSIFY roundtrip:
decode(encode(text)) = textfor all test data - Tokenize all data into sharded Parquet
- Apply FIM transforms to code sequences (fix ALB-018 if needed)
- Create train/val/test splits via
alimentar - Record SHA-256 hashes + provenance manifest for all data artifacts
-
pmat comply check --stricton alimentar changes
Phase 2: Pipeline Validation — 50M Model (Week 2) – COMPLETE
- Write
gradient-accumulation-kernel-v1.yamlcontract (ALB-017) - Write
configs/train/pretrain-50m.yaml(model arch + training + monitoring) - Train albor-50M on 4090 — 500 rows, 31 steps, 110.7s, loss 10.3→4.42
- Validate
apr monitor— ALB-025 FIXED (presentar widget migration complete) - Validate Andon alerts during full training run
-
Fix ALB-009FIXED - Verify FALSIFY-ALBOR-001 (loss decreases) — CORROBORATED
- Verify FALSIFY-ALBOR-002 (gradient bounds) — per-step logging now available (
ALB-035FIXED) -
pv audit— PASS: 7/7 contracts, 0 findings - Milestone: Training loop converges ✓, contracts pass ✓
Phase 3: Base Model — 350M Pre-Training (Week 2-4) – IN PROGRESS
- Write
configs/train/pretrain-350m.yaml— pre-tokenized ByteLevel BPE v2, 22K×2048 tokens - Train albor-base-350m on 4090 — STARTED (2760 batches, ~20h est.)
- Build evaluation infrastructure — eval-code.py, eval-perplexity.py, 35 benchmark problems
-
Fix ALB-038FIXED — RMSNorm + attention backward ops, all 20 params receive gradients -
Fix ALB-041FIXED — D2D buffer size mismatch in backward_attention (entrenar@a48e3d2) -
Fix ALB-043FIXED — backward_ffn buffer overflow + SwiGLU gradients (entrenar@f7805f1) -
Fix ALB-044FIXED — activation gradient clipping at GPU-CPU boundary + CPU optimizer hyperparams (entrenar@86eec38) -
Fix ALB-059FIXED — GEMM backward constructor args n/k swapped, buffer overflow into optimizer states + zero-init optimizer m/v (entrenar@846ae0c) - Write
training-memory-kernel-v1.yamlcontract (ALB-039) — VRAM budget estimation - Write
training-gpu-kernel-v1.yamlcontract (ALB-040) — GPU-resident training invariants - Implement
CudaTransformerTrainer(ALB-040) — 3 PCIe transfers/step vs ~16K - Dogfood CUDA training — 50M test: 3 steps, loss 10.4→11.7, GPU forward+backward working
-
ALB-037FIXED — realizar loads trained SafeTensors checkpoint, generates tokens (e2e verified) - 350M CUDA test training — 50 steps, loss 10.39→5.92 (best 5.53), checkpoint valid
- realizar inference verified — 218 tensors loaded, generates from trained weights
- Checkpoint validation: PASS (weights trained, not initialization)
- Perplexity eval: 31,926 (finite, consistent with 50-step model — random baseline ~32,768)
-
Fix ALB-060CONFIG FIXED — epochs=1 only ran 43/5000 steps. C-TRAINCFG-001 contract written. Config fixed (v1: epochs=117, v2: epochs=1 with 68K seqs) - Expand training data: Tier 1 10x + 8 Tier 2 repos → v2 dataset (67,977 seqs, 139M tokens)
-
Fix ALB-071FIXED — embed gradient clipping decoupled from weight grad_clip (entrenar@d07d67d) -
Fix ALB-072FIXED — fp16 loss scaling (65536x) removed from fused CE kernel; all backward uses f32, no underflow risk (entrenar@44d3e74) - Full 350M v2 training — reached step 1183/5000, loss 10.40→6.85, val_ppl=1008. Crashed: ALB-073 (PTX selp) + ALB-074 (buffer overflow from stale binary). Step 1000 checkpoint saved (1520 MB).
-
Fix ALB-073FIXED — fused_cross_entropy selp arg order, same class as ALB-069 (trueno@10bec89) -
Fix ALB-074FIXED — stale binary missed eval truncation fix. Rebuilt withentrenar@5c4c2d8. - Monitor training via
apr monitor(ALB-025 FIXED) - Data scaling: Download codeparrot-clean (2M files, ~4.4B tokens) → pretokenize at 1024 → ~5.2M sequences
- Full 350M v3 training — PENDING: 250K steps on ~1B tokens from codeparrot-clean. Config:
pretrain-350m-v3.yaml. ETA ~10 days. - Validate loss curve, perplexity convergence
- HumanEval pass@1 evaluation (target >8%)
- Verify FALSIFY-ALBOR-003 (checkpoint determinism)
-
pmat tdg check-regressionon all touched components - Milestone: HumanEval pass@1 > 8%, Perplexity < 30, TDG grade A maintained
Phase 4: Teacher Setup & Logit Pre-Computation (Week 3-5)
- Fix ALB-010: Add Qwen3-Coder-Next support to realizar (stretch — 3-4 week blocker)
- Download Qwen2.5-Coder-3B interim teacher (5.75 GiB, Apache 2.0) — unblocks distillation without ALB-010
- Validate 3B teacher:
apr distill --stage precomputeworks, RosettaStone handles sharded SafeTensors - Create distillation config:
configs/train/distill-qwen3b.yaml(T=4.0, α=0.5, LoRA r=16) - Validate teacher inference on intel (CPU, fp16, 300GB RAM) — for 80B stretch goal
- Write
knowledge-distillation-kernel-v1.yamlcontract (ALB-013) — DOGFOODING -
pv kanion KD loss contract (KL non-negativity, temperature scaling) -
Fix ALB-011FIXED —apr distill --config --stage precompute|trainworks - Pre-compute 3B teacher logits on v2 dataset (background, 4-8h CPU)
- Verify FALSIFY-ALBOR-006 (teacher logit integrity)
- Store as sharded Parquet via alimentar
-
pmat comply check --stricton realizar changes - Milestone: Teacher logits verified, KD contract at Level 4
Phase 5: Knowledge Distillation (Week 5-6)
- Implement
apr distill applywith KD loss - Distill albor-base-350m → albor-distill-350m
- Verify FALSIFY-ALBOR-004 (KL non-negativity in production)
- Verify FALSIFY-ALBOR-005 (distillation improves benchmarks)
- Benchmark: measure improvement over base
-
pv probar --bindingon KD contract with actual training data - Milestone: >5% avg benchmark improvement, KD contract fully wired
Phase 6: Post-Training Optimization (Week 6-8)
- Write
model-merging-kernel-v1.yamlcontract (ALB-015) — DOGFOODING - Write
pruning-kernel-v1.yamlcontract (ALB-016) — DOGFOODING - Fine-tune with LoRA:
apr finetune→ albor-instruct - Merge variants:
apr merge --method slerp→ albor-merged - Verify FALSIFY-ALBOR-007 (SLERP interpolation bound)
- Prune:
apr prune --method wanda→ albor-pruned - Verify FALSIFY-ALBOR-008 (sparsity guarantee)
- Quantize:
apr quantize --method q4_k→ albor-q4 - Verify FALSIFY-ALBOR-009 (quantization fidelity)
- Benchmark every variant
-
pv coverage contracts/ --binding— final contract coverage report - Milestone: Full ladder complete, all post-training contracts pass
Phase 7: Quality Assurance & Falsification Sweep (Week 8)
-
batuta falsify . --min-grade toyota-standard --verbose— full 108-item assessment -
pmat rust-project-score --fullon all touched components -
pmat tdg check-regression --baseline— no quality regressions -
pv graph contracts/ --format mermaid— publish verification DAG -
pv status contracts/— all contracts at Level 3+, critical at Level 4 -
cargo mutants --no-timeson all new code — mutation score ≥ 85% -
cargo llvm-cov— coverage ≥ 95% on all new code - Address any falsification failures or contract violations
- Milestone: Toyota Standard grade, all quality gates green
Phase 8: Evaluation, Leaderboard Submission & Publication (Week 8-9)
- Final eval on all benchmark tasks (all 6 model variants)
- Run
bigcode-evaluation-harnesswith leaderboard-standard params on best model - Submit PR to Big Code Models Leaderboard (
community_results/folder) - Export all models: SafeTensors + GGUF
-
apr publishto HuggingFace Hub aspaiml/albor-* - Write model card with full reproducibility details + leaderboard results
- Publish training logs, loss curves, eval trajectories
- Publish verification report (contract status, falsification results)
-
batuta falsify . --format markdown --output docs/falsification-report.md - Milestone: Models on HuggingFace, leaderboard submission live, quality evidence published
Phase 9: Distributed Training — Stretch (Week 9+)
- entrenar native DDP infrastructure (TCP wire protocol v2, GradientServer, WorkerClient, PerBlockGradientAccumulator, RingAllReduce) — entrenar#133
- Wire DDP train_batch() into DistributedCudaTrainer — COMPLETE (train_loop_cuda_distributed, allreduce_impl, spawn_coordinator_thread)
- Multi-process launcher — COMPLETE (rank 0 auto-spawns GradientServer, all ranks connect as WorkerClient via
--distributedCLI flags) - wgpu backward pass in trueno (ALB-005) — for cross-vendor GPU support
- Full distributed training: 4090 + W5700X x2
- Milestone: Multi-GPU training demonstrated