Model Evaluation
Evaluates an APR language model by computing perplexity and cross-entropy on synthetic test data. Uses the log-sum-exp trick for numerical stability.
CLI Equivalent
apr eval model.apr --dataset test.jsonl
Key Concepts
- Perplexity and cross-entropy computation
- Log-sum-exp trick for numerical stability
- Pass/fail threshold gating on perplexity
Run
cargo run --example analysis_eval