apr-tokenize

BPE tokenizer training pipeline. Trains a Byte Pair Encoding vocabulary from a text corpus, shows merge history, and demonstrates tokenization with roundtrip verification.

cargo run --example cli_apr_tokenize -- --demo

CLI equivalent: apr tokenize corpus.txt --method bpe --vocab-size 100