LLaMA 2 Chat Template
Status: Verified | Idempotent: Yes | Coverage: 95%+
CLI Equivalent: apr chat --format llama2
What This Demonstrates
LLaMA 2 uses a unique chat format with [INST] / [/INST] delimiters and a <<SYS>> block for system prompts. System prompts are embedded inside the first [INST] block only, and each complete turn is wrapped with <s> (BOS) and </s> (EOS) tokens.
Run Command
cargo run --example chat_llama2
Key APIs
format_system_block(&content)-- Wrap system message in<<SYS>>delimitersformat_llama2(&messages, add_generation_prompt)-- Format a full conversation with per-turn BOS/EOS wrapping
Code
//! # Recipe: LLaMA 2 Chat Template Formatting
//!
//! **Category**: chat
//! **CLI Equivalent**: `apr chat --format llama2`
//! Contract: contracts/recipe-iiur-v1.yaml
//! **APR Spec**: APR-021 (Chat Template Support)
//!
//! ## What this demonstrates
//!
//! LLaMA 2 uses a unique chat format with `[INST]` / `[/INST]` delimiters
//! and a `<<SYS>>` block for system prompts. This example implements the
//! full LLaMA 2 chat template specification, including multi-turn handling
//! where system prompts are only included in the first turn.
//!
//! ## Format specification
//!
//! ```text
//! <s>[INST] <<SYS>>
//! system message
//! <</SYS>>
//!
//! user message [/INST] assistant response </s>
//! <s>[INST] next user message [/INST]
//! ```
//!
//! ## Sections
//! 1. Basic user message
//! 2. System prompt placement
//! 3. Multi-turn conversation
//! 4. Comparison with ChatML
//!
//! ## QA Checklist
//!
//! - [x] Compiles with `cargo build --example chat_llama2`
//! - [x] Runs with `cargo run --example chat_llama2`
//! - [x] Tests pass with `cargo test --example chat_llama2`
//! - [x] No unsafe code
//! - [x] No unwrap on user data
//! - [x] Clippy clean
//!
//!
//! ## Format Variants
//! ```bash
//! apr chat model.apr # APR native format
//! apr chat model.gguf # GGUF (llama.cpp compatible)
//! apr chat model.safetensors # SafeTensors (HuggingFace)
//! ```
//! ## References
//! - Touvron, H. et al. (2023). *LLaMA: Open and Efficient Foundation Language Models*. arXiv:2302.13971
use apr_cookbook::prelude::*;
/// A single message in a chat conversation.
#[derive(Debug, Clone)]
struct ChatMessage {
role: String,
content: String,
}
impl ChatMessage {
fn new(role: &str, content: &str) -> Self {
Self {
role: role.to_string(),
content: content.to_string(),
}
}
}
/// LLaMA 2 special tokens and delimiters.
const BOS: &str = "<s>";
const EOS: &str = "</s>";
const INST_START: &str = "[INST]";
const INST_END: &str = "[/INST]";
const SYS_START: &str = "<<SYS>>";
const SYS_END: &str = "<</SYS>>";
/// Format a system prompt in LLaMA 2 style.
///
/// Wraps the system message in `<<SYS>>` delimiters with proper newlines.
fn format_system_block(system_content: &str) -> String {
format!("{SYS_START}\n{system_content}\n{SYS_END}\n\n")
}
/// Extract the leading system message (if any) and return the remaining
/// conversation messages. LLaMA 2 embeds the system prompt inside the
/// first `[INST]` block via `<<SYS>>` delimiters.
fn extract_system_prefix(messages: &[ChatMessage]) -> (Option<&str>, Vec<&ChatMessage>) {
let mut system_prompt: Option<&str> = None;
let mut conversation: Vec<&ChatMessage> = Vec::new();
for msg in messages {
if msg.role == "system" && system_prompt.is_none() && conversation.is_empty() {
system_prompt = Some(&msg.content);
} else {
conversation.push(msg);
}
}
(system_prompt, conversation)
}
/// Format a single `<s>[INST] ... [/INST]` user-assistant turn.
///
/// When `system_block` is `Some`, the `<<SYS>>` block is injected before the
/// user content (first turn only). Returns the number of conversation
/// messages consumed (1 for a trailing user message, 2 for a user+assistant
/// pair).
fn format_llama2_turn(
output: &mut String,
conversation: &[&ChatMessage],
index: usize,
system_block: Option<&str>,
add_generation_prompt: bool,
) -> usize {
let user_msg = &conversation[index];
assert_eq!(
user_msg.role, "user",
"Expected user message at position {index}"
);
output.push_str(BOS);
output.push_str(INST_START);
output.push(' ');
if let Some(sys) = system_block {
output.push_str(&format_system_block(sys));
}
output.push_str(&user_msg.content);
output.push(' ');
output.push_str(INST_END);
// Pair with the following assistant response when present
let next_is_assistant =
index + 1 < conversation.len() && conversation[index + 1].role == "assistant";
if next_is_assistant {
output.push(' ');
output.push_str(&conversation[index + 1].content);
output.push(' ');
output.push_str(EOS);
2
} else {
if add_generation_prompt {
output.push(' ');
}
1
}
}
/// Format a sequence of chat messages in LLaMA 2 chat format.
///
/// Rules:
/// - System prompt (if present) is embedded in the first `[INST]` block.
/// - User/assistant messages alternate in `[INST]`/`[/INST]` pairs.
/// - Each complete turn is wrapped with `<s>` and `</s>`.
/// - The last user message gets no `</s>` if `add_generation_prompt` is true.
fn format_llama2(messages: &[ChatMessage], add_generation_prompt: bool) -> String {
if messages.is_empty() {
return String::new();
}
let (system_prompt, conversation) = extract_system_prefix(messages);
let mut output = String::new();
let mut i = 0;
while i < conversation.len() {
// Only inject the system block on the very first turn
let sys_block = if i == 0 { system_prompt } else { None };
i += format_llama2_turn(
&mut output,
&conversation,
i,
sys_block,
add_generation_prompt,
);
}
output
}
fn main() -> Result<()> {
let mut ctx = RecipeContext::new("chat_llama2")?;
// --- Section 1: Basic user message ---
println!("=== Basic Format ===");
let messages = vec![ChatMessage::new("user", "What is the APR format?")];
let formatted = format_llama2(&messages, true);
println!("Basic user message:\n{formatted}");
assert!(formatted.contains(INST_START), "Must contain [INST]");
assert!(formatted.contains(INST_END), "Must contain [/INST]");
assert!(formatted.starts_with(BOS), "Must start with BOS token");
ctx.record_metric("basic_msg_bytes", formatted.len() as i64);
// --- Section 2: With system prompt ---
println!("\n=== System Prompt ===");
let messages = vec![
ChatMessage::new("system", "You are an expert in ML model formats."),
ChatMessage::new("user", "Explain APR compression."),
];
let formatted = format_llama2(&messages, true);
println!("With system prompt:\n{formatted}");
assert!(formatted.contains(SYS_START), "Must contain <<SYS>>");
assert!(formatted.contains(SYS_END), "Must contain <</SYS>>");
assert!(
formatted.find(SYS_START).expect("SYS_START present")
< formatted.find("Explain APR").expect("user msg present"),
"System prompt must come before user message"
);
// --- Section 3: Multi-turn conversation ---
println!("\n=== Multi-Turn Conversation ===");
let messages = vec![
ChatMessage::new("system", "Be concise."),
ChatMessage::new("user", "What is quantization?"),
ChatMessage::new("assistant", "Reducing model precision to save memory."),
ChatMessage::new("user", "What precisions does APR support?"),
];
let formatted = format_llama2(&messages, true);
println!("Multi-turn:\n{formatted}");
let inst_count = formatted.matches(INST_START).count();
println!("Number of [INST] blocks: {inst_count}");
assert_eq!(inst_count, 2, "Two user turns = two [INST] blocks");
ctx.record_metric("multi_turn_inst_blocks", inst_count as i64);
// System prompt only in the first turn
let first_inst = formatted.find(INST_START).expect("first INST");
let second_inst_start = first_inst + INST_START.len();
let second_inst = formatted[second_inst_start..].find(INST_START);
if let Some(offset) = second_inst {
let second_block = &formatted[second_inst_start + offset..];
assert!(
!second_block.contains(SYS_START),
"System prompt must NOT appear in second turn"
);
}
// --- Section 4: Comparison with ChatML ---
println!("\n=== Format Comparison ===");
let messages = vec![
ChatMessage::new("system", "You are helpful."),
ChatMessage::new("user", "Hello!"),
];
let llama2_out = format_llama2(&messages, true);
// Approximate ChatML for comparison
let chatml_out = "<|im_start|>system\nYou are helpful.<|im_end|>\n\
<|im_start|>user\nHello!<|im_end|>\n\
<|im_start|>assistant\n";
println!("LLaMA 2 ({} bytes):\n{llama2_out}", llama2_out.len());
println!("ChatML ({} bytes):\n{chatml_out}", chatml_out.len());
println!("LLaMA 2 nests system inside [INST]; ChatML uses separate role blocks.");
ctx.record_metric("llama2_bytes", llama2_out.len() as i64);
ctx.record_metric("chatml_bytes", chatml_out.len() as i64);
ctx.report()?;
Ok(())
}
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_basic_user_message() {
let messages = vec![ChatMessage::new("user", "Hello")];
let formatted = format_llama2(&messages, false);
assert_eq!(formatted, "<s>[INST] Hello [/INST]");
}
#[test]
fn test_user_with_generation_prompt() {
let messages = vec![ChatMessage::new("user", "Hello")];
let formatted = format_llama2(&messages, true);
assert!(formatted.contains("[/INST] "));
}
#[test]
fn test_system_prompt_placement() {
let messages = vec![
ChatMessage::new("system", "Be helpful."),
ChatMessage::new("user", "Hi"),
];
let formatted = format_llama2(&messages, false);
assert!(formatted.contains("<<SYS>>\nBe helpful.\n<</SYS>>"));
}
#[test]
fn test_system_prompt_before_user() {
let messages = vec![
ChatMessage::new("system", "System msg"),
ChatMessage::new("user", "User msg"),
];
let formatted = format_llama2(&messages, false);
let sys_pos = formatted.find("System msg").expect("system present");
let usr_pos = formatted.find("User msg").expect("user present");
assert!(sys_pos < usr_pos, "System must come before user");
}
#[test]
fn test_user_assistant_pair() {
let messages = vec![
ChatMessage::new("user", "What is Rust?"),
ChatMessage::new("assistant", "A systems language."),
];
let formatted = format_llama2(&messages, false);
assert!(formatted.contains("[/INST] A systems language. </s>"));
}
#[test]
fn test_multi_turn_structure() {
let messages = vec![
ChatMessage::new("user", "Q1"),
ChatMessage::new("assistant", "A1"),
ChatMessage::new("user", "Q2"),
];
let formatted = format_llama2(&messages, false);
assert_eq!(formatted.matches("[INST]").count(), 2);
assert_eq!(formatted.matches("[/INST]").count(), 2);
assert_eq!(
formatted.matches("</s>").count(),
1,
"Only completed turn gets EOS"
);
}
#[test]
fn test_system_only_in_first_turn() {
let messages = vec![
ChatMessage::new("system", "Be brief."),
ChatMessage::new("user", "Q1"),
ChatMessage::new("assistant", "A1"),
ChatMessage::new("user", "Q2"),
];
let formatted = format_llama2(&messages, false);
// Find second [INST] block and verify no <<SYS>> in it
let first_end = formatted.find("[/INST]").expect("first end");
let rest = &formatted[first_end..];
assert!(
!rest.contains("<<SYS>>"),
"System prompt must not appear in later turns"
);
}
#[test]
fn test_bos_token_present() {
let messages = vec![ChatMessage::new("user", "Hi")];
let formatted = format_llama2(&messages, false);
assert!(formatted.starts_with("<s>"), "Must begin with BOS token");
}
#[test]
fn test_eos_after_assistant() {
let messages = vec![
ChatMessage::new("user", "Hi"),
ChatMessage::new("assistant", "Hello!"),
];
let formatted = format_llama2(&messages, false);
assert!(
formatted.ends_with("</s>"),
"Must end with EOS after assistant"
);
}
#[test]
fn test_empty_messages() {
let formatted = format_llama2(&[], false);
assert!(formatted.is_empty());
}
#[test]
fn test_format_deterministic() {
let messages = vec![
ChatMessage::new("system", "Sys"),
ChatMessage::new("user", "Usr"),
];
let a = format_llama2(&messages, true);
let b = format_llama2(&messages, true);
assert_eq!(a, b);
}
#[test]
fn test_multi_turn_with_system() {
let messages = vec![
ChatMessage::new("system", "You are an AI."),
ChatMessage::new("user", "Hello"),
ChatMessage::new("assistant", "Hi!"),
ChatMessage::new("user", "Bye"),
ChatMessage::new("assistant", "Goodbye!"),
];
let formatted = format_llama2(&messages, false);
assert_eq!(formatted.matches("<s>").count(), 2, "Two turns = two BOS");
assert_eq!(
formatted.matches("</s>").count(),
2,
"Two complete turns = two EOS"
);
}
#[test]
fn test_format_system_block() {
let block = format_system_block("Test system");
assert_eq!(block, "<<SYS>>\nTest system\n<</SYS>>\n\n");
}
}