Mistral Chat Template

Status: Verified | Idempotent: Yes | Coverage: 95%+

CLI Equivalent: apr chat --format mistral

What This Demonstrates

Mistral Instruct uses [INST] / [/INST] delimiters like LLaMA 2 but has no native system prompt role. System instructions are prepended to the first user message. A single BOS token appears at the start (not per-turn), producing a tighter format with fewer tokens.

Run Command

cargo run --example chat_mistral

Key APIs

format_mistral(&messages, add_generation_prompt) -- Format conversation with system-as-prefix handling
has_native_system_support() -- Returns false; documents the lack of a dedicated system role

Code

//! # Recipe: Mistral Chat Template Formatting
//!
//! **Category**: chat
//! **CLI Equivalent**: `apr chat --format mistral`
//! Contract: contracts/recipe-iiur-v1.yaml
//! **APR Spec**: APR-021 (Chat Template Support)
//!
//! ## What this demonstrates
//!
//! Mistral uses a minimal chat format with `[INST]` / `[/INST]` delimiters
//! but notably does NOT support a dedicated system prompt role. System
//! instructions must be prepended to the first user message. This example
//! implements the Mistral Instruct template and compares it with LLaMA 2.
//!
//! ## Format specification
//!
//! ```text
//! <s>[INST] user message [/INST] assistant response</s>[INST] next user [/INST]
//! ```
//!
//! ## Sections
//! 1. Basic user message
//! 2. Multi-turn conversation
//! 3. System prompt handling (no native support)
//! 4. Comparison with LLaMA 2
//! 5. Extended multi-turn
//!
//! ## QA Checklist
//!
//! - [x] Compiles with `cargo build --example chat_mistral`
//! - [x] Runs with `cargo run --example chat_mistral`
//! - [x] Tests pass with `cargo test --example chat_mistral`
//! - [x] No unsafe code
//! - [x] No unwrap on user data
//! - [x] Clippy clean
//!
//!
//! ## Format Variants
//! ```bash
//! apr chat model.apr          # APR native format
//! apr chat model.gguf         # GGUF (llama.cpp compatible)
//! apr chat model.safetensors  # SafeTensors (HuggingFace)
//! ```
//! ## References
//! - Touvron, H. et al. (2023). *LLaMA: Open and Efficient Foundation Language Models*. arXiv:2302.13971

use apr_cookbook::prelude::*;

/// A single message in a chat conversation.
#[derive(Debug, Clone)]
struct ChatMessage {
    role: String,
    content: String,
}

impl ChatMessage {
    fn new(role: &str, content: &str) -> Self {
        Self {
            role: role.to_string(),
            content: content.to_string(),
        }
    }
}

/// Mistral special tokens.
const BOS: &str = "<s>";
const EOS: &str = "</s>";
const INST_START: &str = "[INST]";
const INST_END: &str = "[/INST]";

/// Extract the leading system message (if any) and return the remaining
/// conversation messages.  Mistral has no native `<<SYS>>` block, so the
/// system content is later prepended to the first user turn.
fn extract_system_prefix(messages: &[ChatMessage]) -> (String, Vec<&ChatMessage>) {
    let mut system_prefix = String::new();
    let mut conversation: Vec<&ChatMessage> = Vec::new();

    for msg in messages {
        if msg.role == "system" && conversation.is_empty() && system_prefix.is_empty() {
            system_prefix = msg.content.clone();
        } else {
            conversation.push(msg);
        }
    }

    (system_prefix, conversation)
}

/// Format a single `[INST] ... [/INST]` user turn, optionally prepending a
/// system prefix.  Returns the number of messages consumed (1 if user-only,
/// 2 if followed by an assistant response).
fn format_turn(
    output: &mut String,
    conversation: &[&ChatMessage],
    index: usize,
    system_prefix: &str,
    add_generation_prompt: bool,
) -> usize {
    let msg = &conversation[index];

    output.push_str(INST_START);
    output.push(' ');

    // Prepend system instructions to the very first user message
    if index == 0 && !system_prefix.is_empty() {
        output.push_str(system_prefix);
        output.push_str("\n\n");
    }

    output.push_str(&msg.content);
    output.push(' ');
    output.push_str(INST_END);

    // Pair with the following assistant response when present
    let next_is_assistant =
        index + 1 < conversation.len() && conversation[index + 1].role == "assistant";

    if next_is_assistant {
        output.push(' ');
        output.push_str(&conversation[index + 1].content);
        output.push_str(EOS);
        2
    } else {
        if add_generation_prompt {
            output.push(' ');
        }
        1
    }
}

/// Format a sequence of chat messages in Mistral Instruct format.
///
/// Key differences from LLaMA 2:
/// - No `<<SYS>>` block: system messages are prepended to the first user message.
/// - BOS token only at the very beginning (not per-turn).
/// - EOS token after each assistant response.
/// - No space padding around assistant response (tighter format).
fn format_mistral(messages: &[ChatMessage], add_generation_prompt: bool) -> String {
    if messages.is_empty() {
        return String::new();
    }

    let (system_prefix, conversation) = extract_system_prefix(messages);

    let mut output = String::new();
    output.push_str(BOS);

    let mut i = 0;
    while i < conversation.len() {
        if conversation[i].role == "user" {
            i += format_turn(
                &mut output,
                &conversation,
                i,
                &system_prefix,
                add_generation_prompt,
            );
        } else {
            i += 1;
        }
    }

    output
}

/// Check whether the Mistral format contains a system prompt block.
///
/// Returns false because Mistral does not have a native system role.
fn has_native_system_support() -> bool {
    false
}

fn main() -> Result<()> {
    let mut ctx = RecipeContext::new("chat_mistral")?;

    // --- Section 1: Basic user message ---
    println!("=== Basic Format ===");

    let messages = vec![ChatMessage::new("user", "What is the APR format?")];
    let formatted = format_mistral(&messages, true);
    println!("Basic user message:\n{formatted}");

    assert!(formatted.starts_with(BOS), "Must start with BOS");
    assert!(formatted.contains(INST_START), "Must contain [INST]");
    assert!(formatted.contains(INST_END), "Must contain [/INST]");

    ctx.record_metric("basic_msg_bytes", formatted.len() as i64);

    // --- Section 2: Multi-turn conversation ---
    println!("\n=== Multi-Turn Conversation ===");

    let messages = vec![
        ChatMessage::new("user", "What is quantization?"),
        ChatMessage::new(
            "assistant",
            "Reducing numerical precision of model weights.",
        ),
        ChatMessage::new("user", "What about FP16?"),
    ];
    let formatted = format_mistral(&messages, true);
    println!("Multi-turn:\n{formatted}");

    let inst_count = formatted.matches(INST_START).count();
    println!("Number of [INST] blocks: {inst_count}");
    assert_eq!(inst_count, 2, "Two user messages = two [INST] blocks");

    // Only one BOS at the start (not per-turn like LLaMA 2)
    assert_eq!(
        formatted.matches(BOS).count(),
        1,
        "Mistral uses single BOS token"
    );

    ctx.record_metric("multi_turn_inst_blocks", inst_count as i64);

    // --- Section 3: No native system prompt ---
    println!("\n=== System Prompt Handling ===");

    println!("Native system support: {}", has_native_system_support());
    println!("Mistral prepends system instructions to the first user message.");

    let messages = vec![
        ChatMessage::new("system", "You are an ML expert."),
        ChatMessage::new("user", "Explain LZ4 compression."),
    ];
    let formatted = format_mistral(&messages, true);
    println!("System as prefix:\n{formatted}");

    assert!(
        !formatted.contains("<<SYS>>"),
        "Mistral must NOT use <<SYS>> block"
    );
    // System content should be present but inline with user message
    assert!(
        formatted.contains("You are an ML expert."),
        "System content must be included"
    );
    assert!(
        formatted.contains("Explain LZ4 compression."),
        "User content must be included"
    );

    ctx.record_string_metric("system_handling", "prepend_to_user");

    // --- Section 4: Comparison with LLaMA 2 ---
    println!("\n=== Format Comparison ===");

    let messages = vec![
        ChatMessage::new("user", "Hello!"),
        ChatMessage::new("assistant", "Hi!"),
        ChatMessage::new("user", "How are you?"),
    ];
    let mistral_out = format_mistral(&messages, true);

    println!("Mistral ({} bytes):\n{mistral_out}", mistral_out.len());
    println!("Key differences from LLaMA 2:");
    println!("  1. Single BOS token at start (not per-turn)");
    println!("  2. No <<SYS>> block");
    println!("  3. EOS directly after assistant (no trailing space)");
    println!("  4. Tighter format = fewer tokens");

    ctx.record_metric("comparison_bytes", mistral_out.len() as i64);

    // --- Section 5: Extended multi-turn ---
    println!("\n=== Extended Conversation ===");

    let messages = vec![
        ChatMessage::new("user", "Q1"),
        ChatMessage::new("assistant", "A1"),
        ChatMessage::new("user", "Q2"),
        ChatMessage::new("assistant", "A2"),
        ChatMessage::new("user", "Q3"),
    ];
    let formatted = format_mistral(&messages, true);
    println!("5-message conversation:\n{formatted}");

    assert_eq!(formatted.matches(INST_START).count(), 3, "Three user turns");
    assert_eq!(
        formatted.matches(EOS).count(),
        2,
        "Two completed assistant turns"
    );

    ctx.record_metric("extended_turn_count", 3);

    ctx.report()?;
    Ok(())
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_basic_user_message() {
        let messages = vec![ChatMessage::new("user", "Hello")];
        let formatted = format_mistral(&messages, false);
        assert_eq!(formatted, "<s>[INST] Hello [/INST]");
    }

    #[test]
    fn test_user_assistant_pair() {
        let messages = vec![
            ChatMessage::new("user", "Hi"),
            ChatMessage::new("assistant", "Hello!"),
        ];
        let formatted = format_mistral(&messages, false);
        assert_eq!(formatted, "<s>[INST] Hi [/INST] Hello!</s>");
    }

    #[test]
    fn test_multi_turn() {
        let messages = vec![
            ChatMessage::new("user", "Q1"),
            ChatMessage::new("assistant", "A1"),
            ChatMessage::new("user", "Q2"),
        ];
        let formatted = format_mistral(&messages, false);
        assert!(formatted.contains("A1</s>"));
        assert!(formatted.contains("[INST] Q2 [/INST]"));
    }

    #[test]
    fn test_single_bos_token() {
        let messages = vec![
            ChatMessage::new("user", "Q1"),
            ChatMessage::new("assistant", "A1"),
            ChatMessage::new("user", "Q2"),
        ];
        let formatted = format_mistral(&messages, false);
        assert_eq!(formatted.matches("<s>").count(), 1, "Only one BOS token");
    }

    #[test]
    fn test_no_native_system_support() {
        assert!(!has_native_system_support());
    }

    #[test]
    fn test_system_prepended_to_user() {
        let messages = vec![
            ChatMessage::new("system", "Be helpful."),
            ChatMessage::new("user", "Hi"),
        ];
        let formatted = format_mistral(&messages, false);
        // System should appear before user content within the same [INST] block
        let inst_content_start = formatted.find("[INST] ").expect("INST present") + 7;
        let inst_content_end = formatted.find(" [/INST]").expect("INST end");
        let inst_content = &formatted[inst_content_start..inst_content_end];
        assert!(
            inst_content.starts_with("Be helpful."),
            "System prefix first"
        );
        assert!(inst_content.contains("Hi"), "User content follows");
    }

    #[test]
    fn test_no_sys_delimiters() {
        let messages = vec![
            ChatMessage::new("system", "Sys"),
            ChatMessage::new("user", "Usr"),
        ];
        let formatted = format_mistral(&messages, false);
        assert!(!formatted.contains("<<SYS>>"));
        assert!(!formatted.contains("<</SYS>>"));
    }

    #[test]
    fn test_generation_prompt() {
        let messages = vec![ChatMessage::new("user", "Test")];
        let with = format_mistral(&messages, true);
        let without = format_mistral(&messages, false);
        assert!(with.len() >= without.len());
    }

    #[test]
    fn test_empty_messages() {
        let formatted = format_mistral(&[], false);
        assert!(formatted.is_empty());
    }

    #[test]
    fn test_format_deterministic() {
        let messages = vec![
            ChatMessage::new("user", "Q"),
            ChatMessage::new("assistant", "A"),
        ];
        let a = format_mistral(&messages, true);
        let b = format_mistral(&messages, true);
        assert_eq!(a, b);
    }

    #[test]
    fn test_eos_after_each_assistant() {
        let messages = vec![
            ChatMessage::new("user", "Q1"),
            ChatMessage::new("assistant", "A1"),
            ChatMessage::new("user", "Q2"),
            ChatMessage::new("assistant", "A2"),
        ];
        let formatted = format_mistral(&messages, false);
        assert_eq!(formatted.matches("</s>").count(), 2);
    }

    #[test]
    fn test_extended_conversation_structure() {
        let messages = vec![
            ChatMessage::new("user", "Q1"),
            ChatMessage::new("assistant", "A1"),
            ChatMessage::new("user", "Q2"),
            ChatMessage::new("assistant", "A2"),
            ChatMessage::new("user", "Q3"),
        ];
        let formatted = format_mistral(&messages, false);
        assert_eq!(formatted.matches("[INST]").count(), 3);
        assert_eq!(formatted.matches("[/INST]").count(), 3);
    }
}

Source

examples/chat/chat_mistral.rs