Chapter 46: Rust Best Practices (CB-500 to CB-530)

The CB-500 series detects generic Rust defect patterns that apply to any Rust project. These checks were motivated by cross-stack fault analysis of 10 batuta projects that revealed systematic gaps: extreme unwrap density (14.7/file in trueno-rag), missing clippy/deny configurations (5/10 projects), string byte indexing panics on non-ASCII input, and universally low Rust Tooling scores (<55%).

Overview

# Run all compliance checks including CB-500 series
pmat comply check

# Example output:
# ⚠ CB-500: Rust Best Practices (CB-500 to CB-527): [Advisory] 0 errors, 189 warnings, 160 info:
# CB-506: String byte indexing (&str[n..m]) can panic on non-ASCII input (src/lib.rs:214)
# CB-501: 8 unwrap() calls in production code (threshold: 5) (src/parser.rs:0)
# ...

The CB-500 series is advisory — it reports with Warn status but does not block CI or commits. Violations are categorized into three severity tiers:

Severity	Meaning	Example
Error	Likely defect in production	>10 unwrap() per file
Warning	Code smell, should fix	String byte indexing, panic macros
Info	Suggestion, low priority	Missing clippy.toml, no deny.toml

Defect Taxonomy

Project Configuration (CB-500, CB-503, CB-504, CB-505)

ID	Check	Severity	What it detects
CB-500	Publish Hygiene	Warning	Missing `exclude` in Cargo.toml
CB-503	Clippy Configuration	Info	Missing `.clippy.toml` or no `disallowed-methods`
CB-504	Deny Configuration	Info	Missing `deny.toml` for supply chain security
CB-505	Workspace Lint Hygiene	Warning	Missing `[lints]` or `[workspace.lints]` section

Code Quality (CB-501, CB-502, CB-506, CB-507, CB-508)

ID	Check	Severity	What it detects
CB-501	Unwrap Density	Warning/Error	>5 (Warn) or >10 (Error) `unwrap()` per file
CB-502	Expect Quality	Warning	`.expect("")`, `.expect("failed")` — lazy messages
CB-506	String Byte Indexing	Warning	`&str[n..m]` can panic on non-ASCII input
CB-507	Panic Macros	Warning	`todo!()`, `unimplemented!()` in production code
CB-508	Lossy Numeric Casts	Warning	>10 `as u8`/`as i32`/etc. casts per file

Testing & Architecture (CB-509, CB-510, CB-511, CB-512)

ID	Check	Severity	What it detects
CB-509	Feature Gate Coverage	Info	Features defined but no CI matrix testing
CB-510	include!() Macro Hygiene	Info	Non-standalone files included via `include!()`
CB-511	Flaky Timing Tests	Warning	`Instant::now()` with duration assertions in tests
CB-512	Error Propagation Gap	Warning	Functions returning `Result` but using `unwrap()` internally

Error Handling & Debug Hygiene (CB-513, CB-514, CB-517)

ID	Check	Severity	What it detects
CB-513	Silent Error Swallowing	Warning	`.unwrap_or_else(\|_\|` and `.map_err(\|_\|` discarding error context
CB-514	Debug Eprintln Leaks	Warning	`eprintln!("[DEBUG`/`[DBG`/`[TRACE` in production code
CB-517	Stale Debug Artifacts	Warning	`static AtomicUsize`/`AtomicBool` counters, `#[allow(unused)]` on statics

Pattern Safety (CB-515, CB-516, CB-518)

ID	Check	Severity	What it detects
CB-515	Catch-All Match Default	Warning	`_ =>` returning concrete values instead of errors
CB-516	Hardcoded Magic Numbers	Info	Large numeric literals in `Some()` or struct field contexts
CB-518	Expensive Clone in Loop	Info	>3 `.clone()` calls inside `for`/`while`/`loop` bodies

Data Pipeline & Format Safety (CB-519, CB-520, CB-521)

ID	Check	Severity	What it detects
CB-519	Lossy Data Pipeline	Warning	quantize/dequantize or encode/decode round-trips in same function
CB-520	Expensive Init in Hot Path	Warning	`::new()`/`::open()`/`::connect()` calls inside loop bodies
CB-521	Format Without Magic Bytes	Warning	Binary format parsing without magic byte/header validation

Robustness & Compatibility (CB-522, CB-523, CB-524, CB-525, CB-526, CB-527)

ID	Check	Severity	What it detects
CB-522	Untested Path Normalization	Info	3+ URL/path manipulation ops without edge case coverage
CB-523	External Config Over Embedded	Info	Filesystem heuristics for config discovery instead of embedded metadata
CB-524	Incomplete Enum Match Coverage	Warning	3+ wildcard match arms with concrete defaults in one file
CB-525	Hardcoded Field Names	Info	5+ `.get("field")` calls without `.or_else()` alias fallbacks
CB-526	Single-Path File Resolution	Info	`path.join("file").exists()` without parent/recursive fallback
CB-527	Incomplete Pattern List	Info	3+ `.contains()` classification chain — may miss variants

Numerical Safety (CB-528, CB-530)

ID	Check	Severity	What it detects
CB-528	Division by Length	Warning	`x / collection.len()` without `is_empty()` or `.max(1)` guard
CB-530	Log Without Clamp	Warning	`.ln()`, `.log2()`, `.log10()` without `.max(epsilon)` or `.clamp()` guard

Detection Algorithms

CB-500: Publish Hygiene

Checks Cargo.toml for the exclude field that prevents publishing unnecessary files to crates.io:

# ✅ Good: Critical patterns excluded
[package]
exclude = [
    "target/",
    ".profraw",
    ".profdata",
    ".vscode/",
    ".idea/",
    ".pmat",
    "proptest-regressions",
]

# ❌ Bad: No exclude field - publishes everything
[package]
name = "my-crate"
version = "0.1.0"

Three sub-checks:

Missing exclude: If neither exclude nor include is present → Warning
Include+Exclude conflict: If both are present → Warning (Cargo ignores exclude when include is set)
Insufficient patterns: If exclude exists but covers <3 of 7 critical patterns → Info

CB-501: Unwrap Density

Counts .unwrap() calls per file in production code, excluding test files and #[cfg(test)] regions:

#![allow(unused)]
fn main() {
// ❌ High density (CB-501 Warning at >5, Error at >10):
fn process(data: &str) -> String {
    let parsed = serde_json::from_str(data).unwrap();
    let field = parsed.get("key").unwrap();
    let value = field.as_str().unwrap();
    let num = value.parse::<i32>().unwrap();
    let result = compute(num).unwrap();
    format_output(result).unwrap()
}

// ✅ Better: Use ? operator or contextual errors
fn process(data: &str) -> Result<String, Error> {
    let parsed: Value = serde_json::from_str(data)?;
    let field = parsed.get("key").ok_or(Error::MissingField("key"))?;
    let value = field.as_str().ok_or(Error::TypeMismatch)?;
    let num: i32 = value.parse().map_err(Error::Parse)?;
    let result = compute(num)?;
    Ok(format_output(result))
}
}

CB-502: Expect Quality

Detects lazy or uninformative .expect() messages. A good expect message explains why the invariant should hold, not just that it failed:

#![allow(unused)]
fn main() {
// ❌ Lazy messages detected by CB-502:
let config = load_config().expect("");
let handle = open_file().expect("failed");
let conn = connect().expect("error");
let val = parse().expect("unexpected");
let item = lookup().expect("should not happen");

// ✅ Informative messages:
let config = load_config().expect("config.toml must exist in project root");
let handle = open_file().expect("log file was verified writable in init()");
let conn = connect().expect("database URL validated at startup");
}

Flagged patterns: "", "failed", "error", "unexpected", "should not happen", "todo", "bug", "impossible".

CB-506: String Byte Indexing

Detects &str[n..m] patterns that panic on multi-byte UTF-8 input:

#![allow(unused)]
fn main() {
// ❌ Panics on non-ASCII (CB-506):
let prefix = &name[..3];
let suffix = &text[start..end];

// ✅ Safe alternatives:
let prefix = name.get(..3).unwrap_or(name);           // Returns None on boundary
let prefix = &name.chars().take(3).collect::<String>(); // Character-aware
let suffix = text.get(start..end).unwrap_or_default();  // Safe fallback
}

Uses regex &\w+\[\d*\.\.\d*\] to detect the pattern. Skips test code and comments.

CB-507: Panic Macros

Detects todo!() and unimplemented!() in production code. These are useful during development but should be replaced before release:

#![allow(unused)]
fn main() {
// ❌ Panics at runtime (CB-507):
fn handle_edge_case(&self) -> Result<()> {
    todo!()
}

fn serialize_v2(&self) -> Vec<u8> {
    unimplemented!()
}

// ✅ Proper handling:
fn handle_edge_case(&self) -> Result<()> {
    Err(Error::NotSupported("edge case handling"))
}

fn serialize_v2(&self) -> Vec<u8> {
    self.serialize_v1()  // Fallback to v1
}
}

The detector skips macros that appear inside string literals (e.g., "todo!() is a macro").

CB-508: Lossy Numeric Casts

Detects files with >10 as casts to narrower types without bounds checking:

#![allow(unused)]
fn main() {
// ❌ Lossy casts (CB-508):
let byte = large_number as u8;      // Silently truncates
let small = big_float as f32;       // Loses precision
let signed = unsigned_val as i32;   // Can overflow

// ✅ Checked alternatives:
let byte = u8::try_from(large_number)?;
let small: f32 = big_float as f32;  // With #[allow(clippy::cast_possible_truncation)]
let signed = i32::try_from(unsigned_val).unwrap_or(i32::MAX);
}

Lines with allow(clippy::cast annotations are excluded from the count.

CB-509: Feature Gate Coverage

Projects with >3 features in Cargo.toml should have CI matrix testing to ensure all feature combinations compile:

# ✅ Good: CI tests feature combinations
jobs:
  test:
    strategy:
      matrix:
        features: ["default", "full", "minimal", "no-std"]
    steps:
      - run: cargo test --features ${{ matrix.features }}

CB-510: include!() Macro Hygiene

Flags include!() macro usage because included files are not standalone compilable — they cannot be analyzed by tree-sitter, cause false positives in complexity gates, and break IDE tooling:

#![allow(unused)]
fn main() {
// ⚠ CB-510 Info:
include!("helpers/parse_utils.rs");  // Not standalone compilable

// ✅ Better: Use modules
mod parse_utils;  // Standard module system
}

CB-511: Flaky Timing Tests

Detects tests that use Instant::now() with duration assertions, which are inherently flaky under CI load:

#![allow(unused)]
fn main() {
// ❌ Flaky under CI load (CB-511):
#[test]
fn test_cache_performance() {
    let start = Instant::now();
    cache.lookup("key");
    assert!(start.elapsed() < Duration::from_millis(10));  // Fails on slow CI
}

// ✅ Test behavior, not timing:
#[test]
fn test_cache_hit() {
    cache.insert("key", "value");
    assert_eq!(cache.lookup("key"), Some("value"));
}
}

CB-512: Error Propagation Gap

Detects functions that return Result<T, E> but use .unwrap() >=3 times internally — a sign that error handling is incomplete:

#![allow(unused)]
fn main() {
// ❌ Returns Result but unwraps internally (CB-512):
fn parse_config(path: &Path) -> Result<Config, Error> {
    let content = fs::read_to_string(path)?;
    let parsed = toml::from_str(&content).unwrap();        // Why not ?
    let name = parsed.get("name").unwrap().as_str().unwrap(); // Two more unwraps
    Ok(Config { name: name.to_string() })
}

// ✅ Consistent error propagation:
fn parse_config(path: &Path) -> Result<Config, Error> {
    let content = fs::read_to_string(path)?;
    let parsed: toml::Value = toml::from_str(&content)?;
    let name = parsed.get("name")
        .and_then(|v| v.as_str())
        .ok_or(Error::MissingField("name"))?;
    Ok(Config { name: name.to_string() })
}
}

CB-513: Silent Error Swallowing

Detects patterns where errors are intentionally discarded, hiding failure context. Motivated by GH-215 where silent error swallowing in quantization hid data corruption:

#![allow(unused)]
fn main() {
// ❌ Discards error context (CB-513):
let config = load_config().unwrap_or_else(|_| Config::default());
let data = parse(input).map_err(|_| MyError::ParseFailed)?;

// ✅ Preserve error context:
let config = load_config().unwrap_or_else(|e| {
    tracing::warn!("config load failed: {e}, using defaults");
    Config::default()
});
let data = parse(input).map_err(|e| MyError::ParseFailed { source: e })?;
}

The |_| closure parameter is the signal — it means the original error is being intentionally thrown away.

CB-514: Debug Eprintln Leaks

Detects debug print statements left in production code. These leak internal state to stderr and indicate incomplete cleanup after debugging sessions:

#![allow(unused)]
fn main() {
// ❌ Debug output in production (CB-514):
eprintln!("[DEBUG] parsing token: {:?}", token);
eprintln!("[TRACE] entering function with state={}", state);
eprintln!("[DBG] cache size: {}", cache.len());

// ✅ Use structured logging:
tracing::debug!(?token, "parsing token");
tracing::trace!(state, "entering function");
log::debug!("cache size: {}", cache.len());
}

CB-515: Catch-All Match Default

Detects _ => match arms that return a concrete value instead of an error, None, or unreachable!(). Motivated by GH-236 where _ => Architecture::Qwen2 caused all unknown model architectures to silently receive wrong configuration:

#![allow(unused)]
fn main() {
// ❌ Silent default (CB-515):
fn get_architecture(name: &str) -> Architecture {
    match name {
        "gpt" => Architecture::Gpt,
        "llama" => Architecture::Llama,
        _ => Architecture::Qwen2,  // All unknowns become Qwen2!
    }
}

// ✅ Explicit error on unknown:
fn get_architecture(name: &str) -> Result<Architecture, Error> {
    match name {
        "gpt" => Ok(Architecture::Gpt),
        "llama" => Ok(Architecture::Llama),
        _ => Err(Error::UnknownArchitecture(name.to_string())),
    }
}
}

Safe patterns that are not flagged: Err(...), None, unreachable!(), panic!(), return Err(...), Default::default(), bail!(), todo!().

CB-516: Hardcoded Magic Numbers

Detects large numeric literals (>100) in Some() or struct field contexts that likely represent configuration defaults. Motivated by GH-231 where a hardcoded rope_theta: Some(10000.0) default produced garbage output for models requiring different values:

#![allow(unused)]
fn main() {
// ❌ Hardcoded config defaults (CB-516 Info):
Config {
    rope_theta: Some(10000.0),  // Wrong for Qwen2 (uses 1000000.0)
    max_seq_len: Some(4096),
}

// ✅ Named constants with documentation:
const DEFAULT_ROPE_THETA: f64 = 10000.0;
const DEFAULT_MAX_SEQ_LEN: usize = 4096;

Config {
    rope_theta: Some(DEFAULT_ROPE_THETA),
    max_seq_len: Some(DEFAULT_MAX_SEQ_LEN),
}
}

This is Info severity — advisory only with expected false positives. Common values (128, 256, 512, 1024, etc.) are excluded.

CB-517: Stale Debug Artifacts

Detects leftover debug instrumentation in production code — static atomic counters and #[allow(unused)] annotations on static variables that were used during debugging and not cleaned up:

#![allow(unused)]
fn main() {
// ❌ Leftover debug counter (CB-517):
static DEBUG_COUNTER: AtomicUsize = AtomicUsize::new(0);
fn process() {
    DEBUG_COUNTER.fetch_add(1, Ordering::Relaxed);
}

// ❌ Suppressed unused static (CB-517):
#[allow(unused)]
static TRACE_ENABLED: bool = false;

// ✅ Remove debug artifacts before committing, or use proper instrumentation:
fn process() {
    metrics::counter!("process_calls").increment(1);
}
}

CB-518: Expensive Clone in Loop

Detects loop bodies with >3 .clone() calls, which often indicate that data should be borrowed or restructured to avoid repeated allocation:

#![allow(unused)]
fn main() {
// ❌ Excessive cloning in loop (CB-518):
for item in &items {
    let name = config.name.clone();
    let path = config.path.clone();
    let data = config.data.clone();
    let meta = config.meta.clone();
    process(item, &name, &path, &data, &meta);
}

// ✅ Clone once before the loop, or borrow:
let name = &config.name;
let path = &config.path;
for item in &items {
    process(item, name, path, &config.data, &config.meta);
}
}

This is Info severity — advisory, as some clones are necessary (e.g., sending data across threads).

CB-519: Lossy Data Pipeline

Detects functions containing both halves of a lossy transform pair (e.g., quantize + dequantize), which indicates data being round-tripped through lossy operations. Motivated by aprender GH-215/231/234/237 where GGUF export double-quantized attention weights:

#![allow(unused)]
fn main() {
// ❌ Lossy round-trip in same function (CB-519):
fn convert_tensor(data: &[f32]) -> Vec<u8> {
    let quantized = quantize_q4(data);       // f32 → q4 (lossy)
    let dequantized = dequantize_q4(&quantized); // q4 → f32 (lossy again!)
    pack_bytes(&dequantized)
}

// ✅ Single direction only:
fn export_tensor(data: &[f32]) -> Vec<u8> {
    let quantized = quantize_q4(data);
    pack_bytes(&quantized)
}
}

Detected pairs: quantize/dequantize, encode/decode, compress/decompress, serialize/deserialize, pack/unpack, to_bytes/from_bytes, to_f16/to_f32, to_bf16/to_f32.

CB-520: Expensive Init in Hot Path

Detects constructor/load/open calls inside loop bodies where the initialization could be hoisted. Motivated by aprender GH-224 where ChatSession recreated GPU models (5-6GB VRAM upload) on every generate() call:

#![allow(unused)]
fn main() {
// ❌ Expensive init in loop (CB-520):
for item in &items {
    let client = HttpClient::new(config);  // Re-created every iteration
    let conn = Database::connect("url");   // Re-connected every iteration
    client.send(item);
}

// ✅ Hoist initialization:
let client = HttpClient::new(config);
let conn = Database::connect("url");
for item in &items {
    client.send(item);
}
}

Detected patterns: ::new(), ::open(), ::connect(), ::create(), ::load(), ::init(), ::build(), ::from_file(), ::from_path(), File::open(). Threshold: 2+ init calls per loop body.

CB-521: Format Detection Without Magic Bytes

Detects binary format parsing functions that read binary data without validating magic bytes or format signatures. Motivated by aprender GH-213 where .safetensors.index.json was parsed as binary SafeTensors:

#![allow(unused)]
fn main() {
// ❌ No magic byte validation (CB-521):
fn parse_file(reader: &mut impl Read) -> Result<Header, Error> {
    let mut buf = [0u8; 8];
    reader.read_exact(&mut buf)?;  // Assumes format is correct
    let size = u64::from_le_bytes(buf);
    Ok(Header { size })
}

// ✅ Validate magic bytes first:
fn parse_file(reader: &mut impl Read) -> Result<Header, Error> {
    let mut magic = [0u8; 4];
    reader.read_exact(&mut magic)?;
    if &magic != FILE_MAGIC {
        return Err(Error::InvalidFormat);
    }
    let mut buf = [0u8; 8];
    reader.read_exact(&mut buf)?;
    Ok(Header { size: u64::from_le_bytes(buf) })
}
}

CB-522: Untested Path Normalization

Detects files with 3+ URL/path manipulation operations (.replace("//"), .strip_prefix("http"), Url::parse(), etc.) which indicates complex normalization logic that needs edge case testing. Motivated by GH-221 where hf:// URLs preserved web path components as file paths.

CB-523: External Config Over Embedded Metadata

Detects filesystem heuristic patterns like path.with_file_name("config.json") that discover configuration from sibling files when the loaded data may already contain embedded metadata. Motivated by GH-222 where apr chat used sibling config.json instead of APR v2 embedded metadata.

CB-524: Incomplete Enum Match Coverage

Detects files with 3+ _ => wildcard match arms returning concrete values, indicating an enum is dispatched on inconsistently across multiple functions. Motivated by GH-233/236 where adding new Architecture variants required updating match arms in 5+ places. This is a code smell — consider #[non_exhaustive] enums or centralizing dispatch logic.

CB-525: Hardcoded Field Names Without Aliases

Detects functions with 5+ .get("field") calls on JSON values without .or_else() fallback aliases. Motivated by GH-235 where load_model_config_from_json only handled HuggingFace field names (hidden_size) but not GPT-2 names (n_embd).

CB-526: Single-Path File Resolution

Detects path.join("filename").exists() patterns without fallback search (parent directory, recursive discovery). Motivated by GH-216 where tokenizer.json wasn’t found in workspace layouts.

CB-527: Incomplete Pattern List for Classification

Detects .contains("x") || .contains("y") || ... chains with 3+ patterns, which are typically data classification logic that may miss variants. Motivated by GH-233B/234 where Rosetta density threshold only recognized "embed" but not "wte", "wpe", "position_embedding". Consider centralizing patterns in a constant array or registry.

CB-528: Division by Length Without Empty Guard

Detects x / collection.len() without a preceding is_empty() or len() > 0 guard. In ML and numerical code, dividing by the length of an empty collection causes division-by-zero: panic for integers, Inf/NaN for floats. Motivated by cross-stack analysis where mean calculations over empty batches silently produced NaN that propagated through training losses.

#![allow(unused)]
fn main() {
// ❌ Division by zero on empty input (CB-528):
fn compute_mean(values: &[f64]) -> f64 {
    let sum: f64 = values.iter().sum();
    sum / values.len() as f64  // NaN when values is empty
}

fn average_batch(batch: &[Tensor]) -> Tensor {
    let total = batch.iter().fold(Tensor::zeros(), |a, b| a + b);
    total / batch.len() as f32  // Inf when batch is empty
}

// ✅ Guarded alternatives:
fn compute_mean(values: &[f64]) -> f64 {
    if values.is_empty() {
        return 0.0;
    }
    let sum: f64 = values.iter().sum();
    sum / values.len() as f64
}

fn compute_mean_alt(values: &[f64]) -> f64 {
    let sum: f64 = values.iter().sum();
    sum / values.len().max(1) as f64  // .max(1) prevents zero denominator
}
}

The detector looks back up to 8 lines for guard patterns: is_empty(), .len() > 0, .len() >= 1, .len() != 0, .len() == 0, and .max(1) on the same line.

CB-530: Log Without Clamp Guard

Detects .ln(), .log2(), .log10() calls without a preceding .max(epsilon) or .clamp() guard. Passing zero or negative values to log functions produces -Inf or NaN, which silently corrupts ML training losses, probability calculations, and information-theoretic metrics. Discovered during 3.4.0 dogfooding where trueno’s scalar backend had 3 unguarded log calls.

#![allow(unused)]
fn main() {
// ❌ Risk of -Inf/NaN (CB-530):
fn cross_entropy(predicted: &[f64], actual: &[f64]) -> f64 {
    predicted.iter().zip(actual).map(|(p, a)| {
        -a * p.ln()  // -Inf when p == 0.0
    }).sum()
}

fn information_content(probability: f64) -> f64 {
    -probability.log2()  // NaN when probability < 0.0
}

fn signal_magnitude(value: f64) -> f64 {
    value.log10()  // -Inf when value == 0.0
}

// ✅ Clamped alternatives:
fn cross_entropy(predicted: &[f64], actual: &[f64]) -> f64 {
    predicted.iter().zip(actual).map(|(p, a)| {
        -a * p.max(1e-10).ln()  // Clamped to epsilon
    }).sum()
}

fn information_content(probability: f64) -> f64 {
    -probability.clamp(f64::EPSILON, 1.0).log2()
}

fn signal_magnitude(value: f64) -> f64 {
    value.max(f64::EPSILON).log10()
}
}

The detector recognizes these safe patterns and does not flag them:

.max(epsilon).ln() — epsilon guard on same expression
.clamp(low, high).ln() — range clamp before log
(1.0 + x).ln() — log of sum with positive constant (always > 0)
2.0_f64.ln() — log of known positive literal
Variable guarded within 3 lines: let x = val.max(1e-10); then x.ln()

Test Code Exclusion

All file-scanning checks (CB-501, CB-502, CB-506–CB-508, CB-512–CB-528, CB-530) exclude test code using two mechanisms:

Test file exclusion: Files matching *_test.rs, *_tests.rs, or under a tests/ directory
Test region exclusion: Code inside #[cfg(test)] module blocks within production files

This prevents false positives from test code where .unwrap() and todo!() are acceptable.

Self-Detection Avoidance

The detection code itself uses concat!() to avoid self-detection:

#![allow(unused)]
fn main() {
// The scanner uses split strings to avoid matching itself:
const DOT_UNWRAP: &str = concat!(".unwr", "ap()");
const DOT_EXPECT_QUOTE: &str = concat!(".expe", "ct(\"");
}

Remediation Priority

When pmat comply check reports CB-500 violations, fix them in this priority order:

CB-501 Errors (>10 unwrap/file) — highest crash risk
CB-530 — log without clamp produces -Inf/NaN that silently corrupts ML losses and metrics
CB-519 — lossy data pipeline round-trips corrupt model weights (GH-215/231/237)
CB-528 — division by .len() without empty guard causes panic (integers) or NaN (floats)
CB-521 — binary format parsing without magic bytes causes crashes (GH-213)
CB-515, CB-524 — catch-all match arms / incomplete enum coverage (GH-236)
CB-513 — silent error swallowing hiding data corruption (GH-215)
CB-520 — expensive initialization in hot path (GH-224)
CB-512 — functions claiming error handling but not doing it
CB-506 — string indexing panics on internationalized input
CB-507 — todo!/unimplemented! left in production
CB-514, CB-517 — debug artifacts leaked to production
CB-525 — hardcoded field names without aliases (GH-235)
CB-502 — lazy expect messages hide root cause during debugging
CB-508 — lossy casts cause silent data corruption
CB-500, CB-505 — project configuration hygiene
CB-503, CB-504, CB-509–CB-511, CB-516, CB-518, CB-522, CB-523, CB-526, CB-527 — informational, fix at leisure

CI/CD Integration

# .github/workflows/rust-best-practices.yml
name: Rust Best Practices
on: [push, pull_request]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Install PMAT
        run: cargo install pmat
      - name: Check Rust Best Practices
        run: |
          OUTPUT=$(pmat comply check 2>&1)
          echo "$OUTPUT"
          # Fail on Error-severity violations
          if echo "$OUTPUT" | grep -q "CB-500.*errors: [1-9]"; then
            echo "::error::CB-500 series has Error-severity violations"
            exit 1
          fi

Academic Foundations

The CB-500 checks are grounded in empirical research on Rust defect patterns:

Paper	Finding	Applied To
Xu et al. (2021). “Memory-Safety Challenge Considered Solved?”	30% of Rust CVEs involve unwrap/expect panics	CB-501, CB-502, CB-512
Qin et al. (2020). “Understanding Memory and Thread Safety Practices”	Unsafe patterns cluster in specific files	CB-507, CB-508
Evans et al. (2020). “Is Rust Used Safely?”	String boundary panics in 18% of crates	CB-506
Zhu et al. (2022). “Learning and Programming Challenges of Rust”	Feature flag complexity is top-5 pain point	CB-509

Specification Reference

Full detection logic: src/cli/handlers/comply_cb_detect/rust_best_practices.rs (CB-500–CB-521) and src/cli/handlers/comply_cb_detect/rust_best_practices_extended.rs (CB-522–CB-530) Aggregate check: src/cli/handlers/comply_cb_detect/check_handlers.rs (check_rust_best_practices)

PMAT: The PAIML MCP Agent Toolkit