Shell Safety Classifier Training

Trains an MLP classifier to predict shell script safety using the bashrs corpus (17,942 entries) merged with adversarial data (~8,000 entries for minority classes).

The model classifies scripts into 5 safety categories:

  • safe: passes all checks (lint, deterministic, idempotent)
  • needs-quoting: variable quoting issues
  • non-deterministic: contains $RANDOM, $$, timestamps
  • non-idempotent: missing -p/-f flags
  • unsafe: security rule violations

Prerequisites

# Export corpus + generate adversarial data
cd /path/to/bashrs
cargo run -- corpus export-dataset --format classification -o /tmp/corpus.jsonl
cargo run -- generate-adversarial --verify -o /tmp/adversarial.jsonl
{ cat /tmp/corpus.jsonl; echo; cat /tmp/adversarial.jsonl; } > /tmp/combined.jsonl

Run

cargo run --example shell_safety_training -- /tmp/combined.jsonl

Source

// Run this example:
//   cargo run --example shell_safety_training
//
// See the CLI reference and source code in crates/ for implementation details.