Case Study: AI Shell Completion

Train a personalized autocomplete on your shell history in 5 seconds. 100% local, private, fast.

Quick Start

# Install
cargo install --path crates/aprender-shell

# Train on your history
aprender-shell train

# Test
aprender-shell suggest "git "

How It Works

~/.zsh_history → Parser → N-gram Model → Trie Index → Suggestions
     │                         │              │
  21,729 cmds            40,848 n-grams    <1ms lookup

Algorithm: Markov chain with trigram context + prefix trie for O(1) lookup.

Training

$ aprender-shell train

🚀 aprender-shell: Training model...

📂 History file: /home/user/.zsh_history
📊 Commands loaded: 21729
🧠 Training 3-gram model... done!

✅ Model saved to: ~/.aprender-shell.model

📈 Model Statistics:
   Unique n-grams: 40848
   Vocabulary size: 16100
   Model size: 2016.4 KB

Suggestions

$ aprender-shell suggest "git "
git commit    0.505
git clone     0.065
git add       0.059
git push      0.035
git checkout  0.031

$ aprender-shell suggest "cargo "
cargo run      0.413
cargo install  0.069
cargo test     0.059
cargo clippy   0.045

Scores are frequency-based probabilities from your actual usage.

Incremental Updates

Don't retrain from scratch—append new commands:

$ aprender-shell update
📊 Found 15 new commands
✅ Model updated (21744 total commands)

$ aprender-shell update
✓ Model is up to date (no new commands)

Performance:

0ms when no new commands
~10ms per 100 new commands
Tracks position in history file

ZSH Integration

Generate the widget:

aprender-shell zsh-widget >> ~/.zshrc
source ~/.zshrc

This adds:

Ghost text suggestions as you type (gray)
Tab or Right Arrow to accept
Updates on every keystroke

Auto-Retrain

# Add to ~/.zshrc

# Option 1: Update after every command (~10ms)
precmd() { aprender-shell update -q & }

# Option 2: Update on shell exit
zshexit() { aprender-shell update -q }

Model Statistics

$ aprender-shell stats

📊 Model Statistics:
   N-gram size: 3
   Unique n-grams: 40848
   Vocabulary size: 16100
   Model size: 2016.4 KB

🔝 Top commands:
    340x  git status
    245x  cargo build
    198x  cd ..

Memory Paging for Large Histories

For very large shell histories (100K+ commands), use memory paging to limit RAM usage:

# Train with 10MB memory limit (creates .apbundle file)
$ aprender-shell train --memory-limit 10

🚀 aprender-shell: Training paged model...

📂 History file: /home/user/.zsh_history
📊 Commands loaded: 150000
🧠 Training 3-gram paged model (10MB limit)... done!

✅ Paged model saved to: ~/.aprender-shell.apbundle

📈 Model Statistics:
   Segments:        45
   Vocabulary size: 35000
   Memory limit:    10 MB

# Suggestions with paged loading
$ aprender-shell suggest "git " --memory-limit 10

# View paging statistics
$ aprender-shell stats --memory-limit 10

📊 Paged Model Statistics:
   N-gram size:     3
   Total commands:  150000
   Vocabulary size: 35000
   Total segments:  45
   Loaded segments: 3
   Memory limit:    10.0 MB

📈 Paging Statistics:
   Page hits:       127
   Page misses:     3
   Evictions:       0
   Hit rate:        97.7%

How it works:

N-grams are grouped by command prefix (e.g., "git", "cargo")
Segments are stored in .apbundle format
Only accessed segments are loaded into RAM
LRU eviction frees memory when limit is reached

See Model Bundling and Memory Paging for details.

Export your model for teammates:

# Export
aprender-shell export -m ~/.aprender-shell.model team-model.json

# Import (on another machine)
aprender-shell import team-model.json

Use case: Share team-specific command patterns (deployment scripts, project aliases).

Privacy & Security

Filtered automatically:

Commands containing password, secret, token, API_KEY
AWS credentials, GitHub tokens
History manipulation commands (history, fc)

100% local:

No network requests
No telemetry
Model stays on your machine

Architecture

crates/aprender-shell/
├── src/
│   ├── main.rs      # CLI (clap)
│   ├── history.rs   # ZSH/Bash/Fish parser
│   ├── model.rs     # Markov n-gram model
│   └── trie.rs      # Prefix index

History Parser

Handles multiple formats:

// ZSH extended: ": 1699900000:0;git status"
// Bash plain: "git status"
// Fish: "- cmd: git status"

N-gram Model

Trigram Markov chain:

Context         → Next Token (count)
""              → "git" (340), "cargo" (245), "cd" (198)
"git"           → "commit" (89), "push" (45), "status" (340)
"git commit"    → "-m" (67), "--amend" (12)

Trie Index

O(k) prefix lookup where k = prefix length:

g─i─t─ ─s─t─a─t─u─s (count: 340)
      └─c─o─m─m─i─t (count: 89)
      └─p─u─s─h     (count: 45)

Performance: Sub-10ms Verification

Shell completion must feel instantaneous. Nielsen's research shows:

< 100ms: Perceived as instant
< 10ms: No perceptible delay (ideal)
100ms: Noticeable lag, poor UX

aprender-shell achieves microsecond latency—600-22,000x faster than required.

Benchmark Results

Run the benchmarks yourself:

cargo bench --package aprender-shell --bench recommendation_latency

Suggestion Latency by Model Size

Model Size	Commands	Prefix	Latency	vs 10ms Target
Small	50	kubectl	437 ns	22,883x faster
Small	50	npm	530 ns	18,868x faster
Small	50	docker	659 ns	15,174x faster
Small	50	cargo	725 ns	13,793x faster
Small	50	git	1.54 µs	6,493x faster
Medium	500	npm	1.78 µs	5,618x faster
Medium	500	docker	3.97 µs	2,519x faster
Medium	500	cargo	6.53 µs	1,532x faster
Medium	500	git	10.6 µs	943x faster
Large	5000	npm	671 ns	14,903x faster
Large	5000	docker	7.96 µs	1,256x faster
Large	5000	kubectl	12.3 µs	813x faster
Large	5000	git	14.6 µs	685x faster

Key insight: Even with 5,000 commands in history, worst-case latency is 14.6 µs (0.0146 ms).

Industry Comparison

System	Typical Latency	aprender-shell Speedup
GitHub Copilot	100-500ms	10,000-50,000x faster
Fish shell completion	5-20ms	500-2,000x faster
Zsh compinit	10-50ms	1,000-5,000x faster
Bash completion	20-100ms	2,000-10,000x faster

Why So Fast?

O(1) Trie Lookup: Prefix search is O(k) where k = prefix length, not O(n)
In-Memory Model: No disk I/O during suggestions
Simple Data Structures: HashMap + Trie, no neural network overhead
Zero Allocations: Hot path avoids heap allocations

Benchmark Suite

The recommendation_latency benchmark includes:

Group	What It Measures
`suggestion_latency`	Core latency by model size (primary metric)
`partial_completion`	Mid-word completion ("git co" → "git commit")
`training_throughput`	Commands processed per second during training
`cold_start`	Model load + first suggestion latency
`serialization`	JSON serialize/deserialize performance
`scalability`	Latency growth with model size
`paged_model`	Memory-constrained model performance

Why N-gram Beats Neural

For shell completion:

Factor	N-gram	Neural (RNN/Transformer)
Training time	<1s	Minutes
Inference	<15µs	10-50ms
Model size	2MB	50MB+
Accuracy on shell	70%+	75%+
Cold start	Instant	GPU warmup

Shell commands are repetitive patterns. N-gram captures this perfectly.

CLI Reference

aprender-shell <COMMAND>

Commands:
  train        Full retrain from history
  update       Incremental update (fast)
  suggest      Get completions for prefix (-c/-k for count)
  stats        Show model statistics
  export       Export model for sharing
  import       Import a shared model
  zsh-widget   Generate ZSH integration code
  fish-widget  Generate Fish shell integration code
  uninstall    Remove widget from shell config
  validate     Validate model accuracy (train/test split)
  augment      Generate synthetic training data
  analyze      Analyze command patterns (CodeFeatureExtractor)
  tune         AutoML hyperparameter tuning (TPE)
  inspect      View model card metadata
  publish      Publish model to Hugging Face Hub

Options:
  -h, --help     Print help
  -V, --version  Print version

Fish Shell Integration

Generate the Fish widget:

aprender-shell fish-widget >> ~/.config/fish/config.fish
source ~/.config/fish/config.fish

Disable temporarily:

set -gx APRENDER_DISABLED 1

Model Cards & Inspection

View model metadata:

$ aprender-shell inspect -m ~/.aprender-shell.model

📋 Model Card: ~/.aprender-shell.model

═══════════════════════════════════════════
           MODEL INFORMATION
═══════════════════════════════════════════
  ID:           aprender-shell-markov-3gram-20251127
  Name:         Shell Completion Model
  Version:      1.0.0
  Framework:    aprender 0.10.0
  Architecture: MarkovModel
  Parameters:   40848

Export formats:

# JSON (for programmatic access)
aprender-shell inspect -m model.apr --format json

# Hugging Face YAML (for model sharing)
aprender-shell inspect -m model.apr --format huggingface

Publishing to Hugging Face Hub

Share your model with the community:

# Set token
export HF_TOKEN=hf_xxx

# Publish
aprender-shell publish -m ~/.aprender-shell.model -r username/my-shell-model

# With custom commit message
aprender-shell publish -m model.apr -r org/repo -c "v1.0 release"

Without a token, generates README.md and upload instructions.

Model Validation

Test accuracy with holdout validation:

$ aprender-shell validate

🔬 aprender-shell: Model Validation

📂 History file: ~/.zsh_history
📊 Total commands: 21729
⚙️  N-gram size: 3
📈 Train/test split: 80% / 20%

════════════════════════════════════════════
           VALIDATION RESULTS
════════════════════════════════════════════
  Hit@1:    45.2%  (exact match)
  Hit@3:    62.8%  (in top 3)
  Hit@5:    71.4%  (in top 5)

Uninstalling

Remove widget from shell config:

# Dry run (show what would be removed)
aprender-shell uninstall --dry-run

# Remove from ZSH
aprender-shell uninstall --zsh

# Remove from Fish
aprender-shell uninstall --fish

# Keep model file
aprender-shell uninstall --zsh --keep-model

Troubleshooting

Issue	Solution
"Could not find history file"	Specify path: `-f ~/.bash_history`
Suggestions too generic	Increase n-gram: `-n 4`
Model too large	Decrease n-gram: `-n 2`
Slow suggestions	Check model size with `stats`

Aprender — Pure Rust ML Framework