Case Study: Publishing Shell Models to Hugging Face Hub

Share your trained shell completion models with the community via Hugging Face Hub.

Official Base Model

A pre-trained base model is available for immediate use:

# Download and use
huggingface-cli download paiml/aprender-shell-base model.apr --local-dir ~/.aprender
aprender-shell suggest "git " -m ~/.aprender/model.apr

The base model is trained on 401 synthetic developer commands (git, cargo, docker, kubectl, npm, python, aws, terraform) and contains no personal data.

Overview

The publish command uploads your model to Hugging Face Hub, automatically generating:

Model card (README.md) with metadata
Training statistics
Usage instructions
License information

Quick Start

# 1. Train a model
aprender-shell train -f ~/.zsh_history -o my-shell.model

# 2. Set your HF token
export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxx

# 3. Publish
aprender-shell publish -m my-shell.model -r username/my-shell-completions

Getting a Hugging Face Token

Create account at huggingface.co
Go to Settings → Access Tokens
Create token with "Write" permission
Export: export HF_TOKEN=hf_xxx

Publish Command

aprender-shell publish [OPTIONS] -m <MODEL> -r <REPO>

Options:
  -m, --model <MODEL>    Model file to publish
  -r, --repo <REPO>      Repository ID (username/repo-name)
  -c, --commit <MSG>     Commit message (default: "Upload model")
      --create           Create repository if it doesn't exist
      --private          Make repository private

Examples

# Basic publish
aprender-shell publish -m model.apr -r paiml/devops-completions

# Create new repo with custom message
aprender-shell publish -m model.apr -r alice/k8s-model --create -c "Initial release"

# Private repository
aprender-shell publish -m model.apr -r company/internal-model --create --private

Generated Model Card

The publish command generates a README.md with:

---
license: mit
pipeline_tag: text-generation
tags:
  - aprender
  - shell-completion
  - markov-model
  - rust
---

# Shell Completion Model

AI-powered shell command completion trained on real history.

## Model Details

| Property | Value |
|----------|-------|
| Architecture | MarkovModel |
| N-gram Size | 3 |
| Vocabulary | 16,100 |
| Training Commands | 21,729 |

## Usage

\`\`\`bash
# Download
huggingface-cli download username/model model.apr

# Use with aprender-shell
aprender-shell suggest "git " -m model.apr
\`\`\`

Without Token (Offline Mode)

If HF_TOKEN is not set, publish generates files locally:

$ aprender-shell publish -m model.apr -r paiml/test

⚠️  HF_TOKEN not set. Cannot upload to Hugging Face Hub.

📝 Model card saved to: README.md

To upload manually:
  1. Set HF_TOKEN: export HF_TOKEN=hf_xxx
  2. Run: huggingface-cli upload paiml/test model.apr README.md

Model Inspection

Before publishing, inspect your model:

# Text format
aprender-shell inspect -m model.apr

# JSON format (programmatic)
aprender-shell inspect -m model.apr --format json

# Hugging Face YAML (model card preview)
aprender-shell inspect -m model.apr --format huggingface

JSON Output

{
  "model_id": "aprender-shell-markov-3gram-20251127",
  "name": "Shell Completion Model",
  "version": "1.0.0",
  "created_at": "2025-11-27T12:00:00Z",
  "framework_version": "aprender 0.10.0",
  "architecture": "MarkovModel",
  "hyperparameters": {
    "ngram_size": 3
  },
  "metrics": {
    "vocab_size": 16100,
    "ngram_count": 40848
  }
}

Use Cases

Team-Specific Models

Share DevOps patterns with your team:

# Train on team history
cat ~/.zsh_history ~/.bash_history team/*.history > combined.history
aprender-shell train -f combined.history -o devops.model

# Publish to org
aprender-shell publish -m devops.model -r myorg/devops-completions --create

Domain-Specific Models

Curate models for specific domains:

Domain	Example Commands
Kubernetes	`kubectl`, `helm`, `k9s`
AWS	`aws`, `sam`, `cdk`
Docker	`docker`, `docker-compose`
Git	`git`, `gh`, `glab`

Community Models

Browse community models:

# Official base model (recommended starting point)
huggingface-cli download paiml/aprender-shell-base model.apr

# Search HF Hub for more
huggingface-cli search aprender shell-completion

# Use any model
aprender-shell suggest "kubectl " -m model.apr

Best Practices

Privacy

Before publishing, verify no secrets in model:

# Check for sensitive patterns
strings model.apr | grep -iE 'password|secret|token|key'

# The model stores n-grams, not raw commands
# But verify training data was filtered

Versioning

Use semantic versioning in commit messages:

aprender-shell publish -m model.apr -r user/model -c "v1.0.0: Initial release"
aprender-shell publish -m model.apr -r user/model -c "v1.1.0: Add kubectl patterns"

Documentation

Add context in your model card:

# Edit generated README.md before upload
vim README.md

# Then upload with huggingface-cli
huggingface-cli upload user/model model.apr README.md

Architecture

┌─────────────────┐     ┌──────────────────┐     ┌─────────────────┐
│  aprender-shell │────▶│   hf-hub crate   │────▶│  Hugging Face   │
│    publish      │     │  (official API)  │     │      Hub        │
└─────────────────┘     └──────────────────┘     └─────────────────┘
        │
        ▼
┌─────────────────┐
│   ModelCard     │
│  (README.md)    │
└─────────────────┘

The implementation uses the official hf-hub crate by Hugging Face for API compatibility.

Troubleshooting

Issue	Solution
"401 Unauthorized"	Check HF_TOKEN is valid and has write permission
"404 Not Found"	Use `--create` flag for new repositories
"Repository exists"	Repository already exists, will update files
"Model too large"	Use Git LFS for models >10MB

Shell Completion - Training and usage
Model Cards - Metadata specification (planned)
Model Format (.apr) - Binary format details

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning