Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

batuta hf

HuggingFace Hub integration commands.

Synopsis

batuta hf <COMMAND>

Commands

CommandDescription
catalogQuery 50+ HuggingFace ecosystem components
courseQuery by Coursera course alignment
treeDisplay HuggingFace ecosystem tree
searchSearch models, datasets, spaces
infoGet info about a Hub asset
pullDownload from HuggingFace Hub
pushUpload to HuggingFace Hub

batuta hf catalog

Query the HuggingFace ecosystem catalog with 51 components across 6 categories.

Usage

batuta hf catalog [OPTIONS]

Options

OptionDescription
--component <ID>Get details for a specific component
--category <CAT>Filter by category (hub, deployment, library, training, collaboration, community)
--tag <TAG>Filter by tag (e.g., rlhf, lora, quantization)
--listList all available components
--categoriesList all categories with component counts
--tagsList all available tags
--format <FORMAT>Output format: table (default), json

Examples

# List all training components
batuta hf catalog --category training

# Output:
# πŸ“¦ HuggingFace Components
# ════════════════════════════════════════════════════════════
#   peft        PEFT           Training & Optimization
#   trl         TRL            Training & Optimization
#   bitsandbytes Bitsandbytes  Training & Optimization
#   ...

# Get component details
batuta hf catalog --component peft

# Output:
# πŸ“¦ PEFT
# ════════════════════════════════════════════════════════════
# ID:          peft
# Category:    Training & Optimization
# Description: Parameter-efficient finetuning for large language models
# Docs:        https://huggingface.co/docs/peft
# Repository:  https://github.com/huggingface/peft
# PyPI:        peft
# Tags:        finetuning, lora, qlora, efficient
# Dependencies: transformers, bitsandbytes
# Course Alignments:
#   Course 4, Week 1: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8

# Search by tag
batuta hf catalog --tag rlhf
batuta hf catalog --tag quantization

Component Categories

CategoryComponentsDescription
Hub7Hub & client libraries (models, datasets, spaces)
Deployment7Inference & deployment (TGI, TEI, endpoints)
Library10Core ML libraries (transformers, diffusers, datasets)
Training10Training & optimization (PEFT, TRL, bitsandbytes)
Collaboration11Tools & integrations (Gradio, Argilla, agents)
Community6Community resources (blog, forum, leaderboards)

batuta hf course

Query HuggingFace components aligned to Coursera specialization courses.

Usage

batuta hf course [OPTIONS]

Options

OptionDescription
--listList all 5 courses with component counts
--course <N>Show components for course N (1-5)
--week <N>Filter by week (requires –course)

Examples

# List all courses
batuta hf course --list

# Output:
# πŸ“š Pragmatic AI Labs HuggingFace Specialization
# ════════════════════════════════════════════════════════════
# 5 Courses | 15 Weeks | 60 Hours
#
#   Course 1: Foundations of HuggingFace (9 components)
#   Course 2: Fine-Tuning and Datasets (5 components)
#   Course 3: RAG and Retrieval (3 components)
#   Course 4: Advanced Training (RLHF, DPO, PPO) (3 components)
#   Course 5: Production Deployment (8 components)

# Get Course 4 (Advanced Fine-Tuning)
batuta hf course --course 4

# Output:
# πŸ“š Course 4 - Advanced Training (RLHF, DPO, PPO)
# ════════════════════════════════════════════════════════════
#   peft           Week 1
#   bitsandbytes   Week 1
#   trl            Week 2, Week 3

Course Curriculum

CourseTopicKey Components
1Foundationstransformers, tokenizers, safetensors, hub
2Datasets & Fine-Tuningdatasets, trainer, evaluate
3RAG & Retrievalsentence-transformers, faiss, outlines
4RLHF/DPO/PPOpeft, trl, bitsandbytes
5Productiontgi, gradio, optimum, inference-endpoints

batuta hf tree

Display hierarchical view of HuggingFace ecosystem or PAIML integration map.

Usage

batuta hf tree [OPTIONS]

Options

OptionDescription
--integrationShow PAIML↔HuggingFace integration map
--format <FORMAT>Output format: ascii (default), json

Examples

# HuggingFace ecosystem tree
batuta hf tree

# Output:
# HuggingFace Ecosystem (6 categories)
# β”œβ”€β”€ hub
# β”‚   β”œβ”€β”€ models         (700K+ models)
# β”‚   β”œβ”€β”€ datasets       (100K+ datasets)
# β”‚   └── spaces         (300K+ spaces)
# β”œβ”€β”€ libraries
# β”‚   β”œβ”€β”€ transformers   (Model architectures)
# β”‚   └── ...

# PAIML-HuggingFace integration map
batuta hf tree --integration

# Output shows:
# βœ“ COMPATIBLE  - Interoperates with HF format/API
# ⚑ ALTERNATIVE - PAIML native replacement (pure Rust)
# πŸ”„ ORCHESTRATES - PAIML wraps/orchestrates HF
# πŸ“¦ USES        - PAIML uses HF library directly

Search HuggingFace Hub for models, datasets, or spaces.

Usage

batuta hf search <ASSET_TYPE> <QUERY> [OPTIONS]

Arguments

ArgumentDescription
<ASSET_TYPE>Type: model, dataset, space
<QUERY>Search query string

Options

OptionDescription
--task <TASK>Filter by task (for models)
--limit <N>Limit results (default: 10)

Examples

# Search for Llama models
batuta hf search model "llama 7b" --task text-generation

# Search for speech datasets
batuta hf search dataset "common voice" --limit 5

# Search for Gradio spaces
batuta hf search space "image classifier"

batuta hf info

Get detailed information about a HuggingFace asset.

Usage

batuta hf info <ASSET_TYPE> <REPO_ID>

Examples

# Get model info
batuta hf info model "meta-llama/Llama-2-7b-hf"

# Get dataset info
batuta hf info dataset "mozilla-foundation/common_voice_13_0"

# Get space info
batuta hf info space "gradio/chatbot"

batuta hf pull

Download models, datasets, or spaces from HuggingFace Hub.

Usage

batuta hf pull <ASSET_TYPE> <REPO_ID> [OPTIONS]

Options

OptionDescription
-o, --output <PATH>Output directory
--quantization <Q>Model quantization (Q4_K_M, Q5_K_M, etc.)

Examples

# Pull GGUF model with quantization
batuta hf pull model "TheBloke/Llama-2-7B-GGUF" --quantization Q4_K_M

# Pull to specific directory
batuta hf pull model "mistralai/Mistral-7B-v0.1" -o ./models/

# Pull dataset
batuta hf pull dataset "squad" -o ./data/

batuta hf push

Upload models, datasets, or spaces to HuggingFace Hub.

Usage

batuta hf push <ASSET_TYPE> <PATH> --repo <REPO_ID> [OPTIONS]

Options

OptionDescription
--repo <REPO_ID>Target repository (required)
--message <MSG>Commit message

Examples

# Push trained model
batuta hf push model ./my-model --repo "myorg/my-classifier"

# Push dataset
batuta hf push dataset ./data/processed --repo "myorg/my-dataset"

# Push Presentar app as Space
batuta hf push space ./my-app --repo "myorg/demo" --message "Initial release"

PAIML-HuggingFace Integration

The integration map shows how PAIML stack components relate to HuggingFace (28 mappings):

CategoryPAIMLHuggingFaceType
Formats.aprpickle/.joblib, safetensors, gguf⚑ Alternative
realizar/ggufggufβœ“ Compatible
realizar/safetensorssafetensorsβœ“ Compatible
Data Formats.aldparquet/arrow, json/csv⚑ Alternative
Hub Accessaprender/hf_hubhuggingface_hubπŸ“¦ Uses
batuta/hfhuggingface_hubπŸ”„ Orchestrates
RegistrypachaHF Hub registry, MLflow/W&B⚑ Alternative
Inferencerealizartransformers, TGI⚑ Alternative
realizar/moeoptimum⚑ Alternative
Classical MLaprendersklearn, xgboost/lightgbm⚑ Alternative
Deep LearningentrenarPyTorch training⚑ Alternative
alimentardatasets⚑ Alternative
ComputetruenoNumPy/PyTorch tensors⚑ Alternative
repartiraccelerate⚑ Alternative
Tokenizationrealizar/tokenizertokenizersβœ“ Compatible
trueno-ragtokenizersβœ“ Compatible
Appspresentargradio⚑ Alternative
trueno-vizvisualization⚑ Alternative
Qualitycertezaevaluate⚑ Alternative
MCP ToolingpforgeLangChain Tools⚑ Alternative
pmatcode analysis tools⚑ Alternative
pmcpmcp-sdk⚑ Alternative

Legend:

  • βœ“ COMPATIBLE - Interoperates with HF format/API
  • ⚑ ALTERNATIVE - PAIML native replacement (pure Rust)
  • πŸ”„ ORCHESTRATES - PAIML wraps/orchestrates HF
  • πŸ“¦ USES - PAIML uses HF library directly

Compatible Formats

PAIML can load and save HuggingFace formats:

#![allow(unused)]
fn main() {
// Load GGUF model (realizar)
let model = GGUFModel::from_file("model.gguf")?;

// Load SafeTensors (aprender)
let weights = SafeTensors::load("model.safetensors")?;

// Load HF tokenizer (realizar)
let tokenizer = Tokenizer::from_pretrained("meta-llama/Llama-2-7b-hf")?;
}

Security Features (v1.1.0)

SafeTensors Enforcement

By default, batuta hf pull blocks unsafe pickle-based formats:

# Default: blocks .bin, .pkl, .pt files
batuta hf pull model "repo/model"

# Explicit override for unsafe formats
batuta hf pull model "repo/model" --allow-unsafe
ExtensionSafetyNotes
.safetensorsβœ“ SafeRecommended
.ggufβœ“ SafeQuantized
.jsonβœ“ SafeConfig
.binβœ— UnsafePickle-based
.pklβœ— UnsafePickle
.ptβœ— UnsafePyTorch

Secret Scanning

Automatic scan before push blocks accidental credential exposure:

# Blocked if secrets detected
batuta hf push model ./my-model --repo "org/model"

# Detected patterns:
# - .env files
# - Private keys (.pem, id_rsa)
# - Credential files

Rate Limit Handling

Automatic exponential backoff for API rate limits (429):

  • Initial: 1s β†’ 2s β†’ 4s β†’ 8s β†’ 16s
  • Max backoff: 60s
  • Max retries: 5
  • Respects Retry-After header

Model Card Auto-Generation

# Auto-generates README.md if missing
batuta hf push model ./my-model --repo "org/model"

Generated card includes:

  • YAML frontmatter (license, tags)
  • Training metrics from certeza
  • PAIML stack attribution

Differential Uploads

Only uploads changed files using content-addressable hashing:

# Only uploads modified files
batuta hf push model ./my-model --repo "org/model"

Environment Variables

VariableDescription
HF_TOKENHuggingFace API token
HF_HOMECache directory
HF_HUB_OFFLINEOffline mode