batuta hf
HuggingFace Hub integration commands.
Synopsis
batuta hf <COMMAND>
Commands
| Command | Description |
|---|---|
catalog | Query 50+ HuggingFace ecosystem components |
course | Query by Coursera course alignment |
tree | Display HuggingFace ecosystem tree |
search | Search models, datasets, spaces |
info | Get info about a Hub asset |
pull | Download from HuggingFace Hub |
push | Upload to HuggingFace Hub |
batuta hf catalog
Query the HuggingFace ecosystem catalog with 51 components across 6 categories.
Usage
batuta hf catalog [OPTIONS]
Options
| Option | Description |
|---|---|
--component <ID> | Get details for a specific component |
--category <CAT> | Filter by category (hub, deployment, library, training, collaboration, community) |
--tag <TAG> | Filter by tag (e.g., rlhf, lora, quantization) |
--list | List all available components |
--categories | List all categories with component counts |
--tags | List all available tags |
--format <FORMAT> | Output format: table (default), json |
Examples
# List all training components
batuta hf catalog --category training
# Output:
# π¦ HuggingFace Components
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# peft PEFT Training & Optimization
# trl TRL Training & Optimization
# bitsandbytes Bitsandbytes Training & Optimization
# ...
# Get component details
batuta hf catalog --component peft
# Output:
# π¦ PEFT
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# ID: peft
# Category: Training & Optimization
# Description: Parameter-efficient finetuning for large language models
# Docs: https://huggingface.co/docs/peft
# Repository: https://github.com/huggingface/peft
# PyPI: peft
# Tags: finetuning, lora, qlora, efficient
# Dependencies: transformers, bitsandbytes
# Course Alignments:
# Course 4, Week 1: 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8
# Search by tag
batuta hf catalog --tag rlhf
batuta hf catalog --tag quantization
Component Categories
| Category | Components | Description |
|---|---|---|
| Hub | 7 | Hub & client libraries (models, datasets, spaces) |
| Deployment | 7 | Inference & deployment (TGI, TEI, endpoints) |
| Library | 10 | Core ML libraries (transformers, diffusers, datasets) |
| Training | 10 | Training & optimization (PEFT, TRL, bitsandbytes) |
| Collaboration | 11 | Tools & integrations (Gradio, Argilla, agents) |
| Community | 6 | Community resources (blog, forum, leaderboards) |
batuta hf course
Query HuggingFace components aligned to Coursera specialization courses.
Usage
batuta hf course [OPTIONS]
Options
| Option | Description |
|---|---|
--list | List all 5 courses with component counts |
--course <N> | Show components for course N (1-5) |
--week <N> | Filter by week (requires βcourse) |
Examples
# List all courses
batuta hf course --list
# Output:
# π Pragmatic AI Labs HuggingFace Specialization
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# 5 Courses | 15 Weeks | 60 Hours
#
# Course 1: Foundations of HuggingFace (9 components)
# Course 2: Fine-Tuning and Datasets (5 components)
# Course 3: RAG and Retrieval (3 components)
# Course 4: Advanced Training (RLHF, DPO, PPO) (3 components)
# Course 5: Production Deployment (8 components)
# Get Course 4 (Advanced Fine-Tuning)
batuta hf course --course 4
# Output:
# π Course 4 - Advanced Training (RLHF, DPO, PPO)
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
# peft Week 1
# bitsandbytes Week 1
# trl Week 2, Week 3
Course Curriculum
| Course | Topic | Key Components |
|---|---|---|
| 1 | Foundations | transformers, tokenizers, safetensors, hub |
| 2 | Datasets & Fine-Tuning | datasets, trainer, evaluate |
| 3 | RAG & Retrieval | sentence-transformers, faiss, outlines |
| 4 | RLHF/DPO/PPO | peft, trl, bitsandbytes |
| 5 | Production | tgi, gradio, optimum, inference-endpoints |
batuta hf tree
Display hierarchical view of HuggingFace ecosystem or PAIML integration map.
Usage
batuta hf tree [OPTIONS]
Options
| Option | Description |
|---|---|
--integration | Show PAIMLβHuggingFace integration map |
--format <FORMAT> | Output format: ascii (default), json |
Examples
# HuggingFace ecosystem tree
batuta hf tree
# Output:
# HuggingFace Ecosystem (6 categories)
# βββ hub
# β βββ models (700K+ models)
# β βββ datasets (100K+ datasets)
# β βββ spaces (300K+ spaces)
# βββ libraries
# β βββ transformers (Model architectures)
# β βββ ...
# PAIML-HuggingFace integration map
batuta hf tree --integration
# Output shows:
# β COMPATIBLE - Interoperates with HF format/API
# β‘ ALTERNATIVE - PAIML native replacement (pure Rust)
# π ORCHESTRATES - PAIML wraps/orchestrates HF
# π¦ USES - PAIML uses HF library directly
batuta hf search
Search HuggingFace Hub for models, datasets, or spaces.
Usage
batuta hf search <ASSET_TYPE> <QUERY> [OPTIONS]
Arguments
| Argument | Description |
|---|---|
<ASSET_TYPE> | Type: model, dataset, space |
<QUERY> | Search query string |
Options
| Option | Description |
|---|---|
--task <TASK> | Filter by task (for models) |
--limit <N> | Limit results (default: 10) |
Examples
# Search for Llama models
batuta hf search model "llama 7b" --task text-generation
# Search for speech datasets
batuta hf search dataset "common voice" --limit 5
# Search for Gradio spaces
batuta hf search space "image classifier"
batuta hf info
Get detailed information about a HuggingFace asset.
Usage
batuta hf info <ASSET_TYPE> <REPO_ID>
Examples
# Get model info
batuta hf info model "meta-llama/Llama-2-7b-hf"
# Get dataset info
batuta hf info dataset "mozilla-foundation/common_voice_13_0"
# Get space info
batuta hf info space "gradio/chatbot"
batuta hf pull
Download models, datasets, or spaces from HuggingFace Hub.
Usage
batuta hf pull <ASSET_TYPE> <REPO_ID> [OPTIONS]
Options
| Option | Description |
|---|---|
-o, --output <PATH> | Output directory |
--quantization <Q> | Model quantization (Q4_K_M, Q5_K_M, etc.) |
Examples
# Pull GGUF model with quantization
batuta hf pull model "TheBloke/Llama-2-7B-GGUF" --quantization Q4_K_M
# Pull to specific directory
batuta hf pull model "mistralai/Mistral-7B-v0.1" -o ./models/
# Pull dataset
batuta hf pull dataset "squad" -o ./data/
batuta hf push
Upload models, datasets, or spaces to HuggingFace Hub.
Usage
batuta hf push <ASSET_TYPE> <PATH> --repo <REPO_ID> [OPTIONS]
Options
| Option | Description |
|---|---|
--repo <REPO_ID> | Target repository (required) |
--message <MSG> | Commit message |
Examples
# Push trained model
batuta hf push model ./my-model --repo "myorg/my-classifier"
# Push dataset
batuta hf push dataset ./data/processed --repo "myorg/my-dataset"
# Push Presentar app as Space
batuta hf push space ./my-app --repo "myorg/demo" --message "Initial release"
PAIML-HuggingFace Integration
The integration map shows how PAIML stack components relate to HuggingFace (28 mappings):
| Category | PAIML | HuggingFace | Type |
|---|---|---|---|
| Formats | .apr | pickle/.joblib, safetensors, gguf | β‘ Alternative |
| realizar/gguf | gguf | β Compatible | |
| realizar/safetensors | safetensors | β Compatible | |
| Data Formats | .ald | parquet/arrow, json/csv | β‘ Alternative |
| Hub Access | aprender/hf_hub | huggingface_hub | π¦ Uses |
| batuta/hf | huggingface_hub | π Orchestrates | |
| Registry | pacha | HF Hub registry, MLflow/W&B | β‘ Alternative |
| Inference | realizar | transformers, TGI | β‘ Alternative |
| realizar/moe | optimum | β‘ Alternative | |
| Classical ML | aprender | sklearn, xgboost/lightgbm | β‘ Alternative |
| Deep Learning | entrenar | PyTorch training | β‘ Alternative |
| alimentar | datasets | β‘ Alternative | |
| Compute | trueno | NumPy/PyTorch tensors | β‘ Alternative |
| repartir | accelerate | β‘ Alternative | |
| Tokenization | realizar/tokenizer | tokenizers | β Compatible |
| trueno-rag | tokenizers | β Compatible | |
| Apps | presentar | gradio | β‘ Alternative |
| trueno-viz | visualization | β‘ Alternative | |
| Quality | certeza | evaluate | β‘ Alternative |
| MCP Tooling | pforge | LangChain Tools | β‘ Alternative |
| pmat | code analysis tools | β‘ Alternative | |
| pmcp | mcp-sdk | β‘ Alternative |
Legend:
- β COMPATIBLE - Interoperates with HF format/API
- β‘ ALTERNATIVE - PAIML native replacement (pure Rust)
- π ORCHESTRATES - PAIML wraps/orchestrates HF
- π¦ USES - PAIML uses HF library directly
Compatible Formats
PAIML can load and save HuggingFace formats:
#![allow(unused)]
fn main() {
// Load GGUF model (realizar)
let model = GGUFModel::from_file("model.gguf")?;
// Load SafeTensors (aprender)
let weights = SafeTensors::load("model.safetensors")?;
// Load HF tokenizer (realizar)
let tokenizer = Tokenizer::from_pretrained("meta-llama/Llama-2-7b-hf")?;
}
Security Features (v1.1.0)
SafeTensors Enforcement
By default, batuta hf pull blocks unsafe pickle-based formats:
# Default: blocks .bin, .pkl, .pt files
batuta hf pull model "repo/model"
# Explicit override for unsafe formats
batuta hf pull model "repo/model" --allow-unsafe
| Extension | Safety | Notes |
|---|---|---|
.safetensors | β Safe | Recommended |
.gguf | β Safe | Quantized |
.json | β Safe | Config |
.bin | β Unsafe | Pickle-based |
.pkl | β Unsafe | Pickle |
.pt | β Unsafe | PyTorch |
Secret Scanning
Automatic scan before push blocks accidental credential exposure:
# Blocked if secrets detected
batuta hf push model ./my-model --repo "org/model"
# Detected patterns:
# - .env files
# - Private keys (.pem, id_rsa)
# - Credential files
Rate Limit Handling
Automatic exponential backoff for API rate limits (429):
- Initial: 1s β 2s β 4s β 8s β 16s
- Max backoff: 60s
- Max retries: 5
- Respects
Retry-Afterheader
Model Card Auto-Generation
# Auto-generates README.md if missing
batuta hf push model ./my-model --repo "org/model"
Generated card includes:
- YAML frontmatter (license, tags)
- Training metrics from certeza
- PAIML stack attribution
Differential Uploads
Only uploads changed files using content-addressable hashing:
# Only uploads modified files
batuta hf push model ./my-model --repo "org/model"
Environment Variables
| Variable | Description |
|---|---|
HF_TOKEN | HuggingFace API token |
HF_HOME | Cache directory |
HF_HUB_OFFLINE | Offline mode |