Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

batuta data

Data platforms integration commands for visualizing and querying the enterprise data ecosystem.

Synopsis

batuta data <COMMAND> [OPTIONS]

Commands

CommandDescription
treeDisplay data platforms ecosystem tree

Global Options

OptionDescription
-v, --verboseEnable verbose output
-d, --debugEnable debug output
-h, --helpPrint help

batuta data tree

Display hierarchical visualization of data platforms and their components, or show PAIML stack integration mappings.

Usage

batuta data tree [OPTIONS]

Options

OptionDescriptionDefault
--platform <NAME>Filter by platform (databricks, snowflake, aws, huggingface)All platforms
--integrationShow PAIML integration mappings instead of platform treefalse
--format <FORMAT>Output format (ascii, json)ascii

Examples

View All Platforms

$ batuta data tree

DATA PLATFORMS ECOSYSTEM
========================

DATABRICKS
├── Unity Catalog
│   └── Unity Catalog
│       ├── Schemas
│       ├── Tables
│       └── Views
├── Delta Lake
│   └── Delta Lake
│       ├── Parquet storage
│       ├── Transaction log
│       └── Time travel
...

Filter by Platform

$ batuta data tree --platform snowflake

SNOWFLAKE
├── Virtual Warehouse
│   └── Virtual Warehouse
│       ├── Compute clusters
│       ├── Result cache
│       └── Auto-scaling
├── Iceberg Tables
│   └── Iceberg Tables
│       ├── Open format
│       ├── Schema evolution
│       └── Partition pruning
├── Snowpark
│   └── Snowpark
│       ├── Python UDFs
│       ├── Java/Scala UDFs
│       └── ML functions
└── Data Sharing
    └── Data Sharing
        ├── Secure shares
        ├── Reader accounts
        └── Marketplace

View Integration Mappings

$ batuta data tree --integration

PAIML ↔ DATA PLATFORMS INTEGRATION
==================================

STORAGE & CATALOGS
├── [ALT] Alimentar (.ald) ←→ Delta Lake
├── [CMP] Alimentar (.ald) ←→ Iceberg Tables
├── [CMP] Alimentar (sync) ←→ S3
├── [ALT] Pacha Registry ←→ Unity Catalog
├── [ALT] Pacha Registry ←→ Glue Catalog
├── [ALT] Pacha Registry ←→ HuggingFace Hub

COMPUTE & PROCESSING
├── [ALT] Trueno ←→ Spark DataFrames
├── [ALT] Trueno ←→ Snowpark
├── [ALT] Trueno ←→ EMR
├── [TRN] Depyler → Rust ←→ Snowpark Python
├── [TRN] Depyler → Rust ←→ Lambda Python
├── [ALT] Trueno-Graph ←→ Neptune/GraphQL

ML TRAINING
├── [ALT] Aprender ←→ MLlib
├── [ALT] Aprender ←→ Snowpark ML
├── [ALT] Entrenar ←→ SageMaker Training
├── [ALT] Entrenar ←→ MLflow Tracking
├── [ALT] Entrenar ←→ SageMaker Experiments
├── [USE] Entrenar ←→ W&B

MODEL SERVING
├── [ALT] Realizar ←→ MLflow Serving
├── [ALT] Realizar ←→ SageMaker Endpoints
├── [ALT] Realizar + serve ←→ Bedrock
├── [USE] Realizar ←→ GGUF models
├── [CMP] Realizar (via GGUF) ←→ HF Transformers

ORCHESTRATION
├── [ORC] Batuta ←→ Databricks Workflows
├── [ORC] Batuta ←→ Snowflake Tasks
├── [ORC] Batuta ←→ Step Functions
├── [ORC] Batuta ←→ Airflow/Prefect

Legend: [CMP]=Compatible [ALT]=Alternative [USE]=Uses
        [TRN]=Transpiles [ORC]=Orchestrates

Summary: 3 compatible, 16 alternatives, 2 uses, 2 transpiles, 4 orchestrates
         Total: 27 integration points

JSON Output

$ batuta data tree --platform databricks --format json

{
  "platform": "Databricks",
  "categories": [
    {
      "name": "Unity Catalog",
      "components": [
        {
          "name": "Unity Catalog",
          "description": "Unified governance for data and AI",
          "sub_components": ["Schemas", "Tables", "Views"]
        }
      ]
    },
    ...
  ]
}
$ batuta data tree --integration --format json

[
  {
    "platform_component": "Delta Lake",
    "paiml_component": "Alimentar (.ald)",
    "integration_type": "Alternative",
    "category": "STORAGE & CATALOGS"
  },
  ...
]

Integration Type Legend

CodeTypeMeaning
CMPCompatibleDirect interoperability with PAIML component
ALTAlternativePAIML provides a sovereign replacement
USEUsesPAIML component consumes this as input
TRNTranspilesDepyler converts source code to Rust
ORCOrchestratesBatuta can coordinate external workflows

Supported Platforms

PlatformDescription
databricksUnity Catalog, Delta Lake, MLflow, Spark
snowflakeVirtual Warehouse, Iceberg, Snowpark, Data Sharing
awsS3, Glue, SageMaker, Bedrock, EMR, Lambda
huggingfaceHub, Transformers, Datasets, Inference API

See Also