Chapter 54: Function Boundary Extraction (pmat extract)

Overview

The pmat extract command provides direct tree-sitter AST parsing of individual files, outputting function boundaries as structured JSON. Unlike pmat query which requires an index, extract works on any single file with zero setup — making it ideal for editor integrations, CI scripts, automated file splitting, and quick structural analysis.

Since v3.3.0, the output includes file-level metadata (imports, test boundaries) and per-item visibility — everything needed to split large files into compilable parts.

Usage

# Extract all function/struct/enum/trait boundaries from a file
pmat extract --list src/main.rs

# Pipe to jq for analysis
pmat extract --list src/handlers.rs | jq '.items[] | select(.type == "function")'

# Count items by type
pmat extract --list src/lib.rs | jq '.items | group_by(.type) | map({type: .[0].type, count: length})'

# Find largest functions (by line count)
pmat extract --list src/parser.rs | jq '.items | sort_by(-.lines) | .[0:5]'

# List all imports
pmat extract --list src/lib.rs | jq '.imports[]'

# Find public functions only
pmat extract --list src/lib.rs | jq '.items[] | select(.visibility == "pub" and .type == "function")'

# Get test module boundary
pmat extract --list src/lib.rs | jq '.cfg_test_line'

Output Format

The output is a JSON object with file-level metadata and an items array:

Top-Level Fields

FieldTypeDescription
filestringFile path as provided
languagestringDetected language (rust, typescript, python, c, cpp, go, lua)
importsstring[]Top-level import/use statements (full text)
cfg_test_linenumber?Line where #[cfg(test)] appears (Rust only, absent if none)
itemsobject[]Extracted code items

Item Fields

FieldTypeDescription
namestringFunction/struct/enum/trait name
typestringOne of: function, struct, enum, trait, impl, class, module, type_alias
start_linenumberFirst line (1-indexed)
end_linenumberLast line (inclusive)
linesnumberTotal line count (end_line - start_line + 1)
visibilitystringVisibility: pub, pub(crate), pub(super), export, or "" (private)

Visibility by Language

LanguagePublicCrate-scopedPrivate
Rust"pub""pub(crate)", "pub(super)"""
TypeScript"export"""
Go"pub" (uppercase name)"" (lowercase name)
Python, C, C++, Lua"" (always)

Example Output

$ pmat extract --list src/cache.rs
{
  "file": "src/cache.rs",
  "language": "rust",
  "imports": [
    "use std::collections::HashMap;"
  ],
  "cfg_test_line": 42,
  "items": [
    {
      "name": "Cache",
      "type": "struct",
      "start_line": 4,
      "end_line": 8,
      "lines": 5,
      "visibility": "pub"
    },
    {
      "name": "Cache",
      "type": "class",
      "start_line": 10,
      "end_line": 30,
      "lines": 21,
      "visibility": ""
    },
    {
      "name": "new",
      "type": "function",
      "start_line": 11,
      "end_line": 17,
      "lines": 7,
      "visibility": "pub"
    },
    {
      "name": "get",
      "type": "function",
      "start_line": 19,
      "end_line": 23,
      "lines": 5,
      "visibility": "pub"
    },
    {
      "name": "evict_expired",
      "type": "function",
      "start_line": 25,
      "end_line": 29,
      "lines": 5,
      "visibility": ""
    }
  ]
}

Supported Languages

LanguageExtensionsImports DetectedVisibility
Rust.rsuse, extern cratepub, pub(crate), pub(super)
TypeScript/JavaScript.ts, .tsx, .js, .jsx, .mjsimportexport
Python.py, .pyiimport, from ... import
C.c, .h#include
C++.cpp, .cc, .cxx, .hpp, .hxx#include
Go.goimportUppercase = exported
Lua.lua

Use Cases

Automated File Splitting

The primary motivation for rich metadata: split large files while preserving compilability.

# Extract boundaries for a large file
pmat extract --list src/big_module.rs > boundaries.json

# A splitting tool can:
# 1. Read imports → prepend to each split part
# 2. Read cfg_test_line → separate test code from production code
# 3. Read visibility → determine which items belong in the public API

Editor Integration

Extract function boundaries for jump-to-definition or outline views:

# Get function list for editor sidebar
pmat extract --list "$FILE" | jq '[.items[] | {name, type, line: .start_line, visibility}]'

CI Pipeline — File Complexity Gate

# Fail if any function exceeds 100 lines
MAX_LINES=100
pmat extract --list src/handler.rs | \
  jq --argjson max "$MAX_LINES" '[.items[] | select(.type == "function" and .lines > $max)]' | \
  jq -e 'length == 0' || { echo "Functions exceed $MAX_LINES lines"; exit 1; }

API Surface Analysis

# List all public functions across a crate
for f in src/**/*.rs; do
  pmat extract --list "$f" | \
    jq --arg file "$f" '.items[] | select(.visibility == "pub" and .type == "function") | {file: $file, name}'
done

Compare File Structure Across Versions

# Before refactoring
pmat extract --list src/old.rs > before.json

# After refactoring
pmat extract --list src/new.rs > after.json

# Diff item names
diff <(jq '.items[].name' before.json) <(jq '.items[].name' after.json)

Feed into pmat context --format json

The extract command complements context --format json (see Chapter 2). While context provides project-wide structure with quality metrics, extract provides per-file granularity with exact line boundaries:

# Project-level overview
pmat context --format json -p . | jq '.files | length'

# File-level detail
pmat extract --list src/main.rs | jq '.items | length'

Running the Example

cargo run --example extract_demo

This demonstrates extraction across Rust, Python, and TypeScript files with imports, visibility, and test boundary detection.