Keyboard shortcuts

Press ← or β†’ to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

OIP: Defect Intelligence

β€œOIP (Organizational Intelligence Plugin) provides ML-powered defect pattern analysis and spectrum-based fault localization.”

Overview

OIP analyzes git history and test coverage to identify defect patterns and locate bugs:

  • SBFL Fault Localization: Tarantula, Ochiai, DStar algorithms
  • Defect Classification: ML-based commit labeling
  • Training Data Extraction: Convert git history to ML training data
  • RAG Enhancement: Knowledge retrieval with trueno-rag
  • Ensemble Models: Weighted multi-model predictions

Installation

# Install from crates.io
cargo install oip

# Verify installation
oip --version
# Output: oip 0.3.1

Basic Usage

Training Data Extraction

Extract defect patterns from git history:

oip extract-training-data --repo /path/to/project --max-commits 500

# Output:
# Training Data Statistics:
#   Total examples: 146
#   Avg confidence: 0.84
#
# Class Distribution:
#   ASTTransform: 53 (36.3%)
#   OwnershipBorrow: 43 (29.5%)
#   ComprehensionBugs: 12 (8.2%)
#   ...

Fault Localization

Find suspicious lines using SBFL:

oip localize \
    --passed-coverage passed.lcov \
    --failed-coverage failed.lcov \
    --formula tarantula \
    --top-n 10

# Output:
# 🎯 Tarantula Hotspot Report
#    Line  | Suspiciousness | Status
#    ------|----------------|--------
#    142   | 0.950          | πŸ”΄ HIGH
#    287   | 0.823          | πŸ”΄ HIGH
#    56    | 0.612          | 🟑 MEDIUM

SBFL Formulas

OIP supports multiple fault localization formulas:

FormulaDescriptionBest For
TarantulaClassic SBFLGeneral use
OchiaiCosine similarityHigh precision
DStar2D* with power 2Balanced
DStar3D* with power 3Aggressive

Suspiciousness Calculation

Tarantula formula:

suspiciousness = (failed(line) / total_failed) /
                 ((failed(line) / total_failed) + (passed(line) / total_passed))

Defect Pattern Categories

OIP classifies defects into these categories:

CategoryDescriptionExample
TraitBoundsMissing or incorrect trait boundsT: Clone + Send
ASTTransformSyntax/structure issuesMacro expansion bugs
OwnershipBorrowOwnership/lifetime errorsUse after move
ConfigurationErrorsConfig/environment issuesMissing feature flag
ConcurrencyBugsRace conditionsData races
SecurityVulnerabilitiesSecurity issuesBuffer overflow
TypeErrorsType mismatchesWrong generic
MemorySafetyMemory bugsDangling pointer

Advanced Features

RAG Enhancement

Use knowledge retrieval for better localization:

oip localize \
    --passed-coverage passed.lcov \
    --failed-coverage failed.lcov \
    --rag \
    --knowledge-base bugs.yaml \
    --fusion rrf

Ensemble Models

Combine multiple models for higher accuracy:

oip localize \
    --passed-coverage passed.lcov \
    --failed-coverage failed.lcov \
    --ensemble \
    --ensemble-model trained-model.bin \
    --include-churn

Calibrated Predictions

Get confidence-calibrated outputs:

oip localize \
    --passed-coverage passed.lcov \
    --failed-coverage failed.lcov \
    --calibrated \
    --calibration-model calibration.bin \
    --confidence-threshold 0.7

Integration with Batuta

OIP integrates with Batuta’s validation phase:

# Batuta can invoke OIP for fault analysis
batuta validate --fault-localize

Comparison with pmat

Capabilitypmatoip
SATD Detectionβœ…βŒ
TDG Scoringβœ…βŒ
Complexity Analysisβœ…βŒ
Fault LocalizationβŒβœ…
Defect MLβŒβœ…
RAG EnhancementβŒβœ…

Key insight: pmat is for static analysis BEFORE tests run. OIP is for fault analysis AFTER tests fail.

Command Reference

oip [COMMAND] [OPTIONS]

COMMANDS:
    analyze                Analyze GitHub organization
    summarize              Summarize analysis report
    review-pr              Review PR with context
    extract-training-data  Extract training data from git
    train-classifier       Train ML classifier
    export                 Export features
    localize               SBFL fault localization

LOCALIZE OPTIONS:
    --passed-coverage <PATH>   LCOV from passing tests
    --failed-coverage <PATH>   LCOV from failing tests
    --formula <FORMULA>        tarantula, ochiai, dstar2, dstar3
    --top-n <N>                Top suspicious lines
    --rag                      Enable RAG enhancement
    --ensemble                 Use ensemble model
    --calibrated               Calibrated predictions

Version

Current version: 0.3.1

Next Steps


Navigate: Table of Contents