Case Study: CITL Automated Program Repair

Using the Compiler-in-the-Loop Learning module for automated Rust code repair.

Overview

The aprender::citl module provides a complete system for:

Parsing compiler diagnostics
Encoding errors into embeddings for pattern matching
Suggesting and applying fixes
Tracking metrics for continuous improvement
SIMD-accelerated similarity search via trueno

Basic Usage

use aprender::citl::{CITL, CITLBuilder, CompilerMode};

// Create CITL instance with Rust compiler
let citl = CITLBuilder::new()
    .with_compiler(CompilerMode::Rustc)
    .max_iterations(5)
    .confidence_threshold(0.7)
    .build()
    .expect("Failed to create CITL instance");

// Source code with a type error
let source = r#"
fn main() {
    let x: i32 = "hello";
}
"#;

// Get fix suggestions
if let Some(suggestion) = citl.suggest_fix(source, source) {
    println!("Suggested fix: {}", suggestion.description);
    println!("Confidence: {:.1}%", suggestion.confidence * 100.0);
}

Iterative Fix Loop

The fix_all method attempts to fix all errors iteratively:

use aprender::citl::{CITL, CITLBuilder, CompilerMode, FixResult};

let citl = CITLBuilder::new()
    .with_compiler(CompilerMode::Rustc)
    .max_iterations(10)
    .build()
    .expect("CITL build failed");

let buggy_code = r#"
fn add(a: i32, b: i32) -> i32 {
    a + b
}

fn main() {
    let result: String = add(1, 2);
    println!("{}", result);
}
"#;

match citl.fix_all(buggy_code) {
    FixResult::Success { fixed_code, iterations, fixes_applied } => {
        println!("Fixed in {} iterations!", iterations);
        println!("Applied {} fixes", fixes_applied.len());
        println!("Fixed code:\n{}", fixed_code);
    }
    FixResult::Failure { last_code, remaining_errors, .. } => {
        println!("Could not fully fix. {} errors remain.", remaining_errors);
    }
}

Cargo Mode for Dependencies

When code requires external crates, use Cargo mode:

use aprender::citl::{CITL, CITLBuilder, CompilerMode};

let citl = CITLBuilder::new()
    .with_compiler(CompilerMode::Cargo)  // Uses cargo check
    .build()
    .expect("CITL build failed");

let code_with_deps = r#"
use serde::{Serialize, Deserialize};

#[derive(Serialize, Deserialize)]
struct Config {
    name: String,
    value: i32,
}

fn main() {
    let config = Config { name: "test".into(), value: 42 };
    println!("{}", serde_json::to_string(&config).unwrap());
}
"#;

// Cargo mode resolves dependencies automatically
if let Some(fix) = citl.suggest_fix(code_with_deps, code_with_deps) {
    println!("Fix: {}", fix.description);
}

Pattern Library

The pattern library stores learned error-fix mappings:

use aprender::citl::{PatternLibrary, ErrorFixPattern, FixTemplate};

let mut library = PatternLibrary::new();

// Add a custom pattern
let pattern = ErrorFixPattern {
    error_code: "E0308".to_string(),
    error_message_pattern: "expected `i32`, found `String`".to_string(),
    context_pattern: "let.*:.*i32.*=".to_string(),
    fix_template: FixTemplate::type_conversion("i32", ".parse().unwrap()"),
    success_count: 0,
    failure_count: 0,
};

library.add_pattern(pattern);

// Save patterns for persistence
library.save("patterns.citl").expect("Save failed");

// Load patterns later
let loaded = PatternLibrary::load("patterns.citl").expect("Load failed");

Built-in Fix Templates

The module includes 21 fix templates for common errors:

E0308 - Type Mismatch

type_annotation - Add explicit type annotation
type_conversion - Add conversion method (.into(), .to_string())
reference_conversion - Convert between & and owned types

E0382 - Use of Moved Value

borrow_instead_of_move - Change to borrow
rc_wrap - Wrap in Rc for shared ownership
arc_wrap - Wrap in Arc for thread-safe sharing

E0277 - Trait Bound Not Satisfied

derive_debug - Add #[derive(Debug)]
derive_clone_trait - Add #[derive(Clone)]
impl_display - Implement Display trait
impl_from - Implement From trait

E0515 - Cannot Return Reference

return_owned - Return owned value instead
return_cloned - Clone and return
use_cow - Use Cow<'a, T> for flexibility

Metrics Tracking

Track performance with the built-in metrics system:

use aprender::citl::{MetricsTracker, MetricsSummary};
use std::time::Duration;

let mut metrics = MetricsTracker::new();

// Record fix attempts
metrics.record_fix_attempt(true, "E0308");
metrics.record_fix_attempt(true, "E0308");
metrics.record_fix_attempt(false, "E0382");

// Record pattern usage
metrics.record_pattern_use(0, true);  // Pattern 0 succeeded
metrics.record_pattern_use(1, false); // Pattern 1 failed

// Record compilation times
metrics.record_compilation_time(Duration::from_millis(150));
metrics.record_compilation_time(Duration::from_millis(200));

// Record convergence (iterations to fix)
metrics.record_convergence(2, true);  // Fixed in 2 iterations
metrics.record_convergence(5, false); // Failed after 5 iterations

// Get summary
let summary = metrics.summary();
println!("{}", summary.to_report());

Output:

=== CITL Metrics Summary ===

Fix Attempts: 3 (success rate: 66.7%)
Compilations: 2 (avg time: 175.0ms)
Convergence: 50.0% (avg 3.5 iterations)

Most Common Errors:
  E0308: 2
  E0382: 1

Session Duration: 1.2s

Error Embedding

The encoder converts errors into embeddings for similarity matching:

use aprender::citl::ErrorEncoder;

let encoder = ErrorEncoder::new();

// Encode a diagnostic
let diagnostic = "error[E0308]: mismatched types, expected i32 found String";
let embedding = encoder.encode(diagnostic, "let x: i32 = get_string();");

// Embeddings can be compared for similarity
// Similar errors produce similar embeddings

Integration Test Example

#[test]
fn test_citl_fixes_type_mismatch() {
    let citl = CITLBuilder::new()
        .with_compiler(CompilerMode::Rustc)
        .max_iterations(3)
        .build()
        .unwrap();

    let source = r#"
fn main() {
    let x: i32 = "42";
}
"#;

    let result = citl.fix_all(source);
    assert!(matches!(result, FixResult::Success { .. }));
}

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         CITL Module                             │
│                                                                 │
│   ┌───────────┐    ┌───────────┐    ┌───────────────────┐      │
│   │ Compiler  │───►│  Parser   │───►│  Error Encoder    │      │
│   │ Interface │    │ (JSON)    │    │  (Embeddings)     │      │
│   └───────────┘    └───────────┘    └─────────┬─────────┘      │
│                                               │                 │
│                                               ▼                 │
│   ┌───────────┐    ┌───────────┐    ┌───────────────────┐      │
│   │  Apply    │◄───│  Pattern  │◄───│  Pattern Library  │      │
│   │   Fix     │    │  Matcher  │    │  (21 Templates)   │      │
│   └───────────┘    └─────┬─────┘    └───────────────────┘      │
│                          │                                      │
│                          ▼                                      │
│   ┌─────────────────────────────────────────────────────┐      │
│   │                    trueno                            │      │
│   │         SIMD Vector Operations (CPU/GPU)             │      │
│   │    dot() • norm_l2() • sub() • normalize()           │      │
│   └─────────────────────────────────────────────────────┘      │
│                                                                 │
│   ┌─────────────────────────────────────────────────────┐      │
│   │              Metrics Tracker                         │      │
│   │  (Success Rate, Compilation Time, Convergence)       │      │
│   └─────────────────────────────────────────────────────┘      │
└─────────────────────────────────────────────────────────────────┘

Neural Encoder (Multi-Language)

For cross-language transpilation (Python→Rust, Julia→Rust, etc.), use the neural encoder:

use aprender::citl::{NeuralErrorEncoder, NeuralEncoderConfig, ContrastiveLoss};

// Create encoder with configuration
let config = NeuralEncoderConfig::small();  // 128-dim embeddings
let encoder = NeuralErrorEncoder::with_config(config);

// Encode errors from different languages
let rust_emb = encoder.encode(
    "E0308: mismatched types, expected i32 found &str",
    "let x: i32 = \"hello\";",
    "rust",
);

let python_emb = encoder.encode(
    "TypeError: expected int, got str",
    "x: int = \"hello\"",
    "python",
);

// Similar type errors cluster together in embedding space

Training with Contrastive Loss

let mut encoder = NeuralErrorEncoder::with_config(NeuralEncoderConfig::default());
encoder.train();  // Enable training mode

// Encode batch of anchors and positives
let anchors = &[
    ("E0308: type mismatch", "let x: i32 = s;", "rust"),
    ("E0382: moved value", "let y = x; let z = x;", "rust"),
];
let positives = &[
    ("E0308: expected i32", "let a: i32 = b;", "rust"),
    ("E0382: borrow after move", "let p = q; let r = q;", "rust"),
];

let anchor_emb = encoder.encode_batch(anchors);
let positive_emb = encoder.encode_batch(positives);

// InfoNCE contrastive loss
let loss_fn = ContrastiveLoss::with_temperature(0.07);
let loss = loss_fn.forward(&anchor_emb, &positive_emb, None);

Configuration Options

Config	Embed Dim	Layers	Encode Time
`minimal()`	64	1	132 µs
`small()`	128	2	919 µs
`default()`	256	2	~2 ms

Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│ Tokenizer   │────►│  Embedding  │────►│ Transformer │────►│ L2 Norm     │
│ (8K vocab)  │     │ + Position  │     │ (N layers)  │     │ (SIMD)      │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘

Supported languages: rust, python, julia, typescript, go, java, cpp

Key Types

Type	Purpose
`CITL`	Main orchestrator for fix operations
`CITLBuilder`	Builder pattern for configuration
`CompilerMode`	Rustc, Cargo, or CargoCheck
`PatternLibrary`	Stores error-fix patterns
`FixTemplate`	Describes how to apply a fix
`ErrorEncoder`	Hand-crafted feature embeddings
`NeuralErrorEncoder`	Transformer-based embeddings (GPU)
`ContrastiveLoss`	InfoNCE loss for training
`MetricsTracker`	Performance tracking
`FixResult`	Success/Failure with details

Performance Characteristics

CITL uses trueno for SIMD-accelerated vector operations:

Operation	Time	Throughput
Cosine similarity (256-dim)	122 ns	2.1 Gelem/s
Cosine similarity (1024-dim)	375 ns	2.7 Gelem/s
L2 distance (256-dim)	147 ns	1.7 Gelem/s
Pattern search (100 patterns)	9.3 µs	10.7 Melem/s
Batch similarity (500 comparisons)	40 µs	12.4 Melem/s

Complexity:

Pattern matching: O(n) where n = number of patterns
Embedding generation: O(m) where m = diagnostic length
Fix application: O(1) string replacement
Persistence: Binary format with CITL magic header

GPU Acceleration:

Enable GPU via trueno's wgpu backend:

cargo build --features gpu

Running Benchmarks

cargo bench --bench citl

Benchmark groups:

citl_cosine_similarity - Core SIMD similarity
citl_l2_distance - Euclidean distance
citl_pattern_search - Library search scaling
citl_error_encoding - Full encoding pipeline
citl_batch_similarity - Batch comparison throughput
citl_neural_encoder - Transformer encoding
citl_neural_config - Config comparison

Build-Time Performance Assertions

Beyond correctness, CITL systems enforce performance contracts at build time using the renacer.toml DSL.

renacer.toml Configuration

[package]
name = "my-transpiled-cli"
version = "0.1.0"

[performance]
# Fail build if startup exceeds 50ms
startup_time_ms = 50

# Fail if binary exceeds 5MB
binary_size_mb = 5

# Memory usage assertions
[performance.memory]
peak_rss_mb = 100
heap_allocations_max = 10000

# Syscall budget per operation
[performance.syscalls]
file_read = 50
file_write = 25
network_connect = 5

# Regression detection
[performance.regression]
baseline = "baseline.json"
max_regression_percent = 5.0

Build-Time Validation

# Run performance assertions during build
cargo build --release

# renacer validates assertions automatically
[PASS] startup_time: 23ms (limit: 50ms)
[PASS] binary_size: 2.1MB (limit: 5MB)
[PASS] peak_rss: 24MB (limit: 100MB)
[PASS] syscalls/file_read: 12 (limit: 50)
[FAIL] syscalls/network_connect: 8 (limit: 5)

error: Performance assertion failed
  --> renacer.toml:18:1
   |
18 | network_connect = 5
   | ^^^^^^^^^^^^^^^^^^^ actual: 8, limit: 5
   |
   = help: Consider batching network operations or using connection pooling

Real-World Performance Improvements

The reprorusted-python-cli project demonstrates dramatic improvements achieved through CITL transpilation with performance assertions:

┌─────────────────────────────────────────────────────────────────┐
│           REPRORUSTED-PYTHON-CLI BENCHMARK RESULTS              │
│                                                                 │
│   Operation          Python      Rust        Improvement        │
│   ────────────────   ──────      ────        ───────────        │
│   CSV parse (10MB)   2.3s        0.08s       28.7× faster       │
│   JSON serialize     890ms       31ms        28.7× faster       │
│   Regex matching     1.2s        0.11s       10.9× faster       │
│   HTTP requests      4.5s        0.42s       10.7× faster       │
│                                                                 │
│   Resource Usage:                                               │
│   Total syscalls     185,432     10,073      18.4× fewer        │
│   Memory allocs      45,231      2,891       15.6× fewer        │
│   Peak memory        127.4MB     23.8MB      5.4× smaller       │
│                                                                 │
│   Binary Size:       N/A         2.1MB       (static linked)    │
│   Startup Time:      ~500ms      23ms        21.7× faster       │
└─────────────────────────────────────────────────────────────────┘

Syscall Budget Enforcement

The DSL supports fine-grained syscall budgets:

[performance.syscalls]
# I/O operations
read = 100
write = 50
open = 20
close = 20

# Memory operations
mmap = 10
munmap = 10
brk = 5

# Process operations
clone = 2
execve = 1
fork = 0  # Forbidden

# Network operations
socket = 5
connect = 5
sendto = 100
recvfrom = 100

Integration with CI/CD

# .github/workflows/performance.yml
name: Performance Gates

on: [push, pull_request]

jobs:
  performance:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Build with assertions
        run: cargo build --release

      - name: Run renacer validation
        run: |
          renacer validate --config renacer.toml
          renacer compare --baseline baseline.json --report pr-perf.md

      - name: Upload performance report
        uses: actions/upload-artifact@v4
        with:
          name: performance-report
          path: pr-perf.md

      - name: Comment on PR
        if: github.event_name == 'pull_request'
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const report = fs.readFileSync('pr-perf.md', 'utf8');
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: report
            });

Profiling Integration

Use renacer with profiling tools for detailed analysis:

# Generate syscall trace
renacer profile --trace syscalls ./target/release/my-cli

# Analyze allocation patterns
renacer profile --trace allocations ./target/release/my-cli

# Compare against baseline
renacer diff baseline.trace current.trace --format markdown

Output:

## Syscall Comparison

| Syscall | Baseline | Current | Delta |
|---------|----------|---------|-------|
| read    | 45       | 12      | -73%  |
| write   | 23       | 8       | -65%  |
| mmap    | 156      | 4       | -97%  |
| **Total** | **1,203** | **89** | **-93%** |

EXTREME TDD - The Aprender Guide to Zero-Defect Machine Learning

Case Study: CITL Automated Program Repair

Overview

Basic Usage

Iterative Fix Loop

Cargo Mode for Dependencies

Pattern Library

Built-in Fix Templates

E0308 - Type Mismatch

E0382 - Use of Moved Value

E0277 - Trait Bound Not Satisfied

E0515 - Cannot Return Reference

Metrics Tracking

Error Embedding

Integration Test Example

Architecture

Neural Encoder (Multi-Language)

Training with Contrastive Loss

Configuration Options

Architecture

Key Types

Performance Characteristics

Running Benchmarks

Build-Time Performance Assertions

renacer.toml Configuration

Build-Time Validation

Real-World Performance Improvements

Syscall Budget Enforcement

Integration with CI/CD

Profiling Integration

See Also