Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Syscall Tracing

Syscall tracing is the deepest validation method in Phase 4. It uses the Renacer tool to capture system calls made by both the original and transpiled programs, then compares the sequences to verify behavioral equivalence at the OS level.

Why Syscall Tracing

Unit tests verify individual functions. Output comparison verifies stdout. Syscall tracing verifies everything else: file operations, network calls, memory mapping, process management, and signal handling. If two programs make the same system calls in the same order with the same arguments, they exhibit equivalent OS-level behavior.

How It Works

Original program -----> Renacer -----> Syscall trace A
                                              |
Transpiled program ---> Renacer -----> Syscall trace B
                                              |
                                        Compare A vs B
                                              |
                                        Pass / Fail

Renacer intercepts system calls using ptrace (Linux) and records each call with:

  • Syscall number and name (e.g., open, read, write)
  • Arguments (file paths, buffer sizes, flags)
  • Return value
  • Timestamp

Source-Aware Correlation

Renacer provides source-level correlation: each syscall is linked back to the source line that triggered it. This makes debugging mismatches straightforward:

Mismatch at syscall #47:
  Original:   write(1, "Hello, World!\n", 14) = 14    [main.py:12]
  Transpiled: write(1, "Hello World!\n", 13)  = 13    [main.rs:18]
                          ^ missing comma

CLI Usage

# Run syscall validation
batuta validate --trace-syscalls

# Run with verbose trace output
batuta validate --trace-syscalls --verbose

# Compare specific binaries
batuta validate --trace-syscalls \
    --original ./python_app \
    --transpiled ./rust-output/target/release/app

What Is Compared

AspectComparedNotes
Syscall namesYesMust be identical sequence
File pathsYesNormalized to absolute paths
Read/write sizesYesByte counts must match
Return valuesYesErrors must match
TimingNoOnly ordering matters
Thread IDsNoThread scheduling is non-deterministic

Filtering Noise

Some syscalls are non-deterministic by nature (e.g., brk for heap allocation, mmap for library loading). Renacer applies filters to exclude these from comparison:

  • Memory management syscalls (brk, mmap, munmap)
  • Thread scheduling (futex, sched_yield)
  • Signal handling (rt_sigaction, rt_sigprocmask)
  • Clock queries (clock_gettime)

Limitations

Syscall tracing requires:

  • Linux (uses ptrace; macOS and Windows are not supported)
  • Both original and transpiled binaries must be executable
  • Programs must be deterministic (same input produces same syscall sequence)

When the original binary is not available (e.g., the source was Python without a compiled binary), syscall tracing is skipped with a warning rather than a failure.


Navigate: Table of Contents