FlashAttention

FlashAttention-2 implementation for memory-efficient attention computation with O(N) memory.

cargo run --example flash_attention_inference