FlashAttention
FlashAttention-2 implementation for memory-efficient attention computation with O(N) memory.
cargo run --example flash_attention_inference
FlashAttention-2 implementation for memory-efficient attention computation with O(N) memory.
cargo run --example flash_attention_inference