Skip to content
Snippets Groups Projects
Unverified Commit b8b3a43b authored by DefTruth's avatar DefTruth Committed by GitHub
Browse files

:fire::fire:[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM...

:fire::fire:[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth) (#111)

:fire::fire:[FFPA] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)
parent eb6fb10d
Branches
Tags v2.6.10
No related merge requests found
......@@ -228,6 +228,7 @@ Awesome-LLM-Inference: A curated list of [📙Awesome LLM Inference Papers with
|2024.11|🔥🔥[**SageAttention-2**] SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2411.10958)|[[SageAttention]](https://github.com/thu-ml/SageAttention) ![](https://img.shields.io/github/stars/thu-ml/SageAttention) | ⭐️⭐️ |
|2024.11|🔥🔥[**Squeezed Attention**] SQUEEZED ATTENTION: Accelerating Long Context Length LLM Inference(@UC Berkeley) |[[pdf]](https://arxiv.org/pdf/2411.09688)|[[SqueezedAttention]](https://github.com/SqueezeAILab/SqueezedAttention) ![](https://img.shields.io/github/stars/SqueezeAILab/SqueezedAttention) | ⭐️⭐️ |
|2024.12|🔥🔥[**TurboAttention**] TURBOATTENTION: EFFICIENT ATTENTION APPROXIMATION FOR HIGH THROUGHPUTS LLMS(@Microsoft)|[[pdf]](https://arxiv.org/pdf/2412.08585)| ⚠️ |⭐️⭐️ |
|2025.01|🔥🔥[**FFPA**] FFPA: Yet another Faster Flash Prefill Attention with O(1) SRAM complexity for headdim > 256, ~1.5x faster than SDPA EA(@DefTruth)|[[docs]](https://github.com/DefTruth/faster-prefill-attention)| [[faster-prefill-attention]](https://github.com/DefTruth/faster-prefill-attention) ![](https://img.shields.io/github/stars/DefTruth/faster-prefill-attention)|⭐️⭐️ |
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment