Skip to content
GitLab
Explore
Sign in
Register
Tags
Tags give the ability to mark specific points in history as being important
This project is mirrored from
https://github.com/DefTruth/Awesome-LLM-Inference.git
. Pull mirroring updated
Dec 22, 2024
.
v2.6.9
6ad7b307
·
馃敟
馃敟
[HADACORE] HADACORE: TENSOR CORE ACCELERATED HADAMARD TRANSFORM KERNEL (#108)
·
Dec 22, 2024
v2.6.8
32fdb843
·
馃敟
[BatchLLM] BatchLLM: Optimizing Large Batched LLM Inference with Global...
·
Dec 08, 2024
v2.6.7
9f548f61
·
馃敟
[KV Cache Recomputation] Efficient LLM Inference with I/O-Aware Partial KV...
·
Dec 01, 2024
v2.6.6
40292d73
·
馃敟
[SparseInfer] SparseInfer: Training-free Prediction of Activation Sparsity...
·
Nov 25, 2024
v2.6.5
06c76ad3
·
馃敟
馃敟
[TP: Comm Compression] Communication Compression for Tensor Parallel LLM Inference (#94)
·
Nov 18, 2024
v2.6.4
f3f27a73
·
馃敟
[VL-CACHE] VL-CACHE: SPARSITY AND MODALITY-AWARE KV CACHE COMPRESSION FOR...
·
Nov 12, 2024
v2.6.3
a854d6cd
·
馃敟
[Tensor Product] Acceleration of Tensor-Product Operations with Tensor Cores (#90)
·
Oct 31, 2024
v2.6.2
613300d7
·
馃敟
[FastAttention] FastAttention: Extend FlashAttention2 to NPUs and...
·
Oct 28, 2024
v2.6.1
7ba03a64
·
馃敟
[PARALLELSPEC] PARALLELSPEC: PARALLEL DRAFTER FOR EFFICIENT SPECULATIVE DECODING (#84)
·
Oct 10, 2024
v2.6
c3f14099
·
Bump up to v2.6 (#79)
·
Oct 03, 2024
v2.5
3e436471
·
Bump up to v2.5 (#69)
·
Sep 26, 2024
v2.4
829da5ab
·
Bump up to v2.4 (#64)
·
Sep 18, 2024
v2.3
f0860e84
·
Bump up to v2.3 (#61)
·
Sep 09, 2024
v2.2
6d7e9f8a
·
Bump up to v2.2 (#58)
·
Sep 04, 2024
v2.1
74f887c1
·
Bump up to v2.1 (#50)
·
Aug 28, 2024
v2.0
8c0b51da
·
Bump up to v2.0 (#39)
·
Aug 19, 2024
v1.9
e6b8cf4f
·
Bump up to v1.9 (#32)
·
Aug 12, 2024
v1.8
6bb88189
·
Bump up to v1.8 (#27)
·
Aug 05, 2024
v1.7
a6c1528e
·
Bump up to v1.7
·
Jul 29, 2024
v1.6
a1863349
·
馃敟
馃敟
[flute] Fast Matrix Multiplications for Lookup Table-Quantized LLMs(@mit.edu etc)
·
Jul 20, 2024
Prev
1
2
Next