This project is mirrored from https://github.com/AmadeusChan/Awesome-LLM-System-Papers.
Pull mirroring updated .
- Mar 07, 2025
-
-
AmadeusChan authored
Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse
-
- Mar 03, 2025
-
-
skejriwal44 authored
Thanks for this great list! We’d love to add CacheCraft [PDF]—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025. We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production.
-
- Feb 21, 2025
-
-
AmadeusChan authored
add the LLaMA-Factory project
-
AmadeusChan authored
add deepseek technical reports
-
AmadeusChan authored
-
- Sep 05, 2024
-
-
AmadeusChan authored
Patch 1
-
JYYHH authored
-
JYYHH authored
Add paper: DeFT: Flash Tree-attention with IO-Awareness for Efficient Tree-search-based LLM Inference
-
- Aug 30, 2024
-
-
Zongpu Zhang authored
Add mlc-llm
-
Zongpu Zhang authored
Add mobile papers
-
Zongpu Zhang authored
Add PowerInfer-2 and mlln-NPU
-
- Apr 11, 2024
-
-
AmadeusChan authored
add PatrickStar
-
- Apr 10, 2024
-
-
feifeibear authored
-
- Apr 07, 2024
-
-
AmadeusChan authored
-
AmadeusChan authored
-
AmadeusChan authored
-
- Apr 05, 2024
-
-
AmadeusChan authored
-
- Apr 04, 2024
-
-
AmadeusChan authored
-
- Mar 30, 2024
-
-
AmadeusChan authored
-
- Mar 25, 2024
-
-
AmadeusChan authored
Add SGLang
-
- Mar 22, 2024
-
-
Lianmin Zheng authored
-
AmadeusChan authored
-
- Mar 15, 2024
-
-
AmadeusChan authored
-
- Feb 23, 2024
-
-
AmadeusChan authored
-
AmadeusChan authored
Merge the distributed/single-node serving paper list since nowadays almost all new systems support multi-GPU processing with tensor or pipeline parallelism. Also added a new paper from Cal.
-
- Feb 22, 2024
-
-
AmadeusChan authored
-
- Feb 20, 2024
-
-
AmadeusChan authored
-
- Jan 18, 2024
-
-
AmadeusChan authored
-
AmadeusChan authored
-
- Jan 04, 2024
-
-
AmadeusChan authored
-
- Dec 28, 2023
-
-
AmadeusChan authored
-
- Dec 27, 2023
-
-
AmadeusChan authored
-
- Dec 24, 2023
-
-
AmadeusChan authored
-
- Dec 22, 2023
-
-
AmadeusChan authored
-
- Dec 19, 2023
-
-
AmadeusChan authored
-
- Nov 29, 2023
-
-
AmadeusChan authored
-
- Nov 28, 2023
-
-
AmadeusChan authored
-
AmadeusChan authored
-
- Nov 27, 2023
-
-
AmadeusChan authored
-
- Nov 15, 2023
-
-
AmadeusChan authored
-