Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/AmadeusChan/Awesome-LLM-System-Papers. Pull mirroring updated .
  1. Mar 07, 2025
  2. Mar 03, 2025
    • skejriwal44's avatar
      Request to Add CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse for RAG · 3e6f37d6
      skejriwal44 authored
      Thanks for this great list! We’d love to add CacheCraft [PDF]—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025. We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production.
      3e6f37d6
  3. Feb 21, 2025
  4. Sep 05, 2024
  5. Aug 30, 2024
  6. Apr 11, 2024
  7. Apr 10, 2024
  8. Apr 07, 2024
  9. Apr 05, 2024
  10. Apr 04, 2024
  11. Mar 30, 2024
  12. Mar 25, 2024
  13. Mar 22, 2024
  14. Mar 15, 2024
  15. Feb 23, 2024
    • AmadeusChan's avatar
      fixed a typo · c9328a4b
      AmadeusChan authored
      c9328a4b
    • AmadeusChan's avatar
      Update README.md · 7670100e
      AmadeusChan authored
      Merge the distributed/single-node serving paper list since nowadays almost all new systems support multi-GPU processing with tensor or pipeline parallelism. Also added a new paper from Cal.
      7670100e
  16. Feb 22, 2024
  17. Feb 20, 2024
  18. Jan 18, 2024
  19. Jan 04, 2024
  20. Dec 28, 2023
  21. Dec 27, 2023
  22. Dec 24, 2023
  23. Dec 22, 2023
  24. Dec 19, 2023
  25. Nov 29, 2023
  26. Nov 28, 2023
  27. Nov 27, 2023
  28. Nov 15, 2023
Loading