Skip to content
Snippets Groups Projects
This project is mirrored from https://github.com/DefTruth/Awesome-LLM-Inference.git. Pull mirroring updated .
  1. Mar 30, 2025
  2. Mar 25, 2025
  3. Mar 04, 2025
    • skejriwal44's avatar
      CacheCraft: A Relevant Work on Chunk-Aware KV Cache Reuse for RAG (#126) · 0faf3bf1
      skejriwal44 authored
      Thanks for this great list! We’d love to add CacheCraft —a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025.
      
      We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production. Recent works like CacheFocus and EPIC further build on related ideas, highlighting the growing relevance of this research direction.
      0faf3bf1
  4. Mar 03, 2025
  5. Mar 02, 2025
  6. Mar 01, 2025
  7. Feb 27, 2025
    • Blank-z0's avatar
      Add our ICLR2025 work Dynamic-LLaVA (#121) · 4cb87630
      Blank-z0 authored
      Add paper "Dynamic-LLaVA: Efficient Multimodal Large Language Models via Dynamic Vision-language Context Sparsification"
      Dynamic-LLaVA is the first MLLM acceleration framework that simultaneously sparsifies both vision and language contexts while integrating inference efficiency optimization across different MLLM inference modes into a unified framework. In practice, Dynamic-LLaVA can achieve additional inference efficiency throughout the entire generation process, with negligible understanding and generation ability degradation or even performance gains compared to the full-context inference baselines.
      GitHub: https://github.com/Osilly/dynamic_llava
      4cb87630
  8. Feb 24, 2025
  9. Feb 19, 2025
  10. Feb 13, 2025
  11. Jan 31, 2025
  12. Jan 24, 2025
  13. Jan 23, 2025
  14. Jan 16, 2025
  15. Jan 15, 2025
  16. Jan 08, 2025
  17. Jan 06, 2025
  18. Jan 03, 2025
  19. Dec 27, 2024
  20. Dec 22, 2024
  21. Dec 08, 2024
  22. Dec 01, 2024
  23. Nov 28, 2024
  24. Nov 25, 2024
  25. Nov 24, 2024
Loading