Skip to content
Snippets Groups Projects
user avatar
skejriwal44 authored
Thanks for this great list! We’d love to add CacheCraft —a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025.

We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production. Recent works like CacheFocus and EPIC further build on related ideas, highlighting the growing relevance of this research direction.
0faf3bf1
History
Name Last commit Last update