Skip to content
Snippets Groups Projects
user avatar
skejriwal44 authored
Thanks for this great list! We’d love to add CacheCraft [PDF]—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025. We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production.
3e6f37d6
History
Name Last commit Last update