An error occurred while fetching folder content.
skejriwal44
authored
Thanks for this great list! We’d love to add CacheCraft [PDF]—a chunk-aware KV reuse approach for RAG that minimizes redundant computation while preserving generation quality. Our work is concurrent to CacheBlend, with key differences in chunk-level reuse, selective recompute planning, and optimizations designed for real-world production systems. CacheCraft is accepted at SIGMOD 2025. We’re also open-sourcing a vLLM-based extension soon. Results on real RAG traces show strong efficiency gains in production.
Name | Last commit | Last update |
---|