
Conclusion: vAttention for Simplified, High-Performance LLM Inference
17 Jun 2025
This conclusion highlights vAttention's success in providing dynamic KV-cache memory management for LLMs without added software complexity

Related Work: vAttention in LLM Inference Optimization Landscape
17 Jun 2025
Explore how vAttention distinguishes itself from prior LLM memory management (GMLake, PagedAttention) and scheduling systems, highlighting its unique approach

vAttention: Highly Effective in Reducing LLM KV-Cache Fragmentation
17 Jun 2025
This section compares vAttention's superior approach to memory fragmentation and its significant portability advantage over vLLM

vAttention: Efficacy of Physical Memory Allocation for LLMs
17 Jun 2025
This section demonstrates vAttention's ability to efficiently allocate physical memory for LLM serving, showcasing high bandwidth, and hidden CUDA API latency

Boosting LLM Decode Throughput: vAttention vs. PagedAttention
13 Jun 2025
Discover how vAttention's use of FlashAttention's vanilla kernel for contiguous KV-cache delivers superior decode performance over paged kernelsi

vAttention Performance & Portability for LLM Prefill Phase
13 Jun 2025
This section highlights vAttention's ability to add dynamic memory allocation support to unmodified FlashAttention and FlashInfer prefill kernels

vAttention System Design: Dynamic KV-Cache with Contiguous Virtual Memory
13 Jun 2025
Explore the detailed system design of vAttention, which leverages separate virtual and physical memory allocation to enable dynamic KV-cache management

Hiding Memory Allocation Latency in LLM Serving With vAttention
13 Jun 2025
Explore how vAttention optimizes LLM serving by leveraging predictable memory demand to overlap physical memory allocation with compute

Evaluation of vAttention for LLM Inference: Prefill and Decode Performance
13 Jun 2025
This section details the evaluation methodology for vAttention, assessing its performance, portability, and memory efficiency in both prefill and decode phases