Loading request...
Explore supporting SageAttention2 with vLLM's chunked prefill and paged attention features to accelerate commercial applications of LLMs.
vLLM has chunked prefill and paged attention features. Do you expect SageAttention2 to support these acceleration algorithms based on block tables memory management? This will greatly accelerate the commercial application of LLM.