LLM profiling guides KV cache optimization

This research paper was presented at the 12th International Conference on Learning Representations (opens in new tab) (ICLR 2024), the premier conference dedicated to the advancement of deep learning.

White ICLR logo to the left of the first page of the accepted paper, “Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs” on a purple background.

Large language models (LLMs) rely on complex internal mechanisms that require more memory than what is typically available to operate on standard devices. One such

 

 

To finish reading, please visit source site

Leave a Reply