LLM profiling guides KV cache optimization
This research paper was presented at the 12th International Conference on Learning Representations (opens in new tab) (ICLR 2024), the premier conference dedicated to the advancement of deep learning.
Large language models (LLMs) rely on complex internal mechanisms that require more memory than what is typically available to operate on standard devices. One such