Back    Zoom +    Zoom -
Google Research Releases Compression Algorithm TurboQuant to Reduce AI Model Memory Usage
Recommend
1
Positive
2
Negative
1
Google Research released TurboQuant, a training-free compression algorithm that can compress the KV cache of large language models (LLM) to 3 bits without affecting model accuracy, on Tuesday (24th), according to foreign media.

In benchmark tests on Nvidia (NVDA.US)'s H100 GPUs, compared to unquantized 32-bit keys, the 4-bit TurboQuant can increase the efficiency of computing attention logits by up to 8x, while reducing the KV cache memory by at least 6x.

Related NewsEIA Crude Oil Stocks Change for Mar/20 in the United States is 6.926M, higher than the previous value of 6.156M. The forecast was 0.5M.
Memory stocks Sandisk (SDNK.US) and Micron Technology (MU.US) cascaded 3.5% and 3.4% each overnight (25th).
AASTOCKS Financial News
Website: www.aastocks.com