
The global semiconductor market is reacting sharply to "TurboQuant," an artificial intelligence memory optimization technology recently released by Google Research. TurboQuant is an algorithm that dramatically compresses the "KV cache" — which consumes massive amounts of memory during large language model (LLM) inference — thereby lowering costs. The technology has raised hopes of easing the memory bottleneck long seen as an obstacle to the broader rollout of AI services. At the same time, however, concerns that the technology could reduce memory chip demand and derail the semiconductor super-cycle have sent markets into turmoil.
Google unveiled TurboQuant on May 24 (local time). Han In-soo, a professor in the School of Electrical Engineering at KAIST, Korea's top science and technology university, participated as a co-researcher while serving as a visiting researcher at Google Research. Developed jointly by teams from Google Research, DeepMind and New York University, TurboQuant is drawing attention as a next-generation quantization algorithm that addresses the chronic memory overload problem in AI models.
The KV cache is the storage space where AI systems such as ChatGPT retain the context of conversations with users. As inputs grow longer, the amount of information to be stored increases, causing memory usage to surge and inference speed to drop. TurboQuant is designed to shrink this space to as little as one-sixth of its original size while causing virtually no loss in performance. In simple terms, the technology compresses the information AI must remember into a much smaller footprint while still allowing rapid retrieval when needed. According to Google, computational speed can be increased up to eight-fold in an H100-class GPU environment.
The market impact was immediate. After TurboQuant's release, global semiconductor stocks including Samsung Electronics (005930.KS) and SK hynix (000660.KS) were shaken, with SK hynix shares plunging over two days. The sell-off reflected concerns that if memory usage can be sharply reduced, demand for memory chips would ultimately decline as well. The shock was amplified when Matthew Prince, CEO of Cloudflare, wrote on his X account that "this is Google's DeepSeek."
Academics, however, argue that such an interpretation is premature. Kwon Seok-jun, a professor of semiconductor convergence engineering and chemical engineering at Sungkyunkwan University, said, "It is more appropriate to view Google's TurboQuant as a technology that changes the game rules of the memory industry rather than one that destroys memory demand." His point is that as less memory is consumed, AI can be run more cheaply and quickly, and as a result, high-cost services such as long-prompt processing, multi-agent systems and multimodal analysis will proliferate — ultimately generating new demand.
KAIST offered a similar interpretation. As the semiconductor demand structure shifts from high-capacity to high-efficiency, AI will spread more cheaply and rapidly, and memory demand could become qualitatively more sophisticated, the university said. In particular, if the scope of AI applications widens — from on-device AI in smartphones and home appliances to large-scale data centers — a bigger market could open up beyond any short-term demand slowdown, analysts noted.
The research is also significant in that Korean researchers directly contributed to the development of a core AI algorithm at a global big tech company. "As AI model capabilities grow, the rapid increase in memory usage has been the biggest limitation. This research presents a new direction for reducing that bottleneck while maintaining accuracy," Prof. Han said. Ultimately, the market's question is shifting from "will less memory be used?" to "how much more will cheaper AI be used?" analysts said.
