
KAIST researchers have developed a new artificial intelligence model that generates first-person perspective video from just a single ordinary video clip. The technology is expected to become a breakthrough innovation in fields utilizing augmented reality (AR) and virtual reality (VR).
A research team led by Distinguished Professor Joo Jae-geol at KAIST's Kim Jaechul Graduate School of AI announced on the 23rd that they have developed 'EgoX,' an AI model that precisely generates scenes as they would have appeared from a subject's actual viewpoint using only observer-perspective video.
Previously, obtaining high-quality first-person video required users to wear expensive action cameras or smart glasses. Technical limitations also existed in naturally converting already-recorded ordinary video (third-person perspective, exocentric video) to first-person view.
This technology goes beyond simply rotating the screen. Its key feature is comprehensively understanding the subject's position, posture, and the three-dimensional structure of the surrounding space before reconstructing first-person perspective video based on this analysis.
Existing technologies often could only convert still images or required footage from four or more cameras. Videos with complex lighting directions or movement also produced awkward results. In contrast, EgoX can generate high-quality first-person video from just a single third-person perspective video.

"By precisely modeling the correlation between head movement and actual field of vision, we succeeded in realistically implementing how the view naturally shifts when turning one's head," the research team said.
The newly developed technology demonstrated stable performance not only in specific environments but across various everyday situations including cooking, exercise, and work tasks. This has opened new possibilities for obtaining high-quality first-person perspective data from existing accumulated video without wearing separate wearable devices.
EgoX can be applied across various industrial sectors. In AR, VR, and metaverse fields, it can maximize user experience by converting ordinary video into immersive content that feels like direct personal experience. The technology is also expected to contribute to robotics and AI learning as core data for imitation learning, where robots learn by observing human behavior. Additionally, new forms of video services are anticipated, such as converting sports broadcasts or vlogs to athletes' or protagonists' perspectives.
"This research is significant not merely as video conversion technology, but because artificial intelligence learned and reconstructed human 'vision' and 'spatial understanding,'" Distinguished Professor Joo Jae-geol said. "We expect an environment will emerge where anyone can create and experience immersive content using only previously recorded video."
He added, "KAIST will continue to secure world-class competitiveness in generative AI-based video technology."
The research was conducted with KAIST doctoral candidates Kang Tae-woong and Kim Ki-nam, and Seoul National University undergraduate researcher Kim Do-hyun as co-first authors. The paper was pre-released on arXiv on December 9, 2025, drawing significant attention from the AI industry and academia, including major U.S. technology companies. It is scheduled for official presentation at IEEE/CVPR, an international academic conference to be held in Colorado, USA, on June 3.

