
The Ministry of Science and ICT (MSIT) and the National Information Society Agency announced they will conduct Korea's first government-wide "AI Training Data Census" starting January 10. The initiative aims to systematically identify and secure high-quality AI training data held by the public sector and establish a virtuous cycle in the AI ecosystem through private sector collaboration.
The survey will cover not only existing AI training data but also data that could be utilized for AI training after processing. Analysis will focus on factors directly related to actual AI training usability, including data type and structure, purpose of collection, and scope of availability.
Based on the survey results, the government will comprehensively review data holdings by institution and sector to identify datasets with high potential for AI training applications. The government plans to select 100 datasets and provide them through the national "AI Training Data Integrated Provision System."
Selected datasets will undergo post-processing including quality improvement and de-identification before release. Data that cannot be disclosed online will be made available through data safe zones.
Until now, AI training data held by public institutions has been managed separately by each agency, making it difficult to comprehensively assess the overall scale and utilization potential. This census aims to address that limitation and help private companies utilize the data for AI training.
The data-sharing platform itself will also be upgraded. The government is enhancing the current "AI Hub" into the "AI Training Data Integrated Provision System" to support data distribution and transactions.
"The key to AI performance and quality lies in the abundance of usable data," said Kim Kyung-man, Director General for AI Policy. "Through this survey, we will systematically identify AI training data assets held by the public sector and continue to develop an integrated provision system that enables convenient utilization."





