Researchers Turn to Specialized AI After Hallucination Problems

Technology|
|
By Seo Ji-hye
|
Researchers blindsided by general-purpose AI... turning their attention to specialized AI - Seoul Economic Daily Technology News from South Korea
Researchers blindsided by general-purpose AI... turning their attention to specialized AI

Office worker A recently sought help from generative artificial intelligence while preparing a work report. When A typed "Find papers related to ○○" into the input field, the AI service listed reference materials within seconds. The format appeared perfect, complete with paper titles, authors, and journal names. But when A clicked on the links, confusion set in. The papers would not open—only the message "Page not found (404)" appeared repeatedly. The papers could not be found anywhere on the internet. This is the phenomenon known as "hallucination," where AI generates plausible-sounding falsehoods.

17% of AI Citations Are Fake—Hallucinations Persist Despite Upgrades

Hallucination refers to AI generating facts or information that do not actually exist. A paper titled "The 17% Gap," published in January on the preprint site arXiv by a research team led by Professor Kemal Ilter of Turkey's Izmir Bakircay University, shows that hallucinations appear in a significant portion of results AI shares with users. The team conducted a forensic investigation of 50 AI-field survey papers published between September 2024 and January 2025, containing a total of 5,514 citations. They tracked AI-cited papers using Digital Object Identifier (DOI) lookups, academic database searches, and title similarity calculations.

The verification found that 939 citations—17% of the total—were in "phantom" status, unverifiable in digital form. About 5% of these were completely fabricated, existing nowhere. Another 16% had titles similar to real papers but contained incorrectly recognized DOIs or other unique identifiers, preventing connection to actual papers. Real papers could be found by modifying part of the title or searching by author name, but the citations alone could not lead to original sources.

The largest portion, about 78%, failed searches due to technical problems such as corrupted text or missing spaces in PDFs. The papers existed, but formatting was damaged during AI's citation generation process. Such errors can largely be recovered by reorganizing information like titles, authors, and years. However, the very need for such additional work demonstrates the limitations of AI-generated citations. These problems continued to appear even as AI models were upgraded over time, indicating that unverifiable citations are not early-stage trial-and-error issues but rather structural problems embedded in current AI-based citation methods.

Fast But Lacking Detail—Causing Confusion in Medicine, Science, and Law

According to researchers, AI confidently provides incorrect information because "AI does not search for papers but recombines patterns from memory." In other words, AI cuts corners. Large language models (LLMs) generate sentences by statistically predicting words that follow prompts. While they rapidly find vast amounts of information through computation, they lack the capability to verify whether papers actually exist or whether DOIs are correct. This produces results lacking in detail. Researchers described this as a "lazy research assistant" that creates superficially complete citations without accurately verifying specific information.

Hallucinations are already causing various problems across fields. Reports are increasingly common of AI recommending plausible fake papers by combining non-existent medical journals with author names when searching for medical information, or lawyers appearing in court citing non-existent case law. In the scientific community, researchers have had reports retracted after building hypotheses on non-existent experimental data or statistics while investigating research backgrounds.

Purpose-Built AI Emerges to Filter Out Fakes—Becoming Capable Colleagues

For these reasons, the scientific community is paying more attention to purpose-specific AI than general-purpose AI like ChatGPT or Perplexity. "OpenScholar," developed jointly by the University of Washington and Allen Institute for AI (Ai2), is a prime example. OpenScholar operates based on more than 45 million open-access papers. It constructs answers based on actually retrieved papers, not AI memory. Every sentence AI writes is linked in real-time to existing papers; clicking the number at the end of a sentence leads directly to the paper's abstract or full text. Unlinked citations are deleted at the generation stage, leaving no room for fake references. The scientific journal Nature recently stated, "While models like GPT-4o can cause up to 90% hallucination rates when using recent literature, OpenScholar significantly reduced this and improved citation accuracy."

"PaperQA," increasingly used among scientists, answers questions only within paper collections users personally select—not from worldwide knowledge. If a user uploads 20 PDF papers, AI finds answers only within that scope. All responses include page and paragraph-level sources to block phantoms.

AI that finds answers alongside users, like actual research colleagues, has also emerged. "Elicit," an AI-based literature review tool, finds relevant papers when users input questions, organizes them in table format, and presents actual paper sources linked to each claim. It notably analyzes paper abstracts and structures research methods, sample sizes, and key results. The key feature is that AI is designed to organize based on existing research rather than generate its own answers. The academic search service "Consensus" does not have AI state conclusions definitively. Instead, it analyzes dozens to hundreds of papers related to questions and shows ratios of supporting, opposing, and insufficient evidence. Each result includes links to actual papers. The structure allows users to directly see how academia is divided rather than hearing AI's opinion. This approach reduces hallucination possibilities by having AI organize based on existing research instead of generating its own answers.

Related Video

AI-translated from Korean. Quotes from foreign sources are based on Korean-language reports and may not reflect exact original wording.