In January, Byeongjun Park, an AI researcher, received an unexpected email from colleagues in India. They had discovered an AI-generated manuscript that seemed to borrow ideas from one of Park’s papers—without any credit given.
Curious, Park looked up the manuscript. It wasn’t officially published yet but posted online. The work was produced by a tool called The AI Scientist, developed by researchers at Sakana AI in Tokyo. This software aims to automate research, generating ideas, writing code, and even composing research papers—all labeled as AI-generated. Its goal? To have AI discover new knowledge independently.
Upon reviewing the manuscript, Park noticed it didn’t outright copy his work but proposed a new structure for diffusion models, the technology behind many image-generating tools. However, he found that some core methodologies were quite similar to his own. “I was taken aback by how closely they mirrored my approach,” he said.
The researchers in India, Tarun Gupta and Danish Pruthi, have expressed concerns that this isn’t an isolated case. In February, they reported finding multiple AI-generated manuscripts using others’ ideas without attribution, even though the content wasn’t directly copied. They argue this suggests a form of unintentional plagiarism by AI tools, creating ideas that may appear new but often mix existing concepts.
Their findings won an award at a prestigious conference in Vienna, but not everyone agrees with them. The team behind The AI Scientist disputes these claims, stating there was no plagiarism in their case studies. One expert remarked that, in his view, the methods didn’t overlap enough to qualify as plagiarism. Park himself hesitated to label it as such, acknowledging the methodological similarities without claiming intentional wrongdoing.
The broader issue is quite serious. With thousands of papers published annually—especially in fast-paced fields like computer science—it’s already hard for researchers to track their innovations. As AI tools become more common, there’s a risk of blurring lines on who deserves credit for ideas. “AI models inherently remix existing works,” says Parshin Shojaee, a computer scientist at Virginia Tech.
Debora Weber-Wulff, a plagiarism researcher in Berlin, warns that this lack of originality detection could worsen. While traditional plagiarism involves directly copying text, defining AI-driven idea reuse is murkier. “There’s no straightforward way to prove idea plagiarism,” she explains, complicating academic integrity.
Gupta and Pruthi became aware of the problem after a 2024 study from Stanford University’s Chenglei Si, which compared human and AI-generated novel research ideas. They argue that, despite novelty checks, some AI-generated items still lifted concepts from existing papers.
In testing their suspicions, Gupta and Pruthi evaluated both AI-generated research proposals and original works, finding significant overlaps. They claimed about 24% of AI-generated ideas showed strong resemblance to existing papers, with numbers potentially rising to 36%.
Interestingly, one AI-generated manuscript—previously peer-reviewed in a notable machine-learning workshop—was claimed to borrow heavily from a 2015 work, failing to cite it. Authors of the referenced works expressed disappointment over this oversight, emphasizing their belief that the AI-generated content was neither novel nor appropriately credited.
Despite the controversy, some researchers, like Ben Hoover from Georgia Tech, view overlaps as problematic but stop short of labeling them as plagiarism. They argue AI-generated papers often lack depth compared to original studies, suggesting that the AI’s capacity for novelty is inherently limited.
The AI Scientist team vehemently contests the plagiarism claims, suggesting some overlaps stem from shared methods that researchers regularly fail to cite. They argue it’s standard practice in academia not to mention every related work, acknowledging that it would have been prudent to credit Park’s paper.
This debate raises critical questions about what constitutes plagiarism in the AI era. Weber-Wulff emphasizes that intent shouldn’t matter when determining academic integrity. “The machine knows no intent,” she notes, highlighting the challenge in attributing ideas generated by AI.
In this rapidly evolving landscape of AI research, ensuring fair credit and originality remains a critical concern. As technology advances, the need for clear guidelines on research integrity grows more urgent. For more insights on academic integrity and AI, you can refer to reliable sources like Nature and the International Center for Academic Integrity.
Source link
Computer science,Machine learning,Scientific community,Science,Humanities and Social Sciences,multidisciplinary

