- 08-11-2024
- LLM
A Yale study found peer reviewers struggle to distinguish AI-generated essays from human writing, revealing biases against machine-generated content.
A recent study led by Lee Schwamm at Yale School of Medicine examined how well peer reviewers can distinguish between essays written by humans and those generated by large language models (LLMs) like ChatGPT. The research, published in the journal Stroke, found that reviewers accurately identified authorship only 50% of the time, akin to flipping a coin. Interestingly, AI-generated essays were rated higher for quality compared to human submissions. However, when reviewers believed an essay was AI-written, they rated it as the best only 4% of the time, indicating a bias against machine-generated content. Schwamm emphasizes the need for policies regarding AI's role in scientific writing, noting that as LLMs become more advanced, their detection by reviewers may diminish. He advocates for viewing AI as a tool to enhance scientific communication, particularly benefiting non-native English speakers, rather than as a shortcut that undermines academic integrity. The findings suggest that the scientific community must rethink its approach to AI in writing to embrace its potential benefits while ensuring quality and accountability.