AI | Will AI Diminish the Rigor of Peer Review?

Apr, 2024

Scholars have progressively embraced computational aids as a means of accelerating written output and analytical undertakings. Yet as these automated systems augment in proficiency, so too does their potential to subtly steer intellectual discussions. A group of researchers from Stanford University conducted one of the inaugural quantitative deep dives examining the pervasiveness of language models’ impacts within selected realms of academia.

The team scoured critiques submitted to prominent machine learning conferences and papers released in a distinguished science periodical series, both prior to and following the November 2022 debut of ChatGPT. By developing novel statistical techniques to gauge the likelihood that a given text underwent significant modification or generation by an AI instrument, they found incontrovertible evidence of LLMs being leveraged in review substance, though varying in degree between domains.

Most remarkably, their estimates propose that between 6.5-16.9% of sentences within reviews for ICLR, NeurIPS, CoRL and EMNLP from last year could have experienced meaningful alteration by AI beyond basic proofreading. On the other hand, they detected no statistically significant deviations in assessments published in the nature publications. These discoveries align with expectations that machine learning experts may have been early technology adopters given their knowledge and access to capable tools.

Supplementary analyses shed light on how and when LLM application appears to emerge. Reviews submitted proximal to cutoff dates or from referees less inclined to engage with authors in discussion correlated with higher projected AI implication. Critiques lacking scholarly references or demonstrating less variation in substance across a paper’s appraisals similarly coordinated with amplified AI signals.

While the investigation cannot demonstrate causal relationships, these patterns evoke queries regarding whether ease and time pressure may be influencing some referees’ judgments and potentially skewing the review process. More broadly, the homogenization of content discerned at extensive scales where AI exists raises expanded concerns about representative know-how and diversity that has long bolstered peer assessment.

The creators emphasize these are initial discoveries from a novel methodology, and added interdisciplinary effort remains imperative. Nonetheless, their high-level estimates and linguistic clues furnish a valuable starting point for methodically inspecting AI’s spreading yet delicate sway on scholarly conversations and information formation. Only through transparent metrics like these can we guarantee these strong instruments augment rather than inappropriately direct human decision making and exchange.

Reference(s)

  1. https://doi.org/10.48550/arXiv.2403.07183

Click TAGS to see related articles :

AI | PEER REVIEW | RESEARCH | SOCIETY

About the Author

  • Dilruwan Herath

    Dilruwan Herath is a British infectious disease physician and pharmaceutical medical executive with over 25 years of experience. As a doctor, he specialized in infectious diseases and immunology, developing a resolute focus on public health impact. Throughout his career, Dr. Herath has held several senior medical leadership roles in large global pharmaceutical companies, leading transformative clinical changes and ensuring access to innovative medicines. Currently, he serves as an expert member for the Faculty of Pharmaceutical Medicine on it Infectious Disease Committee and continues advising life sciences companies. When not practicing medicine, Dr. Herath enjoys painting landscapes, motorsports, computer programming, and spending time with his young family. He maintains an avid interest in science and technology. He is EIC and founder of DarkDrug.

Pin It on Pinterest

DarkDrug

FREE
VIEW