Science Summaries Are Simpler, but Not by Much—Can AI Do Better?

26 Nov 2024

Author:

(1) David M. Markowitz, Department of Communication, Michigan State University, East Lansing, MI 48824.

Editor's note: This is part 5 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below.

Table of Links

Study 1a: Results

Descriptive statistics for each language dimension and intercorrelations are in Table 1. As expected, lay summaries were linguistically simpler than scientific summaries of the same article, Welch’s t(65793) = 40.62, p < .001, Cohen’s d = 0.31, 95% CI [0.29, 0.32].[2] At the item level of the simplicity index, lay summaries (M = 69.77%, SD = 7.14%) contained more common words than scientific summaries (M = 67.79%, SD = 6.60%), Welch’s t(68741) = 37.79, p < .001, Cohen’s d = 0.29, 95% CI [0.27, 0.30]. Lay summaries (M = 92.34, SD = 7.95) also had a simpler linguistic style than scientific summaries (M = 94.31, SD = 5.19), Welch’s t(59561) = - 38.52, p < .001, Cohen’s d = 0.29, 95% CI [0.28, 0.31]. Finally, lay summaries (M = 12.96, SD = 13.93) were more readable than scientific summaries as well (M = 12.49, SD = 12.46), Welch’s t(68320) = 4.67, p < .001, Cohen’s d = 0.036, 95% CI [0.02, 0.05].

Together, while lay summaries were indeed linguistically simpler than scientific summaries at PNAS, the effect sizes between such groups were quite small and it is therefore unclear if general readers would be able to recognize or appreciate such differences. Can lay summaries be written even simpler, using generative AI tools, to produce more substantive effect sizes while maintaining the core content of each text? In the next study, a random selection of abstracts was submitted to a popular large language model, GPT-4, and were given the same instructions as PNAS authors on how to construct a significance statement.

This paper is available on arxiv under CC BY 4.0 DEED license.

[2] 95% Confidence Intervals were bootstrapped with 5,000 replicates.

← Previous

Inside the Numbers: How 34,584 Science Papers Reveal the Secrets of Simpler Writing

Up Next →

Can GPT-4 Outdo Scientists? Testing AI’s Skills at Writing Reader-Friendly Science Summaries