The prevalence of statistical reporting errors in psychology (1985-2013)

Michèle B Nuijten¹, Chris H J Hartgerink², Marcel A L M van Assen², Sacha Epskamp³, Jelte M Wicherts²

Affiliations

¹ Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands. m.b.nuijten@uvt.nl.
² Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands.
³ Psychological Methods, University of Amsterdam, Amsterdam, Netherlands.

PMID: 26497820
PMCID: PMC5101263
DOI: 10.3758/s13428-015-0664-2

The prevalence of statistical reporting errors in psychology (1985-2013)

Michèle B Nuijten et al. Behav Res Methods. 2016 Dec.

. 2016 Dec;48(4):1205-1226.

doi: 10.3758/s13428-015-0664-2.

Authors

Michèle B Nuijten¹, Chris H J Hartgerink², Marcel A L M van Assen², Sacha Epskamp³, Jelte M Wicherts²

Affiliations

¹ Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands. m.b.nuijten@uvt.nl.
² Department of Methodology and Statistics, Tilburg School of Social and Behavioral Sciences, Tilburg University, Tilburg, Netherlands.
³ Psychological Methods, University of Amsterdam, Amsterdam, Netherlands.

PMID: 26497820
PMCID: PMC5101263
DOI: 10.3758/s13428-015-0664-2

Abstract

This study documents reporting errors in a sample of over 250,000 p-values reported in eight major psychology journals from 1985 until 2013, using the new R package "statcheck." statcheck retrieved null-hypothesis significance testing (NHST) results from over half of the articles from this period. In line with earlier research, we found that half of all published psychology papers that use NHST contained at least one p-value that was inconsistent with its test statistic and degrees of freedom. One in eight papers contained a grossly inconsistent p-value that may have affected the statistical conclusion. In contrast to earlier findings, we found that the average prevalence of inconsistent p-values has been stable over the years or has declined. The prevalence of gross inconsistencies was higher in p-values reported as significant than in p-values reported as nonsignificant. This could indicate a systematic bias in favor of significant results. Possible solutions for the high prevalence of reporting inconsistencies could be to encourage sharing data, to let co-authors check results in a so-called "co-pilot model," and to use statcheck to flag possible inconsistencies in one's own manuscript or during the review process.

Keywords: False positives; NHST; Publication bias; Questionable research practices; Reporting errors; Significance; p-values.

PubMed Disclaimer

Figures

**Fig. 1**
The percentage of articles with American Psychological Association (APA)-reported null-hypothesis significance testing (NHST) results over the years, averaged over all APA journals (*Developmental Psychology* (DP), *Journal of Consulting and Clinical Psychology* (JCCP), *Journal of Experimental Psychology: General* (JEPG), *Journal of Personality and Social Psychology* (JPSP), and *Journal of Applied Psychology* (JAP); dark gray panel), and split up per journal – light gray panels for the APA journals and white panels for the non-APA journals (*Psychological Science* (PS), *Frontiers in Psychology* (FP), and *Public Library of Science* (PLoS)). For each trend we report the unstandardized linear regression coefficient (b) and the coefficient of determination (R²) of the linear trend

**Fig. 2**
The average number of American Psychological Association (APA)-reported null-hypothesis significance testing (NHST) results per article that contains NHST results over the years, averaged over all APA journals (*Developmental Psychology* (DP), *Journal of Consulting and Clinical Psychology* (JCCP), *Journal of Experimental Psychology: General* (JEPG), *Journal of Personality and Social Psychology* (JPSP), and *Journal of Applied Psychology* (JAP); dark gray panel), and split up per journal (light gray panels for the APA journals and white panels for the non-APA journals – *Psychological Science* (PS), *Frontiers in Psychology* (FP), and *Public Library of Science* (PLoS)). For each trend we report the unstandardized linear regression coefficient (b) and the coefficient of determination (R²) of the linear trend

**Fig. 3**
The average percentage of articles within a journal with at least one (gross) inconsistency and the average percentage of (grossly) inconsistent p-values per article, split up by journal. Inconsistencies are depicted in white and gross inconsistencies in grey. For the journals *Journal of Personality and Social Psychology* (JPSP), *Journal of Experimental Psychology: General* (JEPG), *Developmental Psychology* (DP), *Frontiers in Psychology* (FP), *Public Library of Science* (PLoS), *Journal of Consulting and Clinical Psychology* (JCCP), *Psychological Science* (PS), and *Journal of Applied Psychology* (JAP), respectively, the number of articles with null-hypothesis significance testing (NHST) results is 4,346, 821, 2,607, 702, 2,487, 2,413, 1,681, and 1,638, and the average number of NHST results in an article is 23.4, 23.0, 14.4, 14.5, 12.7, 11.4, 9.3, and 9.2

**Fig. 4**
Average percentage of inconsistencies (open circles) and gross inconsistencies (solid circles) in an article over the years averaged over all American Psychological Association (APA) journals (*Developmental Psychology* (DP), *Journal of Consulting and Clinical Psychology* (JCCP), *Journal of Experimental Psychology: General* (JEPG), *Journal of Personality and Social Psychology* (JPSP), and *Journal of Applied Psychology* (JAP); dark gray panel) and split up per journal (light gray panels for the APA journals and white panels for non-APA journals – *Psychological Science* (PS), *Frontiers in Psychology* (FP), and *Public Library of Science* (PLoS)). The unstandardized regression coefficient b and the coefficient of determination R² of the linear trend are shown per journal for both inconsistencies (incons) and gross inconsistencies (gross) over the years

**Fig. 5**
Percentage of articles with at least one inconsistency (open circles) or at least one gross inconsistency (solid circles), split up by journal. The unstandardized regression coefficient b and the coefficient of determination R² of the linear trend are shown per journal for both inconsistencies (incons) as gross inconsistencies (gross) over the years. *APA* American Psychological Assocation, DP Developmental Psychology, *JCCP* Journal of Consulting and Clinical Psychology, JEPG Journal of Experimental Psychology: General , *JPSP* Journal of Personality and Social Psychology, *JAP* Journal of Applied Psychology, PS Psychological Science, FP Frontiers in Psychology, *PLoS* Public Library of Science

**Fig. 6**
The percentage of gross inconsistencies in p-values reported as significant (white bars) and nonsignificant (gray bars), split up by journal. For the journals *Journal of Applied Psychology* (JAP), *Journal of Consulting and Clinical Psychology* (JCCP), *Developmental Psychology* (DP), *Public Library of Science* (PLoS), *Psychological Science* (PS), *Frontiers in Psychology* (FP), *Journal of Personality and Social Psychology* (JPSP), and *Journal of Experimental Psychology: General* (JEPG), respectively, the total number of significant p-values was 11,654, 21,120, 29,962, 22,071, 12,482, 7,377, 78,889, and 14,084, and the total number of nonsignificant p-values was 3,119, 5,558, 6,698, 9,134, 2,936, 2,712, 17,868, and 4,407

**Fig. 7**
The percentage of gross inconsistencies in p-values reported as significant (solid line) and nonsignificant (dotted line), over the years, averaged over journals. The size of the open and solid circles represents the number of significant and nonsignificant p-values in that year, respectively

**Fig. 8**
The total number of downloaded articles and the number of published articles that contain NHST results over the years, averaged over all American Psychological Association (APA) journals (*Developmental Psychology* (DP), *Journal of Consulting and Clinical Psychology* (JCCP), *Journal of Experimental Psychology: General* (JEPG), *Journal of Personality and Social Psychology* (JPSP), and *Journal of Applied Psychology* (JAP); dark gray panel), and split up per journal (light gray panels for the APA journals and white panels for the non-APA journals –*Psychological Science* (PS), *Frontiers in Psychology* (FP), and *Public Library of Science* (PLoS)). Note that the y-axes in the plot for All APA Journals, FP, and PLOS are different from the others and continue until 1,000, 1,050, and 3,750, respectively. The unstandardized regression coefficient ‘b’ and the coefficient of determination ‘R²’ of the linear trend are shown per journal for both the downloaded articles (down) as articles with null-hypothesis significance testing results (NHST) over the years

**Fig. 9**
The average number of exact and inexact null-hypothesis significance testing (NHST) results per article over the years, averaged over all journals (grey panel), and split up by journal (white panels). The unstandardized regression coefficient ‘b’ and the coefficient of determination ‘R²’ of the linear trend are shown per journal for both exact (ex) as inexact (inex) p-values over the years. *APA* American Psychological Assocation, DP Developmental Psychology, *JCCP* Journal of Consulting and Clinical Psychology, JEPG Journal of Experimental Psychology: General , *JPSP* Journal of Personality and Social Psychology, *JAP* Journal of Applied Psychology, PS Psychological Science, FP Frontiers in Psychology, *PLoS* Public Library of Science

See this image and copyright information in PMC

Comment in

Stat-checking software stirs up psychology.
Baker M. Baker M. Nature. 2016 Nov 25;540(7631):151-152. doi: 10.1038/540151a. Nature. 2016. PMID: 27905454 No abstract available.

References

1. Alsheikh-Ali AA, Qureshi W, Al-Mallah MH, Ioannidis JPA. Public availability of published research data in high-impact journals. PLoS One. 2011;6(9):e24357. doi: 10.1371/journal.pone.0024357. - DOI - PMC - PubMed
1. American Psychological Association . Publication Manual of the American Psychological Association. 3. Washington, DC: American Psychological Association; 1983.
1. American Psychological Association . Publication Manual of the American Psychological Association. 6. Washington, DC: American Psychological Association; 2010.
1. Bakker M, Wicherts JM. The (mis)reporting of statistical results in psychology journals. Behavior Research Methods. 2011;43:666–678. doi: 10.3758/s13428-011-0089-5. - DOI - PMC - PubMed
1. Bakker, M., & Wicherts, J. M. (2014). Outlier removal and the relation with reporting errors and quality of research. Manuscript submitted for publication. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The prevalence of statistical reporting errors in psychology (1985-2013)

Affiliations

The prevalence of statistical reporting errors in psychology (1985-2013)

Authors

Affiliations

Abstract

Figures

Comment in

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources