21 August 2023

The Null Result Penalty

Felix Chopra

Studies with null results are perceived to be less publishable, of lower quality, less important, and less precisely estimated than studies with large and statistically significant results, even when holding constant all other study features, including the sample size and the precision of the estimates.

Publication Bias
Scientists test hypotheses with empirical evidence (Popper, 1934). This evidence accumulates with the publication of studies in scientific journals. The expansion of scientific knowledge thus requires a publication system that evaluates studies without systematic bias. Yet, there are growing concerns about publication bias in scientific studies (Brodeur et al., 2016; Simonsohn et al., 2014). Such publication bias could arise from the publication system punishing research papers with small effects that are not statistically significant. The resulting selection could lead to biased estimates and misleading confidence intervals in published studies (Andrews and Kasy, 2019).

Large-Scale Survey with Academic Economists
In a recently published paper (Chopra et al., 2023), we examine whether there is a penalty in the publication system for research studies with null results and what mechanisms lie behind the penalty. To this end, we conduct experiments with about 500 economists from the top 200 economics departments in the world.

The academic researchers in our sample have vast experience as both producers and evaluators of academic research. For example, 12.7% of our respondents are associate editors of scientific journals, and the median researcher has an H-index of 11.5 and 845 citations on Google Scholar. This allows us to study how experienced researchers in the field of economics evaluate research studies.

In the experiment itself, these researchers were given the descriptions of four hypothetical research studies. Each description was based on an actual research study by economists, but we modified some details for the purpose of our experiment. The description of the study included information about the research question, the experimental design (including the sample size and the control group mean), and the main finding of the study.

Our main intervention varies the statistical significance of the main finding of a research study, holding all other features of the study constant: We randomized whether the point estimate associated with the main finding of the study is large (and statistically significant) or close to zero (and thus not statistically significant). Importantly, in both cases, we keep the standard error of the point estimate identical, which allows us to hold the statistical precision of the estimate constant.

How does the statistical significance of a research study's main finding affect researchers’ perceptions and evaluations of the research study? To find out, we asked our respondents how likely they think it is that the research study would be published in a specific journal if it was submitted there. The journal was either a general interest journal (like the Review of Economic Studies) or an appropriate top field journal (like the Journal of Economic Growth). In addition, we measured their perception of the quality and importance of the research study.

Is there a null result penalty?
We find evidence for a substantial perceived penalty against null results. The researchers in our sample think that research studies with null results have a 14.1 percentage points lower chance of being published (Panel A of Figure 1). This effect corresponds to a 24.9% decrease relative to the scenario where the study at hand would have yielded a statistically significant finding.

In addition, researchers hold more negative views about studies that yielded a null result (Panel B of Figure 1). The researchers in our experiment perceive those studies to have 37.3% of a standard deviation lower quality. Studies with null results are also rated by our respondents to have 32.5% of a standard deviation lower importance.
Does experience moderate the null results penalty? We find that the null result penalty is of comparable magnitude for different groups of researchers, from PhD students to editors of scientific journals. This suggests that the null result penalty cannot be attributed to insufficient experience with the publication process itself.

Figure 1

Why do researchers perceive studies with findings that are not statistically significant to be discounted in the publication process? We examine three potential factors.
Communication of statistical uncertainty. First, could the way in which we communicate statistical uncertainty affect the magnitude of the null result penalty? In our experiment, we cross-randomized whether researchers were provided with the standard error of the main finding, or the p-value associated with a test for whether the main finding is statistically significant. This treatment variation is motivated by a longstanding concern in the academic community is that the emphasis on p-values and tests of statistical significance could contribute to biases in the publication process (Camerer et al., 2016; Wasserstein and Lazar, 2016). We find that the null result penalty is 3.7 percentage points larger when the main results are reported with p-values, thus demonstrating that the way in which we communicate statistical uncertainty matters in practice.
Preference for surprising results. Our respondents might think that the publication process values studies with surprising findings relative to the prior in the literature. Indeed, Frankel and Kasy (2022) show that publishing surprising results is optimal if we want journals to maximize the policy impact of published studies. Such a mechanism could potentially explain the null result penalty if researchers perceive a large penalty for null results studies that are not surprising to experts in the field. To examine this, we randomly provide some of our respondents with an expert forecast of the treatment effect. We randomize whether experts predict a large effect or predict an effect that is close to zero. We find that the null result penalty is unchanged when respondents were given the information that experts in the literature predicted a null result. However, once experts predict a large effect, the null results penalty increases by 6.3 percentage points. These patterns suggest that the penalty against null results cannot be explained by researchers believing that the publication process favors surprising results, as they should have evaluated null results that were not predicted by experts more positively in this case.
Perceived statistical precision. Finally, we investigate the hypothesis that null results might be perceived as being more noisily estimated – even when holding constant the objective precision of the estimate. To test this hypothesis, we conducted an experiment with a sample of PhD students and early career researchers. The design and the main outcome of this experiment are identical to our main experiment, but we replace the questions about quality and importance with a question about the perceived precision of the main finding. We also find a sizeable null results penalty in this more junior sample of researchers. In addition, we find that null results are perceived to have 126.7% of a standard deviation lower precision, despite the fact that we fixed respondents’ beliefs about the standard error of the main finding (Panel B of Figure 1). This suggests that researchers might employ simple heuristics to gauge the statistical precision of findings.

Broader implications
Our findings have important implications for the publication system. First, our study highlights the potential value of pre-results review in which research papers are evaluated before the empirical results are known (Miguel, 2021). Second, our results suggest that additional guidelines on the evaluation of research which emphasize the informativeness and importance of null results (Abadie, 2020) should be provided to referees. Our study also has implications for the communication of research findings. Specifically, our results suggest that communicating statistical uncertainty of estimates in terms of standard errors rather than p-values might alleviate a penalty for null results. Our findings contribute to a broader debate on challenges of the current publication system (Angus et al., 2021; Andre and Falk, 2021; Card and DellaVigna, 2013; Heckman and Moktan, 2018) and potential ways to improve the publication process in economics (Charness et al., 2022).

You can read the full research paper by Felix Chopra,  Ingar Haaland, Christopher Roth, and Andreas Stegmann here

Abadie, Alberto (2020), “Statistical nonsignificance in empirical economics,” American Economic Review: Insights, 2 (2), 193–208.
Andre, P. and A. Falk (2021), “What’s worth knowing in economics? A global survey among economists,” VoxEU.org, 7 September.
Andrews, I., & Kasy, M. (2019). “Identification of and correction for publication bias”, American Economic Review, 109(8), 2766-94.
Angus, S., K. Atalay, J. Newton, amd D. Ubilava (2021), “Editorial boards of leading economics journals show high institutional concentration and modest geographic diversity,” VoxEU.org, 31 July.
Brodeur, A., Lé, M., Sangnier, M., & Zylberberg, Y. (2016). Star wars: The empirics strike back. American Economic Journal: Applied Economics, 8(1), 1-32.
Camerer, Colin F., Anna Dreber, Eskil Forsell, Teck-Hua Ho, Jürgen Huber, Magnus Johannesson, Michael Kirchler, Johan Almenberg, Adam Altmejd, Taizan Chan, Emma Heikensten, Felix Holzmeister, Taisuke Imai, Siri Isaksson, Gideon Nave, Thomas Pfeiffer, Michael Razen, and Hang Wu (2016), “Evaluating replicability of laboratory experiments in economics,” Science, 351 (6280), 1433–1436
Card, D. & S. DellaVigna (2013), “Nine facts about top journals in economics,” VoxEU.org, 21 January.
Charness, G. , Dreber, A., Evans, D., Gill A., Toussaert, S. (2022), “Economists want to see changes to their peer review system. Let’s do something about it,” VoxEU.org 24 April.
Chopra, F., Haaland, I., Roth, C., & Stegmann, A. (2023). “The Null Result Penalty”. The Economic Journal, uead060.
Frankel, A., & Kasy, M. (2022). “Which findings should be published?”. American Economic Journal: Microeconomics, 14(1), 1-38.
Heckman, J. and S. Moktan (2018), “Publishing and promotion in economics: The tyranny of the Top Five,” VoxEU.org, 1 November.
Miguel, E. (2021). “Evidence on research transparency in economics.” Journal of Economic Perspectives, 35(3), 193-214.
Popper, K. (1934), “The logic of scientific discovery,” Routledge.
Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). “p-curve and effect size: Correcting for publication bias using only significant results.” Perspectives on Psychological Science, 9(6), 666-681.
Wasserstein, Ronald L. and Nicole A. Lazar (2016), “The ASA Statement on p-Values:
Context, Process, and Purpose,” The American Statistician, 70 (2), 129–133.