Researchers may handle data in a number of ways that can influence the results to become misleading.
Improper data use undermines the ethos of science and the corresponding misleading results can misguide and distort the production of knowledge.
Examples of improper data use include:
Data dredging is looking for too many possible associations in a dataset to see of any of them are statistically significant. Data dredging results in false positive results.
“When a large number of associations can be looked at in a dataset where only a few real associations exist, a P value of 0.05 is compatible with the large majority of findings still being false positives.” (2)
Origin: "Data Dredging" (Selvin & Stuart, 1966); "Data Fishing" (Grover & Mehra, 2008), “Data Snooping,” “P-hacking”
Principal investigators, Researchers, Policy makers, Supervisors, Postdocs, Journal publishers, Journal editors
By Jensen (2000) (3)
By Glenn & Cormier (2015) (4)
Convenience, dichotomization, stratification, regression to the mean, impact of sample size, competing risks, immortal time and survivor bias, management of missing values (5,6).
(1) Sindermann C. J. “Winning the games scientists play” (Plenum Press, NY, 1982)
(2) Smith, George Davey, and Shah Ebrahim. "Data dredging, bias, or confounding: They can all get you into the BMJ and the Friday papers." (2002): 1437-1438.
(3) Jensen, David. "Data Snooping, Dredging and Fishing: The Dark Side of Data Mining, A SIGKDD99 Panel Report." SIGKDD Explorations 1.2 (2000): 52-54.
(4) Suter, Glenn W., and Susan M. Cormier. "The problem of biased data and potential solutions for health and environmental assessments." Human and Ecological Risk Assessment: An International Journal 21.7 (2015): 1736-1752.
(5) armona-Bayonas A, Jimenez-Fonseca P, Fernandez-Somoano A, et al. Top ten errors of statistical analysis in observational studies for cancer research. Clinical & translational oncology : official publication of the Federation of Spanish Oncology Societies and of the National Cancer Institute of Mexico. 2018;20(8):954-965.
(6) Reanalysis: Ebrahim S, Sohani Z, Montoya L, et al. Reanalyses of randomized clinical trial data. JAMA 2014;312:1024-32
Bjørn Hofmann contributed to this theme.
Latest contribution was May 29, 2019