Technical note: Effects of uncertainties and number of data points on line fitting – a case study on new particle formation
Fitting a line to two measured variables is considered one of the simplest statistical procedures researchers can carry out. However, this simplicity is deceptive as the line-fitting procedure is actually quite a complex problem. Atmospheric measurement data never come without some measurement error. Too often, these errors are neglected when researchers make inferences from their data. To demonstrate the problem, we simulated datasets with different numbers of data points and different amounts of error, mimicking the dependence of the atmospheric new particle formation rate (J1.7) on the sulfuric acid concentration (H2SO4). Both variables have substantial measurement error and, thus, are good test variables for our study. We show that ordinary least squares (OLS) regression results in strongly biased slope values compared with six error-in-variables (EIV) regression methods (Deming regression, principal component analysis, orthogonal regression, Bayesian EIV and two different bivariate regression methods) that are known to take errors in the variables into account.