Linear, bivariate model

If two columns are selected, they represent x and y values, respectively. If one column is selected, it represents y values, and x values are taken to be the sequence of positive integers (1,2,...). A straight line y=ax+b is fitted to the data. Several bivariate data sets can be regressed in the same plot, and their slopes compared, by giving an even number of columns, each pair of columns being one x-y set.

There are five different algorithms available: Ordinary Least Squares (OLS), Reduced Major Axis (RMA), Major Axis (MA), Robust, and Prais-Winsten. OLS regression assumes the x values are fixed, and finds the line which minimizes the squared errors in the y values. Use this if your x values have very little error associated with them. RMA and MA try to minimize both the x and the y errors. RMA/MA fitting, standard error estimation and slope comparison are according to Warton et al. (2006).

Prais-Winsten regression (e.g. Wooldridge 2012, ch. 12) is appropriate for data with serially correlated residuals, typically time series. The fitted model is a sum of a linear function and an AR(1) autoregressive process with autocorrelation rho. An iterative procedure is used, with a tolerance on rho of 0.001 and a maximum of 10 iterations. Bootstrapping is not carried out as it would violate the serial correlation.

The “Robust” method is an advanced Model I (fixed x values) regression which is robust to outliers. It sometimes gives strange results, but can be very successful in the case of “almost” normally distributed errors but with some far-off values. The algorithm is “Least Trimmed Squares” based on the “FastLTS” code of Rousseeuw & Driessen (1999). Parametric error estimates are not available, but Past gives bootstrapped confidence intervals on slope and intercept (beware – this is extremely slow for large data sets).

Both x and y values can be log-transformed (base 10), in effect fitting your data to the 'allometric' function y=10bxa. An a value around 1 indicates that a straight-line ('isometric') fit may be more applicable.

Statistics

The values for a and b, their errors, Pearson's r correlation, and the probability that the columns are not correlated are given. Note the r2 is simply the Pearson’s coefficient squared – it does not adjust for regression method.

The calculation of standard errors for slope and intercept assumes normal distribution of residuals and independence between the variables and the variance of residuals. If these assumptions are strongly violated, it is preferable to use the bootstrapped 95 percent confidence intervals (1999 replicates).

The permutation test on correlation (r2) uses 9,999 replicates.

Confidence band for the regression

In OLS regression (not RMA/MA/Robust/Prais-Winsten), a 95 percent "Working-Hotelling" confidence band for the fitted line is available.

Confidence band for the forecast

In OLS regression, a 95 percent confidence band for forecasting is also given.

Zero intercept

Forces the regression line through zero. This has implications also for the calculation of slope and the standard error of the slope. All five methods handle this option.

Residuals

The Residuals window reports the distances from each data point to the regression line, in the x and y directions. Only the latter is of interest when using ordinary linear regression rather than RMA or MA. The residuals can be copied back to the spreadsheet and inspected for normal distribution and independence between independent variable and residual variance (homoskedasticity).

Durbin-Watson test

The Durbin-Watson test for positive autocorrelation of residuals in y (violating an assumption of OLS regression) is given in the Residuals window. The test statistic varies from zero (total positive autocorrelation) through 2 (zero autocorrelation) to 4 (negative autocorrelation). For n<=400, an exact p value for no positive autocorrelation is calculated using the PAN algorithm (Farebrother 1980, with later corrections). The test is not accurate when using the Zero intercept option.

Breusch-Pagan test

The Breusch-Pagan test for heteroskedasticity, i.e. nonstationary variance of residuals (violating an assumption of OLS regression) is given in the Residuals window. The test statistic is LM = nr2 where r is the correlation coefficient between the x values and the squared residuals. The null hypothesis of the test is homoskedasticity.

Exponential functions

Your data can be fitted to an exponential function y=ebeax by first log-transforming just your y column (in the Transform menu) and then performing a straight-line fit.

Prediction (forecasting)

Rows with a ‘?’ for the y value will be included in the table under the ‘Prediction’ tab. The predicted y value is calculated for the given x, together with a 95% prediction interval calculated as above (confidence band for the forecast). If the ‘log-log’ option was selected, the back-transformed prediction and interval will also be given for convenience. Note that this prediction is only strictly valid for the OLS model, but will be approximately correct also for the RMA and MA models.

Missing data: Supported by row deletion.

For mathematical details, see the Past manual.

References

Farebrother, R.W. 1980. Pan's procedure for the tail probabilities of the Durbin-Watson statistic. Applied Statistics 29:224–227.

Rousseeuw, P.J. & van Driessen, K. 1999. Computing LTS regression for large data sets. Institute of Mathematical Statistics Bulletin.

Warton, D.I., Wright, I.J., Falster, D.S. & Westoby, M. 2006. Bivariate line-fitting methods for allometry. Biological Review 81:259-291.

Wooldridge, J.M. 2012. Introductory Econometrics – a Modern Approach (5th ed.). South-Western Cengage Learning.

Published Aug. 31, 2020 8:38 PM - Last modified Mar. 17, 2024 10:05 PM