Econometrics Dr. Robert Jantzen

A Brief Guide to Classical Regression "Problems"


I. Desirable Properties
IV.  Error Normality VII. (Multi)collinearity
II. OLS Assumptions
V.  Unequal Error Variances VIII. Logit Regression
III. Specification Bias
VI.  Correlated Errors  


          This guide provides a brief description of the desirable properties and assumptions of the classical ordinary least squares (OLS) regression model.  It also reviews the more common situations where such assumptions are not likely to hold true.


 
I.  Desirable Properties of Population Parameter Estimators

       The estimators (formulas) used to estimate the population parameters (i.e., coefficients) in a multiple regression model should be unbiased, efficient, have minimum mean square error (MSE) and be consistent.  An unbiased parameter formula is one that generates, for repeated samples, parameter estimates that have an expected (average) value equal to the true population parameter.  An efficient parameter formula is one that generates, for repeated samples, parameter estimates whose variance is minimized.  A minimum mean square error formula is one that, for repeated samples, minimizes a combination of bias and efficiency.  Finally, a consistent parameter formula is one that generates, for large repeated samples, parameter estimates that have an expected (average) value that converges to the true population parameter.

Back to top



 
II.  Classical ordinary least squares (OLS) regression assumptions

      The formulas used by classical ordinary least squares (OLS) regression to estimate the population parameters in a regression model will be unbiased, be efficient, have minimum mean square error (MSE) and be consistent, if the following assumptions hold true:

  • the model is correctly specified, e.g., all relevant explainers are included in the regression.
  • the error terms are normally distributed.
  • the error terms have constant variance.
  • the error terms are independent of each other.
       If the above assumptions are "violated" the classical regression formulas may not be unbiased, efficient, have minimum mean square error (MSE) or be consistent.

Back to top



 
III.  Specification Bias

        Regression models can be misspecified in one of two ways:

        1.  The regression includes an irrelevant variable.  An irrelevant variable is one whose population regression coefficient is a zero.  The consequences of including an irrelevant variable are minor, namely a small loss of efficiency.  The parameter estimators remain unbiased.

        2.  The regression excludes a relevant variable.  A relevant variable is one whose population regression coefficient is not a zero.   Relevant variables are oftentimes excluded because of missing data.  If a relevant variable is excluded, the estimated coefficients of the included variables may be biased.  The direction of the bias on each of the included explainers' coefficients can be deduced using  the following equation:

      Sign of bias on any bi =   [the sign (+/-) of the omitted variable's coefficient]
                                           times [the sign (+/-) of the partial correlation between
                                           the omitted variable and the explainer whose
                                           coefficient is being examined]

      If the bias is positive it means that the estimated coefficient is larger than the true parameter value.  If the bias is negative it means that the estimated coefficient is smaller than the true parameter value.

      For example, assume that the variable Y is determined by both X2 and X3, but a regression is run with Y as the dependent variable and X2 is included as the only explainer.  Also assume that Y is positively related to X3 (i.e., b3 >0) and X2 and X3 are positively related to each other.  Following the above formula, the estimated coefficient on X2 (i.e., b2 will have positive bias (+ times +), meaning that it will be bigger than the true population coefficient.  NOTE:  even if a relevant variable is excluded, an included variable's coefficient will not be biased if there is zero correlation between the excluded variable and the included variable.

Back to top



 
IV.  Error (residual) terms that are not normally distributed

        The error (residual) term is the difference between the actual value of Y (the dependent variable) and the regression model's expected (forecast) value of Y.  If the error terms are not normally distributed, inferences about the regression coefficients (using t-tests) and the overall equation (using the F-test) will become unreliable.  However, as long as the sample sizes are large (namely the sample
size minus the  # estimated coefficients is >=30) and the error terms are not extremely different from a normal distribution, such tests are likely to be robust.  Whether the error terms are normally distributed can be assessed by using methods like the normal probability plot.

Back to top



 
V.  Error terms with unequal variances ( = heteroskedasticity)

      If the error terms are distributed with constant variance, they are said to be homoskedastic. Frequently, however, in cross-sectional studies (studies that compare individuals, firms, etc. at a single point in time) the size of the error terms will be influenced by the size of the explanatory variables.  If the error terms vary systematically in size, they are said to be heteroskedastic.  The use of classical (OLS) regression with heteroskedastic errors, while yielding unbiased estimates of the regression coefficients, will generate biased estimates of the standard errors of the estimated coefficients, making t-tests on the coefficients unreliable.
     A cursory examination of plots of the estimated error terms against the regression explainers can suggest whether the errors have constant variance (i.e., are homoskedastic).  A formal test for homoskedasticity can be conducted using the Breusch-Pagan (B-P) statistic, which is distributed as chi-squared (with n-k-1 degrees of freedom).  If the sample's B-P statistic is >= the critical chi square value, the null hypothesis that the errors are homoskedastic should be rejected. 
       Most basic statistics programs (like Excel/PHStat/SPSS) do not estimate the B-P statistic.The Gretl program, however, calculates the sample B-P statistic and also generates corrected coefficients standard errors for reliable t-tests (click here for a link to the Gretl guide).

Back to top



 
VI.  Error terms that are not independent ( = serial correlation)

       Classical (OLS) regression assumes that the size of each error term is not influenced by the size of other error terms (the error terms are independent of each other).  This assumption is very likely to be violated in time-series studies (studies that examine the behavior of a dependent variable across time).   In time-series studies, each period's error term is likely to be correlated to the previous period's error, a process termed serial correlation (or autocorrelation) of the error terms.  Serial (auto) correlation arises because if the regression model starts to overpredict values for the dependent variable, it is likely to do so for several time periods in a row. Similarly, underpredictions are also likely to occur for contiguous time periods.
        If the error terms are serial correlated, the use of classical (OLS) regression is unwarranted.  First, the coefficient estimators are not efficient, yielding sample coefficients that will vary excessively from the true parameters.  Second, the standard errors of the estimated coefficients will usually be biased downward (too small), generating excessively large sample t values (leading to unwarranted rejections of coefficient null hypotheses).  Third, the standard error of the regression will also be biased downward, overstating the predictive power of the regression.
        A cursory examination of a plot of the error terms over time can indicate whether the errors follow patterns over time.  The Durbin-Watson (D-W) test formally tests whether the null hypothesis that the error terms are not serially correlated should be rejected.  The D-W test involves comparing the sample's D-W statistic, generated usually by the ordinary least squares regression, to a table containing two critical values.  If the sample D-W statistic is less than the lower Dl critical value, the null hypothesis is rejected indicating that the error terms are serially correlated.  If the sample D-W statistic is greater than the upper Du critical value, the null hypothesis is accepted, indicating there is insufficient evidence to conclude that the errors are correlated.  If Dl <= sample D-W <= Du, the test is inconclusive. 
       Most basic statistics programs (like Excel/PHStat/SPSS) will estimate the D-W statistic, but will not estimate "corrected" results if the error terms are serially correlated.  The Gretl program, however, generates the appropriate results in the presence of serial correlation (click here for a link to the Gretl guide).

Back to top



 
VII.  Collinearity between the explanatory variables ( = multicollinearity)

        If the explanatory variables are highly related to each other (i.e., they are highly collinear), then it will be difficult for regression analysis to identify the independent effect of each variable from the others.  The practical consequence of such high collinearity (also known as multicollinearity) is that the estimated coefficients will have large standard errors, thereby generating either small sample t statistics for the coefficients and/or coefficients that have unexpected signs.  However, it's important to remember that multicollinearity has no practical consequences for a model that is being used only for forecasting, because t tests on each of the independent variable coefficients are unimportant.

       Multicollinearity can be diagnosed by examining the simple correlations between the explanatory variables and the effects on the regression results if highly correlated explainers are excluded from the estimated regression.  If a highly correlated explainer is excluded from a regression, and the R squared changes very little but some of the other variable's coefficients and t values change a lot, the results probably reflect multicollinearity.  Remember, however, that regressions that exclude a highly collinear relevant variable will, by definition, suffer from specification bias.  Because these results assign the influence of the excluded variable to the other explainers included in the regression, they should be interpreted with care in view of the likely specification biases (see above discussion).

        Because multicollinearity arises because the explanatory variables are highly collinear, the only way to estimate more precisely the independent effect of each of the explainers without generating specification biases is to increase the sample size.

Back to top



 
VIII.  Logit regression

        Frequently analysts are interested in identifying what factors explain the behavior of categorical binary (1,0) variables.  Some examples include why do employees quit or stay, which persons will live or die (a key life insurance question), which internet site visitors will buy/not buy, etc.?  Because each of these dependent variables is a categorical (1,0) dummy variable, the use of classical ordinary least squares (OLS) regression is inappropriate.  In these cases, OLS would generate estimated coefficients and corresponding t-statistics that are biased.  The preferred method for estimating the independent effect of several explanatory variables on a (1,0) dummy dependent variable is to utilize the logit regression method.  The logit regression method can be used with large samples because it provides consistent estimates of the regression coefficients.  Logit regression also provides sample t statistics that can be used to test hypotheses about the population coefficient signs.  Logit regression also allows the "goodness of fit" to be gauged by showing how accurately the model predicts how the dependent variable actually behaved in the sample.

       Each estimated coefficient generated by logit regression measures the change in the log of the odds of the dependent variable taking on the value of 1 (vs. 0) if the explanatory variable changes by 1 unit.   A more useful way to view the effects of each explainer can be found by transforming each of the estimated explanatory variable coefficients according to the following formula:

        transformed bi = (logit bi) (proportion of 1s) (proportion of 0s)

        where the proportion of 1s and 0s are equal to the fractions of the sample that fell into one of the two categories of the dependent variable.   Each of the transformed coefficients shows how much the probability of being a 1 (vs. 0) will change if the explanatory variable changes by one unit.

       Because the dependent variable only takes on values of 1 or 0, the R squared value is an inadequate measure of how well the independent variables explain the behavior of why some cases are 1s versus 0s.    A better measure of the estimated model's predictive power can be found by comparing how many cases the model correctly classified into each of the two groups.  Such a classification table is typically provided with the logit results.

       Most basic statistics programs (like PHStat/Excel/SPSS) do not estimate the logit regression model.  However the Gretl program can be used to estimate a logit regression (click here for a link to the Gretl guide).

Back to top