| Econometrics | Dr. Robert Jantzen |
|
|
IV. Error Normality | VII. (Multi)collinearity |
|
|
V. Unequal Error Variances | VIII. Logit Regression |
|
|
VI. Correlated Errors |
| V.
Error terms with unequal variances ( = heteroskedasticity)
If the error terms are distributed with constant variance, they are said
to be homoskedastic. Frequently, however, in cross-sectional
studies (studies that compare individuals, firms, etc. at a single
point in time) the size of the error terms will be influenced by the size
of the explanatory variables. If the error terms vary systematically
in size, they are said to be heteroskedastic. The use of classical
(OLS) regression with heteroskedastic errors, while yielding unbiased
estimates of the regression coefficients, will generate biased estimates
of the standard errors of the estimated coefficients, making t-tests
on the coefficients unreliable.
|
| VI.
Error terms that are not independent ( = serial correlation)
Classical (OLS) regression assumes that the size of each error term is
not influenced by the size of other error terms (the error terms are independent
of each other). This assumption is very likely to be violated in
time-series
studies (studies that examine the behavior of a dependent variable
across time). In time-series studies, each period's
error term is likely to be correlated to the previous period's error, a
process termed serial correlation (or autocorrelation) of
the error terms. Serial (auto) correlation arises because
if the regression model starts to overpredict values for the dependent
variable, it is likely to do so for several time periods in a row. Similarly,
underpredictions are also likely to occur for contiguous time periods.
|
| VII.
Collinearity between the explanatory variables ( = multicollinearity)
If the explanatory variables are highly related to each other (i.e., they are highly collinear), then it will be difficult for regression analysis to identify the independent effect of each variable from the others. The practical consequence of such high collinearity (also known as multicollinearity) is that the estimated coefficients will have large standard errors, thereby generating either small sample t statistics for the coefficients and/or coefficients that have unexpected signs. However, it's important to remember that multicollinearity has no practical consequences for a model that is being used only for forecasting, because t tests on each of the independent variable coefficients are unimportant. Multicollinearity can be diagnosed by examining the simple correlations between the explanatory variables and the effects on the regression results if highly correlated explainers are excluded from the estimated regression. If a highly correlated explainer is excluded from a regression, and the R squared changes very little but some of the other variable's coefficients and t values change a lot, the results probably reflect multicollinearity. Remember, however, that regressions that exclude a highly collinear relevant variable will, by definition, suffer from specification bias. Because these results assign the influence of the excluded variable to the other explainers included in the regression, they should be interpreted with care in view of the likely specification biases (see above discussion). Because multicollinearity arises because the explanatory variables are highly collinear, the only way to estimate more precisely the independent effect of each of the explainers without generating specification biases is to increase the sample size. |
| VIII. Logit regression
Frequently analysts are interested in identifying what factors explain the behavior of categorical binary (1,0) variables. Some examples include why do employees quit or stay, which persons will live or die (a key life insurance question), which internet site visitors will buy/not buy, etc.? Because each of these dependent variables is a categorical (1,0) dummy variable, the use of classical ordinary least squares (OLS) regression is inappropriate. In these cases, OLS would generate estimated coefficients and corresponding t-statistics that are biased. The preferred method for estimating the independent effect of several explanatory variables on a (1,0) dummy dependent variable is to utilize the logit regression method. The logit regression method can be used with large samples because it provides consistent estimates of the regression coefficients. Logit regression also provides sample t statistics that can be used to test hypotheses about the population coefficient signs. Logit regression also allows the "goodness of fit" to be gauged by showing how accurately the model predicts how the dependent variable actually behaved in the sample. Each estimated coefficient generated by logit regression measures the change in the log of the odds of the dependent variable taking on the value of 1 (vs. 0) if the explanatory variable changes by 1 unit. A more useful way to view the effects of each explainer can be found by transforming each of the estimated explanatory variable coefficients according to the following formula: transformed bi = (logit bi) (proportion of 1s) (proportion of 0s) where the proportion of 1s and 0s are equal to the fractions of the sample that fell into one of the two categories of the dependent variable. Each of the transformed coefficients shows how much the probability of being a 1 (vs. 0) will change if the explanatory variable changes by one unit. Because the dependent variable only takes on values of 1 or 0, the R squared value is an inadequate measure of how well the independent variables explain the behavior of why some cases are 1s versus 0s. A better measure of the estimated model's predictive power can be found by comparing how many cases the model correctly classified into each of the two groups. Such a classification table is typically provided with the logit results. Most basic statistics programs (like PHStat/Excel/SPSS) do not estimate the logit regression model. However the Gretl program can be used to estimate a logit regression (click here for a link to the Gretl guide). |