Adding X as an Instrument Variable in GMM Estimation

In summary, the conversation discusses the GMM equation specification and the use of dummy variables as instrument variables. The speaker is trying to determine if it is legitimate to add another variable as a moment in the equation and is experiencing issues with orthogonality conditions and a near singular matrix error message. They also mention trying different models and seeking advice on the matter. Ultimately, it is noted that OLS and GMM produce similar results but the residuals do not meet OLS assumptions.
  • #1
vienna_quant
5
0
Hello,

I have a question concerning the GMM equation specification.
Say we partition each day in 7 intraday intervals. We want to estimate the 7 intraday interval moments for a variable Y observed in those 7 intervals over a period of T days meaning we have t = T*7 total observations for Y.
I first estimate the following model:

Y(t) = c + c(1)*dummy1+c(2)*dummy2+c(6)*dummy6+c(7)*dummy7 +error(t)
(where c represents a constant and dummy1-7 represent dummy variables taking the value of 1 and 0, indicating if the observation occurrs in the according period) I use dummy1 dummy2 dummy6 dummy7 as instrument variables (in moment conditions)

in order to see if the observations in periods 1, 2, 6, 7 (in the morning and or afternoon) are statistically different from the variables in the intervals 3-5. The results will be very similar to OLS regression results except for error homscedasticity and no auto correlation in errors.

QUESTION: Is it legitimate to add another variable -- c(8)*X(t) -- (with the same amount of obseravtions t as Y) as moment such that the equation takes the form ?

Y(t) = c + c(1)*dummy1+c(2)*dummy2+c(6)*dummy6+c(7)*dummy7 +c(8)*X(t) + error(t)

- With OLS this is clearly no problem but with GMM? I am not sure if one can use moment -- c(8)*X(t) -- that does enter the estimation for each the Y(t) observations. as i enter X as additional instrument variable for the orthogonality conditions, i get an error message: near singular matrix ...


thanx for advice
f. :smile: :smile: :smile:
 
Mathematics news on Phys.org
  • #2
Do you get a similar msg. if tried an OLS package? Have you tried? It may be that your X is highly correlated with the dummies, or you're running out of degrees of freedom (too few data).
 
  • #3
tx for quick answer!

i do not get the message with ols -> data is more than 70.000 observations so i don't think that's the problem.

i tried the following model:
Y(t) = c + c(1)*dummy1+c(2)*dummy2+c(3)*dummy3+c(4)*dummy4+c(5)*dummy5+c(6)*dummy6+c(7)*dummy7 +error(t)


the problem comes not from the estimation equation (as no prob with ols) but from the orthogonality conditions i specified:
period1*error = 0
period2*error=0
.
.
period7*error = 0

-> therefore i have an orthogonality condition for each estimated value, and a total of 7 parameters to estimate and 7 orth. conditions -> I get "near singular matrix" error message (with e-views)

Interestingly the following model is no problem:
Y(t) = c + c(1)*X(t)+error(t)

where x(t) spans the complete sample meaning for each observed Y there is an X value.
orthogonality conditon:
X*error = 0

-> no problem estimating this model, even if i have an orthogonality condition for each estimated value

thanx for adWISE
f.
 
  • #4
If you have 7 periods, can you specify 7 dummies and an intercept? Shouldn't there be 6 dummies, or no intercept? Although I cannot think why OLS wouldn't complain in that case, either.

If that's not it, then GMM package's matrix inversion algorithm must be complaining about too many zeroes in the matrix, making it near-singular. I guess this can be a problem especially if GMM's inversion method involves submatrices, some of which can be a zero matrix because there are too many zeroes overall.

A colleague of mine wrote his Ph.D. thesis on this subject (inverting sparse matrices during estimation of regression equations), so I guess that this can be a problem for many people.

Another solution that I can think of is to play with precision limits. If the OLS package has a lower precision limit than GMM, it may not see a problem where GMM does because of its higher precision.
 
  • #5
Thanx for your answers so far, i tried some other forrums, but no1 could give ANY advice.


If you have 7 periods, can you specify 7 dummies and an intercept? Shouldn't there be 6 dummies, or no intercept?
--> 7 dummies and no intercept (with all dummies as orth. cond) returns "near singular matrix" as well --> 6 dummies and a constant is no problem!

--> after some trial and error testing i found out:

interestingly it is possible to specify the equation with 7 dummies and underspecified orthogonality conditions ( 6 orth condt, say period1*error = 0 ... period6*error = 0 ) -> Is the underspecification a big deal? results don't change a lot when adding/changing orth conditions...? I am really no expert in this field but it seems as if orthogonality are not fitting, as the residuals are not normally distributed (skew 0.55 and kurt -3.7 and Jarque Berra test with p.value 0.000), actually it estimates exactly the same model as with OLS. OLS estimation results in equal standard errors for all coefficients and GMM results slightly different standard errors but the same estimated coefficients as OLS. I am really puzzled
I am not sure if i should use my model ( as it is for phd ...)

thanx for suggestions,+
any advice is highly appreciated
f.
 
Last edited:
  • #6
Orthogonality conditions cannot guarantee normally distributed errors. OLS imposes orthogonality (by default), but there is no guarantee that actual residuals generated by OLS will be normal. I am not too familiar with GMM or e-views, but it sounds like you may have to explicitly specify model options that will correct for non-normal errors. E.g., suppose GMM errors are heteroscedastic. Don't you need to call some kind of correction module (or subroutine) that will correct for heteroscedasticity? Or does e-views do this automatically?

Incidentally, do you need an ortho. cond. for your intercept as well?

That OLS produces the same coefficients as GMM does not surprise me. After all, the expected value for each dummy coefficient is [itex]\overline{y_i}[/itex] for the subsample indicated by that dummy (all i such that di = 1.) But if the residuals violate OLS assumptions (e.g. homoscedasticity) then OLS is inefficient (will give too large coefficient standard errors) due to this violation.
 
  • #7
Totally correct, the parameter estimates are the mean value for each subperiod - at least this was also expected from me for the OLS. interestingly the OLS std errors are smaller than the GMM errors.

As written in my first post, my model does not incorporate an interept at all and ideally it would be
Y(t)=c(1)*dummy1+c(2)*dummy2+c(3)*dummy3+c(4)*dummy4+c(5)*dummy5+c(6)*dummy6
+c(7)*dummy7+c(8)*V(t)+error(t)

where Y(t) is a vector spanning over all observations. the only problem remaining is that i can't use all periods in the orth condt. nevertheless it does not seem to make a big difference in estimations when changing parameters used in orth conditions. I will just use 6 of the seven periods and additionally v(T) then.

one last question:
which tests are used to compare the distribution of 2 differtent samples of equal size (not mean adn stdev tests!) I used chi-squared so far but want to test more. i don't know of any other appropriate tests ..
tx
f.
 
  • #8
vienna_quant said:
the only problem remaining is that i can't use all periods in the orth condt. nevertheless it does not seem to make a big difference in estimations when changing parameters used in orth conditions. I will just use 6 of the seven periods and additionally v(T) then.
My guess is that there must be some kind of adding-up condition such that when you impose 6 ortho. conditions, the 7th is automatically satisfied. You may want to research this.
one last question:
which tests are used to compare the distribution of 2 differtent samples of equal size (not mean adn stdev tests!) I used chi-squared so far but want to test more. i don't know of any other appropriate tests ..
There are several non-parametric tests for assessing whether 2 samples are from the same distribution. For example, the "runs" test. Suppose the two samples are [itex]u_1<...<u_n[/itex] and [itex]v_1<...<v_n[/itex]. Suppose you "mix" the samples. If the resulting mix looks something like [itex]u_1< v_1 < u_2 < u_3 < u_4 < v_2 < v_3 <[/itex] ... [itex] < u_{n-1} < v_{n-1} < v_n < u_n[/itex] then the chances that they are from the same distribution is greater than if they looked like [itex]u_1<...<u_n<v_1<...<v_n[/itex]. The latter example has a smaller number of runs (only two: first all u's then all v's) than the former (at least seven runs: one u, one v, u's, v's, ..., u's, v's, one u). This and similar tests are usually described in standard probability textbooks like Mood, Graybill and Boes.
 
  • #9
thank you EnumaElish for taking the time!
this forum seems to be a really good place with friendly people!
keep on going like that!
best
f.
 

FAQ: Adding X as an Instrument Variable in GMM Estimation

What is the Generalized Method of Moments (GMM) in statistics?

The Generalized Method of Moments (GMM) is a statistical method used to estimate the parameters of a statistical model. It is a generalization of the method of moments, which uses the moments of a sample to estimate the parameters of a model. GMM is used when there are more moments available than parameters to estimate, making it a more efficient and flexible method.

How does GMM differ from other statistical methods?

GMM differs from other statistical methods in that it does not require the distribution of the data to be known. Instead, it relies on the moments of the data, which can be estimated from the data itself. This makes GMM a more flexible method that can be applied to a wider range of models and data sets.

What are the steps involved in applying GMM?

The first step in applying GMM is to specify a model and the moments to be used for estimation. Next, the parameters of the model are estimated by finding the values that minimize the difference between the sample moments and the model moments. This is typically done using numerical optimization methods. Finally, the estimated parameters are used to make inferences about the model and its parameters.

What are the advantages of using GMM?

One of the main advantages of GMM is its flexibility. It can be applied to a wide range of models and data sets, and does not require the distribution of the data to be known. GMM also tends to be more efficient than other methods, as it uses all available moments of the data for estimation. Additionally, GMM can handle missing data and is robust to outliers.

What are some common applications of GMM?

GMM has various applications in economics, finance, and other fields. It is commonly used in econometrics to estimate parameters of models such as panel data models, time series models, and structural models. GMM is also commonly used in financial risk management, asset pricing, and forecasting. Additionally, GMM has been applied to problems in biology, physics, and engineering.

Back
Top