Ekonometritenta

Övningen är skapad 2025-07-16 av dalt. Antal frågor: 98.

Klicka

Lär dig orden

Skriv

Stava orden

Lyssna

Spel

Spela spel

Skriv ut

Öva på papper

Välj frågor (98)

Vanligtvis används alla ord som finns i en övning när du förhör dig eller spelar spel. Här kan du välja om du enbart vill öva på ett urval av orden. Denna inställning påverkar både förhöret, spelen, och utskrifterna.

Alla Inga

What is the primary objective of using a Linear Probability Model (LPM), and what is its main functional form? The Linear Probability Model, often abbreviated as LPM, aims to analyze a dependent variable that can only take on two values: zero or one. Its main goal is to estimate the likelihood of the outcome being one. It achieves this by expressing the probability of the dependent variable being one as a straightforward linear combination of various independent variables. The fundamental equation for this model is that the probability of Y equaling one, given the independent variables X1, X2, and so forth up to Xk, is equal to beta naught plus beta one times X1, plus additional terms continuing up to beta k times Xk
List two main disadvantages of the Linear Probability Model (LPM) The Linear Probability Model (LPM) has two primary limitations. Firstly, a significant drawback is that the predicted probabilities generated by the model can sometimes fall outside the logical range of zero to one. This is problematic because probabilities, by their very nature, must exist within this interval, making any predictions outside of it nonsensical. Secondly, the LPM inherently suffers from heteroscedasticity. This means that the variance of the error term is not constant across all observations. Specifically, the variability of the errors depends directly on the predicted probability, as expressed by the formula: the variance of y equals the predicted probability of y being one multiplied by one minus the predicted probability of y being one. This non-constant variance can lead to inefficient parameter estimates and unreliable standard errors
How do the Probit and Logit models differ from the Linear Probability Model (LPM) in how they estimate the probability of a binary outcome? While the Linear Probability Model (LPM) directly estimates the probability of a binary outcome, specifically when the dependent variable Y equals 1, as a simple linear combination of its predictors, Probit and Logit models adopt a different approach. Instead of a direct linear relationship, these models estimate the probability as a non-linear function. They achieve this by employing a "link function," denoted as G. This link function serves to transform the linear combination of predictors, often written as X multiplied by beta, into a probability that is guaranteed to fall within the valid range of 0 to 1. To illustrate, for the Linear Probability Model, the probability of Y equaling 1 is simply X times beta. However, for Probit and Logit models, this probability is calculated as G of X times beta, where G is the specific non-linear transformation that constrains the output to be a valid probability.
What distributional assumptions do the Probit and Logit models make about the error terms? The core distinction between Probit and Logit models lies in their underlying assumptions about the distribution of the error term. The Probit model posits that the errors follow a standard normal distribution. Consequently, it employs the cumulative density function, or CDF, of the normal distribution, typically denoted by Φ(⋅), as its link function. This means the probability of the outcome is derived from the standard normal CDF applied to the linear combination of predictors. In contrast, the Logit model assumes that the errors conform to a logistic distribution. As such, it utilizes the logistic function, often represented as Λ(⋅), as its link function. This function, which has an S-shaped curve similar to the normal CDF but with slightly heavier tails, transforms the linear combination of predictors into a probability.
How is the interpretation of a coefficient in a Linear Probability Model (LPM) different from the interpretation of a coefficient in a Probit or Logit model? In the context of a Linear Probability Model, or LPM, the coefficient β j has a straightforward interpretation: it directly represents the marginal effect. This means that β j quantifies the exact change in the probability of the dependent variable Y equaling 1 for every one-unit increase in the predictor variable X j . However, in Probit and Logit models, the interpretation of the coefficient β j is not as direct in terms of probability. Unlike the LPM, the marginal effect in these models is not constant; instead, it depends on the specific values of all the variables included in the model. To calculate the marginal effect in Probit and Logit models, one must multiply the derivative of the link function, denoted as G ′ (⋅), by β j . This derivative, G ′ (Xβ), indicates how sensitive the probability is to changes in the linear combination of predictors at a given point, thus making the marginal effect conditional on the current values of X.
What is the primary goal of Principal Component Analysis (PCA)? The primary goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a dataset with a large number of interrelated variables. It achieves this by transforming the original variables into a new, smaller set of uncorrelated variables called principal components, which capture most of the original data's variance.
What is a key difference between Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR)? The key difference lies in how they construct their components. PCR creates principal components that explain the most variance in the predictors (X 1 ,...,X p ) alone, without considering the response variable, Y. PLSR creates components by giving more weight to predictors that are most strongly correlated with the response variable, Y, with the explicit goal of improving prediction.
Absolutely, I am ready. Based on the provided econometrics course materials, here are the first 10 questions and answers for your practice webpage. Question 1 What is the primary objective of using a Linear Probability Model (LPM), and what is its main functional form? Answer: The primary objective of the Linear Probability Model (LPM) is to model a binary (0/1) dependent variable by estimating the probability of the outcome being 1 ( Y=1) as a linear function of the independent variables. Its main functional form is: P(Y=1∣X 1 ,X 2 ,...,X k )=β 0 +β 1 X 1 +⋯+β k X k Question 2 List two main disadvantages of the Linear Probability Model (LPM). Answer: Two main disadvantages of the Linear Probability Model (LPM) are: Predicted probabilities can fall outside the [0, 1] range, which is nonsensical for a probability. The model suffers from heteroscedasticity because the variance of the error term is not constant; it depends on the predicted probability, Var(y)=P(y=1)∗[1−P(y=1)]. Question 3 How do the Probit and Logit models differ from the Linear Probability Model (LPM) in how they estimate the probability of a binary outcome? Answer: While the Linear Probability Model (LPM) estimates the probability of Y=1 as a linear function of the predictors, the Probit and Logit models estimate it as a non-linear function. They use a link function, G(⋅), to transform the linear combination of predictors, Xβ, ensuring the predicted probability always lies between 0 and 1. LPM: P(y=1)=Xβ Probit/Logit: P(y=1)=G(Xβ) Question 4 What distributional assumptions do the Probit and Logit models make about the error terms? Answer: The Probit and Logit models differ in their assumptions about the error term's distribution: The Probit model assumes the errors follow a standard normal distribution, using the cumulative density function (CDF) of the normal distribution, Φ(⋅), as its link function. The Logit model assumes the errors follow a logistic distribution, using the logistic function, Λ(⋅), as its link function. Question 5 How is the interpretation of a coefficient in a Linear Probability Model (LPM) different from the interpretation of a coefficient in a Probit or Logit model? Answer: In a Linear Probability Model (LPM), the coefficient β j is the marginal effect; it represents the direct change in the probability of Y=1 for a one-unit change in the predictor X j . In Probit and Logit models, the coefficient β j does not have a direct interpretation in terms of probability. The marginal effect depends on all the variables in the model and is calculated as G ′ (Xβ)⋅β j , where G ′ (⋅) is the derivative of the link function. Question 6 What is the primary goal of Principal Component Analysis (PCA)? Answer: The primary goal of Principal Component Analysis (PCA) is to reduce the dimensionality of a dataset with a large number of interrelated variables. It achieves this by transforming the original variables into a new, smaller set of uncorrelated variables called principal components, which capture most of the original data's variance. Question 7 What is a key difference between Principal Component Regression (PCR) and Partial Least Squares Regression (PLSR)? Answer: The key difference lies in how they construct their components. PCR creates principal components that explain the most variance in the predictors (X 1 ,...,X p ) alone, without considering the response variable, Y. PLSR creates components by giving more weight to predictors that are most strongly correlated with the response variable, Y, with the explicit goal of improving prediction. Question 8 In the context of shrinkage methods, what is the fundamental difference between the penalty term used in Ridge regression versus Lasso regression? The fundamental difference is the type of norm used for the penalty: Ridge regression uses a quadratic penalty (L 2 norm), λ∑β j 2 . This shrinks coefficients towards zero but rarely sets them exactly to zero. Lasso regression uses an absolute value penalty (L 1 norm), λ∑∣β j ∣. This can shrink some coefficients exactly to zero, effectively performing variable selection.
What are the two essential conditions for a variable Z to be a valid instrument for an endogenous variable X? For a variable Z to be a valid instrument, it must satisfy two conditions: Instrument Relevance: The instrument Z must be correlated with the endogenous predictor X (Cov(X,Z)  =0). Instrument Exogeneity: The instrument Z must be uncorrelated with the regression error term ϵ (Cov(Z,ϵ)=0).
What is a "dummy variable," and how is it used in regression analysis? A dummy variable is a binary variable, typically coded as 0 or 1, used to incorporate qualitative (categorical) predictors into a regression model. For a categorical variable with two levels (e.g., yes/no), the dummy variable takes the value 1 for one category and 0 for the other (the reference level). This allows the model to estimate how the mean of the response variable changes when the category changes, holding other predictors constant.
In the Classical Linear Regression Model, what is the linearity assumption (A1)? The linearity assumption (A1) states that the model is linear in its parameters ( β k ) and the error term (ϵ i ). The relationship can be written as: y i =x i1 β 1 +x i2 β 2 +⋯+x iK β K +ϵ i
What is the full rank assumption (A2) in the context of the sample data matrix, X? The full rank assumption (A2) requires that the n×K sample data matrix, X, has full column rank. This means there is no exact linear relationship among any of the independent variables in the model, a condition necessary to estimate the parameters.
Explain the exogeneity assumption (A3) of the independent variables. What does E[ϵ∣[cite s tart]X]=0 imply? The exogeneity assumption (A3) states that there is no correlation between the error term ( ϵ) and the independent variables (X). The expression E[ϵ∣[cite s tart]X]=0 means that the conditional expected value of the disturbance, given the predictors, is zero.
What are the two conditions specified by the homoscedasticity and nonautocorrelation assumption (A4)? The two conditions are: Homoscedasticity: Each disturbance, ϵ i , has the same finite variance, σ 2 . Nonautocorrelation: Every disturbance, ϵ i , is uncorrelated with every other disturbance, ϵ j (for i  =j)
Under the assumption of normality (A6), what is the distribution of the error term, ϵ? Under the normality assumption, the disturbances ( ϵ), conditional on X, are normally distributed with a mean of 0 and a variance of σ 2 I. This is written as ϵ∣[cite s tart]X∼N(0,σ 2 I).
What is the matrix formula for the Ordinary Least Squares (OLS) estimator, b? The OLS estimator b is the vector of coefficients that minimizes the Residual Sum of Squares (RSS). Its solution is given by the matrix formula: b=(X ′ X) −1 X ′ y
What is the difference between the population error term (ϵ i ) and the sample residual (e i )? The population error term, ϵ i =y i −x i ′ β, represents the true, unobservable disturbance for the i-th observation. The sample residual, e i =y i −x i ′ b, is the observable estimate of the error term, calculated using the estimated coefficients (b) from the sample data
The total variation in the dependent variable, represented by the Total Sum of Squares (SST), can be decomposed into two parts. What are they? The Total Sum of Squares (SST) is decomposed into the Regression Sum of Squares (SSR) and the Sum of Squared Errors (SSE). The formula is: SST = SSR + SSE
How is the coefficient of determination, R 2 , calculated using the ANOVA decomposition sums of squares? The coefficient of determination, R 2 , measures the proportion of the total variation in the dependent variable that is explained by the regression model. It is calculated as: R 2 = SST SSR =1− SST SSE
Why is the Adjusted R-squared ( R 2 or R Adj 2 ) often preferred over the standard R 2 for model comparison? The standard R 2 will never decrease when a new variable is added to a model, even if that variable is irrelevant. The Adjusted R-squared is preferred because it incorporates a penalty for adding extra predictors (loss of degrees of freedom) and only increases if the new variable improves the model fit more than expected by chance.
According to the Gauss-Markov theorem, what property does the OLS estimator, b, have The Gauss-Markov theorem states that under the classical linear model assumptions (A1-A5), the OLS estimator, b, is the Best Linear Unbiased Estimator (BLUE). This means it is the estimator with the minimum variance among all linear and unbiased estimators
What is omitted variable bias? Omitted variable bias occurs when a relevant predictor (one with a non-zero coefficient) that is correlated with other predictors is excluded from the model. This causes the OLS estimator for the included variables to be biased and inconsistent because it incorrectly attributes the effect of the omitted variable to the included ones
What happens to the OLS estimator if you include an irrelevant variable in your regression model? Including an irrelevant variable (a variable whose true coefficient is zero) in a regression model does not cause bias in the OLS estimates of the other coefficients. However, it typically increases the variance of the estimators, making them less precise.
When testing a single hypothesis, what is a Type I error? A Type I error occurs when we incorrectly reject a null hypothesis that is actually true. The probability of committing a Type I error is known as the size of the test.
What is the power of a test? The power of a test is the probability that it will correctly reject a null hypothesis that is false. It is calculated as one minus the probability of a Type II error.
In matrix notation, what is the formula for the vector of residuals, e? The vector of least squares residuals is the difference between the actual values (y) and the fitted values (Xb). The formula is: e=y−Xb
What is the "hat matrix," denoted by P, and what is its function? The hat matrix, P=X(X ′ X) −1 X ′ , is a projection matrix that transforms the vector of observed values (y) into the vector of fitted values ( y ^ ). It "puts the hat on y," such that y ^ =Py.
What is the "residual maker" matrix, denoted by M, and what is its relationship with the hat matrix P? The residual maker, M=I−X(X ′ X) −1 X ′ , is a matrix that transforms the vector of observed values (y) into the vector of residuals (e), such that e=My. It is related to the hat matrix by the formula M=I−P.
What does the property X ′ e=0 imply about the relationship between the residuals and the predictor variables? The property X ′ e=0 means that the vector of residuals (e) is orthogonal (uncorrelated) to each column (predictor variable) in the data matrix X
If a regression model includes an intercept, what does the property X ′ e=0 imply about the sum of the residuals? If the first column of X is a column of ones (representing the intercept), the property X ′ e=0 implies that the least squares residuals must sum to zero (∑e i =0)
What is an unbiased estimator for the error variance, σ 2 ? An unbiased estimator for σ 2 , denoted as s 2 , is calculated by dividing the sum of squared residuals (e ′ e) by the degrees of freedom (n−K). The formula is: s 2 = n−K
The variance of the OLS estimator b, conditional on X, is given by the formula Var[b∣X]=σ 2 (X ′ X) −1 . What are the key factors that influence this variance? The variance of the OLS estimator depends on: Error variance (σ 2 ): A larger error variance leads to less precise estimates. Sample size (n): A larger sample size generally decreases the variance and increases precision. Predictor variance/sparsity: More spread-out (disperse) predictor values lead to more precise estimates, as it reduces the magnitude of the (X ′ X) −1 term.
When is the Seemingly Unrelated Regressions (SUR) model more efficient than running separate OLS regressions for each equation? SUR is more efficient than separate OLS estimations when the error terms of the different equations are correlated. OLS ignores this cross-equation correlation, whereas SUR uses it to improve the efficiency of the coefficient estimates
What is the purpose of the Hausman test in the context of simultaneous equations? The Hausman test is used to determine if endogeneity is a significant problem in the model. It compares the OLS estimates to the 2SLS (or IV) estimates. A statistically significant difference (a small p-value) suggests that OLS is biased and inconsistent, and that the 2SLS model is preferred.
What is an interaction term in a regression model, and what does its coefficient represent? An interaction term is the product of two or more predictor variables (e.g., X 1 ×X 2 ) included in the model. Its coefficient ( β 3 in Y=β 0 +β 1 X 1 +β 2 X 2 +β 3 X 1 X 2 +ϵ) represents the change in the slope of one predictor for a one-unit change in the other predictor. It captures how the effect of one variable on Y depends on the level of another variable.
In model selection, what is the main difference between the penalty term used by AIC versus BIC? The main difference is the severity of the penalty for model complexity. BIC uses a penalty term of log(n)×npar(model), while AIC uses 2×npar(model). Because log(n) is greater than 2 for any sample size n≥8, BIC applies a harsher penalty for additional parameters and tends to select more parsimonious (simpler) models than AIC.
Briefly describe the three search strategies that can be used in stepwise selection. The three main strategies are: Forward selection: Starts with a null model and sequentially adds the predictor that provides the most significant improvement. Backward elimination: Starts with the full model containing all predictors and sequentially removes the least significant predictor. Both (or Hybrid): A combination of forward and backward steps, where at each stage the procedure considers adding or removing a variable to find the optimal model.
What is a partial regression coefficient? A partial regression coefficient is a coefficient in a multiple regression model. It measures the effect of a one-unit change in its corresponding predictor on the outcome variable, holding all other predictors in the model constant.
What is a "log-linear" regression model? Give an example. A log-linear model is one where the logarithm of the dependent variable is regressed on the logarithms of the independent variables. The general form is: ln(y)=β 1 +∑ k β k ln(X k )+ϵ This model is often used for modeling demand and production functions.
In a simple linear regression, what is the formula for the leverage statistic, h i ? For a simple linear regression, the leverage statistic h i for the i-th observation is calculated as: h i = n 1 + ∑ j=1 n (X j − X ˉ ) 2 (X i − X ˉ ) 2
What is the primary purpose of creating a "residuals vs. fitted" plot in model diagnostics? The primary purpose of a "residuals vs. fitted" plot is to check for non-linearity in the model and heteroscedasticity (non-constant variance of the errors). A random cloud of points around zero is desired, while patterns like a curve or a funnel shape indicate a violation of model assumptions.
How can a QQ-plot (quantile-quantile plot) be used to assess the normality of residuals? A QQ-plot compares the standardized residuals from the model against the theoretical quantiles of a standard normal distribution. If the residuals are normally distributed, the points on the plot will align closely with the diagonal line. Systematic deviations from the line indicate a departure from normality.
What does the Breusch-Pagan test check for? The Breusch-Pagan test is a formal statistical test used to check for heteroscedasticity. It tests the null hypothesis that the error variances are all equal (homoscedasticity) against an alternative that the error variance depends on the values of the predictors
What is the Durbin-Watson test used for in time series analysis? The Durbin-Watson test is used to detect the presence of autocorrelation (or serial correlation) in the residuals of a regression model. This is a common issue in time series data where errors in one period may be correlated with errors in previous periods.
What is an "outlier," and what is a "high-leverage point"? How are they different? An outlier is an observation with a response value (Y i ) that is far from the regression plane, resulting in a large residual. Outliers primarily affect the estimated error variance ( s 2 ) and goodness-of-fit measures like R 2 . A high-leverage point is an observation with an extreme value for one or more predictor variables (X ij ). These points can be highly influential on the estimation of the regression coefficients themselves.
When conducting a first-stage regression for an instrumental variable, what is the common "rule of thumb" for the F-statistic to avoid the weak instrument problem? The common rule of thumb is that the F-statistic from the first-stage regression should be greater than 10. An F-statistic below 10 suggests that the instrument is weak, meaning it is not strongly correlated with the endogenous variable, which can lead to biased IV estimates
What is the purpose of the Hansen J-Test (also known as a test for over-identifying restrictions)? The Hansen J-Test is used in IV regression when there are more instruments than endogenous variables (the model is over-identified). It tests the null hypothesis that the additional instruments are valid (i.e., uncorrelated with the error term). A rejection of the null suggests that at least one of the instruments is not valid.
Why is it necessary to standardize data before applying shrinkage methods like Ridge and Lasso? It is necessary to standardize the predictors because shrinkage methods apply a penalty to the size of the coefficients. If the predictors are on different scales, the penalty will unfairly affect those with larger scales, regardless of their importance. Standardization ensures that the penalty is applied equitably to all coefficients.
What is the key difference in the output of Ridge regression versus Lasso regression in terms of variable selection? The key difference is that Lasso regression can perform automatic variable selection, while Ridge cannot. The L 1 penalty in Lasso can force some coefficients to be exactly zero, effectively removing those predictors from the model. The L 2 penalty in Ridge only shrinks coefficients towards zero; they never become exactly zero
What is Elastic Net regression? Elastic Net regression is a shrinkage method that combines the penalties of both Ridge ( L 2 ) and Lasso (L 1 ) regression. It is defined by a mixing parameter, α, where α=0 corresponds to Ridge and α=1 corresponds to Lasso. It is useful for handling high-dimensional data with correlated predictors.
What does the term "over-identified" mean in the context of instrumental variables regression? A model is over-identified when the number of available valid instruments (m) is greater than the number of endogenous predictors (k).
What does the term "exactly identified" mean in instrumental variables regression? A model is exactly identified when the number of available valid instruments (m) is equal to the number of endogenous predictors (k).
What does it mean if a model is "under-identified" in IV regression, and can it be estimated? A model is under-identified when the number of available valid instruments (m) is less than the number of endogenous predictors (k). In this case, the model coefficients cannot be estimated using IV methods.
In the two-stage least squares (2SLS) estimation process, what is the objective of the first stage? The objective of the first stage is to decompose the variation in the endogenous predictor(s) into two parts: a problem-free component explained by the instruments and exogenous variables, and a problematic component that is correlated with the error term. This is done by regressing each endogenous variable on all the instruments and exogenous variables
In the two-stage least squares (2SLS) estimation process, what happens in the second stage? In the second stage, the original endogenous variables are replaced with their predicted values from the first stage. The dependent variable is then regressed on these predicted values and the original exogenous variables using OLS to obtain consistent estimates of the model coefficients.
What is a "partial correlation coefficient"? A partial correlation coefficient measures the correlation between two variables after removing the linear effects of one or more other variables. In regression, it is the correlation between the residuals of two variables that have each been regressed on a common set of control variables.
What is the relationship between the t-statistic for a coefficient and the partial correlation coefficient? The squared partial correlation coefficient (r yz ∗2 ) can be calculated directly from the squared t-statistic (t z 2 ) of a coefficient and the model's degrees of freedom (DF) using the formula: r yz ∗2 = t z 2 +DF t z 2
What are the four main types of models for discrete or limited dependent variables mentioned in the course material? Qualitative choice models, which analyze situations where the outcome is a choice from a set of discrete alternatives, can be broadly categorized into four main types. First, Binary Choice models deal with scenarios where an individual has to select between only two possible options. Examples include deciding whether to say "yes" or "no," or making a "purchase" or "don't purchase" decision. Second, Multinomial Choice models are employed when an individual chooses from more than two distinct alternatives that do not have an inherent order. For instance, this type of model would be used to analyze a consumer's choice of a specific brand of car or their preferred mode of transportation, such as bus, train, or car, where none of these options are considered inherently "better" or "worse" than another in a ranked sense. Third, Ordered Choice models are appropriate when the outcome represents a ranking or an intensity of preference. A common application is in analyzing survey responses, where individuals might select from options like "strongly disagree," "disagree," "neutral," "agree," or "strongly agree." Here, there's a clear sequential order to the choices. Finally, Event Counts models are used when the outcome is a non-negative integer that signifies the number of times a particular event occurs. An example would be modeling the number of visits an individual makes to a doctor within a given period.
What is the primary method used to estimate the coefficients in Probit and Logit models? The coefficients in Probit and Logit models are estimated using Maximum Likelihood Estimation (MLE). MLE finds the coefficient values that maximize the log-likelihood function, which is the log of the joint probability of observing the actual outcomes in the sample.
What is McFadden's Pseudo R-squared, and what does it measure? McFadden's Pseudo R-squared is a goodness-of-fit measure for models with discrete outcomes, like Probit and Logit. It compares the log-likelihood of the full model ( LL ur ) with the log-likelihood of a model containing only an intercept (LL 0 ). It measures the improvement in model fit over the null model.
When interpreting the confidence interval for a coefficient, what do you conclude if the interval contains the value zero? If the confidence interval for a coefficient contains zero, you conclude that the coefficient is not statistically significant at that confidence level. This means you cannot reject the null hypothesis that the true coefficient is zero.
What is the formula for a t-statistic when testing the null hypothesis that a coefficient β k is equal to some value β k 0 ? The formula for the t-statistic is the estimated coefficient minus its hypothesized value, divided by its estimated standard error: t k = SE ^ (b k ) b k −β k 0
What is the most common null hypothesis for a t-test on a single coefficient, and what does it mean if you reject it? The most frequently tested null hypothesis in statistical modeling is that the coefficient β j is equal to zero, often written as H 0 :β j =0. When this null hypothesis is rejected, it indicates that there is statistically significant evidence to suggest that the predictor variable X j has a linear effect on the dependent variable Y. In simpler terms, rejecting this hypothesis implies that X j is a meaningful factor in explaining the variations in Y.
What is the purpose of the F-statistic for the overall significance of a regression? The F-statistic for overall significance tests the joint null hypothesis that all regression slope coefficients (excluding the intercept) are simultaneously equal to zero. A significant F-statistic indicates that the model as a whole has explanatory power.
What is a "testable restriction" on a model? Give an example. A testable restriction refers to a hypothesis concerning the parameters of a statistical model that can be formally evaluated using the available data. For instance, one common type of testable restriction involves examining whether two specific coefficients are equal, which would be stated as the null hypothesis that β k equals β j . Another example could be testing if a particular set of coefficients sums to a specific value, such as the null hypothesis that β 2 plus β 3 plus β 4 equals 1. Such restrictions allow researchers to investigate specific theoretical propositions about the relationships within their models.
What is the Wald test used for? The Wald test is a general method used to test the validity of one or more linear restrictions on the regression coefficients. It measures how far the estimated coefficients are from satisfying the restrictions imposed by the null hypothesis. Both the t-test and the F-test are special cases of the Wald test.
What is meant by the "percent correctly predicted" for a binary outcome model? The "percent correctly predicted" is a goodness-of-fit measure that calculates the proportion of observations for which the model's prediction matches the actual outcome. For a binary model, a prediction is typically classified as 1 if the predicted probability is > 0.5, and 0 otherwise.
If you add a variable to a regression, what is guaranteed to happen to the Sum of Squared Residuals (SSE or e ′ e)? Adding a variable to a regression will never increase the Sum of Squared Residuals (SSE). It will either decrease or stay the same. This is because the OLS procedure will always find a fit that is at least as good as the fit without the variable
What are the two components of the Mean Squared Error (MSE) of an estimator? The Mean Squared Error, or MSE, of an estimator is a measure of the average of the squares of the errors. It inherently comprises two key components: the squared bias of the estimator and its variance. Expressed formally, the formula for the Mean Squared Error of an estimator, denoted as θ ^ , is given by the square of the difference between the expected value of the estimator and the true parameter θ, added to the variance of the estimator. This can be written as: MSE[ θ ^ ] = (E[ θ ^ ]−θ) 2 +Var[ θ ^ ].
What is the main idea behind shrinkage methods in regression? The main idea behind shrinkage methods is to intentionally introduce a small amount of bias into the coefficient estimates in order to achieve a larger reduction in their variance. This trade-off often leads to a lower overall Mean Squared Error (MSE) and can produce simpler, more interpretable models.
Of course. Here are the remaining 90 questions and answers for your practice webpage, based on the course materials provided. Question 11 In the Classical Linear Regression Model, what is the linearity assumption (A1)? Answer: The linearity assumption (A1) states that the model is linear in its parameters ( β k ) and the error term (ϵ i ). The relationship can be written as: y i =x i1 β 1 +x i2 β 2 +⋯+x iK β K +ϵ i Question 12 What is the full rank assumption (A2) in the context of the sample data matrix, X? Answer: The full rank assumption (A2) requires that the n×K sample data matrix, X, has full column rank. This means there is no exact linear relationship among any of the independent variables in the model, a condition necessary to estimate the parameters. Question 13 Explain the exogeneity assumption (A3) of the independent variables. What does E[ϵ∣[cite s tart]X]=0 imply? Answer: The exogeneity assumption (A3) states that there is no correlation between the error term ( ϵ) and the independent variables (X). The expression E[ϵ∣[cite s tart]X]=0 means that the conditional expected value of the disturbance, given the predictors, is zero. Question 14 What are the two conditions specified by the homoscedasticity and nonautocorrelation assumption (A4)? Answer: The two conditions are: Homoscedasticity: Each disturbance, ϵ i , has the same finite variance, σ 2 . Nonautocorrelation: Every disturbance, ϵ i , is uncorrelated with every other disturbance, ϵ j (for i  =j). Question 15 Under the assumption of normality (A6), what is the distribution of the error term, ϵ? Answer: Under the normality assumption, the disturbances ( ϵ), conditional on X, are normally distributed with a mean of 0 and a variance of σ 2 I. This is written as ϵ∣[cite s tart]X∼N(0,σ 2 I). Question 16 What is the matrix formula for the Ordinary Least Squares (OLS) estimator, b? Answer: The OLS estimator b is the vector of coefficients that minimizes the Residual Sum of Squares (RSS). Its solution is given by the matrix formula: b=(X ′ X) −1 X ′ y Question 17 What is the difference between the population error term (ϵ i ) and the sample residual (e i )? Answer: The population error term, ϵ i =y i −x i ′ β, represents the true, unobservable disturbance for the i-th observation. The sample residual, e i =y i −x i ′ b, is the observable estimate of the error term, calculated using the estimated coefficients (b) from the sample data. Question 18 The total variation in the dependent variable, represented by the Total Sum of Squares (SST), can be decomposed into two parts. What are they? Answer: The Total Sum of Squares (SST) is decomposed into the Regression Sum of Squares (SSR) and the Sum of Squared Errors (SSE). The formula is: SST = SSR + SSE Question 19 How is the coefficient of determination, R 2 , calculated using the ANOVA decomposition sums of squares? Answer: The coefficient of determination, R 2 , measures the proportion of the total variation in the dependent variable that is explained by the regression model. It is calculated as: R 2 = SST SSR =1− SST SSE Question 20 Why is the Adjusted R-squared ( R 2 or R Adj 2 ) often preferred over the standard R 2 for model comparison? Answer: The standard R 2 will never decrease when a new variable is added to a model, even if that variable is irrelevant. The Adjusted R-squared is preferred because it incorporates a penalty for adding extra predictors (loss of degrees of freedom) and only increases if the new variable improves the model fit more than expected by chance. Question 21 According to the Gauss-Markov theorem, what property does the OLS estimator, b, have? Answer: The Gauss-Markov theorem states that under the classical linear model assumptions (A1-A5), the OLS estimator, b, is the Best Linear Unbiased Estimator (BLUE). This means it is the estimator with the minimum variance among all linear and unbiased estimators. Question 22 What is omitted variable bias? Answer: Omitted variable bias occurs when a relevant predictor (one with a non-zero coefficient) that is correlated with other predictors is excluded from the model. This causes the OLS estimator for the included variables to be biased and inconsistent because it incorrectly attributes the effect of the omitted variable to the included ones. Question 23 What happens to the OLS estimator if you include an irrelevant variable in your regression model? Answer: Including an irrelevant variable (a variable whose true coefficient is zero) in a regression model does not cause bias in the OLS estimates of the other coefficients. However, it typically increases the variance of the estimators, making them less precise. Question 24 When testing a single hypothesis, what is a Type I error? Answer: A Type I error occurs when we incorrectly reject a null hypothesis that is actually true. The probability of committing a Type I error is known as the size of the test. Question 25 What is the power of a test? Answer: The power of a test is the probability that it will correctly reject a null hypothesis that is false. It is calculated as one minus the probability of a Type II error. Question 26 In matrix notation, what is the formula for the vector of residuals, e? Answer: The vector of least squares residuals is the difference between the actual values (y) and the fitted values (Xb). The formula is: e=y−Xb Question 27 What is the "hat matrix," denoted by P, and what is its function? Answer: The hat matrix, P=X(X ′ X) −1 X ′ , is a projection matrix that transforms the vector of observed values (y) into the vector of fitted values ( y ^ ). It "puts the hat on y," such that y ^ =Py. Question 28 What is the "residual maker" matrix, denoted by M, and what is its relationship with the hat matrix P? Answer: The residual maker, M=I−X(X ′ X) −1 X ′ , is a matrix that transforms the vector of observed values (y) into the vector of residuals (e), such that e=My. It is related to the hat matrix by the formula M=I−P. Question 29 What does the property X ′ e=0 imply about the relationship between the residuals and the predictor variables? Answer: The property X ′ e=0 means that the vector of residuals (e) is orthogonal (uncorrelated) to each column (predictor variable) in the data matrix X. Question 30 If a regression model includes an intercept, what does the property X ′ e=0 imply about the sum of the residuals? Answer: If the first column of X is a column of ones (representing the intercept), the property X ′ e=0 implies that the least squares residuals must sum to zero (∑e i =0). Question 31 What is an unbiased estimator for the error variance, σ 2 ? Answer: An unbiased estimator for σ 2 , denoted as s 2 , is calculated by dividing the sum of squared residuals (e ′ e) by the degrees of freedom (n−K). The formula is: s 2 = n−K e ′ e Question 32 The variance of the OLS estimator b, conditional on X, is given by the formula Var[b∣X]=σ 2 (X ′ X) −1 . What are the key factors that influence this variance? Answer: The variance of the OLS estimator depends on: Error variance (σ 2 ): A larger error variance leads to less precise estimates. Sample size (n): A larger sample size generally decreases the variance and increases precision. Predictor variance/sparsity: More spread-out (disperse) predictor values lead to more precise estimates, as it reduces the magnitude of the (X ′ X) −1 term. Question 33 When is the Seemingly Unrelated Regressions (SUR) model more efficient than running separate OLS regressions for each equation? Answer: SUR is more efficient than separate OLS estimations when the error terms of the different equations are correlated. OLS ignores this cross-equation correlation, whereas SUR uses it to improve the efficiency of the coefficient estimates. Question 34 What is the purpose of the Hausman test in the context of simultaneous equations? Answer: The Hausman test is used to determine if endogeneity is a significant problem in the model. It compares the OLS estimates to the 2SLS (or IV) estimates. A statistically significant difference (a small p-value) suggests that OLS is biased and inconsistent, and that the 2SLS model is preferred. Question 35 What is an interaction term in a regression model, and what does its coefficient represent? Answer: An interaction term is the product of two or more predictor variables (e.g., X 1 ×X 2 ) included in the model. Its coefficient ( β 3 in Y=β 0 +β 1 X 1 +β 2 X 2 +β 3 X 1 X 2 +ϵ) represents the change in the slope of one predictor for a one-unit change in the other predictor. It captures how the effect of one variable on Y depends on the level of another variable. Question 36 In model selection, what is the main difference between the penalty term used by AIC versus BIC? Answer: The main difference is the severity of the penalty for model complexity. BIC uses a penalty term of log(n)×npar(model), while AIC uses 2×npar(model). Because log(n) is greater than 2 for any sample size n≥8, BIC applies a harsher penalty for additional parameters and tends to select more parsimonious (simpler) models than AIC. Question 37 Briefly describe the three search strategies that can be used in stepwise selection. Answer: The three main strategies are: Forward selection: Starts with a null model and sequentially adds the predictor that provides the most significant improvement. Backward elimination: Starts with the full model containing all predictors and sequentially removes the least significant predictor. Both (or Hybrid): A combination of forward and backward steps, where at each stage the procedure considers adding or removing a variable to find the optimal model. Question 38 What is a partial regression coefficient? Answer: A partial regression coefficient is a coefficient in a multiple regression model. It measures the effect of a one-unit change in its corresponding predictor on the outcome variable, holding all other predictors in the model constant. Question 39 What is a "log-linear" regression model? Give an example. Answer: A log-linear model is one where the logarithm of the dependent variable is regressed on the logarithms of the independent variables. The general form is: ln(y)=β 1 +∑ k β k ln(X k )+ϵ This model is often used for modeling demand and production functions. Question 40 In a simple linear regression, what is the formula for the leverage statistic, h i ? Answer: For a simple linear regression, the leverage statistic h i for the i-th observation is calculated as: h i = n 1 + ∑ j=1 n (X j − X ˉ ) 2 (X i − X ˉ ) 2 Question 41 What is the primary purpose of creating a "residuals vs. fitted" plot in model diagnostics? Answer: The primary purpose of a "residuals vs. fitted" plot is to check for non-linearity in the model and heteroscedasticity (non-constant variance of the errors). A random cloud of points around zero is desired, while patterns like a curve or a funnel shape indicate a violation of model assumptions. Question 42 How can a QQ-plot (quantile-quantile plot) be used to assess the normality of residuals? Answer: A QQ-plot compares the standardized residuals from the model against the theoretical quantiles of a standard normal distribution. If the residuals are normally distributed, the points on the plot will align closely with the diagonal line. Systematic deviations from the line indicate a departure from normality. Question 43 What does the Breusch-Pagan test check for? Answer: The Breusch-Pagan test is a formal statistical test used to check for heteroscedasticity. It tests the null hypothesis that the error variances are all equal (homoscedasticity) against an alternative that the error variance depends on the values of the predictors. Question 44 What is the Durbin-Watson test used for in time series analysis? Answer: The Durbin-Watson test is used to detect the presence of autocorrelation (or serial correlation) in the residuals of a regression model. This is a common issue in time series data where errors in one period may be correlated with errors in previous periods. Question 45 What is an "outlier," and what is a "high-leverage point"? How are they different? Answer: An outlier is an observation with a response value (Y i ) that is far from the regression plane, resulting in a large residual. Outliers primarily affect the estimated error variance ( s 2 ) and goodness-of-fit measures like R 2 . A high-leverage point is an observation with an extreme value for one or more predictor variables (X ij ). These points can be highly influential on the estimation of the regression coefficients themselves. Question 46 When conducting a first-stage regression for an instrumental variable, what is the common "rule of thumb" for the F-statistic to avoid the weak instrument problem? Answer: The common rule of thumb is that the F-statistic from the first-stage regression should be greater than 10. An F-statistic below 10 suggests that the instrument is weak, meaning it is not strongly correlated with the endogenous variable, which can lead to biased IV estimates. Question 47 What is the purpose of the Hansen J-Test (also known as a test for over-identifying restrictions)? Answer: The Hansen J-Test is used in IV regression when there are more instruments than endogenous variables (the model is over-identified). It tests the null hypothesis that the additional instruments are valid (i.e., uncorrelated with the error term). A rejection of the null suggests that at least one of the instruments is not valid. Question 48 Why is it necessary to standardize data before applying shrinkage methods like Ridge and Lasso? Answer: It is necessary to standardize the predictors because shrinkage methods apply a penalty to the size of the coefficients. If the predictors are on different scales, the penalty will unfairly affect those with larger scales, regardless of their importance. Standardization ensures that the penalty is applied equitably to all coefficients. Question 49 What is the key difference in the output of Ridge regression versus Lasso regression in terms of variable selection? Answer: The key difference is that Lasso regression can perform automatic variable selection, while Ridge cannot. The L 1 penalty in Lasso can force some coefficients to be exactly zero, effectively removing those predictors from the model. The L 2 penalty in Ridge only shrinks coefficients towards zero; they never become exactly zero. Question 50 What is Elastic Net regression? Answer: Elastic Net regression is a shrinkage method that combines the penalties of both Ridge ( L 2 ) and Lasso (L 1 ) regression. It is defined by a mixing parameter, α, where α=0 corresponds to Ridge and α=1 corresponds to Lasso. It is useful for handling high-dimensional data with correlated predictors. Question 51 What does the term "over-identified" mean in the context of instrumental variables regression? Answer: A model is over-identified when the number of available valid instruments (m) is greater than the number of endogenous predictors (k). Question 52 What does the term "exactly identified" mean in instrumental variables regression? Answer: A model is exactly identified when the number of available valid instruments (m) is equal to the number of endogenous predictors (k). Question 53 What does it mean if a model is "under-identified" in IV regression, and can it be estimated? Answer: A model is under-identified when the number of available valid instruments (m) is less than the number of endogenous predictors (k). In this case, the model coefficients cannot be estimated using IV methods. Question 54 In the two-stage least squares (2SLS) estimation process, what is the objective of the first stage? Answer: The objective of the first stage is to decompose the variation in the endogenous predictor(s) into two parts: a problem-free component explained by the instruments and exogenous variables, and a problematic component that is correlated with the error term. This is done by regressing each endogenous variable on all the instruments and exogenous variables. Question 55 In the two-stage least squares (2SLS) estimation process, what happens in the second stage? Answer: In the second stage, the original endogenous variables are replaced with their predicted values from the first stage. The dependent variable is then regressed on these predicted values and the original exogenous variables using OLS to obtain consistent estimates of the model coefficients. Question 56 What is a "partial correlation coefficient"? Answer: A partial correlation coefficient measures the correlation between two variables after removing the linear effects of one or more other variables. In regression, it is the correlation between the residuals of two variables that have each been regressed on a common set of control variables. Question 57 What is the relationship between the t-statistic for a coefficient and the partial correlation coefficient? Answer: The squared partial correlation coefficient (r yz ∗2 ) can be calculated directly from the squared t-statistic (t z 2 ) of a coefficient and the model's degrees of freedom (DF) using the formula: r yz ∗2 = t z 2 +DF t z 2 Question 58 What are the four main types of models for discrete or limited dependent variables mentioned in the course material? Answer: The four main types are: Binary Choice: The outcome is one of two options (e.g., yes/no, purchase/don't purchase). Multinomial Choice: The individual chooses from more than two distinct, unordered options (e.g., brand of car, mode of transport). Ordered Choice: The outcome represents a ranking or intensity of preference (e.g., survey responses from "strongly disagree" to "strongly agree"). Event Counts: The outcome is a non-negative integer representing the number of times an event occurs (e.g., number of doctor visits). Question 59 What is the primary method used to estimate the coefficients in Probit and Logit models? Answer: The coefficients in Probit and Logit models are estimated using Maximum Likelihood Estimation (MLE). MLE finds the coefficient values that maximize the log-likelihood function, which is the log of the joint probability of observing the actual outcomes in the sample. Question 60 What is McFadden's Pseudo R-squared, and what does it measure? Answer: McFadden's Pseudo R-squared is a goodness-of-fit measure for models with discrete outcomes, like Probit and Logit. It compares the log-likelihood of the full model ( LL ur ) with the log-likelihood of a model containing only an intercept (LL 0 ). It measures the improvement in model fit over the null model. Question 61 When interpreting the confidence interval for a coefficient, what do you conclude if the interval contains the value zero? Answer: If the confidence interval for a coefficient contains zero, you conclude that the coefficient is not statistically significant at that confidence level. This means you cannot reject the null hypothesis that the true coefficient is zero. Question 62 What is the formula for a t-statistic when testing the null hypothesis that a coefficient β k is equal to some value β k 0 ? Answer: The formula for the t-statistic is the estimated coefficient minus its hypothesized value, divided by its estimated standard error: t k = SE ^ (b k ) b k −β k 0 Question 63 What is the most common null hypothesis for a t-test on a single coefficient, and what does it mean if you reject it? Answer: The most common null hypothesis is H 0 :β j =0. Rejecting this null hypothesis means you have found statistically significant evidence that the predictor X j has a linear effect on the dependent variable Y. Question 64 What is the purpose of the F-statistic for the overall significance of a regression? Answer: The F-statistic for overall significance tests the joint null hypothesis that all regression slope coefficients (excluding the intercept) are simultaneously equal to zero. A significant F-statistic indicates that the model as a whole has explanatory power. Question 65 What is a "testable restriction" on a model? Give an example. Answer: A testable restriction is a hypothesis about the model's parameters that can be formally tested with the data. An example is testing if two coefficients are equal, such as H 0 :β k =β j , or if a set of coefficients sums to a specific value, such as H 0 :β 2 +β 3 +β 4 =1. Question 66 What is the Wald test used for? Answer: The Wald test is a general method used to test the validity of one or more linear restrictions on the regression coefficients. It measures how far the estimated coefficients are from satisfying the restrictions imposed by the null hypothesis. Both the t-test and the F-test are special cases of the Wald test. Question 67 What is meant by the "percent correctly predicted" for a binary outcome model? Answer: The "percent correctly predicted" is a goodness-of-fit measure that calculates the proportion of observations for which the model's prediction matches the actual outcome. For a binary model, a prediction is typically classified as 1 if the predicted probability is > 0.5, and 0 otherwise. Question 68 If you add a variable to a regression, what is guaranteed to happen to the Sum of Squared Residuals (SSE or e ′ e)? Answer: Adding a variable to a regression will never increase the Sum of Squared Residuals (SSE). It will either decrease or stay the same. This is because the OLS procedure will always find a fit that is at least as good as the fit without the variable. Question 69 What are the two components of the Mean Squared Error (MSE) of an estimator? Answer: The Mean Squared Error (MSE) of an estimator is composed of its squared bias and its variance. The formula is: MSE[ θ ^ ]=(E[ θ ^ ]−θ) 2 +Var[ θ ^ ] Question 70 What is the main idea behind shrinkage methods in regression? Answer: The main idea behind shrinkage methods is to intentionally introduce a small amount of bias into the coefficient estimates in order to achieve a larger reduction in their variance. This trade-off often leads to a lower overall Mean Squared Error (MSE) and can produce simpler, more interpretable models. Question 71 In the context of polynomial regression, what does the poly(x, degree=2) function in R typically create? The poly(x, degree=2) function is designed to generate a matrix that contains orthogonal polynomials for the variable 'x', specifically up to the second degree. This means it produces columns representing both the first and second-degree polynomial terms of 'x'. The key advantage of using orthogonal polynomials is that the resulting columns, which would typically represent 'X' and 'X squared', become uncorrelated with each other. This orthogonality is highly beneficial as it helps to mitigate potential multicollinearity issues that can arise when including both linear and squared terms of a variable in a regression model.
What is Leave-One-Out Cross-Validation (LOOCV)? Leave-One-Out Cross-Validation (LOOCV) is a method for assessing a model's predictive performance. It iteratively fits the model on all but one of the observations ( n−1 observations) and then uses that model to predict the single observation that was left out. This process is repeated n times, and the average error across all predictions is calculated.
What is the key difference in model selection consistency between AIC and BIC? The key difference is that BIC is a consistent model selection criterion, while AIC is not. This means that as the sample size ( n) grows infinitely large, BIC is guaranteed to select the true data-generating model (if it is among the candidates). AIC, which is asymptotically equivalent to LOOCV, tends to select models that are too complex.
What is a MANOVA, and when is it used? MANOVA stands for Multivariate Analysis of Variance. It is used in the context of multivariate regression (models with multiple dependent variables) to test the joint significance of predictors across all the response variables simultaneously.
What is "piecewise linear regression"? Piecewise linear regression is a method used to model relationships that can be described by different linear functions over different ranges of a predictor variable. The model is constructed by defining "knots" or break-points and fitting separate linear segments, often using dummy variables and interaction terms to connect them.
What are the three main types of nonparametric regression discussed in the lab materials? The three primary types of nonparametric regression techniques mentioned are Kernel Regression, Local Polynomial Regression, often known as LOESS, and Smoothing Splines. Kernel Regression operates by employing a kernel function to produce a smoothed estimate of the underlying regression function. This method essentially weighs nearby data points more heavily when making predictions, effectively creating a locally weighted average. Local Polynomial Regression, or LOESS, focuses on fitting simple polynomial models to localized subsets of the data. Instead of fitting a single global model, LOESS constructs a series of local polynomial fits, which are then combined to form a smooth curve. Finally, Smoothing Splines work by fitting a flexible curve to the data. This is achieved by minimizing a specific criterion that strikes a balance between how well the curve fits the observed data points and a penalty for the curve's roughness or wiggliness, thereby promoting a smoother overall fit.
If you fit a regression of log(y) on x, what type of transformation model have you estimated? This is an exponential transformation model, also known as a log-level model. The underlying relationship it assumes is y=e β 0 +β 1 x+ϵ .
What is the purpose of a biplot in Principal Component Analysis (PCA)? A biplot is a visualization that overlays the scores of the observations (the data points projected onto the principal components) with the loadings of the original variables (represented as arrows). It helps to simultaneously visualize the relationships between observations and how the original variables contribute to the principal components.
What is the relationship between the eigenvalues of the covariance matrix (Σ) and the variance of the principal components (Γ j )? The variance of the j-th principal component, Var(Γ j ), is equal to the j-th largest eigenvalue (λ j ) of the covariance matrix Σ.
When viewing a correlation matrix plot (e.g., from corrplot), how can you visually assess the strength and direction of the relationship between two variables? In a typical corrplot visualization, both the color and size of the circles indicate the strength and direction of the correlation. Dark blue often represents strong positive correlation (+1), dark red represents strong negative correlation (-1), and lighter or smaller circles represent weaker correlations closer to 0.
In the "random utility" framework, how is an individual's binary choice modeled? In binary choice modeling, the decision-making process is conceptualized by comparing the utility an individual derives from two distinct options, let's call them option 'a' and option 'b'. These utilities are denoted as U a and U b , respectively. The core assumption of this model is that an individual will select option 'a' if and only if the utility they receive from option 'a' is greater than the utility they receive from option 'b'. This choice mechanism can be formally expressed using an indicator function. Specifically, the outcome variable Y will be 1 if U a is greater than U b , and 0 otherwise. This is written as Y=1(U a >U b ). Equivalently, this can be simplified by defining a new utility difference, U=U a −U b . In this case, the individual chooses option 'a' if this utility difference U is greater than zero, which is then expressed as Y=1(U>0).
In hypothesis testing with a two-sided alternative, how does the p-value relate to the chosen significance level, α, in the decision rule? The decision rule for hypothesis testing involves comparing the calculated p-value to a predetermined significance level, typically denoted by α. If the p-value is found to be less than α, then you reject the null hypothesis, often referred to as H 0 . This indicates that there is sufficient statistical evidence to conclude that the observed data is unlikely to have occurred under the assumption that the null hypothesis is true. Conversely, if the p-value is not lower than α, you do not reject the null hypothesis, meaning that the data does not provide enough evidence to overturn the assumption of the null hypothesis.
What problem does the Box-Cox transformation aim to solve? The Box-Cox transformation is used to stabilize variance and make the data more closely resemble a normal distribution. It is a family of power transformations applied to the response variable (Y) to correct for violations of normality or homoscedasticity, but it requires the data to be non-negative.
If predictors are perfectly orthogonal (uncorrelated), how do the multiple regression slope coefficients relate to the slopes from individual simple regressions? If the predictors in a multiple regression (that includes a constant) are perfectly uncorrelated, then the multiple regression slopes are identical to the slopes obtained from running individual simple regressions of the dependent variable on each predictor separately
What is the one standard error rule in the context of tuning parameter selection via cross-validation? The one standard error rule is a principle of parsimony used to select a tuning parameter (like λ in shrinkage). Instead of choosing the parameter that gives the absolute minimum cross-validated error, it selects the simplest model (largest λ) whose error is within one standard error of the minimum error
What is a key advantage of Principal Components Regression (PCR) in dealing with multicollinearity? A key advantage of PCR is that it eliminates multicollinearity by design. The principal components it uses as predictors are, by construction, orthogonal (uncorrelated) to each other
What is the formula for the Wald statistic for testing the significance of a single coefficient β k against the null hypothesis H 0 :β k =0? The Wald statistic, which is simply the t-statistic in this case, is the estimated coefficient divided by its estimated standard error: t k = SE ^ (b k ) b k
For a linear regression, what is the expected value of the OLS estimator, E[b]? What property does this demonstrate? The expected value of the OLS estimator is the true population parameter, β. This is written as E[b]=β. This demonstrates the property of unbiasedness, meaning that on average, the OLS estimator will be equal to the true value it is trying to estimate.
Can a high R 2 value guarantee that a model's assumptions (like linearity and homoscedasticity) are met? No, a high R 2 does not guarantee that model assumptions are met. It is possible to have a high R 2 for a model that clearly violates the linearity and homoscedasticity assumptions, which is why diagnostic checks are crucial.
What is an "ANOVA table" used for in regression analysis? An ANOVA (Analysis of Variance) table decomposes the total variability of the dependent variable (Total Sum of Squares) into the portion explained by the regression (Regression Sum of Squares) and the portion that remains unexplained (Sum of Squared Errors). It provides the necessary components to calculate the R 2 and the overall F-statistic for the model.
What are the three algebraic properties of the least squares solution when the model includes an intercept? In the context of least squares regression, several fundamental properties hold true. Firstly, the sum of the least squares residuals is always zero. This means that the positive and negative errors, or the differences between the observed and predicted values, perfectly cancel each other out. Secondly, the regression hyperplane, which represents the fitted relationship between the dependent and independent variables, invariably passes through the point of means of the data. This implies that if you consider the average of the dependent variable and the averages of all independent variables, this specific point will lie on the fitted regression line or plane, expressed as y ˉ = x ˉ ′ b. Lastly, the mean of the fitted values, which are the predicted values from the regression model, is equal to the mean of the actual observed values of the dependent variable. This property further reinforces the notion that the least squares method provides a fit that centers around the central tendency of the data.
How does the Frisch-Waugh-Lovell (FWL) theorem describe the calculation of a single coefficient in a multiple regression? The FWL theorem states that the coefficient on a variable z in a multiple regression can be found by a two-step process: First, regress both y and z on all other explanatory variables (X) to get their respective residuals (y ∗ and z ∗ ). Second, the coefficient on z in the original multiple regression is equal to the coefficient from a simple regression of y ∗ on z ∗
In a regression with a dummy variable D and a continuous variable X, what does the model Y=β 0 +β 1 X+β 2 D+ϵ allow for? This model allows for a different intercept for each group defined by the dummy variable D, while keeping the slope the same for both groups. If D=0, the intercept is β 0 ; if D=1, the intercept is β 0 +β 2 .
Why might an econometrician be wary of using automated model selection procedures like stepwise AIC for causal inference? Stepwise selection is an atheoretical process that optimizes for predictive fit (like minimizing AIC), not for identifying causal relationships. It may drop a crucial confounding variable that is necessary to obtain an unbiased estimate of a causal effect, or it might include variables that are good predictors but have no causal link, leading to a biased and misinterpreted model.
What is a key difference in the statistical inference (e.g., distribution of coefficients) between a standard linear model and a generalized linear model (GLM) like logistic regression? In a standard linear model (with the normality assumption), inference is exact, meaning the t-distribution of the coefficients holds for any sample size. In a GLM, inference is asymptotic, meaning the normal distribution for the coefficients is an approximation that becomes accurate only for large sample sizes.
What is the "odds ratio," and how is it derived from the coefficients of a logistic regression model? The odds ratio is the factor by which the odds of the outcome (P(Y=1)/P(Y=0)) change for a one-unit increase in a predictor. It is calculated by exponentiating the logistic regression coefficient,
What is the primary trade-off that shrinkage methods like Ridge and Lasso navigate? Shrinkage methods navigate the bias-variance trade-off. They introduce a small amount of bias in the coefficient estimates to achieve a significant reduction in variance, with the goal of lowering the overall Mean Squared Error (MSE) of the predictions.
In the context of Two-Stage Least Squares (2SLS), why can't you just use the standard OLS standard errors in the second stage? The standard OLS formulas for standard errors are incorrect for the second stage of 2SLS because they do not account for the fact that the regressors (the fitted values X ^ from the first stage) are themselves estimated and contain uncertainty. Using standard OLS formulas would lead to incorrect standard errors, confidence intervals, and hypothesis tests. Proper 2SLS software uses a corrected formula for the variance of the estimators.

Alla Inga

(

Utdelad övning

https://spellic.com/swe/ovning/ekonometritenta.12568817.html

)