Suppose you have data on continuous variable, \(Y\), and two variables \(X\) and \(Z\) that are dichotomous with values 0 and 1.

The following table shows the values of \(\bar{Y}\) for all the combinations of values of \(X\) and \(Z\):

X=0 X=1
Z = 0 5 7
Z = 1 1 2

The number of observations in each cell is:

X=0 X=1
Z = 0 300 10
Z = 1 100 40

Questions:

  1. Draw the Paik diagram for this data.
  2. What are the ‘conditional effects of X’?
  3. What is the ‘marginal effect of X’?
  4. If you were to fit the model ’Y ~ X*Z’ in R, i.e. the model \[\hat{Y} = \hat{\beta}_0 + X \hat{\beta}_1 + Z \hat{\beta}_2 + X Z\hat{\beta}_3\] what would the values of the estimated regression coefficients be? (Hint: It’s a saturated model and it gives an exact fit to \(\bar{Y}\))
  5. How would you get the ‘conditional effects of X’ from the regression coefficients? Express your answer in the form of a ‘hypothesis matrix’ multiplying the vector of fitted values, i.e. a matrix \({\mathrm L}\) so that your answer would have the form \({\mathrm L} \hat{{\boldsymbol\beta}}\).
  6. What is the interpretation of the interaction coefficient, \(\hat{\beta}_3\)?
  7. Why is the use of the word ‘effects’ problematic in this context? Can you think of a better word?
  8. “Challenging – or tedious – so not on the quiz?” How would you get the marginal effect of X from the regression coefficients? What is the connection with the chain rule in multivariate calculus?