This version: December 04 2023 01:15

Where there is no uncertainty, there cannot be truth – Richard Feynman

I am a firm believer that before you use a method, you should know how to break it. Describing how to break something should be an essential part of describing a new piece of statistical methodology (or, for that matter, of resurrecting an existing one). – Dan Simpson

To teach how to live without certainty, and yet without being paralysed by hesitation, is perhaps the chief thing that philosophy, in our age, can still do for those who study it. — Bertrand Russell

Negative of Himmelblau’s function:
\(f(x,y) = - (x^2 + y -11)^2 - (x + y^2 -7)^2\), see p. 57, problem 2.7 of the textbook.

Announcements

September 18:

September 14:

Quick links

Calendar

Classes meet on Mondays and Wednesdays from 1:00 to 2:30. The location on Mondays is Ross South 123 and, on Wednesdays, Life Sciences Building (LSB) 107.
Optional Tutorial: Fridays from 3:30 pm to 4:30 pm on Zoom at Tutorial Zoom link

  • The tutorial scheduled for Friday, September 29 will take place on Monday, October 2 at 3:30 pm.

Informal chat: Tentatively Fridays, 4:30 pm to 5:30 pm at Office hour Zoom link
Instructor: Georges Monette 1 2
Important dates: 2023-2024 Academic Calendar
Textbook: Givens, G.H, Hoeting J.A. (2013) Computational Statistics 2nd ed., Wiley. Some copies available in bookstore. Errata for the second edition
Email: Messages about course content should be posted publicly to Piazza so the discussion can engage and benefit everyone. You may post messages and questions as personal messages to the instructor. If they are of general interest and don’t contain personal information, they will usually be made public for the benefit of the entire class unless you specifically request that the message remain private.

Day 1: Wednesday, September 6

Files used today: Regression Review

Assignment 1 (individual): Setting things up

Due Wednesday, September 13, 9 pm

  • Summary:
    1. Install R and RStudio
    2. Get a free Github account
    3. Install git on your computer
    4. Connect with Piazza, create a LOG post and introduce yourself
  • 1. Install R and RStudio following these instructions. If you already have R and RStudio, update them to the latest versions.
  • 2. Get a free Github account: If you don’t have one, follow this excellent source of advice from Jenny Bryan on using git and Github with R and RStudio.
    • Be sure to read Jenny Bryan’s advice before choosing a name.
    • CAUTION: Avoid installing ‘Github for Windows’ (which is not the same as ’Git for Windows) from the Github site at this stage.
    • An excellent additional reference is Hadley Wickham’s chapter on Git and Github in his book on R Packages
  • 3. Install git on your computer using these instructions.
    • If you are curious about how git is used have a look at this tutorial! You will be using it mainly through RStudio and won’t need to master how to use with a command-line interface.
    • As a final step: In the RStudio menu click on Tools | Global Options ... | Terminal. Then click on the box in Shell paragraph to the right of New terminals open with:
      • On a PC, select Git Bash
      • On a Mac, select Bash
    • You don’t need to do anything else at this time. We will see how to set up SSH keys to connect to Github through the RStudio terminal in a few lectures.
  • 4. Connect with Piazza and post about your experiences installing the above:
    • Join the MATH 6630 Piazza site by going to this URL: piazza.com/yorku.ca/winter2023/math6630
      • Use the access code blackwell when prompted.
      • Create a post on Piazza to introduce yourself to your colleagues:
        • The title should read: Introduction: followed by the name you prefer to use in class, e.g. Jon Smith
        • The first line should show the name of your Github account, e.g. Github: jsmith. If you don’t have one already, leave this blank for now. You can come back and edit it later.
        • Follow this with information on what computing languages you are familiar with? Which one are you proficient with?
        • Share some interests: hobbies, favourite musicians, movies, restaurants, etc.
        • Click on the assn1 folders when you submit your post.
      • Create another post entitled ‘Getting started’.
        • Describe problems you encountered installing R, RStudio, and git
        • Describe problems registering with Github
        • How could the instructions be improved to make the process easier?
        • If you couldn’t complete the installation, describe the problem or error message(s) you encountered.
        • Click on the on the assn1 folder before submitting your post.
        • If you are hoping to get an answer to your post, select Question as the Post Type.

Assignment 2 (teams) \(p\)-values

  • Deadlines: See the course description for the meaning of these deadlines.
    1. Wednesday, September 20 at noon
    2. Friday, September 22 at 9 pm
    3. Sunday, September 24 at 9 pm

This exercise will be done in teams of 4 or 5 members which will be posted on Piazza. All members of a team work on a similar problem but with different parameters. This way, you can help each other with the concepts but each team member must, ultimately, do their own work.

The purpose of the assignment is to explore the meaning of p-values. Before starting, stop and reflect on what it means for an experiment to ‘achieve’ a p-value of 0.049. What meaning can we give to the quantity ‘0.049’? How is it related to the probability that the null hypothesis is correct?

To keep things very simple suppose you want to test \(H_0: \mu =0\) versus \(H_1: \mu > 0\) (note: the problem is also easy but takes a bit more coding if you assume a two-sided alternative: \(H_0: \mu =0\) versus \(H_1: \mu \neq 0\) – to help focus on the essential concepts, we’ll use a one-sided alternative) and you are designing an experiment in which you plan to take a sample of independent random variables, \(X_1, X_2, ... , X_n\) which are iid \(\textrm{N}(\mu,1)\), i.e. the variance is known to be equal to 1. You plan to use the usual test based on \(\bar{X}_n\) rejecting \(H_0\) for values of \(\bar{X}_n\) that are far from 0.

An applied example would be testing for a change in value of a response when all subjects are submitted to the same conditions and the measurement error of the response is known. In that example \(X_i\) would be the ‘gain score’, i.e. post-test response minus the pre-test response exhibited by the \(i\)th subject.

Note: Do the assignment using R and using a Markdown file. You can use the Markdown sample file in Assignment 1 as a template. Upload both a pdf file and your R script to Piazza.

Let the selected probability of Type I error be \(\alpha = 0.05\). Depending on your team, you have a budget that allows you to collect a sample of \(n\) observations as follows:

Team \(n\)
A: Akaike (Hirotsugu Akaike) 2
B: Blackwell (David Blackwell) 3
C: Cox (Gertrude Mary Cox) 5
D: David (Florence Nightingale David) 10
E: Efron (Bradley Efron) 15
F: Freedman (David Freedman) 20

Depending on your position on the team, you are interested in answering the questions below using the following values of \(\mu_1\):

Position in Team \(\mu_1\) Cohen’s terminology for effect size: \(\mu_1/\sigma\)
1 all combine ROC curves for results below
2 0.2 small effect size
3 0.5 medium effect size
4 0.8 large effect size
5 1 very large effect size
  1. What is the probability that \(p \le 0.05\) if \(H_0: \mu = 0\) is true?
  2. What is the probability that \(p \le 0.05\) if \(\mu = \mu_1\)?
  3. What is the power of this test if \(\mu = \mu_1\)?
  4. Plot the ROC curve: Power (\(\beta\) as a function of \(\alpha\)). To generate the ROC curve, use a reasonable range of values for critical values of the test and plot the probability of rejecting under \(H_0\) in the horizontal direction versus the probability of rejecting of under \(H_A\) in the vertical direction.
  5. What is the interpretation of the area under the ROC curve?
  6. Suppose that you collect the data and that the observed \(p\)-value is 0.049. What can you say about the probability that \(H_0\) is true?
  7. Suppose that, before running the experiment, you were willing to give \(H_0\) and \(H_1: \mu = \mu_1\) equal probability of 1/2.
    1. What is the probability that \(H_0\) is true given the event that \(p \le 0.05\)?
    2. What is the probability that \(H_0\) is true given the event that \(p = 0.049\)?
    3. What is the probability that \(H_0\) is true given the event that \(p = 0.005\)?
    4. What is the probability that \(H_0\) is true given the event that \(p = 0.0005\)?
  8. As a team discuss and post your reflections: Hypothesis testing is often presented as a process that parallels that of determining guilt in a criminal process. We start with a presumption of innocence, i.e. that \(H_0\) is true. We then hear evidence and consider whether it contradicts the presumption of innocence ‘beyond a reasonable doubt.’ Suppose we quantify the presumption of innocence to mean that the prior probability of guilt is small, e.g. \(P(H_1) \le .05\). How small an observed \(p\)-value do you need to obtain in order to ‘flip’ the presumption of innocence to ‘guilt beyond a reasonable doubt’ if that is defined as \(P(H_0 | \mathrm{data}) \le .05\).
  9. Ditto: What \(p\)-value would we need if the presumption of innocence and guilt beyond a reasonable doubt correspond to \(P(H_1) \le 0.001\) and \(P(H_0|\mathrm{data}) \le 0.001\), respectively?
  10. Ditto: Courts have often adopted a criterion of \(p < 0.05\) in imitation of the common practice among many researchers. Comment on the possible consequences.
  11. Ditto: Have a look at this xkcd cartoon. How does the Bayesian statistician derive a probability to make a decision in this example? Show the details.

To delve more deeply into these issues you can read Wasserstein and Lazar (2016) and Wasserstein, Schirm, and Lazar (2019). Concerns about \(p\)-values have been around for a long time, see Schervish (1996). For a short overview see Bergland (2019). There are two interesting and influential papers by John Ioannidis (2005), (2019).

For an entertaining take on related issues see Last Week Tonight: Scientific Studies by John Oliver (2016) (warning: contains strong language and political irony that many could consider offensive – watch at your own risk!).

Preparing for the next class: Play your way through these scripts:

Step your way through the scripts and experiment as you go along. If you have questions or comments, post them on Piazza.

These scripts give you a tour of various features of R. They are meant to provide reminders of things you can do when a need arises.

What’s the best way to learn R? That depends on your learning style, your previous experience and how deep you want to go.

See the discussion in ‘Day 2’. Start or join a discussion on Piazza on ‘best ways to learn R’.

Day 2: Monday, September 11

Correction: There was a problem installing the ‘spida2’ package in R. The workaround is to install the ‘gplots’ package before installing the ‘spida2’ package:

install.packages('gplots')
devtools::install_github('gmonette/spida2')

The ‘spida2’ package should be reinstalled regularly since it’s likely to be updated frequently during the course. Once ‘gplots’ has been installed once, it does not have to be reinstalled each time you reinstall ‘spida2’.

Tutorials: Start this week on Friday, September 15 from 3:30 pm to 4:30 pm.

Student hours: Start this week on Friday, September 15, from 4:30 pm to 5:30 pm.

Teams: I will assign teams this evening for those who have already registered with Piazza so you can start working on assignments. So register soon if you want to have a team and, as importantly, your position on your team.

Learning R

Why R? What about SAS, SPSS, Python, Julia, among others?

SAS is a very high quality, intensely engineered, environment for statistical analysis. It is widely used by large corporations. New procedures in SAS are developed and thoroughly tested by a team of 1,000 or more SAS engineers before being released. It currently has more than 300 procedures.

R is an environment for statistical programming and development that has accumulated many somewhat inconsistent layers developed over time by people of varying abilities, many of whom work largely independently of each other. There is no centralized quality testing except to check whether code and documentation run before a new package is added to R’s main repository, CRAN. When this page was last updated (at format(Sys.time(), '%H:%M') on format(Sys.time(), '%B %d %Y')), CRAN had 20,119 packages.

In addition, a large number of packages under development are available through other repositories, such as Github.

The development of SAS began in 1966 and that of R (in the form of its predecessor, S, at Bell Labs) in 1976.

The ‘design’ of R (using ‘R’ to refer to both R and to S) owes a lot to the design of Unix. The idea is to create a toolbox of simple tools that you link together yourself to perform an analysis. Unix, now mainly as Linux, 3 commands were simple tools linked together by pipes so the output of one command is the input of the next. To do anything you need to put the commands together yourself.

The same is true of R. It’s extremely flexible but at the cost of requiring you to know what you want to do and to be able to use its many tools in combination with each other to achieve your goal. Many decisions in R’s design were intended to make it easy to use interactively. Often the result is a language that is very quirky for programming.

SAS, in contrast, requires you to select options to run large procedures that purport to do the entire job.

This is an old joke: If someone publishes a journal article about a new statistical method, it might be added to SAS in 5 to 10 years. It won’t be added to SPSS until 5 years after there’s a textbook written about it, maybe another 10 to 15 years after its appearance in SAS.

It was added to R two years ago because the new method was developed as a package in R long before being published.

So why become a statistician? So you can have the breadth and depth of understanding that someone needs to apply the latest statistical ideas with the intelligence and discernment to use them effectively.

So expect to have a symbiotic relationship with R. You need R to have access to the tools that implement the latest ideas in statistics. R needs you because it takes people like you to use R effectively.

The role of R in this course is to help us

  • have access to tools to expand our ability to explore and analyze data, and
  • learn how to develop and implement new statistical methods. i.e. learn how to build new tools
  • deepen our understanding of the use of statistics for scientific discovery as well as for business applications

It’s very challenging to find a good way to ‘learn R’. It depend on where you are and where you want to go. Now, there’s a plethora of on-line courses. See the blog post: The 5 Most Effective Ways to Learn R

In my opinion, ultimately, the best way is to

  • play your way through the ‘official’ manuals on CRAN starting with ‘An Introduction to R’ along with ‘R Data Import/Export’. Note however that these materials were developed before the current mounting concern with reproducible research and some of the advice should be deprecated, e.g. using ‘attach’ and ‘detach’ with data.frames.
  • read the CRAN task views in areas that interest you.
  • Have a look at the over 496,438 questions tagged ‘r’ on stackoverflow.
  • At every opportunity, use R Markdown documents (like the sample script you ran when you installed R) to work on assignments, project, etc.

Using R is like playing the piano. You can read and learn all the theory you want, ultimately you learn by playing.

We will discuss the scripts CAR_1.R and CAR_2.R in the tutorial this week. Copy them to a file in RStudio and play with them line by line. You can also run an entire script with Control-Shift-K to see how scripts can be used for reproducible research.

Exercises:

  • From New 6630 questions
    • 5.1, 5.2, 5.3, 5.4, 5.5,
    • 5.6.23, 5.6.24, 5.6.25, 5.6.26, 5.6.27,
    • 6.1, 6.2, 6.3, 6.4, 6.5,
    • 7.4, 7.5, 7.6, 7.7, 7.8,
    • 8.1, 8.2, 8.3, 8.4, 8.5,
    • 8.6, 8.7, 8.8, 8.9, 8.10,
    • 8.18.a, 8.18.b, 8.18.c, 8.18.d, 8.18.e,
    • 8.36.a, 8.36.b, 8.36.c, 8.36.d, 8.36.e,
    • 8.51.a, 8.51.b, 8.51.c, (write functions that would work on matrices of any size), 8.61.a, 8.62.a
    • 12.1, 12.3, 12.5, 12.7, 12.9,

Assignment 3 (teams)

  • Do the exercises above. Each member of a team does 10 questions. Use the random permutation of the numbers 1 to 5 below to determine the order of the starting exercise for each member of the team. The first member does the question whose order is the first random number, the second student does the question whose order is the second random number and so on. Then you continue to do the all of
    the fifth succeeding questions. For example, the third student in each team gets the random number 2 in the sequence below and would do the 2nd, 7th, 12th, 17th, 22nd, 27th, 32nd, 37th, 42nd and 47th questions.
  • In other words: Trying to make things less complicated! Note that each of the 10 lines of questions above has 5 questions. So
    • Team Member #1 uses the first random number and does the 3rd question in each line,
    • Team Member #2 does the 1st question in each line,
    • Team Member #3 does the 2nd question in each line,
    • Team Member #4 does the 4th question in each line, and
    • Team Member #5 does the 5th question in each line.
  • Random numbers: 3 1 2 4 5
  • Deadlines: See the course description for the meaning of these deadlines.
    1. Friday, September 29 at noon
    2. Sunday, October 1 at noon
    3. Monday, October 2 at 9 pm
  • IMPORTANT:
    • Upload the answer to each question in a single Piazza post (post it as a Piazza ‘Note’, not as a Piazza ‘Question’) with the title: “A3 5.1” for the first question, etc.
    • You can answer the question directly in the posting or by uploading a pdf file and the R script or Rmarkdown script that generated it.
    • When providing help or comments, do so as “followup discussion”.

If you want a deeper understanding of R, consider reading Hadley Wickham, Advanced R

Day 3: Wednesday, September 13

Coninuation of Day 2

Day 4: Monday, September 18

Files used today:

  • Regression Review
  • Leverage and Mahalanobis Distance:
  • Play through the Simple Regression R script to get information on using Added Variable Plots (also known as Partial Regression Leverage Plots) and Partial Residual Plots (also known as Component-Plus-Residual Plots).
  • The script has a short section on ROC curves at the end.
  • Continue working on your own with sample scripts in R/CAR_1 and R/CAR_2.

Comments on Leverage and Mahalanobis Distance:

Some formulas are worth remembering because they work for multiple regression as well as simple regression. We’ve seen: \[ SE(\hat{\beta}_i) = \frac{1}{\sqrt{n}}\frac{s_e}{s_{x_i|others}} \] Here’s another one: You know that the diagonal elements, \(h_{ii}\) of the hat matrix \[ H = X(X'X)^{-1}X' \] are the leverages showing how much a change in one particular \(y_i\) would change the fitted value \(\hat{y}_i\). i.e. \[ h_{ii} = \frac{\partial \hat{y}_i}{\partial y_i} \] It turns out that there’s nothing mysterious about where the \(h_{ii}\)s come from. They have a simple relationship with the squared Mahalanobis distance in the predictor space: \[ h_{ii} = \frac{1}{n}(1 + Z_i^2) \] where, in simple regression, \(Z_i\) is the Z-score (standardized with the MLE estimate of standard deviation, dividing by \(n\) instead of \(n-1\)). \(|Z_i|\) also happens to be Mahalanobis distance in one dimension.

The formula generalizes to multiple regression if you take \(Z_i\) to be Mahalanobis distance, so Mahalanobis distance is just a generalization of the Z-score to higher dimensions.

See the links above on Leverage and Mahalanobis Distance.

Comments:

  • Start reading Hadley Wickham, Advanced R Chapters 1-3 and work on the exercises.
  • Post questions about R on Piazza or come to the tutorials.

Comment on AVPs: The fact that a simple regression applied to the AVP leads to an estimate of the slope that

  • is the same as that of the associated multiple regression,
  • has a correlation equal to the partial correlation of the predictor in the multiple regression,
  • has the same residuals as the residuals in the multiple regression,
  • has the same residual standard error as that of the multiple regression, and finally
  • has the same standard error for the estimation of the slope as that of the multiple regression

provided one calculates these quantities using maximum likelihood estimates (it’s all asymptotically true if one uses the degree-of-freedom adjusted estimates), is the content of the Frisch-Waugh-Lovell Theorem that was ‘discovered’ in steps over a period of approximately 30 years. The first version applied to lagged time series and was discovered by Ragnar Frisch. He got the first Nobel Prize in Economics in 1931, for his general contributions, not just the start of this theorem. However, like other statistical ‘objects’ that have names associated with them, the actual original theorem was published by Udny Yule in 1911 (I am shamelessly using Wikipedia, and its gifted contributors, as a source). The theorem was refined over time by Frederick Waugh and Michael Lovell who each added portions of it. Lovell is known for an elegant proof.

The theorem shows how the AVP is a window into the way a multiple regression determines an individual coefficient.

Possible quiz questions:

Look at the link to possible questions for the quiz on Wednesday under “Quick Links” above.

Simpsons’ Paradox:

  • The principle behind Simpsons’ Paradox is the same whether the variables involved are continuous or categorical. The many numerical examples don’t give me a feeling of understanding what’s behind it. I need a way to visualize it. But visualizations are quite different depending on the character of each of the X, Y and Z variables.
  • Perhaps the easiest case to visualize involves a continuous X and Y and a categorical conditioning variable Z.
  • See this example I made up.
    • What might be plausible names for the X and Y variables?
    • Note: I found a graphic like this somewhere on the web but I couldn’t find it again for this class, so this is my amateurish attempt at recreating it. If you find something similar on the web, please post the link to Piazza.
    • With apologies to anyone who has never seen “The Simpsons”.
  • The rotating 3d plot at the top of this page is an example in which all three variables are continuous.
  • We’ll see some interesting examples where all three variables are dichotomous, using Paik-Agresti diagrams.

Day 5: Wednesday, September 20

Links:

Day 7: Wednesday, September 27

Announcement:

  • The tutorial scheduled for Friday, September 29 will take place on Monday, October 2 at 3:30 pm.

Links:

Background material:

  • Causal Questions
    • Overview of confounders, mediators, colliders and moderators
    • Paik-Agresti diagram
    • Lord’s Paradox
  • Understanding Output in Regression
    • This is a bit old and needs updating but it might be useful when you explore regression models and want to ask meaningful creative questions.
  • Three Basic Theorems
    • These are proofs of things we use in the course. You are not responsible for these proofs but the techniques, primarily using the properties of orthogonal projections (i.e. symmetric idempotent matrices) are very useful when working with the theory of linear regression.
    • The second theorem is the proof that the AVP does what we claim it does. It’s the ‘Frisch-Waugh-Lovell’ theorem of econometrics.
    • The third theorem shows why linear propensity scores work because they produce the same \(\hat{\beta}\) as a multiple regression with the same predictors as those use for the propensity score. This serves as a warning that the choice of predictors for the propensity score isn’t different from the choice of predictors for a multiple regression model: i.e. you still need to block backdoor paths and avoid mediators and colliders.
  • The Causal Zoo
    • Simulation based on a causal graph showing the consequences of using models that include or exclude various types of variables. See the graph on page 9 summarizes the consequences of different choices by plotting the expected value of \(\hat{\beta}_X\) versus its variance. The ‘causal effect’ of \(X\) is 4, so any model whose estimator has a different expected value is biased. Look at what happens if you include the collider in this model: Y ~ Z + Col. This shows how models matter when you want to give a coefficient a causal interpretation.

Assignment 4 (teams)

  • Question 16 entitled ‘Paradoxes, Fallacies and One Correct Statement’ has 23 statements. Your assignment is to discuss and comment on each of these statements. Following the usual process:
    • team member 1 does question 3 and every fifth question after that,
    • team member 2 does question 5 and every fifth question after that,
    • team member 3 does question 1 and every fifth question after that,
    • team member 4 does question 2 and every fifth question after that,
    • team member 5 does question 4 and every fifth question after that.
  • Deadlines:
    1. Friday, October 6
    2. Sunday, October 15 (there’s lots of time to discuss these questions)
    3. Monday, October 16

Day 8: Monday, October 2

Announcement:

Sample questions for the quiz on Wednesday:

  • “If you drop a variable that is not significant in a multiple linear regression, the other estimated coefficients should not change very much.”
    Comment.
    If this statement is false describe how you would look for a counterexample. If this statement is false provide a plausible counterexample.
  • “If a linear regression on two variables and an intercept leads to the rejection of the joint null hypothesis: \(H_0: \beta_1 = \beta_2 = 0\), then it must be possible to reject at least one of the hypotheses: \(H_0: \beta_1 = 0\) or \(H_0: \beta_2 = 0\).”
    Comment. If this statement is false describe how you would look for a counterexample. If this statement is false provide a plausible counterexample.
  • Consider a linear regression on two variables and an intercept with coefficients \(\beta_1\) and \(\beta_2\). The simple linear regressions have regression coefficients \(\gamma_1\) and \(\gamma_2\). Is it possible for a data set to lead to rejection of the hypothesis \(H_0: \beta_1 = \beta_2 = 0\) yet not reject either \(H_0: \gamma_1 = 0\) nor \(H_0: \gamma_2 = 0\).
    If so, provide a plausible example. What does this say about the ability of forward stepwise regression to find a model that fits well.
  • In multiple regression, when would you want to exclude a variable even though it is highly significant? Give an example.
  • In multiple regression, when would you want to include a variable even though it is not significant? Give an example.

Links:

Background material:

  • Causal Questions
    • Overview of confounders, mediators, colliders and moderators
    • Paik-Agresti diagram
    • Lord’s Paradox
  • Understanding Output in Regression
    • This is a bit old and needs updating but it might be useful when you explore regression models and want to ask meaningful creative questions.
  • Three Basic Theorems – slightly revised
    • See also Three Basic Theorems Notes
    • These are proofs of things we use in the course. You are not responsible for these proofs but the techniques, primarily using the properties of orthogonal projections (i.e. symmetric idempotent matrices) are very useful when working with the theory of linear regression.
    • The second theorem is the proof that the AVP does what we claim it does. It’s the ‘Frisch-Waugh-Lovell’ theorem of econometrics.
    • The third theorem shows why linear propensity scores work because they produce the same \(\hat{\beta}\) as a multiple regression with the same predictors as those use for the propensity score. This serves as a warning that the choice of predictors for the propensity score isn’t different from the choice of predictors for a multiple regression model: i.e. you still need to block backdoor paths and avoid mediators and colliders.
  • The Causal Zoo
    • Simulation based on a causal graph showing the consequences of using models that include or exclude various types of variables. See the graph on page 9 summarizes the consequences of different choices by plotting the expected value of \(\hat{\beta}_X\) versus its variance. The ‘causal effect’ of \(X\) is 4, so any model whose estimator has a different expected value is biased. Look at what happens if you include the collider in this model: Y ~ Z + Col. This shows how models matter when you want to give a coefficient a causal interpretation.

Day 9: Wednesday, October 4

Continuation of previous

Day 10: Monday, October 16

Review

Day 11: Wednesday, October 18

Midterm test

Day 12: Monday, October 23

Links:

  • Computational Statistics data sets and scripts
  • Slides for Chapter 1
  • Chapter 1 Exercises:
    1. Work out the details of the score moments for a sample of size \(n\) from the \(N(\mu,\sigma)\) distribution.
    2. Find the canonical parameters, canonical sufficient statistics and cumulant-generating function for a sample from the \(N(\mu, \sigma)\) distribution treated as an exponential family.
    3. Derive the posterior density for \(\lambda\) given an observation from an exponential(\(\lambda\)) distribution using an exponential(\(\theta\)) prior for \(\lambda\). Compare with a) a posterior density using an improper uniform prior on \(\lambda\). Compare the modes, expected values and the standard deviations of the posteriors.
    4. Find the profile likelihood for \(\mu\) and for \(\sigma\) for a sample of \(n\) from the \(N(\mu,\sigma)\).
    5. Find the estimated variance matrix for a Wald test based on the Hessian evaluated at the MLE for a sample of \(n\) from the \(N(\mu,\sigma)\) distribution.

Day 13: Wednesday, October 25

Links:

Assignment 5 (teams)

  • See the Piazza Team post to see which of the exercises listed under the previous day and which problems in Chapter 2 of the textbook should be done by each team member.
  • Deadlines (updated):
    1. Friday, November 10
    2. Sunday, November 12
    3. Monday, November 13

Day 13: Monday, October 30

Links:

Some sample questions for quiz on November 1:

Note: You should know the definitions of the Bernoulli, Binominal, Multinomial, Normal, Multivariate Normal, Poisson, Exponential, Gamma, and Uniform distributions. If others are needed, their definitions will given in the quiz. You should also know the definition of moment-generating functions, the characterization of exponential families, and the definition and derivation of the score and Fisher information identities.

  • Consider a statistical model given by a family of densities \(f(x|\theta)\) for an outcome \(x\) with parameter \(\theta\) in a open interval in \(\mathbb{R}\). Derive, under standard regularity assumptions, the expected value of the gradient of the log-likelihood evaluated at the ‘true’ value of \(\theta\).
  • Consider a statistical model given by a family of densities \(f(x|\theta)\) for an outcome \(x\) with parameter \(\theta\) in a open interval in \(\mathbb{R}\). Derive, under standard regularity assumptions, the identity relating the variance of the gradient of the log-likelihood evaluated at the ‘true’ value of \(\theta\) with the Fisher information.
  • Consider the problem of finding an extremum of the real function \(\frac{\log(x)}{1+x} - x^2\). Suppose you are applying the Newton method starting at \(x=1\). Find the next value in the sequence.
  • Let the random vector \(x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}\) have the following density for unknown real parameters \(\theta_1, \theta_2 >0\): \[ f(x_1, x_2) = \frac{1}{\theta_1 \theta_2}\exp(-x_1/\theta_1 - x_2/\theta_2) \quad \textrm{for } x_1, x_2 > 0 \]
    1. Derive the score function as a function of \(x_1, x_2, \theta_1, \theta_2\).
    2. Derive the Fisher Information.
  • Show that the Exponential(\(\lambda\)) distribution for unknown \(\lambda > 0\) belongs to an exponential family but the Uniform(\(0,\theta\)) distribution for unknown \(\theta \in \mathbb{R}\) does not.

Day 14: Wednesday, November 1

Links:

Day 15: Monday, November 6

Links:

Day 16: Wednesday, November 8

Links:

Day 17: Monday, November 13

Links:

Sample Quiz Questions for Quiz 4:

  1. Let \(f\) be a twice continuously differentiable real function on \(\mathbb{R}^p\), with a negative definite Hessian. Derive the form of the Newton update to find a maximum of \(f\), given that a Newton update is the maximum of a local quadratic approximation of \(f\).

  2. Consider the problem of finding the root of a real function \(f\) that is twice continuously differentiable on a closed interval \([a,b] \subset \mathbb{R}\) with \(a < b\). Suppose \(f\) has a single root at \(x^* \in (a,b)\). Let \(G(x) = f(x) + x\). Show that \(x^*\) is a fixed point of \(G\) but fixed-point iteration is not a good strategy for finding \(x^*\) if \(|G'(x^*)| > 1\).

  3. Discuss why Fisher Scoring might be sometimes be a better strategy to maximize a log-likelihood than Newton’s method.

  4. Let \(f(x|\theta)\) be a multivariate density for \(x \in \mathbb{R}^p\), with respect to a supporting measure, with the form: \[ f(x|\theta) = \exp \left\{ \sum_{i=1}^p \theta_i x_i - A(\theta) + c(x) \right\} \] with \(\theta \in \Omega \subset \mathbb{R}^p\), where \(\Omega\) is an open set in which the density integrates to 1.

    1. Prove that \(\operatorname{E}(X|\theta) = A'(\theta)\).

    2. Prove that \(\operatorname{Var}(X|\theta) = A''(\theta)\).

  5. Let \(Y\) have a distribution belonging to a scaled exponential family with density of the form: \[ f(y|\theta) = \exp \left\{ \frac{y \theta - b(\theta)}{a(\phi)} + c(y, \phi) \right\} \] with respect to a supporting measure. Let \(\mu(\theta) = E(Y|\theta)\) and let \(\theta\) be modelled as a linear function of two parameters, \(\beta_1\) and \(\beta_2\) with: \[ \theta = z_1 \beta_1 + z_2 \beta_2 \] Show that the gradient of the log-likelihood as a function of \(\beta_1, \beta_2\) has the form \[ \ell'(\beta_1,\beta_2) = \begin{pmatrix} z_1 & z_2 \end{pmatrix} (y - \mu) \]

Day 18: Wednesday, November 18

Links:

Day 19: Monday, November 20

Links:

Day 20: Wednesday, November 22

Links:

Sample Quiz Questions for Monday, November 27

  1. Suppose \((x_i,y_i)\) are pairs of random observations where \(x_i \sim \operatorname{Poisson}(\lambda)\) and \(y_i \sim \operatorname{Poisson}(\alpha\lambda)\), with \(\lambda\) and \(\alpha\) unknown parameters. You have obtained data on 3 pairs, \(i = 1, 2, 3\) but \(y_3\) has gone missing.

    Derive the E step and the M step of the EM algorithm to estimate \(\lambda\) and \(\alpha\) with maximum likelihood.

  2. Suppose \((x_i,y_i), i = 1,...,n\) are pairs of random observations where \(x_i \sim \operatorname{Poisson}(\lambda)\) and \(y_i \sim \operatorname{Poisson}(\alpha\lambda)\), with \(\lambda\) and \(\alpha\) unknown parameters. Because of the way the data are obtained, \(y_i\) is missing whenever \(x_i > 3\).

    Derive the E step and the M step of the EM algorithm to estimate \(\lambda\) and \(\alpha\) with maximum likelihood.

  3. Do the two problems above when the observations are from the exponential distribution instead of Poisson.

Day 23: Wednesday, December 4

Sample Questions (largely from the 2nd half of the course):

Consider the following sample test questions in addition to quiz question, midterm test questions and sample questions for the midterm and the quizzes.

Review DAGs and the use of added-variable plots with the Frisch-Waugh-Lovell Theorem to consider the relative values of \(SE(\hat{\beta}_X)\) with different models.

The final exam will cover material from the entire course.

More theoretical:

  1. Suppose you have access to a uniform random number generator. Describe how you could use the Metropolis-Hastings algorithm to generate a Markov chain with a Normal distribution with mean 0 and unit standard deviation as a stationary distribution. Write R code for a function that generates the next observation from this chain.

  2. Let \(h\) on \(\mathbb{R}\) be proportional to a probability density function \(f\), i.e. \(h(x) = c f(x)\) with \(c\) unknown. Let \(g\) be a probability density on \(\mathbb{R}\) that is symmetric around 0, from which you can generate random observations. Describe how you would use random observations with pdf \(g\) to generate a Markov chain with \(f\) as a stationary distribution.

  3. Prove that the process in the previous question has the distribution given by \(f\) as stationary distribution.

  4. Consider estimating the average lifetime of some electronic component, a ‘gizmo’, whose lifetime has an exponential(\(\theta\)) distribution: \[f(x|\theta) = e^{-x/\theta} /\theta \quad x > 0, \theta > 0\] Suppose \(N\) gizmos are observed for a year and \(n\) (\(1 <n < N\)) burn out during the year, with lifetimes recorded as: \(x_1,x_2,...,x_n\) where \(x\) is measured in years (fractional, of course). The remaining \(N-n\) are still functioning at the end of the year.

    1. Set up the E step of the EM algorithm to estimate \(\theta\). Recall that exponential lifetimes are ‘memoryless’, i.e. the distribution of the additional life of a gizmo that survives one year is exponential(\(\theta\)).
    2. Derive the M step, i.e. how do you obtain \(\theta_{t+1}\) as a function of \(\theta_t\)?
  5. Prove that each step in the EM algorithm cannot reduce the log-likelihood. Assume that the densities in the model, \(f(x|\theta)\) are positive with the same support.
    Assume known as a general fact that the Kullback-Leibler divergence from the distribution \(f(\cdot|\psi_1)\) to \(f(\cdot|\psi_2)\) is non-negative, i.e. \[\operatorname{E}\{\log F(x|\psi1_1) - \log F(x|\psi_2) | \psi_1\} \ge 0\]

  6. Describe the Newton-Raphson method for finding the MLE of a statistical model. Give a sketch of a proof, using Taylor series expansions, that it converges to the maximum if the starting value is sufficiently close to the maximum.

  7. Describe the Newton-Raphson and the Fisher Scoring method for finding MLEs and compare some advantages and disadvantages of each method.

  8. Describe the Nelder-Mead algorithm in detail. (You’ll forget the details after the exam but it’s worth having to learn it at some point in your career.) Be ready to apply it for one or two steps in a simple example.

Applied:

The following 4 questions are based on the ‘Vocab’ data set used in Assignment 3.
Recall that it records a vocabulary score for over 30,000 subjects tested over the years between 1974 and 2016. The questions below refer to the following output in R. Note that the variable ‘Year’ is measured in years since 2000.

library(car)
summary(Vocab)
           year          sex          education       vocabulary    
      Min.   :1974   Female:17148   Min.   : 0.00   Min.   : 0.000  
      1st Qu.:1987   Male  :13203   1st Qu.:12.00   1st Qu.: 5.000  
      Median :1994                  Median :12.00   Median : 6.000  
      Mean   :1995                  Mean   :13.03   Mean   : 6.004  
      3rd Qu.:2006                  3rd Qu.:15.00   3rd Qu.: 7.000  
      Max.   :2016                  Max.   :20.00   Max.   :10.000
data <- within(Vocab,
               {
                 Year <- year - 2000
                 Year2 <- Year^2
                 Year3 <- Year^3
                 Yearf <- factor(Year)
               }
)
fit1 <- lm(vocabulary ~ factor(Year) * sex, data = data)
fit2 <- lm(vocabulary ~ (Year + Year2 + Year3) * sex, data = data)
fit3 <- lm(vocabulary ~ Year * sex, data = data)

summary(fit2)
     
     Call:
     lm(formula = vocabulary ~ (Year + Year2 + Year3) * sex, data = data)
     
     Residuals:
         Min      1Q  Median      3Q     Max 
     -6.1730 -1.0826  0.0162  1.0994  4.1273 
     
     Coefficients:
                     Estimate Std. Error t value Pr(>|t|)    
     (Intercept)    6.141e+00  2.790e-02 220.123  < 2e-16 ***
     Year           1.215e-02  2.793e-03   4.351 1.36e-05 ***
     Year2         -8.358e-04  1.880e-04  -4.445 8.83e-06 ***
     Year3         -4.831e-05  1.017e-05  -4.749 2.06e-06 ***
     sexMale       -1.203e-01  4.226e-02  -2.847  0.00441 ** 
     Year:sexMale  -4.446e-03  4.230e-03  -1.051  0.29331    
     Year2:sexMale  3.865e-04  2.842e-04   1.360  0.17376    
     Year3:sexMale  2.488e-05  1.533e-05   1.623  0.10461    
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     
     Residual standard error: 2.114 on 30343 degrees of freedom
     Multiple R-squared:  0.001568, Adjusted R-squared:  0.001337 
     F-statistic: 6.806 on 7 and 30343 DF,  p-value: 4.252e-08
summary(fit3)
     
     Call:
     lm(formula = vocabulary ~ Year * sex, data = data)
     
     Residuals:
         Min      1Q  Median      3Q     Max 
     -6.0848 -1.0590 -0.0145  1.0799  4.1114 
     
     Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
     (Intercept)   6.0504270  0.0172503 350.742  < 2e-16 ***
     Year          0.0021493  0.0012801   1.679  0.09315 .  
     sexMale      -0.0799758  0.0260553  -3.069  0.00215 ** 
     Year:sexMale  0.0009994  0.0019247   0.519  0.60359    
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     
     Residual standard error: 2.115 on 30347 degrees of freedom
     Multiple R-squared:  0.0006394,    Adjusted R-squared:  0.0005406 
     F-statistic: 6.472 on 3 and 30347 DF,  p-value: 0.000225
anova(fit3, fit2, fit1)
     Analysis of Variance Table
     
     Model 1: vocabulary ~ Year * sex
     Model 2: vocabulary ~ (Year + Year2 + Year3) * sex
     Model 3: vocabulary ~ factor(Year) * sex
       Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
     1  30347 135692                                  
     2  30343 135566  4    126.03 7.0624 1.116e-05 ***
     3  30307 135208 36    358.24 2.2305 3.234e-05 ***
     ---
     Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pred <- within(
    with(data, expand.grid(Year = unique(Year), sex = unique(sex))),
    {
      Year2 <- Year ^ 2
      Year3 <- Year ^ 3
    }
  )

pred$fit1 <- predict(fit1, newdata = pred)
pred$fit2 <- predict(fit2, newdata = pred)
pred$fit3 <- predict(fit3, newdata = pred)

library(latticeExtra)
     Loading required package: lattice
xyplot(fit1+fit2+fit3 ~ Year, pred, groups = sex, type = 'b')

  1. Is the test whose p-value appears in the third line of the output for the anova function, a legitimate hypothesis test. Why or why not? Describe in words what the p-value is indicating, if anything.

  2. Using the model fit2, estimate the rate at which the gender gap is estimated to be changing in the year 2010. How would you go about obtaining a confidence interval for this estimate?

  3. Answer the previous question using the model fit3. Which model do you think provides a better answer to this question? Why?

  4. Describe the interpretation of the estimated coefficients for sexMale and for Year:sexMale in the summary output for fit3.

  5. Describe how you would simulate an observation from a distribution with density \(f(x) = exp\{-x/2\}/2, x >0\) using the inverse CDF. Write the code for a function that would generate \(n\) observations from this distribution using runif as a uniform random number generator.

References

Alzahawi, Shilaan. 2021. “Building Your Website Using R {Blogdown}.” Shilaan Alzahawi. May 12, 2021. https://www.shilaan.com/post/building-your-website-using-r-blogdown/.
“Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” 2017. Stack Overflow. March 9, 2017. https://stackoverflow.com/a/42702950.
Barndorff-Nielsen, O. E., D. R. Cox, and N. Reid. 1986. “The Role of Differential Geometry in Statistical Theory.” International Statistical Review / Revue Internationale de Statistique 54 (1): 83–96. https://doi.org/10.2307/1403260.
“Basic Linear Algebra Subprograms.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Basic_Linear_Algebra_Subprograms&oldid=1162434953.
Bates, Casey. 2020. “23 RStudio Tips, Tricks, and Shortcuts.” Dataquest. June 10, 2020. https://www.dataquest.io/blog/rstudio-tips-tricks-shortcuts/.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” J. Stat. Soft. 67 (1). https://doi.org/10.18637/jss.v067.i01.
Benjamini, Yoav, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, et al. 2021. ASA President’s Task Force Statement on Statistical Significance and Replicability.” CHANCE 34 (4): 10–11. https://doi.org/10.1080/09332480.2021.2003631.
Berger, Adam. 1996. “Convexity, Maximum Likelihood and All That.” https://www.cs.cmu.edu/~roni/11761/Presentations/convexity-maximum-likelihood-and.pdf.
Bergland, Christopher. 2019. “Rethinking P-Values: Is "Statistical Significance" Useless?” Psychology Today. March 22, 2019. https://www.psychologytoday.com/blog/the-athletes-way/201903/rethinking-p-values-is-statistical-significance-useless.
Best, Joel. 2005. “Lies, Calculations and Constructions: Beyond "How to Lie with Statistics".” Statistical Science 20 (3): 210–14. https://www.jstor.org/stable/20061175.
Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo.” October 19, 2014. http://arxiv.org/abs/1410.5110.
Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98.
Biecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne Cook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s Quartet.” March 17, 2023. http://arxiv.org/abs/2302.13356.
BLAS (Basic Linear Algebra Subprograms).” n.d. Accessed July 18, 2023. https://www.netlib.org/blas/#_history.
Braun, Julia, Leonhard Held, and Bruno Ledergerber. 2012. “Predictive Cross‐validation for the Choice of Linear MixedEffects Models with Application to Data from the Swiss HIV Cohort Study.” Biometrics 68 (1): 53–61. https://doi.org/10.1111/j.1541-0420.2011.01621.x.
Braun, W. John, and Duncan J. Murdoch. 2021. A First Course in Statistical Programming with R. Third. Cambridge University Press. https://books.google.com?id=NzorEAAAQBAJ.
Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.
Brown, Andrew W., Douglas G. Altman, Tom Baranowski, J. Martin Bland, John A. Dawson, Nikhil V. Dhurandhar, Shima Dowla, et al. 2019. “Childhood Obesity Intervention Studies: A Narrative Review and Guide for Investigators, Authors, Editors, Reviewers, Journalists, and Readers to Guard Against Exaggerated Effectiveness Claims.” Obesity Reviews 20 (11): 1523–41. https://doi.org/10.1111/obr.12923.
Bungartz, Hans-Joachim, Christian Carbogno, Martin Galgon, Thomas Huckle, Simone Köcher, Hagen-Henrik Kowalski, Pavel Kus, et al. 2020. ELPA: A Parallel Solver for the Generalized Eigenvalue Problem1.” In Advances in Parallel Computing, edited by Ian Foster, Gerhard R. Joubert, Luděk Kučera, Wolfgang E. Nagel, and Frans Peters. IOS Press. https://doi.org/10.3233/APC200095.
Canada, Statistics. 2016. 2016 Census of Population [Canada] Public Use Microdata File (PUMF): Individuals File. http://odesi.ca/#/details?uri.
Canon, Stephen. 2013. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. July 25, 2013. https://stackoverflow.com/a/17858345.
Carroll, R. J., and David Ruppert. 1996. “The Use and Misuse of Orthogonal Regression in Linear Error-in-Variables Models.” American Statistician, February. https://journals-scholarsportal-info.ezproxy.library.yorku.ca/pdf/00031305/v50i0001/1_tuamoorilem_1.xml_en.
“Case Study: Group Work Gone Awry.” n.d. Google Docs. Accessed November 2, 2022. https://docs.google.com/document/d/136VgrrFxkeJSO47g0Bwbj5JqXXumRvf0P1pC9lO91pQ/edit?usp=embed_facebook.
Chow, Vinci. 2022. AMD BLAS/LAPACK Optimization in 2022.” SCRP CUHK Economics. February 7, 2022. http://localhost:4000/blog/analysis/2022/02/07/mkl-optimization.html.
Colby, Emily, and Eric Bair. 2013. “Cross-Validation for Nonlinear Mixed Effects Models.” Journal of Pharmacokinetics and Pharmacodynamics 40 (2): 243–52. https://doi.org/10.1007/s10928-013-9313-5.
“Computational Statistics.” n.d. Accessed July 18, 2023. https://onlinelibrary.wiley.com/doi/epub/10.1002/9781118555552.
“Cornish–Fisher Expansion.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Cornish%E2%80%93Fisher_expansion&oldid=1137666825.
Cox, D. R., and N. Reid. 1987. “Parameter Orthogonality and Approximate Conditional Inference.” Journal of the Royal Statistical Society: Series B (Methodological) 49 (1): 1–18. https://doi.org/10.1111/j.2517-6161.1987.tb01422.x.
Cunningham, Scott. 2021a. “Causal Inference The Mixtape - 6  Regression Discontinuity.” In. Yale University Press. https://mixtape.scunning.com/06-regression_discontinuity.
———. 2021b. Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/.
“Data Organization in Spreadsheets.” n.d. Accessed May 8, 2023. https://www.tandfonline.com/doi/epdf/10.1080/00031305.2017.1375989?needAccess=true&role=button.
“Data to Viz | A Collection of Graphic Pitfalls.” n.d. Accessed September 3, 2023. https://www.data-to-viz.com/caveats.html.
Dempster, Arthur. 1968. “Elements of Continuous Multivariate Analysis.”
“Differential Geometry in Statistical Inference - Encyclopedia of Mathematics.” n.d. Accessed April 21, 2023. https://encyclopediaofmath.org/wiki/Differential_geometry_in_statistical_inference.
Efron, Bradley. 2018. “Curvature and Inference for Maximum Likelihood Estimates.” Ann. Statist. 46 (4). https://doi.org/10.1214/17-AOS1598.
Farimani, Foad S. 2017. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. February 13, 2017. https://stackoverflow.com/a/42212642.
Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Sage Publications.
———. 2021. “Answers to Odd-Numbered Exercises for Chapter 25 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-25-odd-exercise-answers.pdf.
———. 2023a. “Answers to Odd-Numbered Exercises for Chapter 26 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-26-odd-exercise-answers.pdf.
———. 2023b. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) :Causal Inferences From Observational Data: Directed Acyclic Graphs and Potential Outcomes.” Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) Causal Inferences From Observational Data: Directed Acyclic Graphs and Potential Outcomes. March 31, 2023. https://www.john-fox.ca/AppliedRegression/chap-26.pdf.
———. 2023c. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 25 (Draft) Bayesian Estimation of Regression Models.” April 1, 2023. https://www.john-fox.ca/AppliedRegression/chap-25.pdf.
———. n.d. “Blau and Duncan Stratification Data Set.” Accessed October 10, 2023. https://www.john-fox.ca/AppliedRegression/BlauDuncan.txt.
Friendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statist. Sci. 28 (1): 1–39. https://doi.org/10.1214/12-STS402.
Gandhi, Ratnik, and Amoli Rajgor. 2017. “Updating Singular Value Decomposition for Rank One Matrix Perturbation.” July 26, 2017. http://arxiv.org/abs/1707.08369.
Gao, Kaifeng, Gang Mei, Francesco Piccialli, Salvatore Cuomo, Jingzhi Tu, and Zenan Huo. 2020. “Julia Language in Machine Learning: Algorithms, Applications, and Open Issues.” Computer Science Review 37 (August): 100254. https://doi.org/10.1016/j.cosrev.2020.100254.
Geijn, Robert van de. 2018. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. September 3, 2018. https://stackoverflow.com/a/52156861.
Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Stat Comput 24 (6): 997–1016. https://doi.org/10.1007/s11222-013-9416-2.
Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung SuColumbia University. n.d. “Bayesian Generalized Linear Models and an Appropriate Default Prior.” Logistic Regression.
“General Tips for Making Your Report Appealing and Easy To Skim.” n.d. Accessed April 29, 2023. https://www.ahrq.gov/talkingquality/resources/design/general-tips/index.html.
Gentle, James E., Wolfgang Karl Härdle, and Yuichi Mori, eds. 2012. Handbook of Computational Statistics: Concepts and Methods. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21551-3.
George, Brandon J., T. Mark Beasley, Andrew W. Brown, John Dawson, Rositsa Dimova, Jasmin Divers, TaShauna U. Goldsby, et al. 2016. “Common Scientific and Statistical Errors in Obesity Research: Common Statistical Errors in Obesity Research.” Obesity 24 (4): 781–90. https://doi.org/10.1002/oby.21449.
“Getting Started With GitHubThe Turing Way.” n.d. Accessed August 30, 2023. https://the-turing-way.netlify.app/collaboration/github-novice.html#.
Givens, Geof H, and Jennifer A Hoeting. 2013. Computational Statistics. 2nd ed. Wiley.
———. 2014. “Errata for Computational Statistics, Second Edition.”
Givens, Geof, and J. A. Hoeting. 2013. “Web Page for Computational Statistics, by G. H. Givens and J. A. Hoeting.” 2013. https://www.stat.colostate.edu/computationalstatistics/.
Greenland, Sander. 2010. “Simpson’s Paradox From Adding Constants in Contingency Tables as an Example of Bayesian Noncollapsibility.” The American Statistician 64 (4): 340–44. https://doi.org/10.1198/tast.2010.10006.
Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” Eur J Epidemiol 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.
“Halton Sequence.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Halton_sequence&oldid=1170076511.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. 2nd ed. Springer. https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf.
Healy, Yan Holtz and Conor. n.d. “The Issue with Error Bars.” Accessed September 3, 2023. https://www.data-to-viz.com/caveat/www.data-to-viz.com/caveat/error_bar.html.
Hitchcock, David B. 2003. “A History of the Metropolis-Hastings Algorithm.” The American Statistician 57 (4): 254–57. https://www.jstor.org/stable/30037292.
“How R Searches and Finds Stuff — Study Notes.” n.d. Accessed April 5, 2023. https://askming.github.io/study_notes/Stats_Comp/Note-How%20R%20searches%20and%20finds%20stuff.html.
Hyndman. 2014. “Rob J Hyndman - Fast Computation of Cross-Validation in Linear Models.” March 17, 2014. https://robjhyndman.com/hyndsight/loocv-linear-models/.
IBM Technology, dir. 2022. R Vs Python. https://www.youtube.com/watch?v=4lcwTGA7MZw.
“Introduction.” n.d. Accessed September 4, 2023. https://cran.r-project.org/web/packages/rgl/vignettes/WebGL.html.
Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): 6.
Ioannidis, John P. A. 2019. “What Have We (Not) Learnt from Millions of Scientific Papers with P Values?” The American Statistician 73 (March): 20–25. https://doi.org/10.1080/00031305.2018.1447512.
Kass, Robert E. 1989. “The Geometry of Asymptotic Inference.” Statist. Sci. 4 (3). https://doi.org/10.1214/ss/1177012480.
Kass, Robert E., Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid. 2016. “Ten Simple Rules for Effective Statistical Practice.” Edited by Fran Lewitter. PLoS Comput Biol 12 (6): e1004961. https://doi.org/10.1371/journal.pcbi.1004961.
Kass, Robert E., and Paul W. Vos. 1997. Geometrical Foundations of Asymptotic Inference. Wiley Series in Probability and Statistics. New York: Wiley.
Kennedy, WIlliam J., and James E. Gentle. 2021. Statistical Computing. New York: Routledge. https://doi.org/10.1201/9780203738672.
King, Gary. 1986. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science 30 (3): 666–87. https://doi.org/10.2307/2111095.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. https://doi.org/10.2307/2669316.
Lafit, Ginette. (2021) 2023. “Estimating Linear Mixed-Models with Julia Using R.” https://github.com/ginettelafit/MixedModelswithRandJulia.
Lakens, Daniël. 2022. “Improving Your Statistical Inferences.” 2022. https://doi.org/10.5281/zenodo.6409077.
LaTeX for Complete Novices.” n.d. Accessed August 26, 2023. https://www.dickimaw-books.com/latex/novices/index.html.
Lauritzen, Steffen. n.d. “Maximum Likelihood in Exponential Families.”
Li, Changcheng. 2019. JuliaCall: An R Package for Seamless Integration Between R and Julia.” JOSS 4 (35): 1284. https://doi.org/10.21105/joss.01284.
Loy, Adam, Heike Hofmann, and Dianne Cook. 2016. “Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners.” December 6, 2016. http://arxiv.org/abs/1502.06988.
Ma, Yan, Madhu Mazumdar, and Stavros G. Memtsoudis. 2012. “Beyond Repeated-Measures Analysis of Variance: Advanced Statistical Methods for the Analysis of Longitudinal Data in Anesthesia Research.” Regional Anesthesia and Pain Medicine 37 (1): 99–105. https://doi.org/10.1097/AAP.0b013e31823ebc74.
Maechler, Martin. n.d. “Rmpfr: R MPFR - Multiple Precision Floating-Point Reliable.”
makhlaghi. 2019. “What Is the Relation Between BLAS, LAPACK and ATLAS.” Forum post. Stack Overflow. January 24, 2019. https://stackoverflow.com/q/17858104.
mathematical.coffee. 2016a. “Importing Common YAML in Rstudio/Knitr Document.” Forum post. Stack Overflow. October 6, 2016. https://stackoverflow.com/q/39885363.
———. 2016b. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 7, 2016. https://stackoverflow.com/a/39909079.
matloff. 2015. OpenMP Tutorial, with R Interface | R-bloggers.” January 17, 2015. https://www.r-bloggers.com/2015/01/openmp-tutorial-with-r-interface/.
MavropaliasG. 2023. “My Setup as a Researcher. How to Write, Run Statistics, and Work Seamlessly with R, Obsidian, Linux, and Zotero, and Collaborate with Senior Professors Who Only Accept MS Word Files!” Reddit Post. r/ObsidianMD. March 28, 2023. www.reddit.com/r/ObsidianMD/comments/124cd8y/my_setup_as_a_researcher_how_to_write_run/.
Mühleisen, Hannes, and Mark Raasveldt. 2023. Duckdb: DBI Package for the DuckDB Database Management System. Manual. https://CRAN.R-project.org/package=duckdb.
Müller, Samuel, J. L. Scealy, and A. H. Welsh. 2013. “Model Selection in Linear Mixed Models.” Statistical Science 28 (2): 135–67. https://www.jstor.org/stable/43288485.
Muralidharan, Karthik, Mauricio Romero, and Kaspar Wüthrich. n.d. “Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments.”
Nature. 2023. “Points of Significance. Editorial.” Nat Hum Behav 7 (3): 293–94. https://doi.org/10.1038/s41562-023-01586-w.
“Netlib BLAS FAQ.” 2017. 2017. https://www.netlib.org/blas/faq.html.
Nuzzo, Regina. 2014. “P Values, the ‘Gold Standard’ of Statistical Validity, Are Not as Reliable as Many Scientists Assume.” Nature 506 (February): 150–52.
Oliver, John, dir. 2016. Last Week Tonight with John Oliver: Scientific Studies. HBO. https://www.youtube.com/watch?v=0Rnq1NpHdmw.
Oxberry, Geoff. 2012. “Answer to "Updatable SVD Implementation in Python, C, or Fortran?".” Computational Science Stack Exchange. July 3, 2012. https://scicomp.stackexchange.com/a/2686.
Ozgur, Ceyhun, Taylor Colliau, Grace Rogers, Zachariah Hughes, and Elyse "Bennie" Myer-Tyson. 2017. “MatLab vs. Python vs. R.” Journal of Data Science 15 (3): 355–71. https://doi.org/10.6339/JDS.201707_15(3).0001.
Peng, Roger D. n.d. 1.3 Textbooks Vs. Computers | Advanced Statistical Computing. Accessed November 10, 2023. https://bookdown.org/rdpeng/advstatcomp/textbooks-vs.-computers.html.
Pocock, Stuart J, and James H Ware. 2009. “Translating Statistical Findings into Plain English.” The Lancet 373 (9679): 1926–28. https://doi.org/10.1016/S0140-6736(09)60499-2.
QMNET - Quantitative Methods Network, dir. 2020. Missing Data Imputation with Low Rank Models. https://www.youtube.com/watch?v=OjXPskXO8No.
Rao, C. 2017. “Book Review: Multivariate Statistical Methods, A Primer.” Journal of Modern Applied Statistical Methods 16 (1). https://doi.org/10.22237/jmasm/1493599260.
Ratz, Arthur V. 2021. “Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm.” Medium. April 9, 2021. https://towardsdatascience.com/can-qr-decomposition-be-actually-faster-schwarz-rutishauser-algorithm-a32c0cde8b9b.
Reid, N. 2003. “Asymptotics and the Theory of Inference.” Ann. Statist. 31 (6). https://doi.org/10.1214/aos/1074290325.
Reid, Nancy. 2010. “Likelihood Inference: Likelihood Inference.” WIREs Comp Stat 2 (5): 517–25. https://doi.org/10.1002/wics.110.
Roche, Alexis. 2012. EM Algorithm and Variants: An Informal Tutorial.” http://arxiv.org/abs/1105.1476.
Rodrigues, Bruno. n.d. Reproducible Analytical Pipelines - Master’s of Data Science. Accessed October 30, 2022. https://rap4mads.eu/.
Rothman, Kenneth J. 2014. “Six Persistent Research Misconceptions.” J GEN INTERN MED 29 (7): 1060–64. https://doi.org/10.1007/s11606-013-2755-z.
Sabine Hossenfelder, dir. 2021. How I Learned to Love Pseudoscience. https://www.youtube.com/watch?v=bWV0XIn-rvY.
Santillan, Carlos. 2018. “Improving R Perfomance by Installing Optimized BLAS/LAPACK Libraries.” July 25, 2018. https://csantill.github.io/RPerformanceWBLAS/.
Sawitzki, G. 2009. “Web Page for Computational Statistics: An Introduction to R.” StatLab Heidelberg. 2009. https://sintro.r-forge.r-project.org/.
Schervish, Mark J. 1996. “P Values: What They Are and What They Are Not.” The American Statistician 30 (3): 203–6.
Schmidt, Anthony. 2021. “A Zotero Workflow for R.” Anthony Schmidt. October 25, 2021. https://www.anthonyschmidt.co/post/2021-10-25-a-zotero-workflow-for-r/.
Sharp, Julia L. 2022. “Statistical Collaboration Training Videos.” 2022. https://www.youtube.com/playlist?list=PLCqpxiFnahaBFnIXIWmTE1eYpa2ov5VWL.
———. 2023. “Setting the Stage: Statistical Collaboration Training Videos.” 2023. https://sites.google.com/site/julialsharp/other-resources/statistical-collaboration-training-videos.
Sherrington, Malcolm. 2015. Mastering Julia. Packt Publishing Ltd.
Silk, Matthew. 2019. “Mixed Model Diagnostics.” 2019. https://dfzljdn9uc3pi.cloudfront.net/2020/9522/1/MixedModelDiagnostics.html.
Soch, Joram. 2019. “The Book of Statistical Proofs.” The Book of Statistical Proofs. 2019. https://statproofbook.github.io/.
Software, Econometrics and Free. 2022. “A Linux Live USB as a Statistical Programming Dev Environment | R-bloggers.” October 29, 2022. https://www.r-bloggers.com/2022/10/a-linux-live-usb-as-a-statistical-programming-dev-environment/.
“Stat 3701 Lecture Notes: R Generic Functions.” n.d. Accessed February 15, 2023. https://www.stat.umn.edu/geyer/3701/notes/generic.html.
Statistical Tools for High-Throughput Data Analysis. 2023. “A Complete Guide to 3D Visualization Device System in R - R Software and Data Visualization - Easy Guides - Wiki - STHDA.” 2023. http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization.
Stone, M. 1977. “A Unified Approach to Coordinate-Free Multivariate Analysis.” Ann Inst Stat Math 29 (1): 43–57. https://doi.org/10.1007/BF02532773.
strboul. 2019. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 21, 2019. https://stackoverflow.com/a/58491399.
Team, Posit. 2022. “Posit.” Posit. September 14, 2022. https://www.posit.co/.
tillsten. 2013. “Updatable SVD Implementation in Python, C, or Fortran?” Forum post. Computational Science Stack Exchange. February 27, 2013. https://scicomp.stackexchange.com/q/2678.
UC3M, Coding Club. 2019. “Simple yet Elegant Object-Oriented Programming in R with S3.” Coding Club UC3M. May 28, 2019. https://codingclubuc3m.rbind.io/post/2019-05-28/.
Unknown. n.d. “Lme4 Convergence Warnings: Troubleshooting.” Accessed September 23, 2023. https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html.
user1436187. 2015. “Updating SVD Decomposition After Adding One New Row to the Matrix.” Forum post. Cross Validated. October 18, 2015. https://stats.stackexchange.com/q/177007.
“Using oneMKL with R.” n.d. Intel. Accessed July 18, 2023. https://www.intel.com/content/www/us/en/developer/articles/technical/using-onemkl-with-r.html.
usεr11852. 2015. “Answer to "Updating SVD Decomposition After Adding One New Row to the Matrix".” Cross Validated. October 31, 2015. https://stats.stackexchange.com/a/179539.
Vehtari, Aki. n.d. TUTORIAL ON MODEL ASSESMENT, SELECTION AND INFERENCE AFTER SELECTION.”
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.
Wasserstein, Ronald L., Allen L. Schirm, and Nicole A. Lazar. 2019. “Moving to a World Beyond p \(<\) 0.05’.” The American Statistician 73 (March): 1–19. https://doi.org/10.1080/00031305.2019.1583913.
Wicherts, Jelte M., Coosje L. S. Veldkamp, Hilde E. M. Augusteijn, Marjan Bakker, Robbie C. M. Van Aert, and Marcel A. L. M. Van Assen. 2016. “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking.” Front. Psychol. 7 (November). https://doi.org/10.3389/fpsyg.2016.01832.
Wickham, Hadley. 2019. Advanced R. 2nd ed. CRC Press. https://adv-r.hadley.nz/.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” Edited by Francis Ouellette. PLoS Comput Biol 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.
Xie, Yihui. (2011) 2022. “Knitr.” https://github.com/yihui/knitr/blob/36efc00013d423b8f098ff37ba524c9d29810fa0/inst/examples/knitr-spin.R.

  1. Google Scholar↩︎

  2. Github↩︎

  3. R is to S as Linux is to Unix as GNU C is to C as GNU C++ is to C …. S, Unix, C and C++ were created at Bell Labs in the late 60s and the 70s. R, Linux, GNU C and GNU C++ are public license re-engineerings of the proprietary S, Unix, C and C++ respectively.↩︎