This version: November 27 2024 23:38

Where there is no uncertainty, there cannot be truth – Richard Feynman

I am a firm believer that before you use a method, you should know how to break it. Describing how to break something should be an essential part of describing a new piece of statistical methodology (or, for that matter, of resurrecting an existing one). – Dan Simpson

To teach how to live without certainty, and yet without being paralysed by hesitation, is perhaps the chief thing that philosophy, in our age, can still do for those who study it. — Bertrand Russell

Five random log-likelihoods
Five random log-likelihoods

Announcements

  • 2024-11-13
    • The final exam for the course and the comprehensive exam will take place on Monday, December 9, 2024, from 4pm to 6pm (7pm for the comprehensive exam) in Room HNE 030 (Health, Nursing and Environmental Studies Building).
  • 2024-11-12
    • Quiz originally scheduled for Thursday, November 14 will take place on Tuesday, November 19.
  • 2024-10-01
  • 2024-09-11
    • The midterm test will be held on Tuesday, October 22, NOT on Thursday, October 17, as originally announced in the course description, which is during the fall reading week.
    • Voluntary tutorials begin on Friday, September 13, at 9 am at https://yorku.zoom.us/j/92413559596

Calendar

Classes meet on Tuesdays and Thursdays from 2:30 to 4:00 in ACW 104.
Instructor: Georges Monette
Important dates: 2024-2025 Academic Calendar
Textbook:

References:

  • Gentle, J. E. (2009) Computational Statistics, Springer
  • Efron, B., Hastie, T. (2021) Computer Age Statistical Inference: Algorithms, Evidence and Data Science Student Edition, Cambridge Press

Email: Messages about course content should be posted publicly to Piazza so the discussion can engage and benefit everyone.

Day 1: Thursday, September 5

Files used today: Regression Review

Assignment 0 (individual): Connect with Piazza

Due Friday, September 7, 5 pm

  • Join the MATH 6630 Piazza site by going to this URL: piazza.com/yorku.ca/fall2024/math6630
    • Use the access code blackwell when prompted.
    • Create a post on Piazza to introduce yourself to your colleagues:
      • The title should read: Introduction: followed by the name you prefer to use in class, e.g. Jon Smith
      • The first line should show the name of your Github account, e.g. Github: jsmith. If you don’t have one already, leave this blank for now. You can come back and edit it later.
      • Follow this with information on what computing languages you are familiar with? Which one are you proficient with?
      • Share some interests: hobbies, favourite philosopher, statistician, author, musician, movies, restaurants, etc.
      • Click on the assn0 folders when you submit your post.
      • On Saturday, I will assign teams and need you to have connected with Piazza to put you in a team.
  • While you’re on Piazza, post your first ‘feedback’. See the course description for details

Assignment 1 (individual): Setting things up

Due Wednesday, September 11, 9 pm

  • Summary:
    1. Install R and RStudio
    2. Get a free Github account
    3. Install git on your computer
    4. Give feedback on any problems you ran into – because the world keeps changing and these instructions need to be updated.
  • 1. Install R and RStudio following these instructions. If you already have R and RStudio, update them to the latest versions.
  • 2. Get a free Github account: If you don’t have one, follow this excellent source of advice from Jenny Bryan on using git and Github with R and RStudio.
    • Be sure to read Jenny Bryan’s advice before choosing a name.
    • CAUTION: Avoid installing ‘Github for Windows’ (which is not the same as ’Git for Windows) from the Github site at this stage.
    • An excellent additional reference is Hadley Wickham’s chapter on Git and Github in his book on R Packages
  • 3. Install git on your computer using these instructions.
    • If you are curious about how git is used have a look at this tutorial! You will be using it mainly through RStudio and won’t need to master how to use with a command-line interface.
    • As a final step: In the RStudio menu click on Tools | Global Options ... | Terminal. Then click on the box in Shell paragraph to the right of New terminals open with:
      • On a PC, select Git Bash
      • On a Mac, select Bash
    • You don’t need to do anything else at this time. We will see how to set up SSH keys to connect to Github through the RStudio terminal in a few lectures.
  • 4. Post on Piazza about your experiences installing the above: - Create a post entitled ‘Getting started’. - Describe problems you encountered installing R, RStudio, and git - Describe problems registering with Github - How could the instructions be improved to make the process easier? - If you couldn’t complete the installation, describe the problem or error message(s) you encountered. - Click on the on the assn1 folder before submitting your post. - If you are hoping that someone will provide answers to questions in your post select Question as the Post Type. Otherwise, select Note.

Assignment 2 (teams) \(p\)-values

  • Deadlines: See the course description for the meaning of these deadlines.
    1. Friday, September 14 at noon
    2. Sunday, September 16 at 9 pm
    3. Monday, September 17 at 9 pm

This exercise will be done in teams of 4 or 5 members which will be posted on Piazza. All members of a team work on a similar problem but with different parameters. This way, you can help each other with the concepts but each team member must, ultimately, do their own work.

The purpose of the assignment is to explore the meaning of p-values. Before starting, stop and reflect on what it means for an experiment to ‘achieve’ a p-value of 0.049. What meaning can we give to the quantity ‘0.049’? How is it related to the probability that the null hypothesis is correct?

To keep things very simple suppose you want to test \(H_0: \mu =0\) versus \(H_1: \mu > 0\) (note: the problem is also easy but takes a bit more coding if you assume a two-sided alternative: \(H_0: \mu =0\) versus \(H_1: \mu \neq 0\) – to help focus on the essential concepts, we’ll use a one-sided alternative) and you are designing an experiment in which you plan to take a sample of independent random variables, \(X_1, X_2, ... , X_n\) which are iid \(\textrm{N}(\mu,1)\), i.e. the variance is known to be equal to 1. You plan to use the usual test based on \(\bar{X}_n\) rejecting \(H_0\) for values of \(\bar{X}_n\) that are far from 0.

An applied example would be testing for a change in value of a response when all subjects are submitted to the same conditions and the measurement error of the response is known. In that example \(X_i\) would be the ‘gain score’, i.e. post-test response minus the pre-test response exhibited by the \(i\)th subject.

Note: Do the assignment using R and using a Markdown file. You can use the Markdown sample file in Assignment 1 as a template. Upload both a pdf file and your R script to Piazza.

Let the selected probability of Type I error be \(\alpha = 0.05\). Depending on your team, you have a budget that allows you to collect a sample of \(n\) observations as follows:

Team \(n\)
A: Akaike (Hirotsugu Akaike) 5
B: Blackwell (David Blackwell) 8
C: Cox (Gertrude Mary Cox) 10
D: David (Florence Nightingale David) 15

Depending on your position on the team, you are interested in answering the questions below using the following values of \(\mu_1\):

Position in Team \(\mu_1\) Cohen’s terminology for effect size: \(\mu_1/\sigma\)
1 all combine ROC curves for results below
2 0.2 small effect size
3 0.5 medium effect size
4 0.8 large effect size
5 1 very large effect size
  1. What is the probability that \(p \le 0.05\) if \(H_0: \mu = 0\) is true?
  2. What is the probability that \(p \le 0.05\) if \(\mu = \mu_1\)?
  3. What is the power of this test if \(\mu = \mu_1\)?
  4. Plot the ROC curve: Power (\(\beta\) as a function of \(\alpha\)). To generate the ROC curve, use a reasonable range of values for critical values of the test and plot the probability of rejecting under \(H_0\) in the horizontal direction versus the probability of rejecting of under \(H_A\) in the vertical direction.
  5. What is the interpretation of the area under the ROC curve?
  6. Suppose that you collect the data and that the observed \(p\)-value is 0.049. What can you say about the probability that \(H_0\) is true?
  7. Suppose that, before running the experiment, you were willing to give \(H_0\) and \(H_1: \mu = \mu_1\) equal probability of 1/2.
    1. What is the probability that \(H_0\) is true given the event that \(p \le 0.05\)?
    2. What is the probability that \(H_0\) is true given the event that \(p = 0.049\)?
    3. What is the probability that \(H_0\) is true given the event that \(p = 0.005\)?
    4. What is the probability that \(H_0\) is true given the event that \(p = 0.0005\)?
  8. As a team discuss and post your reflections: Hypothesis testing is often presented as a process that parallels that of determining guilt in a criminal process. We start with a presumption of innocence, i.e. that \(H_0\) is true. We then hear evidence and consider whether it contradicts the presumption of innocence ‘beyond a reasonable doubt.’ Suppose we quantify the presumption of innocence to mean that the prior probability of guilt is small, e.g. \(P(H_1) \le .05\). How small an observed \(p\)-value do you need to obtain in order to ‘flip’ the presumption of innocence to ‘guilt beyond a reasonable doubt’ if that is defined as \(P(H_0 | \mathrm{data}) \le .05\).
  9. Ditto: What \(p\)-value would we need if the presumption of innocence and guilt beyond a reasonable doubt correspond to \(P(H_1) \le 0.001\) and \(P(H_0|\mathrm{data}) \le 0.001\), respectively?
  10. Ditto: Courts have often adopted a criterion of \(p < 0.05\) in imitation of the common practice among many researchers. Comment on the possible consequences.
  11. Ditto: Have a look at this xkcd cartoon. How does the Bayesian statistician derive a probability to make a decision in this example? Show the details.

To delve more deeply into these issues you can read Wasserstein and Lazar (2016) and Wasserstein, Schirm, and Lazar (2019). Concerns about \(p\)-values have been around for a long time, see Schervish (1996). For a short overview see Bergland (2019). There are two interesting and influential papers by John Ioannidis (2005), (2019).

For an entertaining take on related issues see Last Week Tonight: Scientific Studies by John Oliver (2016) (warning: contains strong language and political irony that many could consider offensive – watch at your own risk!).

Preparing for the next class: Play your way through these scripts:

Step your way through the scripts and experiment as you go along. If you have questions or comments, post them on Piazza.

These scripts give you a tour of various basic features of R. They are very basic and meant to provide reminders of things you can do when a need arises. If you are new to R you probably know some other computing language well and these scripts can introduce you to R syntax.

What’s the best way to learn R? That depends on your learning style, your previous experience and how deep you want to go.

See the discussion in ‘Day 2’. Start or join a discussion on Piazza on ‘best ways to learn R’.

Day 2: Tuesday, September 10

The ‘spida2’ package should be reinstalled regularly since it’s likely to be updated frequently during the course.

Tutorials: Start this week on Friday, September 13 at 9 am.

Learning R

Why R? What about SAS, SPSS, Python, Julia, among others?

SAS is a very high quality, intensely engineered, environment for statistical analysis. It is widely used by large corporations. New procedures in SAS are developed and thoroughly tested by a team of 1,000 or more SAS engineers before being released. It currently has more than 300 procedures.

R is an environment for statistical programming and development that has accumulated many somewhat inconsistent layers developed over time by people of varying abilities, many of whom work largely independently of each other. There is no centralized quality testing except to check whether code and documentation run before a new package is added to R’s main repository, CRAN.

When this page was last updated (at 23:38 on November 27 2024), CRAN had 21,708 packages.

In addition, a large number of packages under development are available through other repositories, such as Github.

The development of SAS began in 1966 and that of R (in the form of its predecessor, S, at Bell Labs) in 1976.

The ‘design’ of R (using ‘R’ to refer to both R and to S) owes a lot to the design of Unix. The idea is to create a toolbox of simple tools that you link together yourself to perform an analysis.
Unix, now mainly as Linux1, commands were simple tools linked together by pipes so the output of one command is the input of the next. To do anything you need to put the commands together yourself.

The same is true of R. It’s extremely flexible but at the cost of requiring you to know what you want to do and to be able to use its many tools in combination with each other to achieve your goal. Many decisions in R’s design were intended to make it easy to use interactively. Often the result is a language that is very quirky for programming.

SAS, in contrast, requires you to select options to run large procedures that purport to do the entire job.

This is an old joke: If someone publishes a journal article about a new statistical method, it might be added to SAS in 5 to 10 years. It won’t be added to SPSS until 5 years after there’s a textbook written about it, maybe another 10 to 15 years after its appearance in SAS.

It was added to R two years ago because the new method was developed as a package in R long before being published.

So why become a statistician? So you can have the breadth and depth of understanding that someone needs to apply the latest statistical ideas with the intelligence and discernment to use them effectively.

So expect to have a symbiotic relationship with R. You need R to have access to the tools that implement the latest ideas in statistics. R needs you because it takes people like you to use R effectively.

The role of R in this course is to help us

  • have access to tools to expand our ability to explore and analyze data, and
  • learn how to develop and implement new statistical methods. i.e. learn how to build new tools
  • deepen our understanding of the use of statistics for scientific discovery as well as for business applications

It’s very challenging to find a good way to ‘learn R’. It depend on where you are and where you want to go. Now, there’s a plethora of on-line courses. See the blog post: The 5 Most Effective Ways to Learn R

In my opinion, ultimately, the best way is to

  • play your way through the ‘official’ manuals on CRAN starting with ‘An Introduction to R’ along with ‘R Data Import/Export’. Note however that these materials were developed before the current mounting concern with reproducible research and some of the advice should be deprecated, e.g. using ‘attach’ and ‘detach’ with data.frames.
  • read the CRAN task views in areas that interest you.
  • Have a look at the over 496,438 questions tagged ‘r’ on stackoverflow.
  • At every opportunity, use R Markdown documents (like the sample script you ran when you installed R) to work on assignments, project, etc.

Using R is like playing the piano. You can read and learn all the theory you want, ultimately you learn by playing.

We will discuss the scripts CAR_1.R and CAR_2.R in the tutorial this week. Copy them to a file in RStudio and play with them line by line. You can also run an entire script with Control-Shift-K to see how scripts can be used for reproducible research.

Exercises:

  • From New 6630 questions
    • 5.1, 5.2, 5.3, 5.4, 5.5,
    • 5.6.23, 5.6.24, 5.6.25, 5.6.26, 5.6.27,
    • 6.1, 6.2, 6.3, 6.4, 6.5,
    • 7.4, 7.5, 7.6, 7.7, 7.8,
    • 8.1, 8.2, 8.3, 8.4, 8.5,
    • 8.6, 8.7, 8.8, 8.9, 8.10,
    • 8.18.a, 8.18.b, 8.18.c, 8.18.d, 8.18.e,
    • 8.36.a, 8.36.b, 8.36.c, 8.36.d, 8.36.e,
    • 8.51.a, 8.51.b, 8.51.c, (write functions that would work on matrices of any size), 8.61.a, 8.62.a
    • 12.1, 12.3, 12.5, 12.7, 12.9,

Assignment 3 (teams)

  • Do the exercises above. Each member of a team does 10 questions. Use the random permutation of the numbers 1 to 5 below to determine the order of the starting exercise for each member of the team. The first member does the question whose order is the first random number, the second student does the question whose order is the second random number and so on. Then you continue to do the all of
    the fifth succeeding questions. For example, the third student in each team gets the random number 2 in the sequence below and would do the 2nd, 7th, 12th, 17th, 22nd, 27th, 32nd, 37th, 42nd and 47th questions.
  • In other words: Trying to make things less complicated! Note that each of the 10 lines of questions above has 5 questions. So
    • Team Member #1 uses the first random number and does the 3rd question in each line,
    • Team Member #2 does the 1st question in each line,
    • Team Member #3 does the 2nd question in each line,
    • Team Member #4 does the 4th question in each line, and
    • Team Member #5 does the 5th question in each line.
  • Random numbers: 3 1 2 4 5
  • Deadlines: See the course description for the meaning of these deadlines.
    1. Friday, September 29 at noon
    2. Sunday, October 1 at noon
    3. Monday, October 2 at 9 pm
  • IMPORTANT:
    • Upload the answer to each question in a single Piazza post (post it as a Piazza ‘Note’, not as a Piazza ‘Question’) with the title: “A3 5.1” for the first question, etc.
    • You can answer the question directly in the posting or by uploading a pdf file and the R script or Rmarkdown script that generated it.
    • When providing help or comments, do so as “followup discussion”.

If you want a deeper understanding of R, consider reading Hadley Wickham, Advanced R

Preparation for next class

Day 3: Thursday, September 12

Continuation of Day 2

The Problem of Causality

  • A useful oversimplification:

    • When studying relationships among variables, there two kinds of data:
      • Observational: no variable is randomly assigned and manipulated
      • Experimental: a ‘treatment’ is randomly assigned and applied
    • There are two purposes for making inferences about relationships among variables:
      • Making good predictions: when we see X we want to guess or predict Y
      • Making causal inferences: what would happen if we ‘do X’? What would happen if we were to assign different levels of X? What would happen if we were to intervene by setting X at different levels?
  • The fundamental \(2 \times 2\) fundamental table of Statistics

Predictive Inference Causal Inference
Observational Data OK: what you learn in Regression Modern causal inference
Experimental Data Hard but rarely an issue OK: what you learn in Experimental Design

Day 4: Tuesday, September 17

Files to play with:

We will work through some of the following files in the near future but I encourage you to play your way through them on your own: try different models, try different displays of data, diagnostics, etc. I’ll be working on updating them so they will change. Of course, I welcome your suggestions.

Files we’ll start using today:

  • Simple Regression ++
    • Major concepts explored:
      • Correlation and paradoxes related to regression to the mean
      • Understanding the roles and effects of different types of outliers
      • Visualizing significance at a glance from a scatterplot
      • The power of added variable plots to see what’s happening in a regression.
    • Simple Regression ++ with annotations
  • Multiple Regression
    • You can visualize the mathematics of multiple regression with ellipses that reveal a simple relationship between data and inference
    • Simpson’s Paradox again, with continuous variables
    • To answer whether there’s a causal effect, control for confounders, but not for mediators (or colliders)
    • To answer how the effect works, you may want to control mediators
    • To answer how to use a model for prediction use the variables available for prediction. Confounding and mediation is not relevant
    • Visualizing and interpreting parameters with interactions
  • Leverage and Mahalanobis Distance:
  • Play through the Simple Regression R script to get information on using Added Variable Plots (also known as Partial Regression Leverage Plots) and Partial Residual Plots (also known as Component-Plus-Residual Plots).
  • The script has a short section on ROC curves at the end.
  • Continue working on your own with sample scripts in R/CAR_1 and R/CAR_2.

Comments on Leverage and Mahalanobis Distance:

Some formulas are worth remembering because they work for multiple regression as well as simple regression. We’ve seen: \[ SE(\hat{\beta}_i) = \frac{1}{\sqrt{n}}\frac{s_e}{s_{x_i|others}} \] Here’s another one: You know that the diagonal elements, \(h_{ii}\) of the hat matrix \[ H = X(X'X)^{-1}X' \] are the leverages showing how much a change in one particular \(y_i\) would change the fitted value \(\hat{y}_i\). i.e. \[ h_{ii} = \frac{\partial \hat{y}_i}{\partial y_i} \] It turns out that there’s nothing mysterious about where the \(h_{ii}\)s come from. They have a simple relationship with the squared Mahalanobis distance in the predictor space: \[ h_{ii} = \frac{1}{n}(1 + Z_i^2) \] where, in simple regression, \(Z_i\) is the Z-score (standardized with the MLE estimate of standard deviation, dividing by \(n\) instead of \(n-1\)). \(|Z_i|\) also happens to be Mahalanobis distance in one dimension.

The formula generalizes to multiple regression if you take \(Z_i\) to be Mahalanobis distance, so Mahalanobis distance is just a generalization of the Z-score to higher dimensions.

See the links above on Leverage and Mahalanobis Distance.

Comments:

  • Start reading Hadley Wickham, Advanced R Chapters 1-3 and work on the exercises.
  • Post questions about R on Piazza or come to the tutorials.

Comment on AVPs: The fact that a simple regression applied to the AVP leads to an estimate of the slope that

  • is the same as that of the associated multiple regression,
  • has a correlation equal to the partial correlation of the predictor in the multiple regression,
  • has the same residuals as the residuals in the multiple regression,
  • has the same residual standard error as that of the multiple regression, and finally
  • has the same standard error for the estimation of the slope as that of the multiple regression

provided one calculates these quantities using maximum likelihood estimates (it’s all asymptotically true if one uses the degree-of-freedom adjusted estimates), is the content of the Frisch-Waugh-Lovell Theorem that was ‘discovered’ in steps over a period of approximately 30 years. The first version applied to lagged time series and was discovered by Ragnar Frisch. He got the first Nobel Prize in Economics in 1931, for his general contributions, not just the start of this theorem. However, like other statistical ‘objects’ that have names associated with them, the actual original theorem was published by Udny Yule in 1911 (I am shamelessly using Wikipedia, and its gifted contributors, as a source). The theorem was refined over time by Frederick Waugh and Michael Lovell who each added portions of it. Lovell is known for an elegant proof.

The theorem shows how the AVP is the window into the way a multiple regression determines an individual coefficient.

Possible quiz questions:

Look at the link to possible questions for the quiz on Thursday under “Quick Links” above.

Simpsons’ Paradox:

  • The principle behind Simpsons’ Paradox is the same whether the variables involved are continuous or categorical. The many numerical examples don’t give me a feeling of understanding what’s behind it. I need a way to visualize it. But visualizations are quite different depending on the character of each of the X, Y and Z variables.
  • Perhaps the easiest case to visualize involves a continuous X and Y and a categorical conditioning variable Z.
  • See this example I made up.
    • What might be plausible names for the X and Y variables?
    • Note: I found a graphic like this somewhere on the web but I couldn’t find it again for this class, so this is my amateurish attempt at recreating it. If you find something similar on the web, please post the link to Piazza.
    • With apologies to anyone who has never seen “The Simpsons”.
  • The rotating 3d plot at the top of this page is an example in which all three variables are continuous.
  • We’ll see some interesting examples where all three variables are dichotomous, using Paik-Agresti diagrams.

Day 5: Thursday, September 19

Continuation of previous day.

Links:

  • More references on R:

Day 7: Thursday, September 26

Links:

Background material:

  • Causal Questions
    • Overview of confounders, mediators, colliders and moderators
    • Paik-Agresti diagram
    • Lord’s Paradox
  • Understanding Output in Regression
    • This is a bit old and needs updating but it might be useful when you explore regression models and want to ask meaningful creative questions.
  • Three Basic Theorems
    • These are proofs of things we use in the course. You are not responsible for these proofs but the techniques, primarily using the properties of orthogonal projections (i.e. symmetric idempotent matrices) are very useful when working with the theory of linear regression.
    • The second theorem is the proof that the AVP does what we claim it does. It’s the ‘Frisch-Waugh-Lovell’ theorem of econometrics.
    • The third theorem shows why linear propensity scores work because they produce the same \(\hat{\beta}\) as a multiple regression with the same predictors as those use for the propensity score. This serves as a warning that the choice of predictors for the propensity score isn’t different from the choice of predictors for a multiple regression model: i.e. you still need to block backdoor paths and avoid mediators and colliders.
  • The Causal Zoo
    • Simulation based on a causal graph showing the consequences of using models that include or exclude various types of variables. See the graph on page 9 summarizes the consequences of different choices by plotting the expected value of \(\hat{\beta}_X\) versus its variance. The ‘causal effect’ of \(X\) is 4, so any model whose estimator has a different expected value is biased. Look at what happens if you include the collider in this model: Y ~ Z + Col. This shows how models matter when you want to give a coefficient a causal interpretation.

Day 8: Tuesday, October 1

Continuation of previous

Day 9: Thursday, October 3

Two key ideas:

  • Backdoor criterion: Using a linear DAG to identify causal models for the effect of X on Y:
    If your model satisfies these two conditions, then the regression coefficient for X will provide an unbiased estimate of the causal effect of X:
    • Find the backdoor paths:
      • Make sure each backdoor path is blocked either:
        • by controlling for one or more non-colliders along the path, or
        • by the presence of an uncontrolled collider along the path.
        • if a collider is controlled, either by inclusion in the model or because it acted as a selection variable for the sample, e.g. only one level of the collider has been observed, then the path needs to be blocked through some other variable along the path.
    • Do not control for any descendant of X
  • Comparing two models that both provide causal estimates of the effect of X:
    • Consider which one yields a smaller \(SE(\hat{\beta}_X)\) by comparing \(s_e\) (how well the models predict Y) and \(s_{X|Zs in the models}\). If the two models differ in their effect on these two quantities in opposite directions, then you can compare them with respect to \(SE(\hat{\beta}_X)\).
  • Linear propensity score:
    • Consider a linear model Y ~ X + Zs where Zs can represent one or more than one variable.
    • The AVP theorem (FWL theorem) says that the regression coefficient for X in the multiple regression is the same as the regression coefficient in the simple regression of \(Y - \hat{Y}\) against \(X - \hat{X}\), where \(\hat{Y}\) and \(\hat{X}\) are obtained by regression on Zs. Moreover \(SE(\hat{\beta}_X)\) is the same (if you ignore differences in denominator degrees of freedom, e.g. if you use the MLE estimators for the residual variance).
    • The linear propensity score theorem says that \(\hat{\beta}_X\) is also the same in the regression of the original \(Y\) on \(X\) and \(\hat{X}\) with the inclusion of an intercept. In other words, if we keep the ‘propensity score’ \(\hat{X}\) in the model, we can throw away the Zs. The \(SE(\hat{\beta}_X)\), however, may be larger. But the model may be simpler and you may have more confidence in its validity. This is the trade-off in using propensity scores for causal inference. Note that you can add ‘covariates’ to a propensity score model.

Sample questions for the quiz on Tuesday:

  • “If you drop a variable that is not significant in a multiple linear regression, the other estimated coefficients should not change very much.”
    • Comment.
    • If the statement is false describe how you would look for a counterexample? Provide a plausible counterexample.
  • “If a linear regression on two variables and an intercept leads to the rejection of the joint null hypothesis: \(H_0: \beta_1 = \beta_2 = 0\), then it must be possible to reject at least one of the hypotheses: \(H_0: \beta_1 = 0\) or \(H_0: \beta_2 = 0\).”
    Comment. If this statement is false describe how you would look for a counterexample. Provide a plausible counterexample.
  • Consider a linear regression on two variables and an intercept with coefficients \(\beta_1\) and \(\beta_2\). The simple linear regressions have regression coefficients \(\gamma_1\) and \(\gamma_2\). Is it possible for a data set to lead to rejection of the hypothesis \(H_0: \beta_1 = \beta_2 = 0\) yet not reject either \(H_0: \gamma_1 = 0\) nor \(H_0: \gamma_2 = 0\).
    If so, provide a plausible example. What would this say about the ability of forward stepwise regression to find a model that fits well.
  • In multiple regression, when would you want to exclude a variable even though it is highly significant? Give an example.
  • In multiple regression, when would you want to include a variable even though it is not significant? Give an example.
  • Consider the linear causal DAG below and the following models:
    1. Y ~ X
    2. Y ~ X + Z6
    3. Y ~ X + Z1
    4. Y ~ X + Z1 + Z4
    5. Y ~ X + Z1 + Z3
    6. Y ~ X + Z3 + Z6
    7. Y ~ X + Z1 + Z5
    8. Y ~ X + Z2 + Z6 + Z7
    9. Y ~ X + Z1 + Z4
    10. Y ~ X + Z2 + Z8
    1. For each of these models discuss briefly whether fitting the model would produce an unbiased estimate of the causal effect of X.
    2. Choose two pairs of models that provide an unbiased estimate of the causal effect of X. Order them, if possible from the information in the DAG, according to the expected standard deviation of \(\hat{\beta}_X\). Briefly state the basis for your ordering.
    3. Are there reasons why you might prefer to use a model that the DAG would identify as having a larger standard deviation of \(\hat{\beta}_X\)?

Links:

Background material:

  • Causal Questions
    • Overview of confounders, mediators, colliders and moderators
    • Paik-Agresti diagram
    • Lord’s Paradox (not covered)
  • Understanding Output in Regression
    • This is a bit old and needs updating but it might be useful when you explore regression models and want to ask meaningful creative questions.
  • Three Basic Theorems – slightly revised
    • See also Three Basic Theorems Notes
    • These are proofs of things we use in the course. You are not responsible for these proofs but the techniques, primarily using the properties of orthogonal projections (i.e. symmetric idempotent matrices) are very useful when working with the theory of linear regression.
    • The second theorem is the proof that the AVP does what we claim it does. It’s the ‘Frisch-Waugh-Lovell’ theorem of econometrics.
    • The third theorem shows why linear propensity scores work because they produce the same \(\hat{\beta}\) as a multiple regression with the same predictors as those use for the propensity score. This serves as a warning that the choice of predictors for the propensity score isn’t different from the choice of predictors for a multiple regression model: i.e. you still need to block backdoor paths and avoid mediators and colliders.
  • The Causal Zoo
    • Simulation based on a causal graph showing the consequences of using models that include or exclude various types of variables. See the graph on page 9 summarizes the consequences of different choices by plotting the expected value of \(\hat{\beta}_X\) versus its variance. The ‘causal effect’ of \(X\) is 4, so any model whose estimator has a different expected value is biased. Look at what happens if you include the collider in this model: Y ~ Z + Col. This shows how models matter when you want to give a coefficient a causal interpretation.

Day 10: Tuesday, October 8

  • Chapter 1 of Givens and Hoeting
    • Chapter 1 Notes – Part 1
      • Jacobians of various dimensions
      • Likelihood
      • The Bayesian/Frequentist divide
      • Log-likelihood, its gradient, and its Hessian: the simple theory behind a great discovery: Fisher Information
      • The variance of the score and the variance of the MLE
      • The ‘easy’ case: exponential families

Day 11: Thursday, October 10

  • Preparation for the midterm on October 22:

Day 12: Tuesday, October 22

Midterm Test

Day 13: Thursday, October 24

Day 14: Tuesday, October 29

Project

  • Choose an algorithm or method of analysis you find interesting, e.g TMLE, lme in the nlme package, glmmTMB, …
  • Figure out how it works.
  • Describe the algorithm.
  • Show examples of its application.
  • Break it.
  • Show how you broke it.
  • Describe how it broke and how to avoid using it in ways that will break it.
  • Prepare a 15-minute presentation.
  • Collaborate with your team on Piazza so I can give you a grade for your contributions and participation.
  • It’s okay if two or more teams choose the same algorithm as long as teams work independently. It would be very interesting to see how different teams working on the same problem would inevitably come to different results.
  • Project presentations will take place on the last day of class.

Class

  • Slides for Chapter 1
  • Messy folder of files for Chapter 1
  • Computational Statistics data sets and scripts
    • Note that the style of R programming is quite basic so don’t take the scripts as examples of good R style.
  • Chapter 1 Exercises and sample quiz questions:
    1. Work out the details of the score moments for a sample of size \(n\) from the \(N(\mu,\sigma)\) distribution.
    2. Find the canonical parameters, canonical sufficient statistics and cumulant-generating function for a sample from the \(N(\mu, \sigma)\) distribution treated as an exponential family.
    3. Derive the posterior density for \(\lambda\) given an observation from an exponential(\(\lambda\)) distribution using an exponential(\(\theta\)) prior for \(\lambda\). Compare with a) a posterior density using an improper uniform prior on \(\lambda\). Compare the modes, expected values and the standard deviations of the posteriors.
    4. Find the profile likelihood for \(\mu\) and for \(\sigma\) for a sample of \(n\) from the \(N(\mu,\sigma)\).
    5. Find the estimated variance matrix for a Wald test based on the Hessian evaluated at the MLE for a sample of \(n\) from the \(N(\mu,\sigma)\) distribution.

Day 15: Thursday, October 31

  • Discussion of Bayesian inference and implications for the interpretation p-values
  • Distinguishing among statements: “p < 0.05”, “p = 0.049”, “p = 0.00001”
  • Some advocate solving the problem by only reporting “p < 0.05”. Does this solve a problem by pretending it doesn’t exist?

Day 16: Tuesday, November 5

Links:

Some sample questions for quiz on November 7:

Note: You should know the definitions of the Bernoulli, Binominal, Multinomial, Normal, Multivariate Normal, Poisson, Exponential, Gamma, and Uniform distributions. If others are needed, their definitions will given in the quiz. You should also know the definition of moment-generating functions, the characterization of exponential families, and the definition and derivation of the score and Fisher information identities.

  • Consider a statistical model given by a family of densities \(f(x|\theta)\) for an outcome \(x\) with parameter \(\theta\) in a open interval in \(\mathbb{R}\). Derive, under standard regularity assumptions, the expected value of the gradient of the log-likelihood evaluated at the ‘true’ value of \(\theta\).
  • Consider the problem of finding an extremum of the real function \(\frac{\log(x)}{1+x} - x^2\). Suppose you are applying the Newton method starting at \(x=1\). Find the next value in the sequence.
  • Let the random vector \(x = \begin{pmatrix} x_1 \\ x_2 \end{pmatrix}\) have the following density for unknown real parameters \(\theta_1, \theta_2 >0\): \[ f(x_1, x_2) = \frac{1}{\theta_1 \theta_2}\exp(-x_1/\theta_1 - x_2/\theta_2) \quad \textrm{for } x_1, x_2 > 0 \]
    1. Derive the score function as a function of \(x_1, x_2, \theta_1, \theta_2\).
    2. Derive the Fisher Information.
  • Show that the Exponential(\(\lambda\)) distribution for unknown \(\lambda > 0\) belongs to an exponential family but the Uniform(\(0,\theta\)) distribution for unknown \(\theta \in \mathbb{R}\) does not.

Day 14: Thursday, November 7

Continuation

Day 15: Tuesday, November 12

Announcement:

  • Quiz originally scheduled for Thursday, November 14 will take place on Tuesday, November 19.

Links:

Day 16: Thursday, November 14

Sample quiz question for November 19:

Consider estimating the average lifetime of some electronic component, a ‘gizmo’, whose lifetime has an Exponential(\(\psi\)) distribution: \[f(x|\psi) = \psi \exp\{-\psi x\}\quad x > 0, \psi > 0\] using the ‘rate’ parametrization for the Exponential distribution.

Suppose N gizmos are observed for a year and n (1 < n < N ) burn out during the year, with lifetimes recorded as: \(x_1 , x_2 , ..., x_n\), where \(x\) is measured in years (fractional, of course). The remaining \(N − n\) are still functioning at the end of the year.

  1. Set up the E step of the EM algorithm to estimate \(\psi\). Recall that exponential lifetimes are ‘memoryless’, i.e. the distribution of the additional life of a gizmo that survives one year is Exponential(\(\psi\)).
  2. Derive the M step, i.e. how do you obtain \(\psi_{t+1}\) as a function of \(\psi_t\)?
  3. [Not for the quiz] Prove that each step in the EM algorithm cannot reduce the log-likelihood. Assume that the densities in the model, \(f(x|\theta)\) are positive with the same support.

Day 17: Tuesday, November 19

Day 18: Thursday, November 21

Assignment 4 (teams)

Here are some sample final exam questions for the course. I apologize if there are still some redundancies and errors.

Select the 20 hardest question for your team and let each team member work on five of these questions. Your best strategy is to try to choose the questions you find the hardest.

Post, compare and discuss your answers within your team.

Make your answers and discussion public to the class by December 2.

Links:

References

AdamO. 2018. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. April 6, 2018. https://stats.stackexchange.com/a/339065.
Alzahawi, Shilaan. 2021. “Building Your Website Using R {Blogdown}.” Shilaan Alzahawi. May 12, 2021. https://www.shilaan.com/post/building-your-website-using-r-blogdown/.
“Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” 2017. Stack Overflow. March 9, 2017. https://stackoverflow.com/a/42702950.
Bååth, Rasmus. 2013. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. October 7, 2013. https://stats.stackexchange.com/a/72128.
Barndorff-Nielsen, O. E., D. R. Cox, and N. Reid. 1986. “The Role of Differential Geometry in Statistical Theory.” International Statistical Review / Revue Internationale de Statistique 54 (1): 83–96. https://doi.org/10.2307/1403260.
“Basic Linear Algebra Subprograms.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Basic_Linear_Algebra_Subprograms&oldid=1162434953.
Bates, Casey. 2020. “23 RStudio Tips, Tricks, and Shortcuts.” Dataquest. June 10, 2020. https://www.dataquest.io/blog/rstudio-tips-tricks-shortcuts/.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” J. Stat. Soft. 67 (1). https://doi.org/10.18637/jss.v067.i01.
Benjamini, Yoav, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, et al. 2021. ASA President’s Task Force Statement on Statistical Significance and Replicability.” CHANCE 34 (4): 10–11. https://doi.org/10.1080/09332480.2021.2003631.
Berger, Adam. 1996. “Convexity, Maximum Likelihood and All That.” https://www.cs.cmu.edu/~roni/11761/Presentations/convexity-maximum-likelihood-and.pdf.
Bergland, Christopher. 2019. “Rethinking P-Values: Is "Statistical Significance" Useless?” Psychology Today. March 22, 2019. https://www.psychologytoday.com/blog/the-athletes-way/201903/rethinking-p-values-is-statistical-significance-useless.
Best, Joel. 2005. “Lies, Calculations and Constructions: Beyond "How to Lie with Statistics".” Statistical Science 20 (3): 210–14. https://www.jstor.org/stable/20061175.
Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo.” October 19, 2014. http://arxiv.org/abs/1410.5110.
Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98.
Biecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne Cook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s Quartet.” March 17, 2023. http://arxiv.org/abs/2302.13356.
BLAS (Basic Linear Algebra Subprograms).” n.d. Accessed July 18, 2023. https://www.netlib.org/blas/#_history.
Braun, Julia, Leonhard Held, and Bruno Ledergerber. 2012. “Predictive Cross‐validation for the Choice of Linear MixedEffects Models with Application to Data from the Swiss HIV Cohort Study.” Biometrics 68 (1): 53–61. https://doi.org/10.1111/j.1541-0420.2011.01621.x.
Braun, W. John, and Duncan J. Murdoch. 2021. A First Course in Statistical Programming with R. Third. Cambridge University Press. https://books.google.com?id=NzorEAAAQBAJ.
Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.
Brown, Andrew W., Douglas G. Altman, Tom Baranowski, J. Martin Bland, John A. Dawson, Nikhil V. Dhurandhar, Shima Dowla, et al. 2019. “Childhood Obesity Intervention Studies: A Narrative Review and Guide for Investigators, Authors, Editors, Reviewers, Journalists, and Readers to Guard Against Exaggerated Effectiveness Claims.” Obesity Reviews 20 (11): 1523–41. https://doi.org/10.1111/obr.12923.
Bungartz, Hans-Joachim, Christian Carbogno, Martin Galgon, Thomas Huckle, Simone Köcher, Hagen-Henrik Kowalski, Pavel Kus, et al. 2020. ELPA: A Parallel Solver for the Generalized Eigenvalue Problem1.” In Advances in Parallel Computing, edited by Ian Foster, Gerhard R. Joubert, Luděk Kučera, Wolfgang E. Nagel, and Frans Peters. IOS Press. https://doi.org/10.3233/APC200095.
Canada, Statistics. 2016. 2016 Census of Population [Canada] Public Use Microdata File (PUMF): Individuals File. http://odesi.ca/#/details?uri.
Canon, Stephen. 2013. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. July 25, 2013. https://stackoverflow.com/a/17858345.
Carroll, R. J., and David Ruppert. 1996. “The Use and Misuse of Orthogonal Regression in Linear Error-in-Variables Models.” American Statistician, February. https://journals-scholarsportal-info.ezproxy.library.yorku.ca/pdf/00031305/v50i0001/1_tuamoorilem_1.xml_en.
“Case Study: Group Work Gone Awry.” n.d. Google Docs. Accessed November 2, 2022. https://docs.google.com/document/d/136VgrrFxkeJSO47g0Bwbj5JqXXumRvf0P1pC9lO91pQ/edit?usp=embed_facebook.
Chow, Vinci. 2022. AMD BLAS/LAPACK Optimization in 2022.” SCRP CUHK Economics. February 7, 2022. http://localhost:4000/blog/analysis/2022/02/07/mkl-optimization.html.
Colby, Emily, and Eric Bair. 2013. “Cross-Validation for Nonlinear Mixed Effects Models.” Journal of Pharmacokinetics and Pharmacodynamics 40 (2): 243–52. https://doi.org/10.1007/s10928-013-9313-5.
“Computational Statistics.” n.d. Accessed July 18, 2023. https://onlinelibrary.wiley.com/doi/epub/10.1002/9781118555552.
“Cornish–Fisher Expansion.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Cornish%E2%80%93Fisher_expansion&oldid=1137666825.
Cox, D. R., and N. Reid. 1987. “Parameter Orthogonality and Approximate Conditional Inference.” Journal of the Royal Statistical Society: Series B (Methodological) 49 (1): 1–18. https://doi.org/10.1111/j.2517-6161.1987.tb01422.x.
Cunningham, Scott. 2021a. “Causal Inference The Mixtape - 6  Regression Discontinuity.” In. Yale University Press. https://mixtape.scunning.com/06-regression_discontinuity.
———. 2021b. Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/.
“Data Organization in Spreadsheets.” n.d. Accessed May 8, 2023. https://www.tandfonline.com/doi/epdf/10.1080/00031305.2017.1375989?needAccess=true&role=button.
“Data to Viz | A Collection of Graphic Pitfalls.” n.d. Accessed September 3, 2023. https://www.data-to-viz.com/caveats.html.
Dempster, Arthur. 1968. “Elements of Continuous Multivariate Analysis.”
“Differential Geometry in Statistical Inference - Encyclopedia of Mathematics.” n.d. Accessed April 21, 2023. https://encyclopediaofmath.org/wiki/Differential_geometry_in_statistical_inference.
Efron, Bradley. 2018. “Curvature and Inference for Maximum Likelihood Estimates.” Ann. Statist. 46 (4). https://doi.org/10.1214/17-AOS1598.
Farimani, Foad S. 2017. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. February 13, 2017. https://stackoverflow.com/a/42212642.
Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Sage Publications.
———. 2021. “Answers to Odd-Numbered Exercises for Chapter 25 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-25-odd-exercise-answers.pdf.
———. 2023a. “Answers to Odd-Numbered Exercises for Chapter 26 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-26-odd-exercise-answers.pdf.
———. 2023b. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) :Causal Inferences From Observational Data: Directed Acyclic Graphs and Potential Outcomes.” Applied Regression Analysis; Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) Causal Inferences From Observational Data: Directed Acyclic Graphs; Potential Outcomes. March 31, 2023. https://www.john-fox.ca/AppliedRegression/chap-26.pdf.
———. 2023c. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 25 (Draft) Bayesian Estimation of Regression Models.” April 1, 2023. https://www.john-fox.ca/AppliedRegression/chap-25.pdf.
———. n.d. “Blau and Duncan Stratification Data Set.” Accessed October 10, 2023. https://www.john-fox.ca/AppliedRegression/BlauDuncan.txt.
Friendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statist. Sci. 28 (1): 1–39. https://doi.org/10.1214/12-STS402.
Galdino, Manoel. 2011. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. May 23, 2011. https://stats.stackexchange.com/a/11137.
Gandhi, Ratnik, and Amoli Rajgor. 2017. “Updating Singular Value Decomposition for Rank One Matrix Perturbation.” July 26, 2017. http://arxiv.org/abs/1707.08369.
Gao, Kaifeng, Gang Mei, Francesco Piccialli, Salvatore Cuomo, Jingzhi Tu, and Zenan Huo. 2020. “Julia Language in Machine Learning: Algorithms, Applications, and Open Issues.” Computer Science Review 37 (August): 100254. https://doi.org/10.1016/j.cosrev.2020.100254.
Geijn, Robert van de. 2018. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. September 3, 2018. https://stackoverflow.com/a/52156861.
Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Stat Comput 24 (6): 997–1016. https://doi.org/10.1007/s11222-013-9416-2.
Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung SuColumbia University. n.d. “Bayesian Generalized Linear Models and an Appropriate Default Prior.” Logistic Regression.
“General Tips for Making Your Report Appealing and Easy To Skim.” n.d. Accessed April 29, 2023. https://www.ahrq.gov/talkingquality/resources/design/general-tips/index.html.
Gentle, James E., Wolfgang Karl Härdle, and Yuichi Mori, eds. 2012. Handbook of Computational Statistics: Concepts and Methods. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21551-3.
George, Brandon J., T. Mark Beasley, Andrew W. Brown, John Dawson, Rositsa Dimova, Jasmin Divers, TaShauna U. Goldsby, et al. 2016. “Common Scientific and Statistical Errors in Obesity Research: Common Statistical Errors in Obesity Research.” Obesity 24 (4): 781–90. https://doi.org/10.1002/oby.21449.
“Getting Started With GitHubThe Turing Way.” n.d. Accessed August 30, 2023. https://the-turing-way.netlify.app/collaboration/github-novice.html#.
Geyer, Charles J. n.d. “The Wilks, Wald, and Rao Tests.”
Givens, Geof H, and Jennifer A Hoeting. 2013. Computational Statistics. 2nd ed. Wiley.
———. 2014. “Errata for Computational Statistics, Second Edition.”
Givens, Geof, and J. A. Hoeting. 2013. “Web Page for Computational Statistics, by G. H. Givens and J. A. Hoeting.” 2013. https://www.stat.colostate.edu/computationalstatistics/.
Greenland, Sander. 2010. “Simpson’s Paradox From Adding Constants in Contingency Tables as an Example of Bayesian Noncollapsibility.” The American Statistician 64 (4): 340–44. https://doi.org/10.1198/tast.2010.10006.
Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” Eur J Epidemiol 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.
“Halton Sequence.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Halton_sequence&oldid=1170076511.
Harrell, Frank. 2021. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. July 25, 2021. https://stats.stackexchange.com/a/535873.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. 2nd ed. Springer. https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf.
Healy, Yan Holtz and Conor. n.d. “The Issue with Error Bars.” Accessed September 3, 2023. https://www.data-to-viz.com/caveat/www.data-to-viz.com/caveat/error_bar.html.
Hitchcock, David B. 2003. “A History of the Metropolis-Hastings Algorithm.” The American Statistician 57 (4): 254–57. https://www.jstor.org/stable/30037292.
“How R Searches and Finds Stuff — Study Notes.” n.d. Accessed April 5, 2023. https://askming.github.io/study_notes/Stats_Comp/Note-How%20R%20searches%20and%20finds%20stuff.html.
Hyndman. 2014. “Rob J Hyndman - Fast Computation of Cross-Validation in Linear Models.” March 17, 2014. https://robjhyndman.com/hyndsight/loocv-linear-models/.
IBM Technology, dir. 2022. R Vs Python. https://www.youtube.com/watch?v=4lcwTGA7MZw.
“Ina Beaveridge (Nee Soesina Adriana Hensen) | Obituaries | McCall Gardens Funeral and Cremation Service | Victoria BC.” n.d. McCall Gardens Funeral; Cremation Service | Victoria, BC. Accessed November 11, 2024. https://www.mccallgardens.com/obituaries/ina-beaveridgesoesina-adrian-hensen/.
“Introduction.” n.d. Accessed September 4, 2023. https://cran.r-project.org/web/packages/rgl/vignettes/WebGL.html.
Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): 6. https://journals.plos.org/plosmedicine/article/file?id=10.1371%2Fjournal.pmed.0020124&type=printable.
Ioannidis, John P. A. 2019. “What Have We (Not) Learnt from Millions of Scientific Papers with P Values?” The American Statistician 73 (March): 20–25. https://doi.org/10.1080/00031305.2018.1447512.
Ionides, Edward. n.d. “Profile Likelihood, Smoothed Likelihood, and Monte Carlo Likelihood.”
Kass, Robert E. 1989. “The Geometry of Asymptotic Inference.” Statist. Sci. 4 (3). https://doi.org/10.1214/ss/1177012480.
Kass, Robert E., Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid. 2016. “Ten Simple Rules for Effective Statistical Practice.” Edited by Fran Lewitter. PLoS Comput Biol 12 (6): e1004961. https://doi.org/10.1371/journal.pcbi.1004961.
Kass, Robert E., and Paul W. Vos. 1997. Geometrical Foundations of Asymptotic Inference. Wiley Series in Probability and Statistics. New York: Wiley.
Kennedy, WIlliam J., and James E. Gentle. 2021. Statistical Computing. New York: Routledge. https://doi.org/10.1201/9780203738672.
King, Gary. 1986. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science 30 (3): 666–87. https://doi.org/10.2307/2111095.
King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. https://doi.org/10.2307/2669316.
Lafit, Ginette. (2021) 2023. “Estimating Linear Mixed-Models with Julia Using R.” https://github.com/ginettelafit/MixedModelswithRandJulia.
Lakens, Daniël. 2022. “Improving Your Statistical Inferences.” 2022. https://doi.org/10.5281/zenodo.6409077.
LaTeX for Complete Novices.” n.d. Accessed August 26, 2023. https://www.dickimaw-books.com/latex/novices/index.html.
Lauritzen, Steffen. n.d. “Maximum Likelihood in Exponential Families.”
Li, Changcheng. 2019. JuliaCall: An R Package for Seamless Integration Between R and Julia.” JOSS 4 (35): 1284. https://doi.org/10.21105/joss.01284.
Loy, Adam, Heike Hofmann, and Dianne Cook. 2016. “Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners.” December 6, 2016. http://arxiv.org/abs/1502.06988.
Ma, Yan, Madhu Mazumdar, and Stavros G. Memtsoudis. 2012. “Beyond Repeated-Measures Analysis of Variance: Advanced Statistical Methods for the Analysis of Longitudinal Data in Anesthesia Research.” Regional Anesthesia and Pain Medicine 37 (1): 99–105. https://doi.org/10.1097/AAP.0b013e31823ebc74.
Maechler, Martin. n.d. “Rmpfr: R MPFR - Multiple Precision Floating-Point Reliable.”
makhlaghi. 2019. “What Is the Relation Between BLAS, LAPACK and ATLAS.” Forum post. Stack Overflow. January 24, 2019. https://stackoverflow.com/q/17858104.
mathematical.coffee. 2016a. “Importing Common YAML in Rstudio/Knitr Document.” Forum post. Stack Overflow. October 6, 2016. https://stackoverflow.com/q/39885363.
———. 2016b. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 7, 2016. https://stackoverflow.com/a/39909079.
matloff. 2015. OpenMP Tutorial, with R Interface | R-bloggers.” January 17, 2015. https://www.r-bloggers.com/2015/01/openmp-tutorial-with-r-interface/.
MavropaliasG. 2023. “My Setup as a Researcher. How to Write, Run Statistics, and Work Seamlessly with R, Obsidian, Linux, and Zotero, and Collaborate with Senior Professors Who Only Accept MS Word Files!” Reddit Post. r/ObsidianMD. March 28, 2023. www.reddit.com/r/ObsidianMD/comments/124cd8y/my_setup_as_a_researcher_how_to_write_run/.
Monica, Scortchi-Reinstate. 2013. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. September 1, 2013. https://stats.stackexchange.com/a/68917.
Mühleisen, Hannes, and Mark Raasveldt. 2023. Duckdb: DBI Package for the DuckDB Database Management System. Manual. https://CRAN.R-project.org/package=duckdb.
Müller, Samuel, J. L. Scealy, and A. H. Welsh. 2013. “Model Selection in Linear Mixed Models.” Statistical Science 28 (2): 135–67. https://www.jstor.org/stable/43288485.
Muralidharan, Karthik, Mauricio Romero, and Kaspar Wüthrich. n.d. “Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments.”
Nature. 2023. “Points of Significance. Editorial.” Nat Hum Behav 7 (3): 293–94. https://doi.org/10.1038/s41562-023-01586-w.
“Netlib BLAS FAQ.” 2017. 2017. https://www.netlib.org/blas/faq.html.
Nuzzo, Regina. 2014. “P Values, the ‘Gold Standard’ of Statistical Validity, Are Not as Reliable as Many Scientists Assume.” Nature 506 (February): 150–52.
Oliver, John, dir. 2016. Last Week Tonight with John Oliver: Scientific Studies. HBO. https://www.youtube.com/watch?v=0Rnq1NpHdmw.
Oxberry, Geoff. 2012. “Answer to "Updatable SVD Implementation in Python, C, or Fortran?".” Computational Science Stack Exchange. July 3, 2012. https://scicomp.stackexchange.com/a/2686.
Ozgur, Ceyhun, Taylor Colliau, Grace Rogers, Zachariah Hughes, and Elyse "Bennie" Myer-Tyson. 2017. “MatLab vs. Python vs. R.” Journal of Data Science 15 (3): 355–71. https://doi.org/10.6339/JDS.201707_15(3).0001.
Peng, Roger D. n.d. 1.3 Textbooks Vs. Computers | Advanced Statistical Computing. Accessed November 10, 2023. https://bookdown.org/rdpeng/advstatcomp/textbooks-vs.-computers.html.
Pocock, Stuart J, and James H Ware. 2009. “Translating Statistical Findings into Plain English.” The Lancet 373 (9679): 1926–28. https://doi.org/10.1016/S0140-6736(09)60499-2.
QMNET - Quantitative Methods Network, dir. 2020. Missing Data Imputation with Low Rank Models. https://www.youtube.com/watch?v=OjXPskXO8No.
Rao, C. 2017. “Book Review: Multivariate Statistical Methods, A Primer.” Journal of Modern Applied Statistical Methods 16 (1). https://doi.org/10.22237/jmasm/1493599260.
Ratz, Arthur V. 2021. “Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm.” Medium. April 9, 2021. https://towardsdatascience.com/can-qr-decomposition-be-actually-faster-schwarz-rutishauser-algorithm-a32c0cde8b9b.
Reid, N. 2003. “Asymptotics and the Theory of Inference.” Ann. Statist. 31 (6). https://doi.org/10.1214/aos/1074290325.
Reid, Nancy. 2010. “Likelihood Inference: Likelihood Inference.” WIREs Comp Stat 2 (5): 517–25. https://doi.org/10.1002/wics.110.
Roche, Alexis. 2012. EM Algorithm and Variants: An Informal Tutorial.” http://arxiv.org/abs/1105.1476.
Rodrigues, Bruno. n.d. Reproducible Analytical Pipelines - Master’s of Data Science. Accessed October 30, 2022. https://rap4mads.eu/.
Rothman, Kenneth J. 2014. “Six Persistent Research Misconceptions.” J GEN INTERN MED 29 (7): 1060–64. https://doi.org/10.1007/s11606-013-2755-z.
Sabbe, Nick. 2011. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. May 22, 2011. https://stats.stackexchange.com/a/11110.
Sabine Hossenfelder, dir. 2021. How I Learned to Love Pseudoscience. https://www.youtube.com/watch?v=bWV0XIn-rvY.
Santillan, Carlos. 2018. “Improving R Perfomance by Installing Optimized BLAS/LAPACK Libraries.” July 25, 2018. https://csantill.github.io/RPerformanceWBLAS/.
Sawitzki, G. 2009. “Web Page for Computational Statistics: An Introduction to R.” StatLab Heidelberg. 2009. https://sintro.r-forge.r-project.org/.
Schervish, Mark J. 1996. “P Values: What They Are and What They Are Not.” The American Statistician 30 (3): 203–6.
Schmidt, Anthony. 2021. “A Zotero Workflow for R.” Anthony Schmidt. October 25, 2021. https://www.anthonyschmidt.co/post/2021-10-25-a-zotero-workflow-for-r/.
Sharp, Julia L. 2022. “Statistical Collaboration Training Videos.” 2022. https://www.youtube.com/playlist?list=PLCqpxiFnahaBFnIXIWmTE1eYpa2ov5VWL.
———. 2023. “Setting the Stage: Statistical Collaboration Training Videos.” 2023. https://sites.google.com/site/julialsharp/other-resources/statistical-collaboration-training-videos.
Sherrington, Malcolm. 2015. Mastering Julia. Packt Publishing Ltd.
Silk, Matthew. 2019. “Mixed Model Diagnostics.” 2019. https://dfzljdn9uc3pi.cloudfront.net/2020/9522/1/MixedModelDiagnostics.html.
Soch, Joram. 2019. “The Book of Statistical Proofs.” The Book of Statistical Proofs. 2019. https://statproofbook.github.io/.
Software, Econometrics and Free. 2022. “A Linux Live USB as a Statistical Programming Dev Environment | R-bloggers.” October 29, 2022. https://www.r-bloggers.com/2022/10/a-linux-live-usb-as-a-statistical-programming-dev-environment/.
“Stat 3701 Lecture Notes: R Generic Functions.” n.d. Accessed February 15, 2023. https://www.stat.umn.edu/geyer/3701/notes/generic.html.
Statistical Tools for High-Throughput Data Analysis. 2023. “A Complete Guide to 3D Visualization Device System in R - R Software and Data Visualization - Easy Guides - Wiki - STHDA.” 2023. http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization.
StayLearning. 2015. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. December 19, 2015. https://stats.stackexchange.com/a/187561.
Stone, M. 1977. “A Unified Approach to Coordinate-Free Multivariate Analysis.” Ann Inst Stat Math 29 (1): 43–57. https://doi.org/10.1007/BF02532773.
strboul. 2019. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 21, 2019. https://stackoverflow.com/a/58491399.
Team, Posit. 2022. “Posit.” Posit. September 14, 2022. https://www.posit.co/.
tillsten. 2013. “Updatable SVD Implementation in Python, C, or Fortran?” Forum post. Computational Science Stack Exchange. February 27, 2013. https://scicomp.stackexchange.com/q/2678.
UC3M, Coding Club. 2019. “Simple yet Elegant Object-Oriented Programming in R with S3.” Coding Club UC3M. May 28, 2019. https://codingclubuc3m.rbind.io/post/2019-05-28/.
Unknown. n.d. “Lme4 Convergence Warnings: Troubleshooting.” Accessed September 23, 2023. https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html.
user1436187. 2015. “Updating SVD Decomposition After Adding One New Row to the Matrix.” Forum post. Cross Validated. October 18, 2015. https://stats.stackexchange.com/q/177007.
user333. 2022. “How to Deal with Perfect Separation in Logistic Regression?” Forum post. Cross Validated. August 27, 2022. https://stats.stackexchange.com/q/11109.
user78229. 2015. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. November 26, 2015. https://stats.stackexchange.com/a/183763.
“Using oneMKL with R.” n.d. Intel. Accessed July 18, 2023. https://www.intel.com/content/www/us/en/developer/articles/technical/using-onemkl-with-r.html.
usεr11852. 2015. “Answer to "Updating SVD Decomposition After Adding One New Row to the Matrix".” Cross Validated. October 31, 2015. https://stats.stackexchange.com/a/179539.
Vehtari, Aki. n.d. TUTORIAL ON MODEL ASSESMENT, SELECTION AND INFERENCE AFTER SELECTION.”
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.
Wasserstein, Ronald L., Allen L. Schirm, and Nicole A. Lazar. 2019. “Moving to a World Beyond p \(<\) 0.05’.” The American Statistician 73 (March): 1–19. https://doi.org/10.1080/00031305.2019.1583913.
Wicherts, Jelte M., Coosje L. S. Veldkamp, Hilde E. M. Augusteijn, Marjan Bakker, Robbie C. M. Van Aert, and Marcel A. L. M. Van Assen. 2016. “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking.” Front. Psychol. 7 (November). https://doi.org/10.3389/fpsyg.2016.01832.
Wickham, Hadley. 2019. Advanced R. 2nd ed. CRC Press. https://adv-r.hadley.nz/.
Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” Edited by Francis Ouellette. PLoS Comput Biol 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.
Xie, Yihui. (2011) 2022. “Knitr.” https://github.com/yihui/knitr/blob/36efc00013d423b8f098ff37ba524c9d29810fa0/inst/examples/knitr-spin.R.
Zhu, Xiaorui. 2021. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. July 25, 2021. https://stats.stackexchange.com/a/535860.

  1. R is to S as Linux is to Unix as GNU C is to C as GNU C++ is to C …. S, Unix, C and C++ were created at Bell Labs in the late 60s and the 70s. R, Linux, GNU C and GNU C++ are public license re-engineerings of the proprietary S, Unix, C and C++ respectively.↩︎