This version: January 19 2025 23:32
Click here to get the latest update
Click here to go to the end of this page

Doubt is not a pleasant condition, but certainty is absurd. – Voltaire

Where there is no uncertainty, there cannot be truth – Richard Feynman

To teach how to live without certainty, and yet without being paralysed by hesitation, is perhaps the chief thing that philosophy, in our age, can still do for those who study it. — Bertrand Russell

How harmful is smoking?: Cigarette consumption and life expectancy in 189 countries in 2004: Correlation is not causation. This is an example of the ecological fallacy, which is itself an example of a deeper phenomenon: Simpson’s Paradox.

Announcements

  • 2025-01-12: Note that all quizzes are graded out of 20 marks, regardless of what eclass says.

Quick links

Calendar

Classes meet on Mondays, Wednesday and Friday from 9:30 to 10:20 in SC 222.
Tutorial: On Zoom (click here), weekly: Tuesdays at 9am Instructor: Georges Monette Office hours: On Zoom (click here), weekly: Tuesday at 10am Email: Messages about course content or course organization should be posted publicly on Piazza so everyone can benefit from the questions and discussions.

Day 1: Monday, January 6

  • Lecture slides and Quiz 1 questions

  • Course description

  • This evening, I will resend invitations to join Piazza to all students currently registered in the course.

    • I will use your York e-mail address. If you don’t read email at your York email address, please make sure that it’s forwarded to an email address that you do read regularly.
    • Please do not change your e-mail address on Piazza because your York email address is used to identify you so you get credited for your contributions. You make my life much easier if I don’t have to manually cut and paste your grades.
  • Topic 1: AMSTAT News Survey of Master’s Graduates (2023 US Statistics Graduates surveyed in mid-2024)

  • Topic 2: The meaning of p-values Consider the simplest of statistical tests. Suppose you plan to sample 100 observations from a population that is known to have a normal distribution with variance equal to 1 but unknown non-negative mean \(\mu\).

    You plan to test \(H_0: \mu = 0\) versus the one-sided alternative \(H_A: \mu > 0\).

    Consider what happens if you use a 5% test, i.e. reject the null hypothesis if \(p < 0.05\).

    Suppose you reject \(H_0\). What does this tell you about \(H_0\)?

    What if the \(p\)-value is 0.049? Or what if the \(p\)-value is 0.0003?

Assignment 1 (individual): Joining Piazza

Due Thursday, January 9, 9 pm

  • Join Piazza using the invitation sent to your yorku.ca email address. The access code, if you need to use it, is blackwell.
  • Send a private message to the instructor confirming that you have joined Piazza. Use the folder assn1. If you have any private questions about the course, e.g. whether you have the required prerequisites, you can ask them in this message.

Day 2: Wednesday, January 8

Assignment 2 (individual) NHST and \(p\)-values

Due: Monday, January 13 at 11:59pm: Submit your answers on Piazza, preferably as a pdf file generated with Rmarkdown, but any other way is ok. Use the folder assn2 on Piazza for any posts submitted for this assignment. Do as much of the assignment as you are capable of doing. You will be graded on evidence of ‘honest effort’.

Discussion (The questions in this part are rhetorical, i.e. you don’t need to answer them)

This purpose of this assignment is to reawaken the skills you learned in MATH 2131 and MATH 3131, and to see whether some important statistical concepts really mean what most people who use them think they mean.

Consider the simplest of statistical tests. Suppose you plan to sample 100 observations from a population that is known to have a normal distribution with variance equal to 1.0 but unknown non-negative mean \(\mu\).

You plan to test \(H_0: \mu = 0\) versus one-sided alternative \(H_A: \mu > 0\).

Consider what happens if you use a 5% test, i.e. reject the null hypothesis if \(p < 0.05\).

Suppose you reject \(H_0\). What does this tell you about \(H_0\)?

We know that the p-value tells you something about the probability of an event related to the data given \(H_0\). So, although we know that the \(p\)-value isn’t the actual probability of \(H_0\) given the data, it’s natural to feel that they are somehow related. After all, why would we use a \(p\)-value unless it tells us something about \(H_0\)? We aren’t interested in the data. What we are really interested in is \(H_0\).

So it’s useful to actually explore the connection between \(p\)-values and the probability of \(H_0\). But we can’t get an actual probability for \(H_0\), without also hypothesizing a prior distribution on the hypotheses: we need a hypothetical probability for \(H_0\) and a hypothetical distribution over the possible values in \(H_A\).

In this exercise, you will consider the posterior probability of \(H_0\) given different possibilities for prior probabilities to get a sense of the strength of the evidence against \(H_0\) when you have a \(p\)-value of, for example, 0.049, or of 0.001.

Assignment:

  1. In the scenario above, suppose that, before getting the data, \(H_0\) and \(H_A\) are considered equally probable, and that \(H_A: \mu_A = 0.2\).
    1. What is the posterior probability of \(H_0\) given that \(p < 0.05\)? Note that this ignores the actual value of the \(p\)-value, as long as it’s less than 0.05. This is the posterior probability over all possible values of the \(p\)-value that are less than 0.05.
    2. But in an analysis, we actually know the \(p\)-value. What is the posterior probability of \(H_0\) given that \(p = 0.049\). Or that \(p = 0.001\)?
    3. Discuss the implications of these results.
  2. (Somewhat challenging) Now, consider different possibilities for \(\mu_A\). For what value of \(\mu_A\) would we achieve the strongest evidence against \(H_0\) if \(p=0.049\) (i.e. minimum posterior probability), and what is that posterior probability? What is the answer if \(p = 0.001\)?
  3. Write a program or function (your choice of programming language) that calculates the posterior probability of \(H_0\), in the current scenario, given the following inputs:
    1. observed p-value,
    2. the prior probability of \(H_0\)
    3. the hypothetical value of the alternative: \(\mu_A\).
  4. In one graph, plot the posterior probability of \(H_0\) as a function of \(\mu_A\) given \(p\)-values of 0.049 and 0.001, assuming a prior probability of 0.5 for \(H_0\). Repeat assuming a prior probability of 0.05 for \(H_0\).
  5. (More challenging) Suppose you would like to consider what \(p\)-value you would need to ‘legally disprove’ \(H_0\) in the sense of reversing a large prior probability of \(H_0\) (e.g. a presumption of innocence) into a small posterior probability of \(H_0\) (e.g. ‘proof’ beyond a reasonable doubt). Suppose that ‘presumption of innocence’ is taken to be a prior probability of 0.95 and that proof beyond a reasonable doubt corresponds to a posterior probability less than 0.05. What \(p\)-value would achieve this? 1

As you work on this assignment, refer back occasionally to the questions in Quiz 1. How is your thinking about these questions evolving? If the same questions were to appear in Quiz 3, how would you answer them?

Assignment 3 (individual): Setting things up

Due Sunday, January 12, 9 pm

  • Summary:
    1. Install (or update) R and RStudio
    2. Get a free Github account
    3. Install git on your computer
    4. Post publicly on Piazza if you run into problems. Help others if you can. Before the deadline on Sunday, post at least one public message commenting on your experiences installing software. Use the folder ‘assn2’.
  • 1. Install R and RStudio following these instructions. If you have already installed R and RStudio, update them to the latest versions.
  • 2. Get a free Github account: If you don’t have one, first consider choosing a name. Here’s an excellent source of advice from Jenny Bryan.
    • CAUTION: Avoid installing ‘Github for Windows’ from the Github site. It is not the same as ‘Git for Windows’.
  • 3. Install git on your computer using the instructions on Jenny Bryan’s webpage.
    • If you are curious about how git is used have a look at this tutorial!
    • As a final step: In the RStudio menu click on Tools | Global Options ... | Terminal. Then click on the box in Shell paragraph to the right of New terminals open with:
      • On a PC, select Git Bash
      • On a Mac, select Bash
    • You don’t need to do anything else at this time. We will see how to set up SSH keys to connect to Github through the RStudio terminal in a few lectures.
  • 4. Post questions on Piazza and if everything goes well, post that on Piazza. Use the folder assn3.

Day 3: Friday, January 10

Announcements:

Day 4: Monday, January 13

Quiz on Wednesday:

Sample question:

Given a table of conditional mean survival rates in percents(Y) by Gender (G: F or M) for two treatments (X: A or B) for a disease D:

X = A X = B
G = F 90.0 70.0
G = M 40.0 30.0

The following are the numbers of individuals in each group:

X = A X = B
G = F 100 300
G = M 400 100
  1. Draw the trapezoid of mean survival rates.
  2. Find the marginal effect of X (comparing treatment B to treatment A)
  3. Find the conditional (specific) effects of X
  4. Find the marginal effect of G.
  5. Draw the lines corresponding to the Paik-Agresti diagram and the Liu-Meng diagram.
  6. Would the conditional effects or the marginal effect be more appropriate as a measure of the effectiveness of this treatment assuming the Ms and Fs were assigned randomly (albeit not with equal probability) to the treatments.
  7. Suppose variable G stands for ‘Gastric Complications’ that can be caused by the treatment (F is few and M is many). Again, subjects (a random mix of men and women) were randomly assigned to the the two treatments. Would the conditional effects or the marginal effect be more appropriate as a measure of the effectiveness of this treatment.
  8. Discuss your answers to (f) and (g) briefly.

Assignment 4 (class): Setting things up

Due Sunday, January 19, 9 pm

Work your way through Regression Review: Regression in R / R script and post notes/questions/etc. using the folder assn4.

  • Formulate questions and post them.

  • Find errors (e.g. code that no longer works because things change) and flag them.

  • Raise topics for discussion.

  • Reply to other students’ postings.

  • Topic 1: How to estimate everything

  • Topic 2: Some real data on Smoking

Day 5: Wednesday, January 15

Announcement: Tutorial on Zoom (click here), every Tuesday from 1:30 to 2:30 pm.

Quiz Today:

Day 6: Friday, January 17

Concepts and Theory

Using R:

Learning R:

Why R? What about SAS, SPSS, Python, among others?

SAS is a very high quality, intensely engineered, environment for statistical analysis. It is widely used by large corporations. New procedures in SAS are developed and thoroughly tested by a team of 1,000 or more SAS engineers before being released. It currently has more than 300 procedures.

R is an environment for statistical programming and development that has accumulated many somewhat inconsistent layers developed over time by people of varying abilities, many of whom work largely independently of each other. There is no centralized quality testing except to check whether code and documentation run before a new package is added to R’s main repository, CRAN. When this page was last updated, CRAN had 21,911 packages.

In addition, a large number of packages under development are available through other repositories, such as github.

The development of SAS began in 1966 and that of R (in the form of its predecessor, S, at Bell Labs) in 1976.

The ‘design’ of R (using ‘R’ to refer to both R and to S) owes a lot to the design of Unix. The idea is to create a toolbox of simple tools that you link together yourself to perform an analysis. Unix, now mainly as Linux, 2 commands were simple tools linked together by pipes so the output of one command is the input of the next. To do anything you need to put the commands together yourself.

The same is true of R. It’s extremely flexible but at the cost of requiring you to know what you want to do and to be able to use its many tools in combination with each other to achieve your goal. Many decisions in R’s design were intended to make it easy to use interactively. Often the result is a language that is very quirky for programming.

SAS, in contrast, requires you to select options to run large procedures that purport to do the entire job.

This is an old joke: If someone publishes a journal article about a new statistical method, it might be added to SAS in 5 to 10 years. It won’t be added to SPSS until 5 years after there’s a textbook written about it, maybe another 10 to 15 years after its appearance in SAS.

It was added to R two years ago because the new method was developed as a package in R long before being published.

So why become a statistician? So you can have the breadth and depth of understanding that someone needs to apply the latest statistical ideas with the intelligence and discernment to use them effectively.

So expect to have a symbiotic relationship with R. You need R to have access to the tools that implement the latest ideas in statistics. R needs you because it takes people like you to use R effectively.

The role of R in this course is to help us

  • have access to tools to expand our ability to explore and analyze data, and
  • learn how to develop and implement new statistical methods. i.e. learn how to build new tools
  • deepen our understanding of the use of statistics for scientific discovery as well as for business applications

It’s very challenging to find a good way to ‘learn R’. It depend on where you are and where you want to go. Now, there’s a plethora of on-line courses. See the blog post: The 5 Most Effective Ways to Learn R

In my opinion, ultimately, the best way is to

  • play your way through the ‘official’ manuals on CRAN starting with ‘An Introduction to R’ along with ‘R Data Import/Export’. Note however that these materials were developed before the current mounting concern with reproducible research and some of the advice should be deprecated, e.g. using ‘attach’ and ‘detach’ with data.frames.
  • read the CRAN task views in areas that interest you.
  • Have a look at the 1/2 million questions tagged ‘r’ on stackoverflow.
  • At every opportunity, use R Markdown documents (like the sample script you ran when you installed R) to work on assignments, project, etc.

Using R is like playing the piano. You can read and learn all the theory you want, ultimately you learn by playing.

Copy the following scripts as files in RStudio:

Play with them line by line.

Post questions arising from these scripts to the ‘question’ folder on Piazza. We will take up some questions in class and others in tutorials scheduled to deal with questions on R.

Day 7: Monday, January 20

Assignment 5 (teams)

Exercises:

  • From 4939 questions
    • 5.1, 5.2, 5.3, 5.4, 5.5
    • 5.6.23, 5.6.24, 5.6.25, 5.6.26, 5.6.27
    • 6.1, 6.2, 6.3, 6.4, 6.5
    • 7.4, 7.5, 7.6, 7.7, 7.8
    • 8.1, 8.2, 8.3, 8.4, 8.5
    • 8.6, 8.7, 8.8, 8.9, 8.10
    • 8.18.a, 8.18.b, 8.18.c, 8.18.d, 8.18.e
    • 8.36.a, 8.36.b, 8.36.c, 8.36.d, 8.36.e
    • 8.51.a, 8.51.b, 8.51.c, (write R functions) that would work on matrices of any size), 8.61.a, 8.62.a
    • 12.1, 12.3, 12.5, 12.7, 12.9
  • Do the exercises above. There are 5 members in each team. Randomly assign the numbers 1 to 5 to members of your team (without replacement). Member number 1 does the first question in each row, Member number 2 does the second question in each row, etc.
  • Deadlines: See the course description for the meaning of these deadlines.
    1. Sunday, January 26 at noon
    2. Tuesday, January 28 at noon
    3. Thursday, January 30 at 9 pm
  • IMPORTANT:
    • Upload the answer to each question in a single Piazza post (post it as a Piazza ‘Note’, not as a Piazza ‘Question’) with the title: “A5 5.1” for the first question, etc. (That’s “A5” for assignment 5 and “5.1” for the question). You can add more text after “A5 5.1”, e.g. “A5 5.1
    • You can answer the question directly in the posting or by uploading a pdf file and the R script or Rmarkdown script that generated it.
    • When providing help or comments, do so as “followup discussion”.

Continuation

Concepts and Theory

Using R:

Topic 2 (continuing): Regression Review

Bibliography

“A Data.table and Dplyr Tour · Home.” n.d. Accessed December 20, 2024. https://atrebas.github.io/post/2019-03-03-datatable-dplyr/.
Agresti, Alan. 2007. An Introduction to Categorical Data Analysis. Second. John Wiley.
Anscombe, Francis J. 1973. “Graphs in Statistical Analysis.” The American Statistician 27 (1): 17–21.
Article. n.d.
Ashwanden, Christie. 2015. “Science Isn’t Broken: It’s Just a Hell of Lot Harder Than We Give It Credit For.” FiveThirtyEight.com, August. https://fivethirtyeight.com/features/science-isnt-broken/.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” Journal of Statistical Software 67 (1). https://doi.org/10.18637/jss.v067.i01.
Bergland, Christopher. 2019. “Rethinking P-Values: Is "Statistical Significance" Useless?” Psychology Today. March 22, 2019. https://www.psychologytoday.com/blog/the-athletes-way/201903/rethinking-p-values-is-statistical-significance-useless.
Book. n.d.
Bouman, Judith A., Anthony Hauser, Simon L. Grimm, Martin Wohlfender, Samir Bhatt, Elizaveta Semenova, Andrew Gelman, Christian L. Althaus, and Julien Riou. 2024. “Bayesian Workflow for Time-Varying Transmission in Stratified Compartmental Infectious Disease Transmission Models.” Edited by Samuel V. Scarpino. PLOS Computational Biology 20 (4): e1011575. https://doi.org/10.1371/journal.pcbi.1011575.
Chung, Kai Lai. 1974. A Course in Probability Theory. 3rd ed.
“Datasets - UCI Machine Learning Repository.” n.d. Accessed December 17, 2023. https://archive.ics.uci.edu/datasets?orderBy=DateDonated&sort=desc.
Dmitry. 2024. DHARMa Residual Pattern.” Forum post. Cross Validated. October 15, 2024. https://stats.stackexchange.com/q/655401.
“Essay3.” n.d. Accessed January 14, 2025. https://jwilson.coe.uga.edu/emt668/emat6680.2000/umberger/EMAT6690smu/Essay3smu/Essay3smu.html.
Evans, Michael J, and Jeffrey S Rosenthal. 2009. Probability and Statistics: The Science of Uncertainty. Second. Macmillan. http://www.utstat.toronto.edu/mikevans/jeffrosenthal/book.pdf.
Fox, John. 2016a. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Sage Publications.
———. 2016b. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Sage Publications.
Fox, John, and Jangman Hong. 2009. “Effect Displays in R for Multinomial and Proportional-Odds Logit Models: Extensions to the effects Package.” Journal of Statistical Software 32 (1): 1–24. http://www.jstatsoft.org/v32/i01/.
Fox, John, and Sanford Weisberg. 2019. An R and S-Plus Companion to Applied Regression. 3rd ed. Sage Publications.
Friendly, Michael. 2017. “An Introduction to R Graphics.” SCS Short Course, March. http://www.datavis.ca/courses/RGraphics/.
Friendly, Michael, Georges Monette, John Fox, et al. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statistical Science 28 (1): 1–39.
Gigerenzer, Gerd. 2004. “Mindless Statistics.” The Journal of Socio-Economics 33 (5): 587–606. https://doi.org/10.1016/j.socec.2004.09.033.
Giles, Philip. 2001. “An Overview of the Survey of Laobur and Income Dynamics (SLID).” Canadian Studies in Population 28 (2): 363–75. http://www.canpopsoc.ca/CanPopSoc/assets/File/publications/journal/CSPv28n2p363.pdf.
Gillespie, Colin, and Robin Lovelace. n.d. Efficient R Programming. Accessed December 26, 2019. https://csgillespie.github.io/efficientR/.
glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling.” 2024. ResearchGate, October. https://doi.org/10.32614/RJ-2017-066.
Gunter, Bert, and Christopher Tong. 2017. “What Are the Odds!? The ‘Airport Fallacy’ and Statistical Inference.” Significance 14 (4): 38–41. https://doi.org/10.1111/j.1740-9713.2017.01057.x.
Hernán, Miguel A, and James M Robins. 2020. Causal Inference: What If. Boca Raton: Chapman & Hall/CRC: Chapman & Hall/CRC. https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): 6. https://journals.plos.org/plosmedicine/article/file?id=10.1371%2Fjournal.pmed.0020124&type=printable.
Ioannidis, John P. A. 2019. “What Have We (Not) Learnt from Millions of Scientific Papers with P Values?” The American Statistician 73 (March): 20–25. https://doi.org/10.1080/00031305.2018.1447512.
J, Alex. 2024. “Answer to "DHARMa Residual Pattern".” Cross Validated. October 10, 2024. https://stats.stackexchange.com/a/655624.
Jansen, Jeroen P., Christopher H. Schmid, and Georgia Salanti. 2012. “Directed Acyclic Graphs Can Help Understand Bias in Indirect and Mixed Treatment Comparisons.” Journal of Clinical Epidemiology 65 (7): 798–807. https://doi.org/10.1016/j.jclinepi.2012.01.002.
Johnson, Paul, and John Gruber. n.d. “R Markdown Basics,” 19.
Kahneman, Daniel. 2011. Thinking, Fast and Slow.
Kass, Robert E., Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid. 2016. “Ten Simple Rules for Effective Statistical Practice.” PLOS Computational Biology 12 (6): e1004961. https://doi.org/10.1371/journal.pcbi.1004961.
Liu, Keli, and Xiao-Li Meng. n.d. “A Fruitful Resolution to Simpson’s Paradox via Multi-Resolution Inference.”
McNamara, Amelia, and Nicholas J Horton. 2017. “Wrangling Categorical Data in R.” PeerJ Preprints 5 (August): e3163v2. https://doi.org/10.7287/peerj.preprints.3163v2.
Meng, Xiao-Li. 1994. “Posterior Predictive p-Values.” The Annals of Statistics 22 (3): 1142–60. https://www.jstor.org/stable/2242219.
Meyer, R.-post on Cosima. 2022. “Understanding the Basics of Package Writing in R | R-bloggers.” October 16, 2022. https://www.r-bloggers.com/2022/10/understanding-the-basics-of-package-writing-in-r/.
Monette, Georges, John Fox, Michael Friendly, and Heather Krause. 2018. “Spida2: Collection of Tools Developed for the Summer Programme in Data Analysis 2000-2012.” https://github.com/gmonette/spida2.
Murnane, Richard J, and John B Willett. 2010. Methods Matter: Improving Causal Inference in Educational and Social Science Research. Oxford University Press.
Nguyen, Mike. n.d. Chapter 9 Nonlinear and Generalized Linear Mixed Models | A Guide on Data Analysis. Accessed December 21, 2024. https://bookdown.org/mike/data_analysis/nonlinear-and-generalized-linear-mixed-models.html.
Oliver, John, dir. 2016. Last Week Tonight with John Oliver: Scientific Studies. HBO. https://www.youtube.com/watch?v=0Rnq1NpHdmw.
Paik, Minja. 1985. “A Graphic Representation of a Three-Way Contingency Table: Simpson’s Paradox and Correlation.” The American Statistician 39 (1): 53. https://doi.org/10.2307/2683907.
Pearl, Judea, and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect. Basic Books.
Presnell, Brett. n.d. “An Introduction to Categorical Data Analysis Using R,” 38.
Schervish, Mark J. 1996. “P Values: What They Are and What They Are Not.” The American Statistician 30 (3): 203–6.
Sellke, Thomas, M. J Bayarri, and James O Berger. 2001. “Calibration of ρ Values for Testing Precise Null Hypotheses.” The American Statistician 55 (1): 62–71. https://doi.org/10.1198/000313001300339950.
Snijders, Tom A. B., and Roel J. Bosker. 2012. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling, Second Edition. Sage.
Staff, CANSSI. 2022. “Graduate Program Listings - CANSSI.” September 27, 2022. https://canssi.ca/graduate-program-listings/.
“The Data.table R Package Cheat Sheet.” n.d. Accessed December 20, 2024. https://www.datacamp.com/cheat-sheet/the-datatable-r-package-cheat-sheet.
Thinking, Fast and Slow.” 2024. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Thinking,_Fast_and_Slow&oldid=1263058396.
Thompson, Laura A. 2009. “R (and S-PLUS) Manual to Accompany Agresti’s Categorical Data Analysis (2002) 2nd Edition.” http://www.stat.ufl.edu/~aa/cda/Thompson_manual.pdf.
Vélez, D., L. R. Pericchi, and M. E. Pérez. 2022. “From $p$-Values to Posterior Probabilities of Hypothesis.” February 14, 2022. https://doi.org/10.48550/arXiv.2202.06864.
Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.
Wasserstein, Ronald L., Allen L. Schirm, and Nicole A. Lazar. 2019. “Moving to a World Beyond p \(<\) 0.05’.” The American Statistician 73 (March): 1–19. https://doi.org/10.1080/00031305.2019.1583913.
Wickham, Hadley. 2014. Advanced R. CRC Press. http://adv-r.had.co.nz/.
———. 2015. R Packages. CRC Press. http://r-pkgs.had.co.nz/.
Yee, Thomas W. 2022. “On the HauckDonner Effect in Wald Tests: Detection, Tipping Points, and Parameter Space Characterization.” Journal of the American Statistical Association 117 (540): 1763–74. https://doi.org/10.1080/01621459.2021.1886936.

  1. Note that legal definitions of the “presumption of innocence” and of “proof beyond a reasonable doubt” are never formulated, to my knowledge, in such specific probabilistic terms. Nevertheless, this would seem to me to be a minimal interpretation of these concepts. For some purposes, in the United States, a \(p\)-value less than 0.05 is considered sufficient for some forms of evidence.↩︎

  2. R is to S as Linux is to Unix as GNU C is to C as GNU C++ is to C …. S, Unix, C and C++ were created at Bell Labs in the late 60s and the 70s. R, Linux, GNU C and GNU C++ are public license re-engineerings of the proprietary S, Unix, C and C++ respectively.↩︎