(Updated: September 11 2024 00:11)

Doubt is not a pleasant condition, but certainty is absurd. — Voltaire

1 Goals of MATH 6630

This course aims to accomplish two goals. The first is to seek a unification of statistical concepts learned as undergraduates with an understanding of their interpretation in statistical applications. As undergraduates, we learn to drive the statistical machine. Now we learn where it can take us and what routes get us there.

We already know how to run a regression, even a complex one with interactions and non-linear terms. But what do the coefficients mean? How should the scientific or business question we are seeking to answer influence our choice of models? In particular, how does the critical distinction between causal and predictive inference affect the entire approach to statistical modelling and the interpretation of our results?

The second goal of the course is to understand what’s happening under the hood. You can pick up thousands of statistical methods and techniques by browsing through the packages of modern software like R, Python and Julia. If you’re using cutting-edge tools, you are probably using tools built by a small group of researchers, tools that haven’t been thoroughly tested or vetted. To understand whether they really do what they claim to do, you need to be able to look under the hood. When they break down (e.g. fail to converge), as they often do, you need to know how to figure out what’s going on and how to fix it.

So the primary goal of this course is not to learn a large disparate collection of statistical tools. Instead, we want to learn how to learn new statistical methods critically, something you will be doing your entire careers.

To achieve this, we will learn the major computational approaches that are used in statistical methods. This will allow you to critically understand the methods you use, to improve them, and to build your own.

I assume you already know how to drive. We will learn something about where to take the car and how to fix it when it breaks down.

List of topics:

Basics of R.
Review of linear models, foundations of inference and issues in applied statistics:
- Current controversies: reproducibility and p-values.
- The interpretation of multiple regression: modelling for causal vs predictive interpretations of coefficients.
- Linear algebra and geometry of regression: The role of Simpson’s Paradox and Paik-Agresti diagrams in causal interpretation. Correlation, partial correlation, Schur complements, variance, quadratic forms, data and confidence ellipses.
- Overview of causal graphs and the backdoor criterion.
- The added variable plot and the Frisch-Waugh-Lovell Theorem.
- Review of likelihood theory, foundation of inference, Bayesian inference.
Statistical limit theory.
Computational Statistics:
- Optimization
  - Univariate: Newton’s method, Fisher scoring, secant method.
  - Multivariate: Newton and quasi-Newton methods, Nelder-Mead, Gauss-Seidel.
  - Combinatorial methods.
  - The EM algorithm and EM optimization
- Numerical integration
- Simulation and Monte Carlo integration
- Markov Chain Monte Carlo
  - Metropolis-Hastings
  - Gibbs
  - Hamiltonian Monte Carlo
- Bootstrapping, Cross-validation, Jacknife.

Some habits we’ll try to form:

Writing reproducible code: Performing analyses in Rmarkdown documents which you can rerun to get the same results!
Working on group projects by collaboratively sharing and editing documents using Git and Github, Google Docs, etc.
Building a toolkit collaboratively with Git and Github.
Constructive collaboration with mutual positive reinforcement.

Some useful ancillary skills:

Programming techniques:
- Using R to create wrappers for programs in C and C++.
Using regular expressions.
Object-oriented programming
…

2 Course work and grades

Quizzes: 10% Bi-weekly quizzes on Thursdays starting Thursday, September 19.
- The lowest grade is dropped.
- If you can’t attend a quiz for medical or other reasons beyond your control, the weight of the quiz will be transferred to the final exam.
- The dates of the quizzes are:
  - Quiz 1: Sep 19
  - Quiz 2: Oct 3
  - Quiz 3: Oct 31
  - Quiz 4: Nov 14
  - Quiz 5: Nov 28
Mid-term test: 15%
- In-class closed-book midterm on Tuesday, October 22.
- If, for any reason, the course is meeting online, the midterm will consist of 20-minute individual oral tests held over the weekend of October 19-20. It will be held at individually scheduled times over Zoom. Open mic and camera. You can use your book, notes and the web, but no human help during the test.
Final exam: 25% Written closed book 2-hour exam during regular exam period. Note that for those taking the MATH 6630 exam to satisfy the comprehensive requirement, the exam includes additional questions and lasts 3 hours.
Project: 20%
You will work on a team project in which you develop a new method, or modify, extend or improve and existing method and prepare a report including analyses or sample analyses, graphical displays and a careful discussion of your work. The project has five components:
1. Keep a post for your team with the title ‘Diary’ in which each person keeps track of their work and contributions to the project.
2. A description of your plans including the method you plan to use or develop and the general approach and methods you plan to use.
3. An interim report on your progress submitted in early November (deadline November 12) which your team will discuss with the instructor to get feedback.
4. A ‘.R’ or ‘.Rmd’ script using Markdown that produces a detailed analysis and presentation of your work, including diagnostics, etc. This output can be quite detailed.
5. A ‘.R’ script using Markdown that produces an attractive and readable report with your main findings prepared in a way that would be suitable for a publication. You need to include all relevant references, data sources, etc. Aim for a maximum of 30 pages.
6. Slides for a 10-minute presentation discussed below. The slides should be prepared with R-markdown using the ioslides format or other appropriate format. You will collaborate using R, R Studio, R Markdown, git and github.
7. You will prepare a brief summary of your project for a 10-minute presentation in late November. The 10-minute limit is strict. Be aware that it takes careful preparation and rehearsing to give a good presentation in such a short time. You must rehearse as a group ahead of time. The presentation will be followed by a 5-minute question and discussion period.
  - The grade is based on the overall quality of the project (5%) and on your personal contribution to it (10%) and on your understanding of the issues and concepts in the project as shown in the final presentation and in project meetings with instructor. (5%).
Assignments: 20%
- Combination of individual and team assignments. Assigned approximately weekly. Most are done on Piazza. Some will involve contributions to Github R repositories.
  - Some assignments may have a higher weight than others.
  - All team members should feel responsible for helping each other to prepare and understand all solutions.
  - For team assignments, different questions will usually be assigned to different members of the team.
  - Team assignments are done in three steps. Usually, for an assignment given on Thursday:
    - Step 1: to be completed by deadline #1, usually the Friday nine days later at noon:
      - The team member responsible for a question posts a tentative solution on Piazza before deadline #1.
      - It must have a title of the form specified for the assignment.
      - The solution must start by repeating the question so someone looking at the solution can tell what question it solves.
      - For math, you can use the LateX editor in Piazza. You can also make sketches on paper, photograph them and upload the photograph to Piazza. Use Markdown in R as much as possible.
      - When you first submit the post, make it private to your team and use the folder assn X, where X is the number of the assignment.
      - Each post remains private to your team until after deadline #3.
      - You get full marks for effort in making an honest attempt, it does not have to be completely correct.
    - Step 2: to be completed by all teammates by deadline #2, usually the following Sunday at noon:
      - Provide feedback on the solutions posted by your teammates: suggestions for improvements, improving coding in R, pointing out inconsistencies or errors, broadening the answer to cover a broader range of cases, etc.
    - Step 3: to be completed by deadline #3, usually the following Monday at noon:
      - The team member responsible for a question reviews the suggestions made by teammates and incorporates them into the answer before deadline #3. Only after deadline #3 and before the next class, make the solutions public to the class.
      - I will select some solutions as interesting sample solutions and add them to the star folder. Being added to the star folder does not necessarily imply that a solution is correct, nor does it mean that it’s the best solution. It just means that I found some aspect of it interesting and illustrative of the issues presented in the question. Conversely, not getting a star does not mean that you don’t have an excellent solution. Sometimes you can learn as much or more from a solution with ‘errors’ than from a perfect solution.
Class and Piazza contributions: 5%
- Contribute actively in class and post on average once a week on Piazza (posting 18 items the last week doesn’t count!):
- post or edit questions and provide answers about course material
- post comments and/or links to something on the web that is interesting and relevant to statistics and add a summary and critique of the content and relevance.
Weekly feedback every Friday evening and quiz questions: 5%
Every Friday starting Friday, September 6, create a post that is private to the Instructor (it may be made public later during the weekend or you can make it public yourself) with information on each of the following:
- What idea attracted your curiosity the most during the week?
- What questions are you left with?
- What was hardest to understand?
- A quiz question based on the material of the week.
If you miss or are late for a component of the course for a medical, compassionate or technical reason beyond your control, the weight of that portion of the requirements for the course will be transferred to the final examination.

3 Prerequisites

The prerequisite for taking this course is to have been admitted to a graduate program in Statistics in the Department of Mathematics and Statistics. If you are enrolled in a different program, you need to get the permission of the instructor to take this course.

This entails a few assumptions which may not be correct for every member of the class:

An advanced course in mathematical statistics, such as MATH 3131 at York.
Experience with linear models and generalized linear models.
A basic working knowledge of the R programming language. If you are not very familiar with R, you are probably quite familiar with a language like Matlab, Python or even, perhaps SAS, and you will have little difficulty picking up R. See me if this is the case and we can discuss ways of learning R.

4 Textbook

Geof H. Givens and Jennifer A. Hoeting. (2013) Computational Statistics 2nd ed., Wiley
- Web page: Computational Statistics, by G. H. Givens and J. A. Hoeting
  - includes datasets and code
  - Some copies are available at the York Bookstore.

5 References

Michael Evans and Jeffrey Rosenthal (evans2009?) Probability and Statistics – The Science of Uncertainty, 2nd ed., available online
Bradley Efron and Trevor Hastie (2016) Computer Age Statistical Inference: Algorithms, Evidence and Data Science
Hadley Wickham (2014) Advanced R
Hadley Wickham (2015) R Packages

6 Getting Help

Post questions and comments about the course material on Piazza. Post your questions to the entire class so everyone can benefit from the discussion and answer. I will monitor Piazza and participate if other students don’t have an answer.
If you have a personal question for the instructor, you can post it on Piazza as a private posting. This should only be used for personal questions that are of no interest to the rest of the class.
If you happen to post a private question whose answer is of general interest to the class and that contains no personal information, I will assume that you consent to it being posted to the whole class unless you explicitly request otherwise.
You can ask your teammates or other classmates directly.
You can see the instructor during office hours or after class.

7 Some reflections on teams

The project and many activities are done in semi-randomly assigned teams that will be assigned during the first few weeks of class.

Working with a diverse team that you didn’t select yourself gives you the opportunity to have experiences that will give you great anecdotes to use in your future job interviews.

When you land the job, you will be much more likely to show the kind of leadership and productivity in team work that is invaluable in the modern workplace.

Once teams are assigned, you will be able to communicate directly with your team by posting messages on Piazza and directing them to your team.

The more work you do on an assignment the better prepared you are to do well on the term test and on the final exam. But you shouldn’t hog the work – let others do their part too. Everyone should make sure that they understand the whole assignment. Discuss the assignment with your team members to make sure everyone understands the key points and difficulties of each question.

8 Course policies

8.1 Missed deadlines

Late activities or projects will have their weight transferred to the final exam.

8.2 Missed term test

If you miss the term test, the weight of the test will be transferred to the final exam.

8.3 Use of computers and cell phones in class

You should bring your laptop to class to use it for purposes related to the class such as taking notes, annotating slides posted on the web or trying out commands in R. Be aware that some pedagogical research suggests that taking handwritten notes leads to deeper learning for most students. I don’t think that this is true for all students and that is one reason why I would not consider suggesting that students forego the use of computers.

It is natural to think that you do not affect anyone else if you are doing your own thing in class on your laptop, phone or tablet. This, of course, is incorrect. People seated around you cannot help but be distracted. The instructor gets distracted when members of the class are clearly lost in a different dimension. Therefore, I request that you not use electronic devices for activities unrelated to the class because this creates distractions for other students and for me.

8.4 Academic honesty

Familiarize yourself with the York University Senate Policy on Academic Honesty. Violations of academic honesty are treated very seriously in university.

Always cite your sources for any information you use. This can as simple as providing links to websites you have visited to get information.

The course policy regarding the use of ChatGPT is … use it BUT use critically and share your experiences so we can all learn about its strengths and pitfalls.

References

AdamO. 2018. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. April 6, 2018. https://stats.stackexchange.com/a/339065.

Alzahawi, Shilaan. 2021. “Building Your Website Using R {Blogdown}.” Shilaan Alzahawi. May 12, 2021. https://www.shilaan.com/post/building-your-website-using-r-blogdown/.

“Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” 2017. Stack Overflow. March 9, 2017. https://stackoverflow.com/a/42702950.

Bååth, Rasmus. 2013. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. October 7, 2013. https://stats.stackexchange.com/a/72128.

Barndorff-Nielsen, O. E., D. R. Cox, and N. Reid. 1986. “The Role of Differential Geometry in Statistical Theory.” International Statistical Review / Revue Internationale de Statistique 54 (1): 83–96. https://doi.org/10.2307/1403260.

“Basic Linear Algebra Subprograms.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Basic_Linear_Algebra_Subprograms&oldid=1162434953.

Bates, Casey. 2020. “23 RStudio Tips, Tricks, and Shortcuts.” Dataquest. June 10, 2020. https://www.dataquest.io/blog/rstudio-tips-tricks-shortcuts/.

Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using Lme4.” J. Stat. Soft. 67 (1). https://doi.org/10.18637/jss.v067.i01.

Benjamini, Yoav, Richard D. De Veaux, Bradley Efron, Scott Evans, Mark Glickman, Barry I. Graubard, Xuming He, et al. 2021. “ASA President’s Task Force Statement on Statistical Significance and Replicability.” CHANCE 34 (4): 10–11. https://doi.org/10.1080/09332480.2021.2003631.

Berger, Adam. 1996. “Convexity, Maximum Likelihood and All That.” https://www.cs.cmu.edu/~roni/11761/Presentations/convexity-maximum-likelihood-and.pdf.

Bergland, Christopher. 2019. “Rethinking P-Values: Is "Statistical Significance" Useless?” Psychology Today. March 22, 2019. https://www.psychologytoday.com/blog/the-athletes-way/201903/rethinking-p-values-is-statistical-significance-useless.

Best, Joel. 2005. “Lies, Calculations and Constructions: Beyond "How to Lie with Statistics".” Statistical Science 20 (3): 210–14. https://www.jstor.org/stable/20061175.

Betancourt, M. J., Simon Byrne, Samuel Livingstone, and Mark Girolami. 2014. “The Geometric Foundations of Hamiltonian Monte Carlo.” October 19, 2014. http://arxiv.org/abs/1410.5110.

Bezanson, Jeff, Alan Edelman, Stefan Karpinski, and Viral B Shah. 2017. “Julia: A Fresh Approach to Numerical Computing.” SIAM Review 59 (1): 65–98.

Biecek, Przemyslaw, Hubert Baniecki, Mateusz Krzyzinski, and Dianne Cook. 2023. “Performance Is Not Enough: A Story of the Rashomon’s Quartet.” March 17, 2023. http://arxiv.org/abs/2302.13356.

“BLAS (Basic Linear Algebra Subprograms).” n.d. Accessed July 18, 2023. https://www.netlib.org/blas/#_history.

Braun, Julia, Leonhard Held, and Bruno Ledergerber. 2012. “Predictive Cross‐validation for the Choice of Linear Mixed‐Effects Models with Application to Data from the Swiss HIV Cohort Study.” Biometrics 68 (1): 53–61. https://doi.org/10.1111/j.1541-0420.2011.01621.x.

Braun, W. John, and Duncan J. Murdoch. 2021. A First Course in Statistical Programming with R. Third. Cambridge University Press. https://books.google.com?id=NzorEAAAQBAJ.

Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.

Brown, Andrew W., Douglas G. Altman, Tom Baranowski, J. Martin Bland, John A. Dawson, Nikhil V. Dhurandhar, Shima Dowla, et al. 2019. “Childhood Obesity Intervention Studies: A Narrative Review and Guide for Investigators, Authors, Editors, Reviewers, Journalists, and Readers to Guard Against Exaggerated Effectiveness Claims.” Obesity Reviews 20 (11): 1523–41. https://doi.org/10.1111/obr.12923.

Bungartz, Hans-Joachim, Christian Carbogno, Martin Galgon, Thomas Huckle, Simone Köcher, Hagen-Henrik Kowalski, Pavel Kus, et al. 2020. “ELPA: A Parallel Solver for the Generalized Eigenvalue Problem1.” In Advances in Parallel Computing, edited by Ian Foster, Gerhard R. Joubert, Luděk Kučera, Wolfgang E. Nagel, and Frans Peters. IOS Press. https://doi.org/10.3233/APC200095.

Canada, Statistics. 2016. 2016 Census of Population [Canada] Public Use Microdata File (PUMF): Individuals File. http://odesi.ca/#/details?uri.

Canon, Stephen. 2013. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. July 25, 2013. https://stackoverflow.com/a/17858345.

Carroll, R. J., and David Ruppert. 1996. “The Use and Misuse of Orthogonal Regression in Linear Error-in-Variables Models.” American Statistician, February. https://journals-scholarsportal-info.ezproxy.library.yorku.ca/pdf/00031305/v50i0001/1_tuamoorilem_1.xml_en.

“Case Study: Group Work Gone Awry.” n.d. Google Docs. Accessed November 2, 2022. https://docs.google.com/document/d/136VgrrFxkeJSO47g0Bwbj5JqXXumRvf0P1pC9lO91pQ/edit?usp=embed_facebook.

Chow, Vinci. 2022. “AMD BLAS/LAPACK Optimization in 2022.” SCRP CUHK Economics. February 7, 2022. http://localhost:4000/blog/analysis/2022/02/07/mkl-optimization.html.

Colby, Emily, and Eric Bair. 2013. “Cross-Validation for Nonlinear Mixed Effects Models.” Journal of Pharmacokinetics and Pharmacodynamics 40 (2): 243–52. https://doi.org/10.1007/s10928-013-9313-5.

“Computational Statistics.” n.d. Accessed July 18, 2023. https://onlinelibrary.wiley.com/doi/epub/10.1002/9781118555552.

“Cornish–Fisher Expansion.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Cornish%E2%80%93Fisher_expansion&oldid=1137666825.

Cox, D. R., and N. Reid. 1987. “Parameter Orthogonality and Approximate Conditional Inference.” Journal of the Royal Statistical Society: Series B (Methodological) 49 (1): 1–18. https://doi.org/10.1111/j.2517-6161.1987.tb01422.x.

Cunningham, Scott. 2021a. “Causal Inference The Mixtape - 6 Regression Discontinuity.” In. Yale University Press. https://mixtape.scunning.com/06-regression_discontinuity.

———. 2021b. Causal Inference: The Mixtape. Yale University Press. https://mixtape.scunning.com/.

“Data Organization in Spreadsheets.” n.d. Accessed May 8, 2023. https://www.tandfonline.com/doi/epdf/10.1080/00031305.2017.1375989?needAccess=true&role=button.

“Data to Viz | A Collection of Graphic Pitfalls.” n.d. Accessed September 3, 2023. https://www.data-to-viz.com/caveats.html.

Dempster, Arthur. 1968. “Elements of Continuous Multivariate Analysis.”

“Differential Geometry in Statistical Inference - Encyclopedia of Mathematics.” n.d. Accessed April 21, 2023. https://encyclopediaofmath.org/wiki/Differential_geometry_in_statistical_inference.

Efron, Bradley. 2018. “Curvature and Inference for Maximum Likelihood Estimates.” Ann. Statist. 46 (4). https://doi.org/10.1214/17-AOS1598.

Farimani, Foad S. 2017. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. February 13, 2017. https://stackoverflow.com/a/42212642.

Fox, John. 2016. Applied Regression Analysis and Generalized Linear Models. 3rd ed. Sage Publications.

———. 2021. “Answers to Odd-Numbered Exercises for Chapter 25 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-25-odd-exercise-answers.pdf.

———. 2023a. “Answers to Odd-Numbered Exercises for Chapter 26 Fox, Applied Regression Analysis and Generalized Linear Models, Third Edition (Sage, 2016).” https://www.john-fox.ca/AppliedRegression/chap-26-odd-exercise-answers.pdf.

———. 2023b. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) :Causal Inferences From Observational Data: Directed Acyclic Graphs and Potential Outcomes.” Applied Regression Analysis; Generalized Linear Models, Third Edition Supplement: Chapter 26 (Draft) Causal Inferences From Observational Data: Directed Acyclic Graphs; Potential Outcomes. March 31, 2023. https://www.john-fox.ca/AppliedRegression/chap-26.pdf.

———. 2023c. “Applied Regression Analysis and Generalized Linear Models, Third Edition Supplement: Chapter 25 (Draft) Bayesian Estimation of Regression Models.” April 1, 2023. https://www.john-fox.ca/AppliedRegression/chap-25.pdf.

———. n.d. “Blau and Duncan Stratification Data Set.” Accessed October 10, 2023. https://www.john-fox.ca/AppliedRegression/BlauDuncan.txt.

Friendly, Michael, Georges Monette, and John Fox. 2013. “Elliptical Insights: Understanding Statistical Methods Through Elliptical Geometry.” Statist. Sci. 28 (1): 1–39. https://doi.org/10.1214/12-STS402.

Galdino, Manoel. 2011. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. May 23, 2011. https://stats.stackexchange.com/a/11137.

Gandhi, Ratnik, and Amoli Rajgor. 2017. “Updating Singular Value Decomposition for Rank One Matrix Perturbation.” July 26, 2017. http://arxiv.org/abs/1707.08369.

Gao, Kaifeng, Gang Mei, Francesco Piccialli, Salvatore Cuomo, Jingzhi Tu, and Zenan Huo. 2020. “Julia Language in Machine Learning: Algorithms, Applications, and Open Issues.” Computer Science Review 37 (August): 100254. https://doi.org/10.1016/j.cosrev.2020.100254.

Geijn, Robert van de. 2018. “Answer to "What Is the Relation Between BLAS, LAPACK and ATLAS".” Stack Overflow. September 3, 2018. https://stackoverflow.com/a/52156861.

Gelman, Andrew, Jessica Hwang, and Aki Vehtari. 2014. “Understanding Predictive Information Criteria for Bayesian Models.” Stat Comput 24 (6): 997–1016. https://doi.org/10.1007/s11222-013-9416-2.

Gelman, Andrew, Aleks Jakulin, Maria Grazia Pittau, and Yu-Sung SuColumbia University. n.d. “Bayesian Generalized Linear Models and an Appropriate Default Prior.” Logistic Regression.

“General Tips for Making Your Report Appealing and Easy To Skim.” n.d. Accessed April 29, 2023. https://www.ahrq.gov/talkingquality/resources/design/general-tips/index.html.

Gentle, James E., Wolfgang Karl Härdle, and Yuichi Mori, eds. 2012. Handbook of Computational Statistics: Concepts and Methods. Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-21551-3.

George, Brandon J., T. Mark Beasley, Andrew W. Brown, John Dawson, Rositsa Dimova, Jasmin Divers, TaShauna U. Goldsby, et al. 2016. “Common Scientific and Statistical Errors in Obesity Research: Common Statistical Errors in Obesity Research.” Obesity 24 (4): 781–90. https://doi.org/10.1002/oby.21449.

“Getting Started With GitHub — The Turing Way.” n.d. Accessed August 30, 2023. https://the-turing-way.netlify.app/collaboration/github-novice.html#.

Geyer, Charles J. n.d. “The Wilks, Wald, and Rao Tests.”

Givens, Geof H, and Jennifer A Hoeting. 2013. Computational Statistics. 2nd ed. Wiley.

———. 2014. “Errata for Computational Statistics, Second Edition.”

Givens, Geof, and J. A. Hoeting. 2013. “Web Page for Computational Statistics, by G. H. Givens and J. A. Hoeting.” 2013. https://www.stat.colostate.edu/computationalstatistics/.

Greenland, Sander. 2010. “Simpson’s Paradox From Adding Constants in Contingency Tables as an Example of Bayesian Noncollapsibility.” The American Statistician 64 (4): 340–44. https://doi.org/10.1198/tast.2010.10006.

Greenland, Sander, Stephen J. Senn, Kenneth J. Rothman, John B. Carlin, Charles Poole, Steven N. Goodman, and Douglas G. Altman. 2016. “Statistical Tests, P Values, Confidence Intervals, and Power: A Guide to Misinterpretations.” Eur J Epidemiol 31 (4): 337–50. https://doi.org/10.1007/s10654-016-0149-3.

“Halton Sequence.” 2023. In Wikipedia. https://en.wikipedia.org/w/index.php?title=Halton_sequence&oldid=1170076511.

Harrell, Frank. 2021. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. July 25, 2021. https://stats.stackexchange.com/a/535873.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2008. The Elements of Statistical Learning, Data Mining, Inference, and Prediction. 2nd ed. Springer. https://hastie.su.domains/ElemStatLearn/printings/ESLII_print12_toc.pdf.

Healy, Yan Holtz and Conor. n.d. “The Issue with Error Bars.” Accessed September 3, 2023. https://www.data-to-viz.com/caveat/www.data-to-viz.com/caveat/error_bar.html.

Hitchcock, David B. 2003. “A History of the Metropolis-Hastings Algorithm.” The American Statistician 57 (4): 254–57. https://www.jstor.org/stable/30037292.

“How R Searches and Finds Stuff — Study Notes.” n.d. Accessed April 5, 2023. https://askming.github.io/study_notes/Stats_Comp/Note-How%20R%20searches%20and%20finds%20stuff.html.

Hyndman. 2014. “Rob J Hyndman - Fast Computation of Cross-Validation in Linear Models.” March 17, 2014. https://robjhyndman.com/hyndsight/loocv-linear-models/.

IBM Technology, dir. 2022. R Vs Python. https://www.youtube.com/watch?v=4lcwTGA7MZw.

“Introduction.” n.d. Accessed September 4, 2023. https://cran.r-project.org/web/packages/rgl/vignettes/WebGL.html.

Ioannidis, John P A. 2005. “Why Most Published Research Findings Are False.” PLoS Medicine 2 (8): 6. https://journals.plos.org/plosmedicine/article/file?id=10.1371%2Fjournal.pmed.0020124&type=printable.

Ioannidis, John P. A. 2019. “What Have We (Not) Learnt from Millions of Scientific Papers with P Values?” The American Statistician 73 (March): 20–25. https://doi.org/10.1080/00031305.2018.1447512.

Kass, Robert E. 1989. “The Geometry of Asymptotic Inference.” Statist. Sci. 4 (3). https://doi.org/10.1214/ss/1177012480.

Kass, Robert E., Brian S. Caffo, Marie Davidian, Xiao-Li Meng, Bin Yu, and Nancy Reid. 2016. “Ten Simple Rules for Effective Statistical Practice.” Edited by Fran Lewitter. PLoS Comput Biol 12 (6): e1004961. https://doi.org/10.1371/journal.pcbi.1004961.

Kass, Robert E., and Paul W. Vos. 1997. Geometrical Foundations of Asymptotic Inference. Wiley Series in Probability and Statistics. New York: Wiley.

Kennedy, WIlliam J., and James E. Gentle. 2021. Statistical Computing. New York: Routledge. https://doi.org/10.1201/9780203738672.

King, Gary. 1986. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science 30 (3): 666–87. https://doi.org/10.2307/2111095.

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44 (2): 347–61. https://doi.org/10.2307/2669316.

Lafit, Ginette. (2021) 2023. “Estimating Linear Mixed-Models with Julia Using R.” https://github.com/ginettelafit/MixedModelswithRandJulia.

Lakens, Daniël. 2022. “Improving Your Statistical Inferences.” 2022. https://doi.org/10.5281/zenodo.6409077.

“LaTeX for Complete Novices.” n.d. Accessed August 26, 2023. https://www.dickimaw-books.com/latex/novices/index.html.

Lauritzen, Steffen. n.d. “Maximum Likelihood in Exponential Families.”

Li, Changcheng. 2019. “JuliaCall: An R Package for Seamless Integration Between R and Julia.” JOSS 4 (35): 1284. https://doi.org/10.21105/joss.01284.

Loy, Adam, Heike Hofmann, and Dianne Cook. 2016. “Model Choice and Diagnostics for Linear Mixed-Effects Models Using Statistics on Street Corners.” December 6, 2016. http://arxiv.org/abs/1502.06988.

Ma, Yan, Madhu Mazumdar, and Stavros G. Memtsoudis. 2012. “Beyond Repeated-Measures Analysis of Variance: Advanced Statistical Methods for the Analysis of Longitudinal Data in Anesthesia Research.” Regional Anesthesia and Pain Medicine 37 (1): 99–105. https://doi.org/10.1097/AAP.0b013e31823ebc74.

Maechler, Martin. n.d. “Rmpfr: R MPFR - Multiple Precision Floating-Point Reliable.”

makhlaghi. 2019. “What Is the Relation Between BLAS, LAPACK and ATLAS.” Forum post. Stack Overflow. January 24, 2019. https://stackoverflow.com/q/17858104.

mathematical.coffee. 2016a. “Importing Common YAML in Rstudio/Knitr Document.” Forum post. Stack Overflow. October 6, 2016. https://stackoverflow.com/q/39885363.

———. 2016b. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 7, 2016. https://stackoverflow.com/a/39909079.

matloff. 2015. “OpenMP Tutorial, with R Interface | R-bloggers.” January 17, 2015. https://www.r-bloggers.com/2015/01/openmp-tutorial-with-r-interface/.

MavropaliasG. 2023. “My Setup as a Researcher. How to Write, Run Statistics, and Work Seamlessly with R, Obsidian, Linux, and Zotero, and Collaborate with Senior Professors Who Only Accept MS Word Files!” Reddit Post. r/ObsidianMD. March 28, 2023. www.reddit.com/r/ObsidianMD/comments/124cd8y/my_setup_as_a_researcher_how_to_write_run/.

Monica, Scortchi-Reinstate. 2013. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. September 1, 2013. https://stats.stackexchange.com/a/68917.

Mühleisen, Hannes, and Mark Raasveldt. 2023. Duckdb: DBI Package for the DuckDB Database Management System. Manual. https://CRAN.R-project.org/package=duckdb.

Müller, Samuel, J. L. Scealy, and A. H. Welsh. 2013. “Model Selection in Linear Mixed Models.” Statistical Science 28 (2): 135–67. https://www.jstor.org/stable/43288485.

Muralidharan, Karthik, Mauricio Romero, and Kaspar Wüthrich. n.d. “Factorial Designs, Model Selection, and (Incorrect) Inference in Randomized Experiments.”

Nature. 2023. “Points of Significance. Editorial.” Nat Hum Behav 7 (3): 293–94. https://doi.org/10.1038/s41562-023-01586-w.

“Netlib BLAS FAQ.” 2017. 2017. https://www.netlib.org/blas/faq.html.

Nuzzo, Regina. 2014. “P Values, the ‘Gold Standard’ of Statistical Validity, Are Not as Reliable as Many Scientists Assume.” Nature 506 (February): 150–52.

Oliver, John, dir. 2016. Last Week Tonight with John Oliver: Scientific Studies. HBO. https://www.youtube.com/watch?v=0Rnq1NpHdmw.

Oxberry, Geoff. 2012. “Answer to "Updatable SVD Implementation in Python, C, or Fortran?".” Computational Science Stack Exchange. July 3, 2012. https://scicomp.stackexchange.com/a/2686.

Ozgur, Ceyhun, Taylor Colliau, Grace Rogers, Zachariah Hughes, and Elyse ＂Bennie＂ Myer-Tyson. 2017. “MatLab vs. Python vs. R.” Journal of Data Science 15 (3): 355–71. https://doi.org/10.6339/JDS.201707_15(3).0001.

Peng, Roger D. n.d. 1.3 Textbooks Vs. Computers | Advanced Statistical Computing. Accessed November 10, 2023. https://bookdown.org/rdpeng/advstatcomp/textbooks-vs.-computers.html.

Pocock, Stuart J, and James H Ware. 2009. “Translating Statistical Findings into Plain English.” The Lancet 373 (9679): 1926–28. https://doi.org/10.1016/S0140-6736(09)60499-2.

QMNET - Quantitative Methods Network, dir. 2020. Missing Data Imputation with Low Rank Models. https://www.youtube.com/watch?v=OjXPskXO8No.

Rao, C. 2017. “Book Review: Multivariate Statistical Methods, A Primer.” Journal of Modern Applied Statistical Methods 16 (1). https://doi.org/10.22237/jmasm/1493599260.

Ratz, Arthur V. 2021. “Can QR Decomposition Be Actually Faster? Schwarz-Rutishauser Algorithm.” Medium. April 9, 2021. https://towardsdatascience.com/can-qr-decomposition-be-actually-faster-schwarz-rutishauser-algorithm-a32c0cde8b9b.

Reid, N. 2003. “Asymptotics and the Theory of Inference.” Ann. Statist. 31 (6). https://doi.org/10.1214/aos/1074290325.

Reid, Nancy. 2010. “Likelihood Inference: Likelihood Inference.” WIREs Comp Stat 2 (5): 517–25. https://doi.org/10.1002/wics.110.

Roche, Alexis. 2012. “EM Algorithm and Variants: An Informal Tutorial.” http://arxiv.org/abs/1105.1476.

Rodrigues, Bruno. n.d. Reproducible Analytical Pipelines - Master’s of Data Science. Accessed October 30, 2022. https://rap4mads.eu/.

Rothman, Kenneth J. 2014. “Six Persistent Research Misconceptions.” J GEN INTERN MED 29 (7): 1060–64. https://doi.org/10.1007/s11606-013-2755-z.

Sabbe, Nick. 2011. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. May 22, 2011. https://stats.stackexchange.com/a/11110.

Sabine Hossenfelder, dir. 2021. How I Learned to Love Pseudoscience. https://www.youtube.com/watch?v=bWV0XIn-rvY.

Santillan, Carlos. 2018. “Improving R Perfomance by Installing Optimized BLAS/LAPACK Libraries.” July 25, 2018. https://csantill.github.io/RPerformanceWBLAS/.

Sawitzki, G. 2009. “Web Page for Computational Statistics: An Introduction to R.” StatLab Heidelberg. 2009. https://sintro.r-forge.r-project.org/.

Schervish, Mark J. 1996. “P Values: What They Are and What They Are Not.” The American Statistician 30 (3): 203–6.

Schmidt, Anthony. 2021. “A Zotero Workflow for R.” Anthony Schmidt. October 25, 2021. https://www.anthonyschmidt.co/post/2021-10-25-a-zotero-workflow-for-r/.

Sharp, Julia L. 2022. “Statistical Collaboration Training Videos.” 2022. https://www.youtube.com/playlist?list=PLCqpxiFnahaBFnIXIWmTE1eYpa2ov5VWL.

———. 2023. “Setting the Stage: Statistical Collaboration Training Videos.” 2023. https://sites.google.com/site/julialsharp/other-resources/statistical-collaboration-training-videos.

Sherrington, Malcolm. 2015. Mastering Julia. Packt Publishing Ltd.

Silk, Matthew. 2019. “Mixed Model Diagnostics.” 2019. https://dfzljdn9uc3pi.cloudfront.net/2020/9522/1/MixedModelDiagnostics.html.

Soch, Joram. 2019. “The Book of Statistical Proofs.” The Book of Statistical Proofs. 2019. https://statproofbook.github.io/.

Software, Econometrics and Free. 2022. “A Linux Live USB as a Statistical Programming Dev Environment | R-bloggers.” October 29, 2022. https://www.r-bloggers.com/2022/10/a-linux-live-usb-as-a-statistical-programming-dev-environment/.

“Stat 3701 Lecture Notes: R Generic Functions.” n.d. Accessed February 15, 2023. https://www.stat.umn.edu/geyer/3701/notes/generic.html.

Statistical Tools for High-Throughput Data Analysis. 2023. “A Complete Guide to 3D Visualization Device System in R - R Software and Data Visualization - Easy Guides - Wiki - STHDA.” 2023. http://www.sthda.com/english/wiki/a-complete-guide-to-3d-visualization-device-system-in-r-r-software-and-data-visualization.

StayLearning. 2015. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. December 19, 2015. https://stats.stackexchange.com/a/187561.

Stone, M. 1977. “A Unified Approach to Coordinate-Free Multivariate Analysis.” Ann Inst Stat Math 29 (1): 43–57. https://doi.org/10.1007/BF02532773.

strboul. 2019. “Answer to "Importing Common YAML in Rstudio/Knitr Document".” Stack Overflow. October 21, 2019. https://stackoverflow.com/a/58491399.

Team, Posit. 2022. “Posit.” Posit. September 14, 2022. https://www.posit.co/.

tillsten. 2013. “Updatable SVD Implementation in Python, C, or Fortran?” Forum post. Computational Science Stack Exchange. February 27, 2013. https://scicomp.stackexchange.com/q/2678.

UC3M, Coding Club. 2019. “Simple yet Elegant Object-Oriented Programming in R with S3.” Coding Club UC3M. May 28, 2019. https://codingclubuc3m.rbind.io/post/2019-05-28/.

Unknown. n.d. “Lme4 Convergence Warnings: Troubleshooting.” Accessed September 23, 2023. https://rstudio-pubs-static.s3.amazonaws.com/33653_57fc7b8e5d484c909b615d8633c01d51.html.

user1436187. 2015. “Updating SVD Decomposition After Adding One New Row to the Matrix.” Forum post. Cross Validated. October 18, 2015. https://stats.stackexchange.com/q/177007.

user333. 2022. “How to Deal with Perfect Separation in Logistic Regression?” Forum post. Cross Validated. August 27, 2022. https://stats.stackexchange.com/q/11109.

user78229. 2015. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. November 26, 2015. https://stats.stackexchange.com/a/183763.

“Using oneMKL with R.” n.d. Intel. Accessed July 18, 2023. https://www.intel.com/content/www/us/en/developer/articles/technical/using-onemkl-with-r.html.

usεr11852. 2015. “Answer to "Updating SVD Decomposition After Adding One New Row to the Matrix".” Cross Validated. October 31, 2015. https://stats.stackexchange.com/a/179539.

Vehtari, Aki. n.d. “TUTORIAL ON MODEL ASSESMENT, SELECTION AND INFERENCE AFTER SELECTION.”

Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p -Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. https://doi.org/10.1080/00031305.2016.1154108.

Wasserstein, Ronald L., Allen L. Schirm, and Nicole A. Lazar. 2019. “Moving to a World Beyond ‘ p \(<\) 0.05’.” The American Statistician 73 (March): 1–19. https://doi.org/10.1080/00031305.2019.1583913.

Wicherts, Jelte M., Coosje L. S. Veldkamp, Hilde E. M. Augusteijn, Marjan Bakker, Robbie C. M. Van Aert, and Marcel A. L. M. Van Assen. 2016. “Degrees of Freedom in Planning, Running, Analyzing, and Reporting Psychological Studies: A Checklist to Avoid p-Hacking.” Front. Psychol. 7 (November). https://doi.org/10.3389/fpsyg.2016.01832.

Wickham, Hadley. 2019. Advanced R. 2nd ed. CRC Press. https://adv-r.hadley.nz/.

Wilson, Greg, Jennifer Bryan, Karen Cranston, Justin Kitzes, Lex Nederbragt, and Tracy K. Teal. 2017. “Good Enough Practices in Scientific Computing.” Edited by Francis Ouellette. PLoS Comput Biol 13 (6): e1005510. https://doi.org/10.1371/journal.pcbi.1005510.

Xie, Yihui. (2011) 2022. “Knitr.” https://github.com/yihui/knitr/blob/36efc00013d423b8f098ff37ba524c9d29810fa0/inst/examples/knitr-spin.R.

Zhu, Xiaorui. 2021. “Answer to "How to Deal with Perfect Separation in Logistic Regression?".” Cross Validated. July 25, 2021. https://stats.stackexchange.com/a/535860.

Course Description

MATH 6630: Applied Statistics I – Fall 2024

Georges Monette

September 2024