Statistical Methods Seminar Description
Overview: This 5-day summer camp will introduce participants to the R software platform for data analysis. R is a freely available, open source software platform that is growing in both popularity and capacity.
Learn R for Data Science! This 5-day data science with the R language complements statistical knowledge with the practical skills to clean, prepare, and visualize data before analyses are run, as well as the skills to tabulate, plot, and export statistical results. The core of the seminar will cover modules from the free online book: R for Data Science. (http://r4ds.had.co.nz ). Prior programming or statistical experience is not required, but a general understanding of computers and basic statistics such as mean, variance, and correlation is helpful. Experience with SAS, SPSS, or Stata can also be helpful.
The textbook offers this graphic as a map of the data science
Several modules from the textbook that will be covered extensively. 1-Explore (import, visualize, and describe data). This includes a good introduction to rich graphic capabilities in R. 2-Wrangle (transform/recode variables, select data subsets, aggregate, reshape, and merge datasets). 3-Programming (looping, conditional IF/ELSE logic, creating functions, workspace management). The module on Modeling will have less emphasis because this what statistics seminars teach; however we will code several regression and SEM models. Visualization will cover both pre-model exploratory plots, as well as post-model plots of results and diagnostics.
The textbook focuses on an emerging set of “best practice” functions called the TidyVerse, and the seminar will be organized around writing good TidyVerse R-code. However, our seminar will also introduce the fundamental data structures and principals of the base-R language elements underlying the TidyVerse. We will offer key side by side comparisons of some tasks coded in base-R versus TidyVerse R-code. Other seminar topics will review R’s files and website I/O capabilities, making attractive reports with R markdown, and a short introduction to text processing functions (including regex expressions). As time permits, demonstrations of how similar tasks are done in SPSS or SAS will be provided.
Instructor: Daniel E. Bontempo Ph.D.
Dr. Bontempo has a BA in computer science from Loyola University in New Orleans, and both a MAS in applied statistics as well as a quantitative PhD in Human Development from the Pennsylvania State University. Over his career as a research scientists and statistical consultant, Dr. Bontempo has delivered advanced multilevel/multivariate modeling solutions for complex longitudinal datasets at the Penn State Prevention Science Research Center, The Oregon State Center for Healthy Aging, the Kansas University Institute for Lifespan Studies, and the Texas Tech Institute for Measurement, Methodology, Analysis, and Policy.
Robust readable codes have always been at the center of Dr. Bontempo’s statistical practice. From teaching computer programming in the 1980s, to scripting complex data management and analyses in commercial packages such as SAS, Stata, and SPSS, to more recent use of Tidyverse R-programming methods, Dr. Bontempo maintained a focus on clean, readable, robust codes and scripts in the service of reproducible research and the implementation of both programming and research project best practices.
The camp is ideal for life-science investigators, biostatisticians, program evaluators, and R & D researchers—anyone who is interested in data analysis with R.
Software and Computer Support
A laptop with the latest version of R installed is highly recommended. R can be downloaded, for free, at the R-project website: https://www.r-project.org/. R’s built-in text editor is very primitive, so an additional editor or Integrated Development Environment (IDE) is highly recommended. In class the RStudio IDE will be used, and this can also be downloaded free. The following are three text editor/plug-in combinations that work very well with R:
- RStudio (https://www.rstudio.com/)
- EMACS with ESS (http://vgoulet.act.ulaval.ca/en/emacs/)
- Notepad++ (https://notepad-plus-plus.org/) with NppToR (http://sourceforge.net/projects/npptor/)
R for Data Science, Published by O’Reilly January 2017 First Edition. Available for free online at http://r4ds.had.co.nz/ – and also available as paper or ebook from booksellers.
Suggested Additional Readings
Matloff, N. (2011). The art of R programming: A tour of statistical software design. San Francisco: No Starch Press.
Verzani, J. (2014). Using R for introductory statistics. Boca Raton, FL: CRC Press.