COVID-19 UPDATE: The safety of our students and staff is our top priority. Therefore, Stats Camp will be holding seminars online via live interactive zoom discussion groups. Our goal is to expand on the interactivity side and provide one-on-one consulting time via virtual breakout rooms. We are offering a discount code to a future camp worth $200 off and we are offering 1-hour of post camp consultation as an added value. Registrations will be accepted up to 12 hours prior to seminar start date and time. All seminars will be conducted in CDT time and will be recorded. The recordings will be made available to you within 3-5 business days of the live recording date. Access will be granted to the recorded videos for 1 year from the date of the seminar. Have questions? Contact us
A note on open source and comparison of R, Python, and SAS
Meet R’s data frame
Algorithms and Problem Solving
R as a functional language
TidyVerse & wrangling with robust codes
Practical issues: features of Rstudio IDE, working with file and directory structures, the question of saving the R workspace.
Importing data into R (Pt#1)
Saving/exporting data (Pt#1)
Exploring Data & More about R Fundamentals
Getting statistical and graphical summaries of variables in data frames.
R fundamental data-objects, object class, object structure, and factors
R-fundamentals vectorization and recycling
A comment on base-R versus TidyVerse
Visualizing and comparing data-frame subsets with dplyr
A quick word about pipes (%>%), dplyr, factors, and attributes
Introducing ggplot2 for more advanced visualization and plotting
A closer look at the grammar of graphics
ggplot geom versus aesthetic, and the concept of mapping
Wrangle Data with the Tidyverse and base-R
Revisiting pipes vs naming objects
A closer look at dplyr’s main functions
Import Pt#2 using haven to import from Excel, SAS, SPSS
Saving data Pt#2, R-binary files and external files
dplyr’s helper functions
The “tidy” concept
Recoding and transforming variables
Recasting data-objects form one type to another
Reshaping data-frames wide vs long/tall
Merge and Join data-frames
Using string functions and dplyr helper functions to wrangle subsets of variables
Real expression (regex) for complex patter matching
Working with dates and times
Working with list objects
The benefits of reusing code
Looping to reuse code
base-R’s for() versus TidyVerse map()
R’s apply functions versus TidyVerse map() map2() pmap() & walk()
Writing user-defined functions
Rstudio tools for debugging
Review of R environments
Using user function inside map() loops
Models and model objects
Formulas specify the model
Simple Linear Regression with lm() function
Simple CFA models with lavaan() function
Extracting results from model fit objects into tables
Plotting model results & model-predicted values
Plotting marginal values
Making diagnostic plots
R Markdown for clean reports
Communicating “Research” vs reporting statistical analyses
Graphics for communication
More advanced File I/O functions
Reading and scraping websites
Advanced Text processing with real expressions
Introduction to parallel processing
Statistical Methods Seminar Description
Overview: This 5-day summer camp will introduce participants to the R software platform for data analysis. R is a freely available, open source software platform that is growing in both popularity and capacity.
Learn R for Data Science! This 5-day data science with the R language complements statistical knowledge with the practical skills to clean, prepare, and visualize data before analyses are run, as well as the skills to tabulate, plot, and export statistical results. The core of the seminar will cover modules from the free online book: R for Data Science. (http://r4ds.had.co.nz ). Prior programming or statistical experience is not required, but a general understanding of computers and basic statistics such as mean, variance, and correlation is helpful. Experience with SAS, SPSS, or Stata can also be helpful.
The textbook offers this graphic as a map of the data science
Several modules from the textbook that will be covered extensively. 1-Explore (import, visualize, and describe data). This includes a good introduction to rich graphic capabilities in R. 2-Wrangle (transform/recode variables, select data subsets, aggregate, reshape, and merge datasets). 3-Programming (looping, conditional IF/ELSE logic, creating functions, workspace management). The module on Modeling will have less emphasis because this what statistics seminars teach; however we will code several regression and SEM models. Visualization will cover both pre-model exploratory plots, as well as post-model plots of results and diagnostics.
The textbook focuses on an emerging set of “best practice” functions called the TidyVerse, and the seminar will be organized around writing good TidyVerse R-code. However, our seminar will also introduce the fundamental data structures and principals of the base-R language elements underlying the TidyVerse. We will offer key side by side comparisons of some tasks coded in base-R versus TidyVerse R-code. Other seminar topics will review R’s files and website I/O capabilities, making attractive reports with R markdown, and a short introduction to text processing functions (including regex expressions). As time permits, demonstrations of how similar tasks are done in SPSS or SAS will be provided.
Instructor: Daniel E. Bontempo Ph.D.
Dr. Bontempo has a BA in computer science from Loyola University in New Orleans, and both a MAS in applied statistics as well as a quantitative PhD in Human Development from the Pennsylvania State University. Over his career as a research scientists and statistical consultant, Dr. Bontempo has delivered advanced multilevel/multivariate modeling solutions for complex longitudinal datasets at the Penn State Prevention Science Research Center, The Oregon State Center for Healthy Aging, the Kansas University Institute for Lifespan Studies, and the Texas Tech Institute for Measurement, Methodology, Analysis, and Policy.
Robust readable codes have always been at the center of Dr. Bontempo’s statistical practice. From teaching computer programming in the 1980s, to scripting complex data management and analyses in commercial packages such as SAS, Stata, and SPSS, to more recent use of Tidyverse R-programming methods, Dr. Bontempo maintained a focus on clean, readable, robust codes and scripts in the service of reproducible research and the implementation of both programming and research project best practices.
The camp is ideal for life-science investigators, biostatisticians, program evaluators, and R & D researchers—anyone who is interested in data analysis with R.
Instructor Will Provide Materials Download Link and Password on First Day of Seminar:
Software and Computer Support
A laptop with the latest version of R installed is highly recommended. R can be downloaded, for free, at the R-project website: https://www.r-project.org/. R’s built-in text editor is very primitive, so an additional editor or Integrated Development Environment (IDE) is highly recommended. In class the RStudio IDE will be used, and this can also be downloaded free. The following are three text editor/plug-in combinations that work very well with R: