R Programming for Data Science 2018-02-20T23:32:59+00:00

R Programming for Data Science

Session 1: June 4 – 8, 2018
Albuquerque, NM – Embassy Suites

FAQVenue Info
$1,795 Faculty/Professional or $1,095 Student/Post-Doc

Payment Options
Per Course Qty
Professionalshow details + $1,795 (USD)  
Studentshow details + $1,095 (USD)  

Course Outline

Software installations (R, Rstudio, R packages)
A note on open source and comparison of R, Python, and SAS
Meet R’s data frame
Algorithms and Problem Solving
R as a functional language
TidyVerse & wrangling with robust codes
Practical issues: features of Rstudio IDE, working with file and directory structures, the question of saving the R workspace.
Importing data into R (Pt#1)
Saving/exporting data (Pt#1)
Exploring Data & More about R Fundamentals
Getting statistical and graphical summaries of variables in data frames.
R fundamental data-objects, object class, object structure, and factors
R-fundamentals vectorization and recycling
A comment on base-R versus TidyVerse
Visualizing and comparing data-frame subsets with dplyr
A quick word about pipes (%>%), dplyr, factors, and attributes
Introducing ggplot2 for more advanced visualization and plotting
A closer look at the grammar of graphics
ggplot geom versus aesthetic, and the concept of mapping
Wrangle Data with the Tidyverse and base-R
Revisiting pipes vs naming objects
A closer look at dplyr’s main functions
Import Pt#2 using haven to import from Excel, SAS, SPSS
Saving data Pt#2, R-binary files  and external files
dplyr’s helper functions
The “tidy” concept
Recoding and transforming variables
Recasting data-objects form one type to another
Reshaping data-frames wide vs long/tall
Merge and Join data-frames
Using string functions and dplyr helper functions to wrangle subsets of variables
Real expression (regex) for complex patter matching
Working with dates and times
Working with list objects
The benefits of reusing code
Looping to reuse code
base-R’s for() versus TidyVerse map()
R’s apply functions versus TidyVerse map() map2() pmap() & walk()
Writing user-defined functions
Rstudio tools for debugging
Review of R environments
Using user function inside map() loops
Models and model objects
Formulas specify the model
Simple Linear Regression with lm() function
Simple CFA models with lavaan() function
Extracting results from model fit objects into tables
Plotting model results & model-predicted values
Plotting marginal values
Making diagnostic plots
R Markdown for clean reports
Communicating “Research” vs reporting statistical analyses
Graphics for communication
Supplemental Topics
More advanced File I/O functions
Reading and scraping websites
Advanced Text processing with real expressions
Introduction to parallel processing

Statistical Methods Course Description

Overview: This 5-day summer camp will introduce participants to the R software platform for data analysis. R is a freely available, open source software platform that is growing in both popularity and capacity.

Learn R for Data Science! This 5-day data science with the R language complements statistical knowledge with the practical skills to clean, prepare, and visualize data before analyses are run, as well as the skills to tabulate, plot, and export statistical results. The core of the course will cover modules from the free online book: R for Data Science. (http://r4ds.had.co.nz ). Prior programming or statistical experience is not required, but a general understanding of computers and basic statistics such as mean, variance, and correlation is helpful. Experience with SAS, SPSS, or Stata can also be helpful.

The textbook offers this graphic as a map of the data science

Several modules from the textbook that will be covered extensively. 1-Explore (import, visualize, and describe data). This includes a good introduction to rich graphic capabilities in R.  2-Wrangle (transform/recode variables, select data subsets, aggregate, reshape, and merge datasets). 3-Programming (looping, conditional IF/ELSE logic, creating functions, workspace management). The module on Modeling will have less emphasis because this what statistics courses teach; however we will code several regression and SEM models. Visualization will cover both pre-model exploratory plots, as well as post-model plots of results and diagnostics.

The textbook focuses on an emerging set of “best practice” functions called the TidyVerse, and the course will be organized around writing good TidyVerse R-code. However, our course will also introduce the fundamental data structures and principals of the base-R language elements underlying the TidyVerse. We will offer key side by side comparisons of some tasks coded in base-R versus TidyVerse R-code. Other course topics will review R’s files and website I/O capabilities, making attractive reports with R markdown, and a short introduction to text processing functions (including regex expressions). As time permits, demonstrations of how similar tasks are done in SPSS or SAS will be provided.

Instructor: Daniel E. Bontempo Ph.D.

Dr. Bontempo has a BA in computer science from Loyola University in New Orleans, and both a MAS in applied statistics as well as a quantitative PhD in Human Development from the Pennsylvania State University. Over his career as a research scientists and statistical consultant, Dr. Bontempo has delivered advanced multilevel/multivariate modeling solutions for complex longitudinal datasets at the Penn State Prevention Science Research Center, The Oregon State Center for Healthy Aging, the Kansas University Institute for Lifespan Studies, and the Texas Tech Institute for Measurement, Methodology, Analysis, and Policy.

Robust readable codes have always been at the center of Dr. Bontempo’s statistical practice. From teaching computer programming in the 1980s, to scripting complex data management and analyses in commercial packages such as SAS, Stata, and SPSS, to more recent use of Tidyverse R-programming methods, Dr. Bontempo maintained a focus on clean, readable, robust codes and scripts in the service of reproducible research and the implementation of both programming and research project best practices.

Course Audience

The camp is ideal for life-science investigators, biostatisticians, program evaluators, and R & D researchers—anyone who is interested in data analysis with R.

Software and Computer Support

A laptop with the latest version of R installed is highly recommended. R can be downloaded, for free, at the R-project website: https://www.r-project.org/. R’s built-in text editor is very primitive, so an additional editor or Integrated Development Environment (IDE) is highly recommended. In class the RStudio IDE will be used, and this can also be downloaded free. The following are three text editor/plug-in combinations that work very well with R:

  1. RStudio (https://www.rstudio.com/)
  2. EMACS with ESS (http://vgoulet.act.ulaval.ca/en/emacs/)
  3. Notepad++ (https://notepad-plus-plus.org/) with NppToR (http://sourceforge.net/projects/npptor/)


R for Data Science, Published by O’Reilly January 2017 First Edition. Available for free online at http://r4ds.had.co.nz/ – and also available as paper or ebook from booksellers.

Suggested Additional Readings

Matloff, N. (2011). The art of R programming: A tour of statistical software design. San Francisco: No Starch Press.

Verzani, J. (2014). Using R for introductory statistics. Boca Raton, FL: CRC Press.

Why Should You Attend?

  • Get 1 on 1 Consultation With Instructor
  • Professional Networking
  • Peer Socializing
  • Collaboration
  • All Course Resources
  • Breakfast (Embassy guests), Lunches, & Snacks Daily