http://uit.unt.edu/research

Data Science and Analytics

UITHelp Desk | Training | About Us | Publications | DSA Home

NOTE: Please read the FAQ thoroughly before contacting our office.
Web link for requesting support and/or appointments with DSA staff

Please participate in the DSA Client Feedback Survey.

Do it yourself Introduction to R

R is a free statistical programming language environment. It is completely free to anyone -- like the air you breath is free.

For more information on why everyone should be using R, see here.

The goal of this site is to allow someone to overcome the intimidation associated with learning the very basics of R and showing them the tools for continued usage. Let's get started.

Some assumptions: This site assumes you are using a Windows operating system (Mac folks see here) and have a basic understanding of file structures and paths. You will also need to have administrator privileges in order to install R. Some of the notes linked on this page are standard HTML pages; most of the links on this page are in R script file format (they have the file extension.R). Beyond that; the site and any instructions or links on it should be self-explanatory. It is STRONGLY recommended that one progress through the modules in order. 

A brief explanation of this site is here.

UPDATE NOTE: As of July 5, 2019 the current R version is 3.6.1

These pages have been tested for use with SRware Iron, Firefox, & Chrome, other browsers may display the pages incorrectly.

Course Materials Supplemental Materials
Part I: Introduction Part I: Introduction

An Adobe.pdf Instructional Manual which covers the first 3 modules; it contains the same content which is in each of the HTML pages linked in the 3 modules below. 

Module 1: Download and Install R.

First, start with these Introductory Notes 1.

Ready to download and install?

Then click here and then click on "Windows". You will only need the "base" install.

Introductory Notes 2

Module 2: Packages/Libraries

Installing and Updating Packages

Loading and using Libraries

     An example of a Rprofile.site file

How to for finding Help 1 

Keeping R up to date with new packages

Module 3: Getting Data into R

Read Data into R using Rcmdr  

Read data into R directly from a web address (URL)

Browsing to find the data you want to import into R

Importing many Excel files; each with multiple sheets.

Module 4: Initial Data Processing

Some Initial Processing 1

Some Descriptive statistics and associated graphical displays

A very easy way to produce simple / customizable graphs, without knowing script!!

Some Slicing and Dicing of data (restructure data from Long to Wide)

Recommended (*NEW) technique for Multiple Missing Value Imputation

Good (but slow) technique for Multiple Missing Value Imputation

Some (*OLDER) Robust techniques for Multiple Missing Value Imputation

Some quick 'prettyR' examples for obtaining common summary and descriptive statistics.

Some examples of common variable conversions

Some examples of Recoding Likert response variables to numeric

Creating Composite Scores (i.e., indicator variables) with factor analysis.

Creating Composite Scores (i.e., indicator variables) with principle component analysis.

Using the 'paste' function to create a character string sequence.

Multivariate outlier detection with Mahalanobis' distance.

 

Persistent links to documentation for each package used on this site.

foreign  Rcmdr  XLConnect  car  Hmisc  mix  mvnmle  MBESS  MASS  psych  QuantPsyc  relaimpo  boot  mvoutlier  robust  robustbase  leaps  chemometrics  sem  WRS2  VIM  Amelia  lattice  latticist  lmtest  polycor  nls2  locfit  sm  homals  mlogit  ca  bootStepAIC  BMA  lavaan   RcmdrPlugin.epack  tseries GPArotation  SeqKnn  rrcovNA  lme4  prettyR  multilevel   arm  MCMCpack  coda  scatterplot3d  gvlma  lmSupport  BayesFactor  mvtnorm  LearnBayes  MatchIt  norm  DAAG  Design  plspm  semPLS  perturb  fortunes  RColorBrewer  quantmod  rrp  fdrtool  multtest  yacca  yhat  rJava  XLconnect  fractaldim  xlsxjars  xlsx  tm  SnowballC  akima  doParallel  foreach  missForest  semPlot  GrapheR  bindata  semTools  beepr  ggmap  ggplot2 GGally rworldmap  plyr  mgcv  matrixpls  

A script file which can be used to install the packages listed above (and their dependencies).

List of packages included with every 'base' installation of R

 

Only a small fraction of the help available for using R.

Research and Statistical Support statistical resources workshop

The R Project for Statistical Computing

The Comprehensive R Archive Network (CRAN)

CRAN Contributed Documents which are 'How-to' guides and 'Getting Started with R' tutorials.

CRAN Task Views offers 'how-to' information on several frequently used / common tasks in R.

Massive searchable database for R packages: https://rdrr.io/

The CRAN complete R programming Language Definition

RStudio homepage.

R-Forge for the latest / in development packages

One R Reference Card (there are several), a 'data mining' specific R Reference Card

R specific search engine RSeek.

A repository of R related blogs and blog posts R-bloggers

Very helpful site if you are coming to R from SAS, SPSS, or Stata: Quick-R

Informative site for folks in Psychology and related fields (site includes and entire textbook: here).

Dr. Rich Herrington's fairly comprehensive R & S-Plus web page.

Dr. Thomas Lumley's two day R short course notes

Dr. Paul Johnson's very helpful Rtips (aka. Stats 'R' us)

CRAN Task View: Multivariate Statistics

YouTube Statistics with R (part 1 of many)

Some links graph galleries: 



http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

 

Part II: Intermediate Part II: Intermediate

Module 5: Basic Group Differences

T-tests & Analysis of Variance

Be careful and thorough when assessing group equality.

Some examples of False Discovery Rate (FDR)

Module 6: Correlation & Linear Regression Models

Basic & Robust Correlations 

Example of the importance of heterogeneous correlations

Basic & Robust Linear Regression Models

Some simulation with Linear (OLS) regression and a second example

Some regression diagnostics, including the Global Test of Linear Model Assumptions

Some implementations of Cross Validation applied to OLS regression.

Data generation for and exploration of Cross Validation (data used in link directly above)

A quick illustration of 'Funkiness' among the variables of a seemingly good regression model; and a second example here

Odd multicollinearity patterns with varying variances and means -- null and alternative models.

A very handy package (examples & explanations included) for working with Linear Models

Demonstrations of finding the right OLS parameters (might be useful for "in class" demos)

Illustration of why too many predictors is a bad thing

Assessing multicollinearity with the 'perturb' package

A simple example of matching cases to reduce multicollinearity prior to modeling

Module 7: Other Regression Related Analysis

Testing Mediation with the Aroian test and OLS regression

Testing Moderation with Simple Slopes Analysis using OLS regression

An example of Binomial or Binary logistic regression (data made with this script)

Demonstration to understand logistic regression coefficients and reference categories.

An example of Multinomial Logistic Regression

An example of Discriminant Function Analysis

Brief example of Categorical Regression with Optimal Scaling

Exploration of Linear Mixed Effects Models (e.g., Hierarchical Linear Modeling).

Introduction and demonstration of simple (bivariate) smoothers.

Brief examples of NON-linear regression analysis with graphs

Gentle introduction to Generalized Additive Modeling (GAM).

 

Research and Statistical Support statistical resources workshop

The Rice Virtual Lab in Statistics: http://onlinestatbook.com/rvls/ 

David Lane's site offers a good refresher for some basic content: http://davidmlane.com/hyperstat/index.html

STATSOFT: http://www.statsoft.com/textbook/

A great deal of useful information is available at the Personality Project including an entire textbook for "using R for personality research."

Some links graph galleries: 



http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

Part III: Advanced Part III: Advanced

Module 8: Principle Components Analysis, Factor Analysis and Related

Principle Components Analysis, Maximum Likelihood Factor Analysis & scale reliability

Creating Composite Scores (i.e., indicator variables) with factor analysis.

Creating Composite Scores (i.e., indicator variables) with principle component analysis.

Determining the most appropriate number of factors to extract using VSS

Demonstration of factor rotations and what they do.

Data generation and simulation of various Factor Analytic Models.

Data generation and fitting of binary item factor models.

An example of Bootstrapped Factor Analysis

An example of Hierarchical Factor Analysis

Simple 2 & 3 variable Correspondence Analysis

Brief examples of Multidimensional Scaling

Module 9: Structural Equation Modeling (SEM) and Related

SEM example run in R using SEMData.sav (same as this: Stage 1 and Stage 2)

An intuitive way to do latent variable modeling (CFA, SEM, & latent growth)

Dr. Rich Herrington's CFA & SEM in R simulation which runs in your browser

Estimation of a bi-factor model in packages 'sem' & 'lavaan' using simulated data

Multiple-group bi-factor model tested with the Satorra-Bentler Difference Test

Iterated simulation of Multi-group bi-factor model showing SB Difference Test behavior

Exploring Measurement Invariance in CFA & SEM models.

Example SEM with one hierarchical latent factor

Some examples of Partial Least Squares (PLS) modeling (including SEM/Path Modeling).

Generation of data for the PLS script above.

Moderation & Mediation in PLS Modeling 

A more efficient and consistent way to do PLS Path Modeling.

Module 10: Time Series and Related

Brief example of how to get R to recognize a column of dates.

Brief example(s) of Time Series analysis (using BP oil stock prices; circa Apr. 2010)

Brief look at some financial indices and how to access them.

How to export tables, graphs, and raw data to Excel

How to scrape data from a webpage.

Super Bowl 50 ticket price absurdity!

Module 11: Bayesian Methods

Recommended Bayesian Books (Last updated: Sep. 12, 2011). Summary of each here.

Explicit Bayes: a very simple introduction to Bayesian calculation and inference.

Using Bayes Factors for Bayesian versions of t-test & one way ANOVA

Using Bayesian Model Averaging to address the variable selection problem

Brief example of Bayesian Generalized Linear Modeling (e.g., regression)

Brief example of Bayesian Factor Analysis with MCMC methods

How Bayes Rule was used to find a missing H-bomb and nuclear sub.

Module 12: Miscellaneous

Some common and useful Data Simulation functions.

Generating a Normal Curve

Generating an ogive, binomial, logistic function

Create strong passwords

Basic example of Parallel Processing

Basic example of Text Mining.

Basic example of Geographical Mapping Data.

Simulation of the Central Limit Theorem with graphs

Beware of Simpson's Paradox

The Two Envelopes Problem in R

The German Tank Problem in R

Demonstration and simulation of the Kruskal Count card trick.

Can Sums-of-Squares be equal for two groups of different sizes?

A minor Monte Carlo -like Flea Circus.

A reminder about the importance of graphing data: Anscombe's Quartet

Some examples of Interesting Small Models with Graphs

Fun with Plots and Graphs (a limited and random collection of examples)

The world famous SpiroGraph and graduated color wheels. Other interesting graphs.

A few examples of Spirals and Spheres

How to create SVG graphs and why that is a useful skill.

"What do you mean by asymptotic?"

Introduction to simple GIF creation in R using ImageMagick

Never at the (exact) same point twice! Strange Attractors from the field of Chaos

Automated variable selection using the bootstrapped stepwise AIC (not model based)

Testing your computer's power using a speed testing script.

Playing around with Piano Composition using R.

The Standard Formula for average weight of "normal" humans.

The first 7 Magic Squares....

Making R noisy....

Some R humor   

The Matrix

Simulated Blackjack game in R

 

Research and Statistical Support statistical resources workshop

CRAN Task View: Multivariate Statistics

Statistical learning textbooks:

        http://www-bcf.usc.edu/~gareth/ISL/index.html

 

        http://statweb.stanford.edu/~tibs/ElemStatLearn/

 

STATSOFT: http://www.statsoft.com/textbook/

A great deal of useful information is available at the Personality Project including an entire textbook for "using R for personality research."

Some links graph galleries: 



http://r-statistics.co/Top50-Ggplot2-Visualizations-MasterList-R-Code.html

The DSA SPSS short course
The DSA SAS short course


Contact Information

Jon Starkweather, PhD

Jonathan.Starkweather@unt.edu 940-565-4066

Richard Herrington, PhD

Richard.Herrington@unt.edu

940-565-2140

Please participate in the DSA Client Feedback Survey.

Last updated: 2019.01.24 by Jon Starkweather.

Copyright 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019 by Jonathan D. Starkweather.

These pages have been tested for use with Firefox, other browsers may display the pages incorrectly.