DSA R Introduction Course: Module 2 Help

Data and Science and Analytics

Please participate in the DSA Client Feedback Survey.

Back to the Do it yourself Introduction to R

(1) Reading data into R directly from a URL.

Reading data into R from the web is very easy, you simply specify where the file is located with the URL in the function. There are three common functions for reading, or importing, data into R; regardless of where the data is stored. Those functions are 'read.table' (for most text files -- which have the extension .txt), 'read.csv' (for comma separated values files -- which have the extension .csv), and 'read.spss' (for SPSS data files -- which have the file extension .sav). Note however, as was discussed in the previous tutorial, when importing SPSS data files (.sav) you must first load the 'foreign' library.

An example of reading a text (.txt) file into R and naming the data "example.3":
example.3 <- read.table("http://bayes.acs.unt.edu:8083:8083/BayesContent/class/Jon/R_SC/Module3/ExampleData3.txt",

 header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)

summary(example.3)

Notice, the function is 'read.table'. The first argument to that function is the location of the file, in quotations, giving the URL. The 'header' argument specifies whether or not the file has a header of names as the first line (top row). The 'sep' argument specifies what character separates each data point (e.g., a space, a comma, a period, etc. between each of the columns of data). The 'na.strings' argument specifies how you want to identify missing values -- the common R default being "NA". The 'dec' argument specifies what is used for a decimal point. The 'strip.white' argument specifies whether or not the function will remove the white space from before and after unquoted character fields, it is only specified when 'sep' has been specified. The 'read.table' function can be used with comma separated value files, but the 'read.csv' file is very similar (i.e. the two functions are virtually the same accept that the 'read.csv' function does not need the 'sep' or 'strip.white' arguments).

An example of reading an SPSS (.sav) file into R and naming the data "example.1":

library(foreign)

example.1 <- read.spss("http://bayes.acs.unt.edu:8083:8083/BayesContent/class/Jon/R_SC/Module3/ExampleData1.sav",

 use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)

summary(example.1)

Notice above, the 'foreign' library must be loaded, perhaps SPSS data is considered foreign to R... At any rate, you'll notice the first argument to the 'read.spss' function specifies the location (URL) of the file being retrieved. The 'use.value.labels' argument specifies whether or not the values (FALSE) or value labels (TRUE) will be displayed for factor variables (i.e. grouping variables; e.g., gender; values = 1 or 2, value labels = "Male" or "Female"). Note, if this argument is set to FALSE, the variable will be considered numeric and when this argument is set to TRUE, the variable will be considered a factor. The 'max.value.labels' argument specifies how many unique valid values will be converted to value labels when a variable is converted to a factor. The 'to.data.frame' argument specifies whether or not the data will be specified as a data frame; if FALSE, then the data will simply be a matrix.

The 'summary' function from both examples above, simply provides the minimum value, 1st quartile value, median, mean, 3rd quartile value, maximum value, and how many cells contain missing data for each numeric variable in the data. When a variable is a factor (e.g., gender), then summary returns the number of cases/rows of each level of the factor (e.g. "Males" = 103, "Females" = 121). The summary function is perhaps one of the most often used functions in R and certainly the most frequently used function on this web site. It can be applied to vectors (numeric or factor), matrices, data frames, lists, and fitted model objects (e.g., a regression model, a factor analysis, etc.). As its name implies, it simply provides a summary of whatever object is passed to it and the output (or returned values) vary widely depending on the object on which it is run.

In future tutorial notes, we will be using R console and script files; but remember all scripts can be copied and pasted into the R Console. The script files can also be downloaded and then opened with the R Console or in R Commander using ‘File’, ‘Open script file…’ in the Console or Rcmdr top task bar.

When reading the script files, you'll notice the common convention of using # to start a comment line (which is not working code), while lines without # are working code.

Back to the Do it yourself Introduction to R

Please participate in the DSA Client Feedback Survey.

Contact Information
Jon Starkweather, PhD	Jonathan.Starkweather@unt.edu	940-565-4066
Richard Herrington, PhD	Richard.Herrington@unt.edu	940-565-2140

Last updated: 2018.11.06 by Jon Starkweather.