UIT
| Help Desk |
Training |
About Us
|
Publications
| DSA Home
Please
participate in the DSA Client Feedback Survey.
Back to the Do
it yourself Introduction
to R
|
(1) Reading data
into R directly from a URL.
Reading data into R from
the web is very easy, you simply specify where the file is located with
the URL in the function. There are three common functions for reading,
or importing, data into R; regardless of where the data is stored.
Those functions are 'read.table' (for most text files -- which have the
extension .txt), 'read.csv' (for comma separated values files -- which
have the extension .csv), and 'read.spss' (for SPSS data files -- which
have the file extension .sav). Note however, as was discussed in the
previous tutorial, when importing SPSS data files (.sav) you must first
load the 'foreign' library.
An example of reading a text (.txt) file into R and naming the data "example.3": example.3 <- read.table("http://bayes.acs.unt.edu:8083:8083/BayesContent/class/Jon/R_SC/Module3/ExampleData3.txt",
header=TRUE, sep="", na.strings="NA", dec=".", strip.white=TRUE)
summary(example.3)
Notice, the function is
'read.table'. The first argument to that function is the location of
the file, in quotations, giving the URL. The 'header' argument
specifies whether or not the file has a header of names as the first
line (top row). The 'sep' argument specifies what character separates
each data point (e.g., a space, a comma, a period, etc. between each of
the columns of data). The 'na.strings' argument specifies how you want
to identify missing values -- the common R default being "NA". The
'dec' argument specifies what is used for a decimal point. The
'strip.white' argument specifies whether or not the function will
remove the white space from before and after unquoted character fields,
it is only specified when 'sep' has been specified. The 'read.table'
function can be used with comma separated value files, but the
'read.csv' file is very similar (i.e. the two functions are virtually
the same accept that the 'read.csv' function does not need the 'sep' or
'strip.white' arguments).
An example of reading an
SPSS (.sav) file into R and naming the data "example.1":
library(foreign)
example.1 <- read.spss("http://bayes.acs.unt.edu:8083:8083/BayesContent/class/Jon/R_SC/Module3/ExampleData1.sav",
use.value.labels=TRUE, max.value.labels=Inf, to.data.frame=TRUE)
summary(example.1)
Notice above, the
'foreign' library must be loaded, perhaps SPSS data is considered foreign
to R... At any rate, you'll notice the first argument to the
'read.spss' function specifies the location (URL) of the file being
retrieved. The 'use.value.labels' argument specifies whether or not the
values (FALSE) or value labels (TRUE) will be displayed for factor
variables (i.e. grouping variables; e.g., gender; values = 1 or 2,
value labels = "Male" or "Female"). Note, if this argument is set to
FALSE, the variable will be considered numeric and when this argument
is set to TRUE, the variable will be considered a factor. The
'max.value.labels' argument specifies how many unique valid values will
be converted to value labels when a variable is converted to a factor.
The 'to.data.frame' argument specifies whether or not the data will be
specified as a data frame; if FALSE, then the data will simply be a
matrix.
The 'summary' function
from both examples above, simply provides the minimum value, 1st
quartile value, median, mean, 3rd quartile value, maximum value, and
how many cells contain missing data for each numeric variable in the
data. When a variable is a factor (e.g., gender), then summary returns
the number of cases/rows of each level of the factor (e.g. "Males" =
103, "Females" = 121). The summary function is perhaps one of the most
often used functions in R and certainly the most frequently used
function on this web site. It can be applied to vectors (numeric or
factor), matrices, data frames, lists, and fitted model objects (e.g.,
a regression model, a factor analysis, etc.). As its name implies, it
simply provides a summary of whatever object is passed to it and the
output (or returned values) vary widely depending on the object on
which it is run.
In future tutorial notes,
we will be using R console and script files; but remember all scripts
can be copied and pasted into the R Console. The script files can also
be downloaded and then opened with the R Console or in R Commander
using ‘File’, ‘Open script file…’ in the Console or Rcmdr top task bar.
When reading the script
files, you'll notice the common convention of using # to start a
comment line (which is not working code), while lines without # are
working code.
|
Back to the
Do
it yourself Introduction
to R |
Please
participate in the DSA Client Feedback Survey.
Last
updated: 2018.11.06 by Jon Starkweather.
UIT
| Help Desk |
Training |
About Us
|
Publications
| DSA Home
|