rssunt.gif (12308 bytes)


Computer Tools for Research and Data Analysis

Creation date: 2/25/99
Author: Karl Ho

Objectives: This course is to introduce the fundamentals of using computer for research and familiarize researchers with the computing environment at the University of North Texas.  Its Objectives include:

1. Understanding the Data Preparation Process;
2. Managing Data Sets for Analysis;
3. Performing Exploratory Data Analysis

Topics:

  1. Introduction
  2. Data Collection
  3. Data Storage and Transferal
  4. Data Exposition
  5. Data Analysis

1. Introduction

This course focuses on one of the integral building blocks of the scientific research process: data. This training will cover the various topics pertaining to the gathering and management of data needed for scientific research. These comprise the collection of data, methods of storing and transferring data and exploring data as preliminary analysis.   In the last session we will also illustrate how to present the data with which we can perform simple statistical analysis.

2. Data Collection

Successful research necessitates good planning on gathering the data for analysis.   The very first step of planning for data collection alone can determine if the project prevails.  Hence, principal investigators and researchers are required to ensure the data are collected in an appropriate manner using the right instruments, recorded error-free with the right tools and imported into statistical application with detailed documentation.  In most cases, data are extracted from its original raw form (e.g. questionnaire or measuring instrument log) and recorded using computer-recognizable codes such as the ASCII alphanumeric values.  These values can be typed into a word processing file or an electronic spreadsheet.   Most of the present day statistical applications can read in files in these two categories of file formats.   For Windows applications such as SPSS, data entry can also be done using data editor or tables built-in in the software.  The most commonly used method, however, is the ASCII text file since electronic spreadsheet programs are usually bound by the amount of computer memory and, in some cases, stop short when the memory is short or out.   Direct entry into data table or editor of the program also suffers from memory limits and lack of flexibility in controlling the variable formats.   Using ASCII text files, users can set up programs to assign formats and control details like decimal place in a more flexible manner.

File Formats:

ASCII - American Standard Code for Information Interchange
EBCDIC - Extended Binary Coded Decimal Interchange Code (EBCDIC)
Binary

Editors:

Shift Key - Internet Language

3. Data Storage and Transferal

After collecting and recording the data in a file, you need to pay attention to the fashion you keep the files.  It is ALWAYS advised to keep second and even third copies of the file and BACKUP the files somewhere other than just the floppy disk.   You may also transfer the files to your computer accounts such as CMS or UNIX sol accounts so in case your computer gets burnt down or crashed, you still have a copy in a remote computer. 

Different formats

SPSS system file- *.sys, *.sav
SPSS portable file - *.exp, *.por
SAS system file - *.sd2
Lotus Spreadsheet - *.wk*
Excel Spreadsheet - *.xls
ASCII/DOS text - *.dat, *.prn, *.txt

What mode/formats are they in:

*.sys -      binary/ASCII
*.sav -     binary/ASCII
*.exp/*.por - binary/ASCII
*.sd2 -     binary/ASCII
*.dat - binary/ASCII
*.xls - binary/ASCII
*.wk* - binary/ASCII

How to store the files on a remote computer?  First, you need to have a computer account.  You can get the account from the Computer Center.  Specify if you need a UNIX sol (confined to Faculty and Graduate students) or a mainframe CMS account.    Consult your professor for the latter one.

Using FTP to transfer files

Shift Key

4. Data Exposition

Most researchers are anxious about SEEING the data: what they look like, how they are distributed, what can be drawn from the data, etc.  First of all, you can use a view table or a spreadsheet to visualize the figures and contents in a data set.  Once you read in the data using a statistical application, you can also examine different dimension of the variables or plot them in univariate, bivariate and multivariate charts:

Histogram using SPSS

wpe23.jpg (32792 bytes)

Matrix Scatter Plot using SPSS

wpe3.jpg (30476 bytes)

Time Series Plot Using SAS

sasts.gif (6478 bytes)

Surface Plot Using SAS

sassurf.gif (11129 bytes)

 


World Map using Excel

map3.jpg (239647 bytes)

Texas GIS Map using SAS

Trellis Plot using S-Plus

repdiss2.jpg (76595 bytes)

5. Data Analysis

Once the data are well managed and visualized, you can apply further statistical analysis to study the "data generating process" and relationships among the variables.  There are two approaches in analyzing the data: using a point-and-click approach or programming  approach.  The former is easy to use and applicable for most beginners.  For long term research projects, the latter is highly recommended since 1. more flexibility in manipulating the data; 2. some sophisticated procedures and specifications are not available from the menu; 3. replication is possible.

Software supported at UNT RSS office:

Software/Version* Windows Mac CMS MVS UNIX
SAS 6.12/7 6.1 6.08/TS425 6.09/TS450 6.12
SPSS 8.0.1/9 6.1 4.1 4.1 6.1
Eviews 3.1 - - - -
LISREL 8.2/8.3 - 7 7 -
S-Plus 4.5 - - - 5.0

red font - latest version

Shift Key

The following are a few SPSS sample programs:

T-test

T-TEST
  /TESTVAL=0
  /MISSING=ANALYSIS
  /VARIABLES=happy
  /CRITERIA=CIN (.95) .

 

 

T-Test

 

One-way ANOVA

ONEWAY
  happy BY educ
  /MISSING ANALYSIS .

 

 

Oneway

 

GLM - General Factorial

UNIANOVA
  life  BY educ  WITH age
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /CRITERIA = ALPHA(.05)
  /DESIGN = age educ .

Univariate Analysis of Variance

Linear Regression

REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN
  /DEPENDENT happy
  /METHOD=ENTER age educ  .

 

 

Regression

 

ACS short courses

Application Course Title Time
Research Tools Computer Tools for Research and Data Analysis 3 Hours
SAS Introduction to SAS 3 Hours
  Workshop in S-Plus Programming I 3 Hours
  Workshop in S-Plus Programming II 3 Hours
  Mapping Data Using SAS and Excel 3 Hours
SPSS Introduction to SPSS 3 Hours
  Workshop in S-Plus Programming I 3 Hours
  Workshop in S-Plus Programming II  3 Hours
S-Plus Introduction to S-Plus 3 Hours
  Workshop in S-Plus Programming I 3 Hours
  Workshop in S-Plus Programming II 3 Hours

Evaluation


MAIN

Last updated: 01/18/06 by Karl Ho