next up previous
Up: Introduction to Statistics Home

Module 2: Displaying Data

Jon Starkweather, PhD

Jon Starkweather, PhD
jonathan.starkweather@unt.edu
Consultant
Research and Statistical Support



Image UNT_Brand
http://www.unt.edu



Image RSS_logo_001
A list of them is available at:




1. Collecting Data

1.1. Context of an example study

Example Study:
Do students in a statistics class really pay attention to the slides being presented?
We set up hidden digital video cameras to clearly record the faces (& facial expressions) of each student in class.
Next, we alter a typical slide presentation, so that in the middle of the presentation (middle of class time) a single, high-resolution image appears for 5 seconds, then disappears.

  • The image chosen displays a 90-year-old man wearing only bright red lipstick and a red string-bikini.
Using the videos, the time (100ths of a second) between the image appearing and any visible reaction from each of the students was recorded.

1.2. Variable(s) of Interest

Data Collected: Reaction Time:

Table 1: Raw Data
Students 1-10 Students 11-20 Students 21-30 Students 31-40
60 55 55 57
50 53 51 56
62 57 56 52
61 59 56 58
59 54 54 53
59 58 55 56
57 56 54 51
57 60 56 55
58 58 53 56
52 55 57 56


Reaction Time in 100ths of a second.

1.3. Demographic Variables

Demographic Variables

Generally, when we collect data, we are also interested in the individuals we collect it from; be those individuals: persons, schools, corporations, etc.
So, we will often collect information on demographic variables which help us understand the nature of the individuals (i.e., demographic variables tell us something about our sample).

  • With human participants: age, gender/sex, ethnicity, location, income or socio-economic status (SES)...etc.

In our current example, we might also take note of the students' gender and their class standing.

  • Male (0) or Female (1)
  • Freshman (1), Sophomore (2), Junior (3), Senior (4)

Data: What do we have?

Our data consists of 40 individuals or cases.

  • In this example; humans, specifically students in a statistics class

Our data also consists of three variables.

  • Reaction: the reaction time of each of the students expressed in 100ths of a second.
  • Gender: male or female
  • Class Standing: Freshman, Sophomore, Junior, Senior

So what do we do with our data now?

2. Displaying Data in Tables

Displaying the data

So, what do we do with our data now?

  • Generally, the raw data is not terribly useful.
  • So, the first thing we do is display the data so that we can begin to interpret it.
  • There are many ways of displaying data
    • We usually start with frequency tables
    • Then use the frequencies to create some figures.

Tables & Figures

A key distinction to make when displaying data (and talking about displaying data) is the distinction between Tables & Figures.

  • A Table is made up of straight lines and text content which can be typeset.
    • letters and numbers
    • Table 1 above, which showed our raw data
  • A Figure is any graphical representation
    • Wiring schematic, photograph, thermal image, MRI image, topographical map, ...
    • And common graphs as we will see in a moment.

2.1. Frequency Tables

Frequency Tables

  • There are many kinds of frequency tables & figures, each is used to display how frequently (how many times) each score occurred, commonly called the ``distribution'' of scores.
  • Frequency tables display:
    • All the possible values of a variable sequentially in the first column
    • Then they display the frequency of occurrence of each value in the next column.
      • Frequencies are calculated by simply counting the number of individuals having the specified value.
  • Occasionally, the cumulative frequency and/or cumulative percentage is listed in subsequent columns.

Frequency Table: Continuous Variable

Table 2: Reaction Time Frequency Table
Reaction Time Frequency
50 1
51 2
52 2
53 3
54 3
55 5
56 8
57 5
58 4
59 3
60 2
61 1
62 1

Frequency Table: Categorical Variable

Table 3: Class Standing Frequency Table
Class Standing Code Frequency Percent Cum.Percent
Freshman 1 5 12.5 12.5
Sophomore 2 12 30.0 42.5
Junior 3 16 40.0 82.5
Senior 4 7 17.5 100


Optional percentages and cumulative percentages displayed.

Frequency Table: Categorical Variable

Table 4: Gender Frequency Table
Gender Code Frequency
Male 0 14
Female 1 26

2.2. Stem and Leaf plots

Stem and Leaf Plots

Stem and Leaf plots can be thought of as grouped frequency tables. Each `stem' is the part of an individual's score which is associated with a group, and each `leaf' is the part associated with the individual.

Table 5: Reaction Time Stem and Leaf plot
Frequency Stem Leaf
3 5 0 1 1
5 5 2 2 3 3 3
8 5 4 4 4 5 5 5 5 5
13 5 6 6 6 6 6 6 6 6 7 7 7 7 7
7 5 8 8 8 9 9 9 9
3 6 0 0 1
1 6 2

Stem and Leaf Plots

A more representative example of a Stem and Leaf plot. Each stem represents a decade.

Table 6: Age Stem and Leaf plot
Frequency Stem Leaf
21 2 011222333344555667899
16 3 1112233455677889
14 4 00122345556789
13 5 2233455556667
5 6 00126
3 7 126
1 8 2

3. Figures and/or Graphs

3.1. Graphing Data

Graphing Data

  • Displaying data in Frequency tables is fairly easy and informative when:
    • There are relatively few case (individuals)
    • There are relatively few possible values (of the variable).
  • Often the values of a variable are much more or much less numerous, resulting in uninformative frequency tables.
  • It is almost always beneficial to go beyond the frequency tables and create some sort of graphical display to make the data more interpretable.

3.2. Bar Graphs

Bar Graphs

Bar Graphs are used to graphically display frequencies and are used when the variable is categorical (discrete).
Recall our demographic variables from our example.

  • Gender
  • Class Standing

Bar Graph of Gender

Figure 1: Gender Bar Graph
Image GenderBar

Bar Graph of Class Standing

Figure 2: Class Standing Bar Graph
Image ClassStandingBar

3.3. Histograms

Histograms

Histograms are used to graphically display frequencies and are especially useful when the variable's values are numerous.
Recall, our example measured reaction time in 100ths of a second.

  • If we had over a thousand students in our example and
  • We had students distributed between a very late reaction of 450 (4.5 seconds) and a very quick reaction of 3 (.03 of a second).
Then it would be beneficial to use a Histogram.
  • Looks like a bar graph; but the `bars' have no space between them, and
  • Each bar represents multiple values of the variable.

Histogram of Reaction Time

Figure 3: Reaction Time Histogram w/example study data
Image ReactionHist

Histogram of a much large and more widely distributed sample of Reaction Time

Figure 4: Expanded Histogram (not example study data)
Image ExpReactionHist

Pros and Cons of Histograms

  • Pro 1: Makes a large distribution of scores easy to interpret graphically.
  • Pro 2: Makes spotting outliers easy.
    • Outlier: an extreme case, a case whose value is far to one end or another of the distribution.
  • Con 1: Some information is lost when bars represent more than one value.

3.4. Boxplots

Boxplots

Boxplots are good for showing the where the bulk of data lies in relation to the tails (whiskers) of a distribution.

Figure 5: Reaction Time Boxplot
Image ReactionBox

Multiple Group boxplots

Here, the boxplot shows the reaction time distribution of both males and females.

Figure 6: Reaction Time by Gender Boxplot
Image ReactionGenBox

3.5. Scatterplots

Scatterplots

Scatterplots are used to show how two (or more) variables are distributed together. Here, the `plain' scatterplot shows reaction time and age.

Figure 7: Reaction Time and Age
Image RTScatterPlain

Scatterplots

Figure 8: Reaction Time and Age w/boxplots
Image RTScatterBoxPlot

Scatterplot Matrix: more than 2 variables

Figure 9: Scatterplot Matrix w/histograms on diagonal
Image ScatterMatrix

4. Summary of Module 2

4.1. Additional Considerations/Issues

Additional Thoughts on Displaying Data

  • Using tables and figures makes the data more interpretable, or accessible, for ourselves and others.
  • While producing graphs is incredibly easy with computers, we must not forget that understanding the data and conveying it are the goals of displaying data.
    • Not simply producing nice graphs
  • When creating graphs; be very careful about how you scale each axis.
    • There is great power when creating a simple graph.
  • An example follows.

Scale Importance

Here is the same exact information as displayed in Figure 6.

  • Changing the scale has made the two genders look more similar by compressing the boxplots.
Figure 10: Reaction Time by Gender Boxplot
Image BadGenBox

4.2. Summary of Module 2

Summary of Module 2

Module 2 covered the following topics:

  • Collecting Data
    • Context of an example study
    • Variable(s) of interest
    • Demographic variables.
  • Displaying Data in Tables.
    • Frequency Tables
    • Stem & Leaf plots
  • Displaying Data in Graphs
    • Bar Graphs
    • Histograms
    • Boxplots
    • Scatterplots
  • Additional Considerations & Issues

4.3. What's next

This concludes Module 2

Next time Module 3.

  • Next time we'll begin covering descriptive statistics.
  • Until next time; have a nice day.




These pages were last updated on: October 4, 2010



These pages were created using LATEX. This document was created in LATEX and converted to HTML using LATEX2HTML.



Return to the Short Course page by clicking the link below.

up previous
Up: Introduction to Statistics Home
jds0282 2010-10-04