Examplonia1
is a small island country of approximately one million adult citizens.
The population is dispersed across three regions (western, central, and
eastern) and among 15 cities. The census bureau of Examplonia maintains
records of each adult citizen. You can find a description of the 62
variables collected from Examplonia’s citizens below.
Instructors
at University of North Texas (UNT) interested in obtaining a random
sample of Examplonia’s data can request one from this page's author.
When requesting a sample, please indicate the following: your first and
last name (as listed in the university directory), the course prefix,
number, and title of the class for which you are requesting the sample;
the percentage of (random) missing values you want the sample to
contain (e.g., 0%, 5%, 10%, etc.), and the sample size (i.e. number of
citizens/cases) you would like the sample to contain. Requested sample
data files will be sent through UNT email as attachments and be in
plain text (your_name_date.txt), comma delimited format; with NA as
missing if any percentage missing is requested and variable names
across the top row of data.
Section
1: Demographic
Variables
The first 7 columns of data are used to
describe the characteristics of the citizens.
Variable
name: id
Each adult
citizen's data is assigned a sequential id number which simply
identifies them among their peers.
Variable
name: region
Each
citizen’s region of residence; there are three regions in Examplonia;
I, II, III which correspond to west, central, and east.
Variable
name: city.names
Each
citizen’s city of residence; there are 15 cities in Examplonia. The
western region (I) contains the cities: Seeatile, Portly, San Francis,
Los Angelinas, and San Dingo. The central region (II) contains the
cities: Fargis, Omah, Tilsa, Astin, and El Piaso. The eastern region
(III) contains the cities: Bahston, New Jork, Washlesston, Carlot, and
Myami.
Variable
name: gender
Each
citizen’s gender; male or female.
Variable
name: age
Each
citizen’s age; number of years.
Variable
name: education
The number
of years of formal education for each citizen.
Variable
name: income
The annual
income of each citizen.
Section
2: Engagement/Activity Survey (3 subscales/domains)
The next
14 columns of data represent survey questions assessing the levels of
cognitive, physical, and social engagement of each citizen.
Variable
names: q1.cognitive.1, q2.cognitive.2, q3.cognitive.3,
q4.cognitive.4
These four
variables contain the 4-point Likert responses to questions assessing
the cognitive engagement of citizens. Response choices were: Strongly
Agree, Agree, Disagree, and Strongly Disagree. Strong agreement
indicates greater cognitive engagement or activity of intellectual
abilities.
Variable
names: q5.physical.1, q6.physical.2, q7.physical.3,
q8.physical.4, q.9.physical.5
These five
variables contain the 5-point Likert responses to questions assessing
the physical activity (engagement) of citizens. Response choices were:
Hyperactive, More Active, No Difference, Not Very Active, and
Lethargic. Hyperactivity represents greater (i.e. more frequent and
intense) physical activity.
Variable
names: q10.social.1, q11.social.2, q12.social.3,
q13.social.4, q14.social.5
These five
variables contain the 4-point Likert responses to questions assessing
the social engagement of citizens. Response choices were: Strongly
Agree, Agree, Disagree, and Strongly Disagree. Strong agreement
indicates more social engagement, community activity and person to
person activity.
Section
3: Personality and Socio-Political Values (6
subscales/domains)
The next
34 columns of data contain numeric scores assessing a variety of
personality characteristics and citizen values or opinions.
Variable
names: neuroticism, extroversion, openness,
conscientiousness, agreeableness
These five
variables contain scores similar to those produced by the NEO PI-R
(Costa & McCrae, 1992), which reflect each citizen’s propensity
to display certain behaviors and thought patterns. Higher scores
indicate more of the variable’s personality trait.
Variable
names: nuclear, coal, nat.gas.electric, wind, solar
These five
variables contain scores which reflect each citizen’s opinion on the
use of various energy sources (domestic electricity production – for
household, commercial, and governmental use). Higher scores indicate
more favorable views toward the widespread use of a particular source
of energy.
Variable
names: automobile, bus, train, bicycle, walk
These five
variables contain scores which reflect each citizen’s opinion on the
use of various transportation methods. Issues of personal and mass
transit were addressed across a variety of distances (i.e. short trips,
long commutes, etc.). Higher scores represent favorable views toward
the widespread use/utility of a particular variable’s transportation
method.
Variable
names: gasoline, nat.gas.car, hybrid, electric, other
These five
variables contain scores which reflect each citizen’s opinion on the
use of various vehicle types, specifically vehicle propulsion methods.
It is important to note, diesel was included in the ‘gasoline’
variable, and steam, hydrogen, and biofuels (e.g. ethanol) were
included in the ‘other’ variable. Higher scores represent the
favorable views toward the widespread use/utility of a particular
variables propulsion method.
Variable
names: animal.extinction, plant.extinction, severe.storms,
ice.melt, sea.rise
These five
variables contain scores which reflect each citizen’s opinion on the
consequences of human induced climate change. Higher scores reflect
greater anxiety about the impact of a particular consequence.
Variable
names: religion, abortion, national.debt, unemployment,
healthcare, public.edu, campaign.finance, business.regulation
These eight
variables contain scores which indicate each citizen’s opinion on
specific social or political issues facing Examplonia. Higher scores
represent strong beliefs that a particular issue represents a threat to
national solidarity.
Variable
name: social.responsibility
This
variable contains a score which represents the level of obligation each
citizen feels toward the healthy maintenance of their country and its
people. Higher scores represent a greater sense of social
responsibility.
Section 4:
Health
The next 7 columns of data contain
information about the health of each citizen.
Variable
name: tobacco.user
This
variable simply reports whether or not the citizen is a tobacco user:
yes, no.
Variable
name: blood.type
This
variable contains the blood type of each citizen: O positive, A
positive, B positive, AB positive, O negative, A negative, B negative,
and AB negative.
Variable
name: bmi
This
variable contains the Body Mass Index (BMI) of each citizen.
Higher numbers represent greater body fat percentages.
Variable
name: sys.bp
This
variable contains the average Systolic Blood Pressure of each citizen
taken at the beginning and end of their most recent doctors’ visit.
Higher numbers represent higher blood pressure – extremely high or low
numbers represent a cardiovascular health risk.
Variable
name: wbc.count
This
variable contains each citizen’s White Blood Cell (WBC) count, measured
as the number of cells per micro liter (mcL). Higher numbers indicate
higher concentrations of WBC – higher numbers generally indicate better
health, extremely low numbers can indicate a risk to immune system
health.
Variable
name: glucose
This
variable contains each citizen’s blood glucose level, measured in
millimoles per liter (mmol/L). Extremely low levels can indicate a risk
of hypoglycemia; extremely high levels can indicate a risk of
hyperglycemia and long term hyperglycemia can present a risk for
developing diabetes.
Variable
name: ldl.cholesterol
This
variable contains the blood level of Low-Density Lipoprotein (LDL)
cholesterol for each citizen, measured in milligrams per deciliter
(mg/dL). Extremely high levels of LDL cholesterol are associated with a
risk for cardiovascular disease.
Footnotes:
1Examplonia
is a fictional country which allows some (somewhat) meaningful context
for statistical analysis examples. The population data for Examplonia
was generated to provide a statistical population from which random
samples could be drawn for the completion of example statistical
analysis problems. The current population data (March 15, 2012)
contains a variety of univariate and multivariate statistical models
and/or effects. The idea for creating Examplonia originated (for this
author) with Bethlehem (2009).
Other notes:
The
population of Examplonia can change over time, new variables can be
added and the nature of the relationships between variables may be
adjusted.
References
Bethlehem,
J. (2009). Applied Survey Methods: A Statistical Perspective.
Hoboken, NJ: John Wiley & Sons.
Costa, P. T., Jr., & McCrae,
R. R. (1992). NEO PI-R professional manual. Odessa, FL: Psychological
Assessment Resources, Inc.
This page has been tested for use with Firefox,
other browsers may display the pages incorrectly.
|