http://uit.unt.edu/research

Data Science and Analytics

UITHelp Desk | Training | About Us | Publications | DSA Home

NOTE: Please read the FAQ thoroughly before contacting our office.
Web link for requesting support and/or appointments with DSA staff

Please participate in the DSA Client Feedback Survey.

Examplonia

Examplonia1 is a small island country of approximately one million adult citizens. The population is dispersed across three regions (western, central, and eastern) and among 15 cities. The census bureau of Examplonia maintains records of each adult citizen. You can find a description of the 62 variables collected from Examplonia’s citizens below.

Instructors at University of North Texas (UNT) interested in obtaining a random sample of Examplonia’s data can request one from this page's author. When requesting a sample, please indicate the following: your first and last name (as listed in the university directory), the course prefix, number, and title of the class for which you are requesting the sample; the percentage of (random) missing values you want the sample to contain (e.g., 0%, 5%, 10%, etc.), and the sample size (i.e. number of citizens/cases) you would like the sample to contain. Requested sample data files will be sent through UNT email as attachments and be in plain text (your_name_date.txt), comma delimited format; with NA as missing if any percentage missing is requested and variable names across the top row of data.

 


Section 1: Demographic Variables

The first 7 columns of data are used to describe the characteristics of the citizens.

Variable name: id

Each adult citizen's data is assigned a sequential id number which simply identifies them among their peers.

Variable name: region

Each citizen’s region of residence; there are three regions in Examplonia; I, II, III which correspond to west, central, and east.

Variable name: city.names

Each citizen’s city of residence; there are 15 cities in Examplonia. The western region (I) contains the cities: Seeatile, Portly, San Francis, Los Angelinas, and San Dingo. The central region (II) contains the cities: Fargis, Omah, Tilsa, Astin, and El Piaso. The eastern region (III) contains the cities: Bahston, New Jork, Washlesston, Carlot, and Myami.

Variable name: gender

Each citizen’s gender; male or female.

Variable name: age

Each citizen’s age; number of years.

Variable name: education

The number of years of formal education for each citizen.

Variable name: income

The annual income of each citizen.

 


Section 2: Engagement/Activity Survey (3 subscales/domains)

The next 14 columns of data represent survey questions assessing the levels of cognitive, physical, and social engagement of each citizen.

Variable names: q1.cognitive.1, q2.cognitive.2, q3.cognitive.3, q4.cognitive.4

These four variables contain the 4-point Likert responses to questions assessing the cognitive engagement of citizens. Response choices were: Strongly Agree, Agree, Disagree, and Strongly Disagree. Strong agreement indicates greater cognitive engagement or activity of intellectual abilities.

Variable names: q5.physical.1, q6.physical.2, q7.physical.3, q8.physical.4, q.9.physical.5

These five variables contain the 5-point Likert responses to questions assessing the physical activity (engagement) of citizens. Response choices were: Hyperactive, More Active, No Difference, Not Very Active, and Lethargic. Hyperactivity represents greater (i.e. more frequent and intense) physical activity.

Variable names: q10.social.1, q11.social.2, q12.social.3, q13.social.4, q14.social.5

These five variables contain the 4-point Likert responses to questions assessing the social engagement of citizens. Response choices were: Strongly Agree, Agree, Disagree, and Strongly Disagree. Strong agreement indicates more social engagement, community activity and person to person activity.

 


Section 3: Personality and Socio-Political Values (6 subscales/domains)

The next 34 columns of data contain numeric scores assessing a variety of personality characteristics and citizen values or opinions.

Variable names: neuroticism, extroversion, openness, conscientiousness, agreeableness

These five variables contain scores similar to those produced by the NEO PI-R (Costa & McCrae, 1992), which reflect each citizen’s propensity to display certain behaviors and thought patterns. Higher scores indicate more of the variable’s personality trait.

Variable names: nuclear, coal, nat.gas.electric, wind, solar

These five variables contain scores which reflect each citizen’s opinion on the use of various energy sources (domestic electricity production – for household, commercial, and governmental use). Higher scores indicate more favorable views toward the widespread use of a particular source of energy.

Variable names: automobile, bus, train, bicycle, walk

These five variables contain scores which reflect each citizen’s opinion on the use of various transportation methods. Issues of personal and mass transit were addressed across a variety of distances (i.e. short trips, long commutes, etc.). Higher scores represent favorable views toward the widespread use/utility of a particular variable’s transportation method.

Variable names: gasoline, nat.gas.car, hybrid, electric, other

These five variables contain scores which reflect each citizen’s opinion on the use of various vehicle types, specifically vehicle propulsion methods. It is important to note, diesel was included in the ‘gasoline’ variable, and steam, hydrogen, and biofuels (e.g. ethanol) were included in the ‘other’ variable.  Higher scores represent the favorable views toward the widespread use/utility of a particular variables propulsion method.

Variable names: animal.extinction, plant.extinction, severe.storms, ice.melt, sea.rise

These five variables contain scores which reflect each citizen’s opinion on the consequences of human induced climate change. Higher scores reflect greater anxiety about the impact of a particular consequence.

Variable names: religion, abortion, national.debt, unemployment, healthcare, public.edu, campaign.finance, business.regulation

These eight variables contain scores which indicate each citizen’s opinion on specific social or political issues facing Examplonia. Higher scores represent strong beliefs that a particular issue represents a threat to national solidarity.

Variable name: social.responsibility

This variable contains a score which represents the level of obligation each citizen feels toward the healthy maintenance of their country and its people. Higher scores represent a greater sense of social responsibility.

 


Section 4: Health

The next 7 columns of data contain information about the health of each citizen.

Variable name: tobacco.user

This variable simply reports whether or not the citizen is a tobacco user: yes, no.

Variable name: blood.type

This variable contains the blood type of each citizen: O positive, A positive, B positive, AB positive, O negative, A negative, B negative, and AB negative.

Variable name: bmi

This variable contains the Body Mass Index (BMI) of each citizen.  Higher numbers represent greater body fat percentages.

Variable name: sys.bp

This variable contains the average Systolic Blood Pressure of each citizen taken at the beginning and end of their most recent doctors’ visit. Higher numbers represent higher blood pressure – extremely high or low numbers represent a cardiovascular health risk.

Variable name: wbc.count

This variable contains each citizen’s White Blood Cell (WBC) count, measured as the number of cells per micro liter (mcL). Higher numbers indicate higher concentrations of WBC – higher numbers generally indicate better health, extremely low numbers can indicate a risk to immune system health.

Variable name: glucose

This variable contains each citizen’s blood glucose level, measured in millimoles per liter (mmol/L). Extremely low levels can indicate a risk of hypoglycemia; extremely high levels can indicate a risk of hyperglycemia and long term hyperglycemia can present a risk for developing diabetes.

Variable name: ldl.cholesterol

This variable contains the blood level of Low-Density Lipoprotein (LDL) cholesterol for each citizen, measured in milligrams per deciliter (mg/dL). Extremely high levels of LDL cholesterol are associated with a risk for cardiovascular disease.


Footnotes:

1Examplonia is a fictional country which allows some (somewhat) meaningful context for statistical analysis examples. The population data for Examplonia was generated to provide a statistical population from which random samples could be drawn for the completion of example statistical analysis problems. The current population data (March 15, 2012) contains a variety of univariate and multivariate statistical models and/or effects. The idea for creating Examplonia originated (for this author) with Bethlehem (2009).

 

Other notes:  

The population of Examplonia can change over time, new variables can be added and the nature of the relationships between variables may be adjusted.

 

 

References

Bethlehem, J. (2009). Applied Survey Methods: A Statistical Perspective. Hoboken, NJ: John Wiley & Sons.

Costa, P. T., Jr., & McCrae, R. R. (1992). NEO PI-R professional manual. Odessa, FL: Psychological Assessment Resources, Inc.

This page has been tested for use with Firefox, other browsers may display the pages incorrectly.

The DSA Introduction to R short course
The DSA Introduction to SPSS short course
The DSA Introduction to SAS short course

UNT home page

Contact Information

Jon Starkweather, PhD

Jonathan.Starkweather@unt.edu 940-565-4066

Richard Herrington, PhD

Richard.Herrington@unt.edu

940-565-2140

Please participate in the DSA Client Feedback Survey.

Last updated: 2018.11.02 by Jon Starkweather.

Copyright 2012, 2013, 2014, 2015, 2016, 2017, 2018 by Jonathan D. Starkweather.

These pages have been tested for use with Firefox, other browsers may display the pages incorrectly.