Up: RSS Introduction to Statistics Home

Module 5: Introduction to Hypothesis Testing

Jon Starkweather, PhD

Jon Starkweather, PhD
jonathan.starkweather@unt.edu
Consultant
Research and Statistical Support

Image UNT_Brand

http://www.unt.edu

Image RSS_logo_001

A list of them is available at:

1. Hypothesis Testing Hypothesis Testing Hypothesis testing is a systematic procedure for deciding whether the results of a research study, which examines a sample, support a particular theory or practical innovation, which applies to a population. The type of hypothesis testing done in much of the social sciences is called Null Hypothesis Significance Testing (NHST). Decisions, Decisions... What is Null Hypothesis Significance Testing all about? Use an inferential procedure to examine the credibility of a hypothesis about a population based on the probability of sample data. Notice; NHST is not about testing the probability of a hypothesis or theory! NHST gets its name from the use of the Null hypothesis. Originally referred to a hypothesis of no difference. Modern interpretation and use allows more precise specification. Symbols of NHST Typically we use population symbols when expressing hypotheses. The Null hypothesis uses the symbol and is read as `H-oh' or `H-naught'. Example Null hypothesis in symbols: Which is read as, we hypothesize the mean of population 1 is equal to the mean of population 2. The alternative hypothesis uses the symbol and is read as `H-one'. Example alternative hypothesis in symbols: Which is read as, we hypothesize the mean of population 1 is not equal to the mean of population 2. Steps of NHST (one version among many) Define the populations of interest and restate the research question as null and alternative hypotheses about the populations. Determine the characteristics of the comparison distribution. Determine the cutoff sample score on the comparison distribution at which the null should be rejected (typically associated with p = .05). Determine your sample's score on the comparison distribution (i.e., compute your sample statistic) Compare and make a decision. To reject or not to reject? That is the question! Logic of NHST If we believe something to be different, why do we start by hypothesizing that things are the same? Falsification Very difficult to prove things ``true'' but, it is not difficult to show things are ``not true''. Provides a basis for statistical testing. Gives us somewhere to start (i.e., something to test). Provides the first piece of evidence for our empirical decisions. More on this `falsification' thingy... No amount of confirmation can achieve certainty for some types of statements. Not verifiable: There are no pink swans. All swans are purple. You would have to collect all swans to verify these statements. Verifiable: There are blue swans. Not all swans are green. You would need only one example to verify these statements. Confirmed theory is not truth, but merely conjecture. Nothing is proven true, only supported by evidence. But, theory can be falsified with more certainty. 2. Example Z-test Example: Dog IQ scores Research question: Are dogs on cartoons smarter than `regular' dogs? In order to investigate, we use Scooby Doo as a representative of dogs on cartoons and we measure Scooby Doo's IQ (X = 123). Step 1: Define the populations and state the null and alternative hypotheses. Population 1: Dogs on cartoons (represented by Scooby). Population 2: Dogs not on cartoons. Step 2: Determine the comparison distribution. Population distribution of IQ among dogs not on cartoons is known to be normal and has . Therefore, we can use the standard normal distribution (Z-score distribution) as our comparison distribution. Step 3: Determine the cutoff sample score, also called the critical value, associated with our probability cutoff (.05). corresponds to a Z-score = 1.64; this is our critical score on the comparison distribution. Recall the table of Z-score values from the previous module. http://www.sjsu.edu/faculty/gerstman/EpiInfo/z-table.htm The is how we represent the top 5 percent of the comparison distribution as a cutoff point. The Z-score associated with at the higher end of the Standard Normal Curve is 1.64. Another way of saying this is: the Z-score, 1.64, represents the cutoff point between the lower 95% (or more precisely 94.95%) and the top 5% of scores. Critical Value In this example, we are saying 95% of the (dogs not on cartoons) population of IQ scores are below this point. Population 2: dogs not on cartoons. Recall, we are testing whether or not dogs on cartoons (represented by Scooby Doo) are significantly smarter than dogs not on cartoons. If we find Scooby's IQ is greater than our cutoff (if Scooby's Z-score is greater than 1.64) then we have evidence he is significantly `smarter' than dogs not on cartoons. Which is to say, evidence that he does not come from population 2, but instead he represents a different population, namely population 1. Step 4: Determine your sample's score on the comparison distribution (i.e., compute your statistic). Calculated Z-score (Z-calc) = 1.53, which when we look in the Z-score table corresponds to (.9370 or 93.70%). To find the value; look for 1.5 on the left and 0.03 on the top (of the table linked below) which corresponds to a Z-score of 1.53 http://www.sjsu.edu/faculty/gerstman/EpiInfo/z-table.htm Step 5: Compare and make a decision. To reject or not to reject? That is the question! Compare our critical (crit) value to our calculated (calc) value (Z-scores or values) Although a bit confusing, the second is the same as the first and is typically used when reporting results (e.g., research articles). Both indicate that Scooby's IQ was not far enough from the mean of population 2 to distinguish him as a member of population 1. Thus, we have no evidence for population 1 and the idea that dogs on cartoons are smarter than dogs not on cartoons. Fail to reject the null hypothesis; we never say we accept the null or alternative hypothesis. 3. p values But what does the value mean? Probability obtained tells us: If the null hypothesis were true, the probability of obtaining a sample statistic of the kind observed. Stated another way: probability tells us how extreme is the result we got; assuming the null is true. If it is more extreme than our stated critical value (i.e., lower probability than .05), then we reject the null hypothesis. If it is less extreme than our stated critical value (i.e., greater probability than .05), then we fail to reject the null hypothesis. We never accept either hypothesis, nor do we reject the alternative hypothesis. We either reject or fail to reject the null hypothesis. More on probability obtained. It is important to recognize that the values are what tie our calculated sample statistic to the comparison distribution and vice-versa. If Z-calc is larger than Z-crit, we reject the null because, it indicates our sample (representing population 1) is more extreme than our critical value (representing population 2). Interpretation: Population 1 is significantly different from population 2 (). If Z-crit is larger than Z-calc, we fail to reject because, it indicates our sample (population 1) is not extreme enough to differentiate it from population 2. Interpretation: Population 1 is not significantly different from population 2 (); they are the same (or only population 2 exists). 4. One or Two Tails? One-tailed vs. Two-tailed Note, our example alternative hypothesis was directional; dictating a `one-tailed' test. We were interested in whether or not population 1 was significantly greater than population 2. A precise alternative hypothesis. We were not asking the question ``is population 1 significantly different from population 2'', which is a more general, non-directional hypothesis and dictates a two-tailed test. Because it is less specific; population 1 could be greater or less than population 2. One-tailed vs. Two-tailed In a one-tailed test, the .05 critical level we typically use gets applied to one side (high or low) of the distribution. Note, one-tailed test hypotheses can go either direction, greater than or less than; it simply depends on how you state your alternative hypothesis, which is often a function of how much you know about the topic you are examining. When doing a two-tailed test, we are interested in statistics associated with extreme values at either end of the comparison distribution. Take .05 and split it; .025 at both ends = .05 as the critical level we generally use to determine statistical significance. One-tailed vs. Two-tailed Directional Alternative Hypotheses Using all lumped at one end of the distribution. Must be specified prior to collecting data. Non-directional Alternative Hypotheses Using split in half, with one half () at each end of the distribution. Directional Alternative Hypothesis Population 1 greater than population 2 Directional Alternative Hypothesis Population 1 less than population 2 Non-directional Alternative Hypothesis Population 1 different than population 2 5. Decision Errors Decision Errors When the right procedures and calculations lead to the wrong decisions. Type I error: Rejecting the null hypothesis when in fact it is true. Also called Alpha error Symbol: Type II error: Failing to reject the null when in fact it is false. Also called Beta error Symbol: Decision Possibilities More on Type I Error Social sciences tend to focus more emphasis on Type I error than on Type II error. When initially designing a study, the Alpha level should be set early on (prior to data collection). Alpha is the probability of committing a Type I error. Typically, social science uses the conventional .05. This means, we are willing to run the risk of having a sample which is extreme enough (5% of the total population) to reject the null when the null is in fact true. If Scooby's IQ was 160 but all the other cartoon dogs were especially dumb, we would reject the null when in fact there was no difference between the IQ of cartoon dogs (population 1) and the IQ dogs not in cartoons (population 2). Type I Error continued The problem with Alpha is that we never know if we have committed the error, unless we fail to reject the null - in which case, we know for certain that we have not committed a Type I error. But, we may have then committed a Type II error. If Scooby was especially dumb, but all the other cartoon dogs were especially bright, then we would have failed to reject the null, when in fact there was a significant difference in IQ between dogs in cartoons and dogs not in cartoons. Decision Probabilities 6. Concerns and Controversies NHST Concerns and Controversies First, much of this module will be revisited and reinforced in later modules. The null is never true, strictly speaking. There will always be some (numerical) difference between any two populations, so why do we use a `null' hypothesis? It provides a starting point; in later modules we will see that statistical significance is not ``everything'' and should not be the only thing we consider when making empirical decisions. Most controversies come from mis-interpretation of or over-reliance on the value. It does not represent the probability of the Null hypothesis or the Alternative hypothesis. Nor does it say anything about how likely your result is to replicate. One study alone should never be used to make serious decisions; replication of findings across multiple samples must be done in order to have strong evidence for (and confidence in) research findings. 7. Summary of Module 5 Summary of Module 5 Module 5 covered the following topics: Rules and Steps of Hypothesis Testing Individual Z-test. values. 1-tailed and 2-tailed tests. Decision Errors. Concerns and Controversies of NHST. All of these topics will be revisited consistently in future modules. This concludes Module 5 Next time Module 6. Next time we'll begin covering Hypothesis Testing with means of Samples. Until next time; have a nice day. These pages were last updated on: October 8, 2010 These pages were created using L^ATEX. This document was created in L^ATEX and converted to HTML using L^ATEX2HTML. Return to the Short Course page by clicking the link below. Up: RSS Introduction to Statistics Home jds0282 2010-10-08