Module 7: Additions to Significance Testing

Jon Starkweather, PhD

Jon Starkweather, PhD
jonathan.starkweather@unt.edu
Consultant
Research and Statistical Support

Image UNT_Brand

http://www.unt.edu

Image RSS_logo_001

A list of them is available at:

Effect Size From a score to a distribution of scores Effect Size Keep in mind, there are two types of effect sizes: Measures of Difference Allows comparison across samples and variables with differing variance. Equivalent to Z-scores Note sometimes there is no need to standardize (units of the scale have inherent meaning). Measures of Variance Accounted for. Amount of explained variance vs. total variance. Such as and For now, we will deal with Measures of Difference. Effect Size Effect size is a standardized measure of difference (lack of overlap) between populations. Effect size is the magnitude of experimental effect. Effect size: Increases with greater differences between means, Decreases with greater standard deviations in the population but, Is not affected by sample size. Calculating Effect Size There are many measures of effect size, for now we will be using Cohen's d. Notice within this formula, we are removing the influence of population standard deviation. This produces the standardized effect size. Raw score effect size (i.e., without dividing by ) is virtually useless. The standardization allows us to compare effect sizes obtained from different research studies. Remember Scooby...? Population 1: Dogs on cartoons. Sample: Scooby, Pluto, and Goofy ( ). Population 2: Dogs not on cartoons ( ) Please note the effect size is greater than 1. This may not always be the case, but the value of Cohen's d can be greater than 1. Remember Scooby...part 2? Population 1: Dogs on cartoons. Sample: Scooby, Underdog, and Scrappy ( )¹. Population 2: Dogs not on cartoons ( ) The effect size is not greater than 1, but this may still be considered a large effect size. Interpreting Cohen's d One way: Effect size conventions suggested from Cohen. Small = 0.20 Medium = 0.50 Large = 0.80 and greater A better way: Rational judgment based on a thorough understanding of the phenomena and the previous literature. It may be that an effect size of 0.90 is small based on previous findings where to 1.90. Statistical Power Statistical Power Definition: The probability that the study will produce a statistically significant result if the null hypothesis is false. The ability to detect a significant effect is one is present. Important to note: `if the null hypothesis is false' If you get a significant result when the null is true, then you have committed a Type I error. General equation for power: Power = 1 - beta Power = 1 - Two kinds of Power analysis A priori Power Used when planning a study Used to determine the sample size necessary to achieve a specified power level. Post hoc Power Used when evaluating a study. What chance did a study have of finding significant results? Not really useful. If you do the power analysis and conduct your study accordingly, then you did what you could. To say afterward: ``I would have found significance but did not have enough power or enough participants is not going to impress anyone''. A priori Power Can use all the following to calculate how many subjects / participants we need for our study. Decide an acceptable level of power. Set the significance level (usually .05). Figure out the desirable or expected effect size. Calculate n needed to achieve significance with those levels of power and effect size. A priori Effect Size? Figure out an effect size before I conduct my study? Several ways to do this: Base it on substantive knowledge. What you know about the situation and scale of measurement. Base it on previous literature / research. Use Cohen's conventions (not recommended). An acceptable level of power? Why not set power at .99? Practicalities. Cost of increasing power (usually done by increasing sample size) can be high. Increasing power decreases the Type II error rate (good), but also increases Type I error rate (bad). Power has a range of 0 to 1 (it is a probability); with a higher number indicating greater power. Influences on Power Table 1: Influences on Power Feature of Study High Power Low Power Effect Size larger smaller Sample Size larger smaller Sig. Level high (.10) low (.001) Tailed Test 1-tailed 2-tailed Type of analysis varies varies Carrying out the calculation of Power The easiest way. When you have to implement power calculations, you can use specialist programs. Many websites offer free applications to conduct power analysis. G-power: http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3 Calculating Power The more difficult way. First, convert your critical value ( ) into a raw score. This defines the point on your Null Distribution where the rejection region begins. Null Distribution Calculating Power continued Next, calculate the Z-score for a raw score of 114.22 on the Alternative Distribution. Finally, look in the Z-score table to identify beta and power. http://www.sjsu.edu/faculty/gerstman/EpiInfo/z-table.htm Alternative Distribution Practical Significance Statistical vs. Practical Significance Statistical significance is determined by a dichotomous decision based on the p value. If ; then reject the null hypothesis. If ; then fail to reject the null hypothesis. Practical significance has more to do with the effect size and meaningfulness of the results in practical terms. If , reject the null, but if ; then your results are not likely to be influential or useful. More on Practical Significance Keep in mind, anything will be significant with a large enough sample!!! However, the results may not be meaningful or useful. Remember Scooby and Friends... Example 1: ; reject the null because Example 2 (from Module 6 handout): ; fail to reject the null because Hypothetically, you could get a result like this: Concluding Thoughts Always report as much information as you can; meaning: The calculated sample statistic The sample size The critical level (.05) The obtained value ( The effect size () The power If it was used a-priori to calculate sample size and the appropriate sample size was obtained (G-power application). Remember, values are not related to effect sizes. Use a-priori power and effect size to determine the minimum sample size (and gather that amount of data) prior to collecting the data. Post hoc power is virtually meaningless. Summary of Module 7 Summary of Module 7 Module 7 covered the following topics: Cohen's d effect size. Statistical Power. Practical significance. Many of these topics will be revisited consistently in future modules. This concludes Module 7 Next time Module 8. Next time we'll begin covering Introduction to t tests. Until next time; have a nice day. This page was last updated on: October 12, 2010 This page was created using L^ATEX. This document was created in L^ATEX and converted to HTML using L^ATEX2HTML. Return to the Short Course page by clicking the link below. Up: RSS Introduction to Statistics Home jds0282 2010-10-12

Feature of Study	High Power	Low Power
Effect Size	larger	smaller
Sample Size	larger	smaller
Sig. Level	high (.10)	low (.001)
Tailed Test	1-tailed	2-tailed
Type of analysis	varies	varies