# Theory-Based Inference

Using Simulation/Randomization to Introduce p-value in Week 1 Soma Roy Department of Statistics, Cal Poly, San Luis Obispo ICOTS9, Flagstaff Arizona July 15, 2014 Overview Background and Motivation Philosophy and Approach Course Materials Sample Examples/Explorations Advantages of this Approach

ICOTS9: July 15, 2014 2 Background The Stat 101 course Algebra-based introductory statistics for nonmajors Recent changes in Stat 101 Content, pedagogy, and use of technology Implementation of GAISE (2002) guidelines A more modern course compared to twenty years ago Still, one traditional aspect Sequencing of topics ICOTS9: July 15, 2014 3

Background (contd.) Typical traditional sequence of topics Part I: Descriptive statistics (graphical and numerical) Part II: Data collection (types of studies) Part III: Probability (e.g. normal distribution, z-scores, looking up z-tables to calculate probabilities) Sampling distribution/CLT Part IV: Inference Tests of hypotheses, and Confidence intervals ICOTS9: July 15, 2014

4 Motivation Concerns with using the typical traditional sequence of topics Puts the inference at the very end of the quarter/semester Leaves very little time for students to Develop a strong conceptual understanding of the logic of inference, and the reasoning process behind: Statistical significance p-values (evaluation and interpretation) Confidence intervals Estimation of parameter of interest

ICOTS9: July 15, 2014 5 Motivation (contd.) Concerns with using the typical traditional sequence of topics (contd.) Content (parts I, II, III, and IV) appears disconnected and compartmentalized Not successful at presenting the big picture of the entire statistical investigation process ICOTS9: July 15, 2014 6 Recent attempts to change the sequence

of topics Chance and Rossman (ISCAM, 2005) introduce statistical inference in week 1 or 2 of a 10-week quarter in a calculus-based introductory statistics course. Malone et al. (2010) discuss reordering of topics such that inference methods for one categorical variable are introduced in week 3 of a 15-week semester, in Stat 101 type courses. ICOTS9: July 15, 2014 7 Our Philosophy

Expose students to the logic of statistical inference early Give them time to develop and strengthen their understanding of the core concepts of Statistical significance, and p-value (interpretation, and evaluation) Interval estimation Help students see that the core logic of inference stays the same regardless of data type and data structure Give students the opportunity to discover how the study ICOTS9: July 15, 2014design connects with the scope of inference 8

Our Approach: Key features Introduce the concept of p-value in week 1 Present the entire 6Step statistical investigation process Use a spiral approach to repeat the 6-Step statistical investigation process in different scenarios ICOTS9: July 15, 2014 9

Implementation of Our Approach Order of topics Inference (Tests of significance and Confidence Intervals) for One proportion One mean Two proportions Two means Paired data More than 2 means r x k tables Regression ICOTS9: July 15, 2014 10

Implementation (contd.) The key question is the same every time Is the observed result surprising (unlikely) to have happened by random chance alone? First through simulation/randomization, and then theory-based methods, every time Start with a tactile simulation/randomization using coins, dice, cards, etc. Follow up with technology purposefully-designed (free) web applets (instead of commercial software); self-explanatory; lots of visual explanation Wrap up with theory-based method, if available ICOTS9: July 15, 2014 11

Implementation (contd.) So, we start with Part IV. What about Part I and Part II? Descriptive statistics and data collection? Descriptive statistics are introduced as and when the need arises; step 3 of the 6-Step process. E.g. Segmented bar charts for comparing two proportions Random sampling is discussed early on, but the discussion on random assignment (and experiments vs. observational studies) is saved for comparison of two groups These concepts fall under the discussion of scope of inference (generalization and causation); step 5 of the 6-Step process.

ICOTS9: July 15, 2014 12 Implementation (contd.) What about Part III? Probability and sampling distribution? Theoretical distributions? Probability and sampling distribution are integrated into inference Students start working with the null distribution and p-value in week 1. Theoretical distributions are introduced as alternative paths to approximating a p-value in certain situations. Show that (under certain conditions) the theory can predict what the simulation will show

Show the limitations of the theory-based approach ICOTS9: July 15, 2014 13 Course materials Challenge: no existing textbook has these features Sequencing of topics; process of statistical investigations Just-in-time introduction of descriptive statistics and data collection concepts Alternating between simulation/randomizationbased and theory-based inference So, we developed our own materials Another key feature: an instructor can choose from Exposition-based example, or

Activity-based exploration ICOTS9: July 15, 2014 14 Example 1: Introduction to chance models Research question: Can chimpanzees solve problems? A trained adult chimpanzee named Sarah was shown videotapes of 8 different problems a human was having (Premack and Woodruff, 1978) After each problem, she was shown two photographs, one of which showed a potential solution to the problem. Sarah picked the correct photograph 7 out of 8

times. Question to students: What are two possible ICOTS9: July 15, 2014 explanations for why Sarah got 7 correct out of 8? 15 Example 1: Intro to chance models (contd.) Generally, students can come up with the two possible explanations 1. Sarah guesses in such situations, and got 7 correct just by chance

2. Sarah tends to do better than guess in such situations Question: Given her performance, which explanation do you find more plausible? Typically, students pick explanation #2 as the more plausible explanation for her performance. Question: How do you rule out explanation #1? ICOTS9: July 15, 2014 16 Example 1: Intro to chance models (contd.) Simulate what Sarahs results could-have-been had she been just guessing Coin tossing seems like a reasonable mechanism

to model just guessing each time How many tosses? How many repetitions? What to record after each repetition? Thus, we establish the need to mimic the actual study, but now assuming Sarah is just guessing, to generate the pattern of just guessing results ICOTS9: July 15, 2014 17 Example 1: Intro to chance models (contd.) Here are the results of 35 repetitions ( for a class size of 35)

Question: What next? How can we use the above dotplot to decide whether Sarahs performance is surprising (i.e. unlikely) to have happened by chance alone? Aspects of the distribution to discuss: center and variability; typical and atypical values ICOTS9: July 15, 2014 18 The One Proportion applet Move to the applet to

increase the number of repetitions Question: Does the long-run guessing pattern convince you that Sarah does better than guess in such situations? ICOTS9: July 15, 2014 Explain.

19 Example 1: Intro to chance models (contd.) For this first example/exploration, we are deliberate about Getting across the idea of is the observed result surprising to have happened by chance alone? Using a simple 50-50 model Having the observed result be quite clearly in the tail of the null distribution Avoid terminology such as parameter, hypotheses, null distribution, and p-value ICOTS9: July 15, 2014 20

Example 1: Intro to chance models (contd.) Follow-up or Think about it questions: What if Sarah had got 5 correct out of 8? Would her performance be more convincing, less convincing, or similarly convincing that she tends to do better than guess? What if Sarah had got 14 correct out of 16 questions? Based on Sarahs results, can we conclude that all chimpanzees tend to do better than guess? Step 6 of the 6-Step Statistical Investigation Process

ICOTS9: July 15, 2014 21 Example 2: Measuring the strength of evidence Research question: Does psychic functioning exist? Utts (1995) cites research from various studies involving the Ganzfeld technique Receiver sitting in a different room has to choose the picture (from 4 choices) being sent by the sender Out of 329 sessions, 106 produced a hit (Bem and Honorton, 1994) Key question: Is the observed number of hits

surprising (i.e. unlikely) to have happened by ICOTS9: July 15, 2014 chance alone? 22 Example 2: Measuring the strength of evidence (contd.) Question: What is the probability of getting a hit by chance? 0.25 (because 1 out of 4) Cant use a coin. How about a spinner? Same logic as before: Use simulation to generate what the

pattern/distribution for number of hits couldhave-been if receivers are randomly choosing an image from 4 choices. Compare the observed number of hits (106) to this pattern ICOTS9: July 15, 2014 23 The One Proportion applet Question: Is the observed number of hits surprising (i.e. unlikely) to have happened by chance alone? Whats a measure of how unlikely?

Tail proportion The p-value! ICOTS9: July 15, 2014 24 The One Proportion applet Approx. p-value = 0.002 Note that the statistic can either be the number of or the proportion of hits ICOTS9: July 15, 2014 25 Example 2: Measuring the strength of

evidence (contd.) Natural follow-ups The standardized statistic (or z-score) as a measure of how far the observed result is in the tail of the null distribution Theoretical distribution: the normal model, and normal approximation-based p-value Examples of studies where the normal approximation is not a valid approach ICOTS9: July 15, 2014 26 Example 2: Measuring the strength of evidence (contd.)

For this example we are deliberate about Formalizing terminology such as hypotheses, parameter vs. statistic (with symbols), null distribution, and p-value Moving away from 50-50 model Still staying with a one-sided alternative to facilitate the understanding of what the p-value measures, but in a simpler scenario ICOTS9: July 15, 2014 27 What comes next Two-sided tests for one proportion Sampling from a finite population

Tests of significance for one mean Confidence intervals: for one proportion, and for one mean Observational studies vs. experiments Comparing two groups simulating randomization tests ICOTS9: July 15, 2014 28 Advantages of this approach Does not rely on a formal discussion of probability, and hence can be used to introduce statistical inference as early as week 1 Provides a lot of opportunity for activity/explorationbased learning Students seem to find it easier to interpret the p-value

Students seem to find it easier to remember that smaller p-values provide stronger evidence against the null ICOTS9: July 15, 2014 29 Advantages of this approach (contd.) Allows one to use the spiral approach To deepen student understanding throughout the course Allows one to use other statistics that dont have theoretical distributions; for example, difference in medians, or relative risk (without getting into logs) Most importantly, this approach is more fun for instructors (not that I am biased )

ICOTS9: July 15, 2014 30 Assessment results Beth Chance and Karen McGaughey, Impact of simulation/randomization-based curriculum on student understanding of p-values and confidence intervals Session 6B, Thursday, 10:55 am Nathan Tintle, Quantitative evidence for the use of simulation and randomization in the introductory statistics course Session 8A; see Proceedings Todd Swanson and Jill VanderStoep, Student attitudes towards statistics from a randomizationbased curriculum Session 1F; see Proceedings

ICOTS9: July 15, 2014 31 Acknowledgements Thank you for listening! National Science Foundation DUE/TUES-114069, 1323210 If youd like to know more: Workshop on Saturday, July 19, 8:00 am to 5:00 pm Modifying introductory courses to use simulation methods as the primary introduction to statistical inference Presenters: Beth Chance, Kari Lock Morgan, Patti Lock, Robin Lock, Allan Rossman, Todd Swanson, Jill

VanderStoep ICOTS9: July 15, 2014 32 Resources Course materials: Introduction to Statistical Investigations (Fall 2014, John Wiley and Sons) by Nathan Tintle, Beth Chance, George Cobb, Allan Rossman, Soma Roy, Todd Swanson, Jill VanderStoep http://www.math.hope.edu/isi/ Applets: http://www.rossmanchance.com/ ISIapplets.html [email protected] ICOTS9: July 15, 2014

33

## Recently Viewed Presentations

• BioCore. Guide to gather data from faculty on the depth and context of Core Content, Core Competencies and Core Skills in each Biology/Zoology majors course. We drew heavily on the BioCore Guide to format and create our survey. We added...
• Just to reviewâ€¦if you have a regular uniformly shaped object that has a measureable length, height and width, we use which method of calculating volume? If the object is irregularly shaped, then we use what method to calculate volume? Density.
• Interactive Keys for Plant Identification Conventional Identification keys vs Interactive keys Plant Identification Tools The methods of identification include (1) expert determination, (2) recognition, (3) comparison, and (4) the use of keys and similar devices.
• THE INTERPRETATION OF VAGUE PREDICATES - EXPERIMENTAL INSIGHTS Nicole Gotzner, Marijan Palmovic & Stephanie Solt LOGICCC Final Conference September 15-18 2011
• Global Read Aloud. 2013. Geoff Herbach. Sharon M. Draper. Kate Messner. Eric Carle
• .modal auxiliary verbscan be used to indicate that you don't want to sound completely certain about something. E.g. Should, would, could, may and might. Studies suggest that women exploit hedges and epistemic modal forms more than men. 5.3.4 Hedges and...
• Arial Gill Sans MT Default Design Chapter 17 Adverbial Clauses PowerPoint Presentation Use an adverb clause to show time relationships. Use an adverb clause to show time relationships. Use an adverb clause to show time relationships. PowerPoint Presentation Use an...
• Diagram that shows relative amounts of energy or matter contained within each trophic level in a food chain or food web for an ecosystem. Energy Pyramid - energy available at each trophic level - 10% rule. 2. Biomass Pyramid -...