Practice your way to a higher statistics score The adage that "practice makes perfect" is never truer than with math problems. Statistics Workbook For Dummies with Online Practice provides succinct content reviews for every topic, with plenty of examples and practice problems for each concept, in the book and online. Every lesson begins with a concept review, followed by a few example problems and plenty of practice problems. There's a step-by-step solution for every problem, with tips and tricks to help with comprehension and retention. New for this edition, free online practice quizzes for each chapter provide extra opportunities to test your knowledge and understanding. * Get FREE access to chapter quizzes in an online test bank * Work along with each chapter or use the test bank for final exam review * Discover which statistical measures are most meaningful Scoring high in your Statistics class has never been easier!
Ebooka przeczytasz w aplikacjach Legimi na:
Liczba stron: 503
Statistics Workbook For Dummies®, 2nd Edition with Online Practice
Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com
Copyright © 2019 by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc., and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc., is not associated with any product or vendor mentioned in this book.
LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: THE PUBLISHER AND THE AUTHOR MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES OR PROMOTIONAL MATERIALS. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR EVERY SITUATION. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING LEGAL, ACCOUNTING, OR OTHER PROFESSIONAL SERVICES. IF PROFESSIONAL ASSISTANCE IS REQUIRED, THE SERVICES OF A COMPETENT PROFESSIONAL PERSON SHOULD BE SOUGHT. NEITHER THE PUBLISHER NOR THE AUTHOR SHALL BE LIABLE FOR DAMAGES ARISING HEREFROM. THE FACT THAT AN ORGANIZATION OR WEBSITE IS REFERRED TO IN THIS WORK AS A CITATION AND/OR A POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE AUTHOR OR THE PUBLISHER ENDORSES THE INFORMATION THE ORGANIZATION OR WEBSITE MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. FURTHER, READERS SHOULD BE AWARE THAT INTERNET WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ.
For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Control Number: 2019931550
ISBN 978-1-119-54751-8 (pbk); ISBN 978-1-119-54767-9 (ebk); ISBN 978-1-119-54768-6
About This Book
Icons Used in This Book
Beyond the Book
Where to Go from Here
Part 1: Getting Off to a Statistically Significant Start
Chapter 1: Summarizing Categorical Data: Counts and Percents
Counting On the Frequency
Relating with Percentages
Interpreting Counts and Percents with Caution
Answers to Problems in Summarizing Categorical Data
Chapter 2: Summarizing Quantitative Data: Means, Medians, and More
Finding and Interpreting Measures of Center
Finding and Interpreting Measures of Spread
Using Percentiles and the Interquartile Range
Answers to Problems in Summarizing Quantitative Data
Chapter 3: Organizing Categorical Data: Charts and Graphs
Making, Interpreting, and Evaluating Pie Charts
Making, Interpreting, and Evaluating Bar Graphs
Answers to Problems in Organizing Categorical Data
Chapter 4: Organizing Quantitative Data: Charts and Graphs
Creating a Histogram
Making Sense of Histograms
Straightening Out Skewed Data
Spotting a Misleading Histogram
Making Box Plots
Interpreting Box Plots
Looking at Line Graphs
Understanding the Empirical Rule
Answers to Problems in Organizing Quantitative Data
Part 2: Probability, Distributions, and the Central Limit Theorem (Are You Having Fun Yet?)
Chapter 5: Understanding Probability Basics
Grasping the Rules of Probability
Avoiding Probability Misconceptions
Making Predictions Using Probability
Answers to Problems in Probability
Chapter 6: Measures of Relative Standing and the Normal Distribution
Mastering the Normal Distribution
Finding and Interpreting Standard (Z) Scores
Knowing Where You Stand with Percentiles
Finding Probabilities for a Normal Distribution
Finding the Percentile (Backwards Normal)
Answers to Problems in Normal Distribution
Chapter 7: The Binomial Distribution
Characterizing the Binomial Distribution
Finding Probabilities Using the Binomial Formula for small n
Finding Probabilities Using the Binomial Table for Medium-Sized n
Calculating the Mean and Variance of the Binomial
Estimating Probabilities in Large Cases — the Normal Approximation
Answers to Problems in the Binomial Distribution
Chapter 8: The t-Distribution
Getting to Know the
Working with the
-Table and Degrees of Freedom
Answers to Problems in the
Chapter 9: Demystifying Sampling Distributions and the Central Limit Theorem
Exactly What Is a Sampling Distribution?
Clearing Up the Central Limit Theorem (Once and for All)
Finding Probabilities with the Central Limit Theorem
When Your Sample’s Too Small: Employing the t-Distribution
Answers to Problems in Sampling Distributions and the Central Limit Theorem
Part 3: Guesstimating and Hypothesizing with Confidence
Chapter 10: Making Sense of Margin of Error
Reviewing Margin of Error
Calculating the Margin of Error for Means and Proportions
Increasing and Decreasing Margin of Error
Interpreting Margin of Error Correctly
Answers to Problems in Making Sense of Margin of Error
Chapter 11: Calculating Confidence Intervals
Walking through a Confidence Interval
Deriving a Confidence Interval for a Population Mean
Figuring a Confidence Interval for a Population Proportion
Calculating a Confidence Interval for the Difference of Two Means
Computing a Confidence Interval for the Difference of Two Proportions
Answers to Problems in Calculating Confidence Intervals
Chapter 12: Deciphering Your Confidence Interval
Interpreting Confidence Intervals the Right Way
Evaluating Confidence Interval Results: What the Formulas Don’t Tell You
Answers to Problems in Confidence Intervals
Chapter 13: Testing Hypotheses
Walking Through a Hypothesis Test
Testing a Hypothesis about a Population Mean
Testing a Hypothesis about a Population Proportion
Testing for a Difference between Two Population Means
Testing for a Mean Difference (Paired
Testing a Hypothesis about Two Population Proportions
Answers to Problems in Testing Hypotheses
Chapter 14: Taking the Guesswork Out of p-Values and Type I and II Errors
Understanding What p-Values Measure
Test (Statistic) Time: Figuring Out p-Values
The Value Breakdown: Interpreting p-Values Properly
Deciphering Type I Errors
Deciphering and Distinguishing Type II Errors
Answers to Problems in p-Values and Type I and II Errors
Part 4: Statistical Studies and the Hunt for a Meaningful Relationship
Chapter 15: Examining Polls and Surveys
Planning and Designing a Survey
Selecting a Random Sample
Carrying Out a Survey Properly
Interpreting and Evaluating Survey Results
Answers to Problems in Polls and Surveys
Chapter 16: Evaluating Experiments
Distinguishing Experiments from Observational Studies
Designing a Good Experiment
Looking for Cause and Effect: Interpreting Experiment Results
Answers to Problems in Evaluating Experiments
Chapter 17: Looking for Links in Categorical Data: Two-Way Tables
Understanding Two-Way Tables Inside and Out
Working with Intersection, Unions, and the Addition Rule
Figuring Marginal Probabilities
Nailing Down Conditional Probabilities and the Multiplication Rule
Inspecting the Independence of Categorical Variables
Answers to Problems in Two-Way Tables
Chapter 18: Searching for Links in Quantitative Data: Correlation and Regression
Relating X and Y with a Scatterplot
Toeing the Line of Correlation
Picking Out the Best Fitting Regression Line
Interpreting the Regression Line and Making Predictions
Checking the Fit of the Regression Line
Answers to Problems in Correlation and Regression
Part 5: The Part of Tens
Chapter 19: Math Review: Ten Steps to a Better Grade
Know Your Math Symbols
Uproot Roots and Powers
Treat Fractions with Extra Care
Obey the Order of Operations
Avoid Rounding Errors
Get Comfortable with Formulas
Stay Calm When Formulas Get Tough
Feel Fine about Functions
Know When Your Answer Is Wrong
Show Your Work
Chapter 20: Top Ten Statistical Formulas
Mean (or Average)
Sample Standard Deviation
Margin of Error for the Sample Mean
Sample Size Needed for Estimating
Test Statistic for the Mean
Margin of Error for the Sample Proportion
Sample Size Needed for Estimating
Test Statistic for the Proportion
Chapter 21: Ten Ways to Spot Common Statistical Mistakes
Searching for and Specifying Bias
Marking the Margin of Error
Scanning for Sample Size
Studying Sample Selection (Gotta Be Random)
Checking for Confounding Variables
Doing the Math
Detecting Selective Reporting
Avoiding the Anecdote
Appendix: Tables for Reference
About the Author
Connect with Dummies
End User License Agreement
Table 11-1 Common Confidence Levels and the Matching
Table 13-1 Critical Values for Hypothesis Tests Using the
Table 17-1 The Gender and Political Affiliation of 200 Survey Participants
FIGURE 4-1: Histograms have several common patterns.
FIGURE 6-1: Laypeople call it a bell curve; you call it a normal distribution.
FIGURE 6-2: The standard normal distribution has mean 0 and standard deviation 1...
distributions for different sample sizes compared to the
Table of Contents
Perhaps you’re taking a statistics class, or you’re about to take one. You may understand some of the basic ideas, but you have questions and want a place to go for a little extra help to give you an edge. And you also want a heads-up as to what instructors really think about when they write their exams. Well, look no further; help has arrived in the form of Statistics Workbook For Dummies, 2nd Edition.
This workbook helps you become more comfortable with and confident about statistics. Through plenty of practical problems that take you from step one all the way to your final exam, you review the concepts you know, identify areas where you need to focus more work, and address the little things that can make the difference between a B and an A.
As a statistics professor who has taught tens of thousands of students over the years, I have noticed that certain problems keep cropping up and causing my pen to take points off exams over and over again. And believe me: I want nothing more than to put my red pen away. So I give you all my secrets about what professors really want you to know, the kinds of questions they ask, and the types of answers they love and hate to see (so you can avoid the latter). And I focus only on the topics that you absolutely need to know, with minimal background information.
The major objectives of this workbook are for you to understand, calculate, and interpret the most common statistical formulas and techniques; get a handle on basic probability; gain confidence with difficult statistical topics such as the central limit theorem and p-values; know which statistical technique to use in different situations (for example, when to employ what kind of confidence interval); and evaluate and pinpoint problems with studies, polls, and experiments.
Although I wrote this workbook to serve as a companion to Statistics for Dummies, 2nd Edition (also published by Wiley and written by yours truly), this workbook works quite well with any introductory statistics textbook.
You may be asking how this workbook is different from other workbooks on the shelf. Well, here are a few ways, listed in order of importance:
Plenty of excellent practice problems
to lead you down the path of examination success, chosen by me, a card-carrying member of the “million statistics exam question writing and grading” club. I provide all answers at the end of each chapter.
for you to work through the problems directly in the section you’re working on, so you can easily refer to your notes later when you need them.
Not only answers, but also clear, complete explanations to go with them.
Explanations help you know exactly how to approach a problem, what information you need to solve it, and common problems you need to avoid.
A view inside a professor’s mind
to help you determine the most popular questions, the answers we look for, and the answers that make us pull our hair out.
Tips, strategies, and warnings
based on my vast experience with students of all backgrounds and learning styles (and my grading experience).
An example accompanying each section,
directly followed by the solution. Use the example as a reference when you work the other problems.
A focus on problem-solving skills
to help you develop a problem-solving strategy when you take exams. I don’t show you how I would do the problems; I help you see how you can do the problems. And believe me, there’s a big difference!
The nonlinear approach
allows you to skip around in the workbook and still have easy access to and understanding of any given topic.
to help you process, remember, and put into practice statistical definitions, techniques, and processes.
Clear and concise step-by-step procedures
that intuitively explain how to work through statistics problems and remember the process.
I also used a few conventions while writing this book that you should be aware of:
The most important convention that you need to be aware of deals with my dual use of the word “statistics.” In some situations, I refer to statistics as a subject of study or as a field of research. For example, “Statistics is really quite an interesting subject!” (Note I said statistics “is” in this case.) In other situations, I refer to statistics as the plural of statistic, in a numerical sense. For example, “The most common statistics are the mean and the standard deviation.” (Notice my use of the word “are” in this case.)
I also use data in a plural form (“the data are” rather than “the data is”). The battle rages on between statisticians over which way is right, but I go with the plural form.
I use Ho to represent the null hypothesis in a hypothesis test. Although this is a commonly used notation, others might use the notation H
to mean the same thing.
I use * to indicate a multiplication sign.
This book is for you if you have some exposure to statistics already and want more opportunities to enjoy success through additional practice of the skills and techniques. Or perhaps you’re taking a statistics class and could use some extra support (and insider information). Or maybe you just really want to understand p-values because they keep you awake at night (been there, done that).
Note: If you’re totally new to the subject of statistics, I suggest that you first read Statistics for Dummies, 2nd Edition, (Wiley), because I cover the various concepts of statistics in much more detail in that book (but any introductory text will suffice). After you feel comfortable and confident with the material, you can try the problems in this workbook. Or, as an alternative, you can use this workbook to practice along with what you read in Statistics For Dummies, 2nd Edition.
Icons in this workbook draw your attention to certain features that occur on a regular basis. Think of them as road signs that you encounter on a trip. Here are the road signs you encounter on your journey through this workbook.
Each section of this workbook begins with a brief overview of the topic. After the intro, you see an example problem with a fully worked solution for use as a reference as you work the practice problems. You can quickly locate the example problems by looking for this icon.
I use this icon for particular ideas that I hope you’ll remember long after you read this workbook.
This icon points out helpful hints, ideas, or shortcuts that save you time or give you alternative ways to think about a particular concept. I also use this icon to “get down to the nitty-gritty” discussing the types of questions your instructor may ask you and why, revealing what instructors really look for in your answers, and giving you a heads-up on the types of errors that really make them nuts (so you can avoid them at all costs).
This icon refers to specific ways that you may get tripped up while working a certain kind of problem and how to avoid those problems. Commit these items to memory while it still doesn’t cost you any points (in other words, before the exam takes place).
Be sure to check out the free Cheat Sheet for a handy guide that covers tips and tricks for answering statistics questions. To get this Cheat Sheet, simply go to www.dummies.com and enter “Statistics Workbook For Dummies” in the Search box.
You also have the opportunity to complete online quizzes for Chapters 1 through 18 that test your knowledge of the concepts in each chapter. To gain access to the online practice, all you have to do is register by following these simple steps:
Find your PIN access code located on the inside front cover of this book.
Find your product (
Statistics Workbook For Dummies,
2nd Edition) and then follow the on-screen prompts to activate your PIN.
Now you’re ready to go! You can go back to the program at testbanks.wiley.com as often as you want — simply log on with the username and password you created during your initial login. No need to enter the access code a second time.
Tip: If you have trouble with your PIN or can’t find it, contact Wiley Product Technical Support at 877-762-2974 or go to support.wiley.com.
I wrote this workbook in a nonlinear way, so you can start anywhere and still understand what’s happening. However, I can make some recommendations to readers who are interested in knowing where to start:
If you want to get right into the number-crunching aspects of statistics (finding the mean, median, standard deviation, and so on), I suggest starting with
If you want to break down the normal distribution or the central limit theorem, go to
If you’re most worried about confidence intervals and hypothesis tests, jump to
If you want to develop your skills evaluating and making sense of the results of medical studies, polls, surveys, and experiments, start with
If you want to nail down data collected on two variables (correlation and the like), head directly to
If you want tips on math, common statistical formulas, or ways to spot statistical mistakes, head to
IN THIS PART …
Get down to the basics of number crunching.
Make and interpret charts and graphs.
Crank out and understand descriptive statistics.
Develop important skills for critiquing others’ statistics.
IN THIS CHAPTER
Making tables to summarize categorical data
Highlighting the difference between frequencies and relative frequencies
Interpreting and evaluating tables
Categorical data is data in which individuals are placed into groups or categories — for example gender, region, or type of movie. Summarizing categorical data involves boiling down all the information into just a few numbers that tell its basic story. Because categorical data involves pieces of data that belong in categories, you have to look at how many individuals fall into each group and summarize the numbers appropriately. In this chapter, you practice making, interpreting, and evaluating frequency and relative frequency tables for categorical data.
One way to summarize categorical data is to simply count, or tally up, the number of individuals that fall into each category. The number of individuals in any given category is called the frequency (or count) for that category. If you list all the possible categories along with the frequency for each, you create a frequency table. The total of all the frequencies should equal the size of the sample (because you place each individual in one category).
See the following for an example of summarizing data by using a frequency table.
Q. Suppose that you take a sample of 10 people and ask them all whether they own a cellphone. Each person falls into one of two categories: yes or no. The data are shown in the following table.
Summarize this data in a frequency table.
What’s an advantage of summarizing categorical data?
A. Data summaries boil down the data quickly and clearly.
The frequency table for this data is shown in the following table.
A data summary allows you to see patterns in the data, which aren’t clear if you look only at the original data.
Own a Cellphone?
1 You survey 20 shoppers to see what type of soft drink they like best, Brand A or Brand B. The results are: A, A, B, B, B, B, B, B, A, A, A, B, A, A, A, A, B, B, A, A. Which brand do the shoppers prefer? Make a frequency table and explain your answer.
2 A local city government asks voters to vote on a tax levy for the local school district. A total of 18,726 citizens vote on the issue. The yes count comes in at 10,479, and the rest of the voters said no.
Show the results in a frequency table.
Why is it important to include the total number at the bottom of a frequency table?
3 A zoo asks 1,000 people whether they’ve been to the zoo in the last year. The surveyors count that 592 say yes, 198 say no, and 210 don’t respond.
Show the results in a frequency table.
Explain why you need to include the people who don’t respond.
4 Suppose that instead of showing the number in each group, you show just the percentage (called a relative frequency). What’s one advantage a relative frequency table has over a frequency table?
Another way to summarize categorical data is to show the percentage of individuals who fall into each category, thereby creating a relative frequency. The relative frequency of a given category is the frequency (number of individuals in that category) divided by the total sample size, multiplied by 100 to get the percentage. For example, if you survey 50 people and 10 are in favor of a certain issue, the relative frequency of the “in-favor” category is times 100, which gives you 20 percent. If you list all the possible categories along with their relative frequencies, you create a relative frequency table. The total of all the relative frequencies should equal 100 percent (subject to possible round-off error).
See the following for an example of summarizing data by using a relative frequency table.
Q. Using the cellphone data from the following table, make a relative frequency table and interpret the results.
A. The following table shows a relative frequency table for the cellphone data. Seventy percent of the people sampled reported owning cellphones, and 30 percent admitted to being technologically behind the times.
Own a Cellphone?
You get the 70 percent by taking , and you calculate the 30 percent by taking .
5 You survey 20 shoppers to see what type of soft drink they like best, Brand A or Brand B. The results are: A, A, B, B, B, B, B, B, A, A, A, B, A, A, A, A, B, B, A, A. Which brand do the shoppers prefer?
Use a relative frequency table to determine the preferred brand.
In general, if you had to choose, which is easier to interpret: frequencies or relative frequencies? Explain.
6 A local city government asked voters in the last election to vote on a tax levy for the local school district. A record 18,726 voted on the issue. The yes count came in at 10,479, and the rest of the voters checked the no box. Show the results in a relative frequency table.
7 A zoo surveys 1,000 people to find out whether they’ve been to the zoo in the last year. The surveyors count that 592 say yes, 198 say no, and 210 don’t respond. Make a relative frequency table and use it to find the response rate (percentage of people who respond to the survey).
8 Name one disadvantage that comes with creating a relative frequency table compared to using a frequency table.
Not all summaries of categorical data are fair and accurate. Knowing what to look for can help you keep your eyes open for misleading and incomplete information.
Instructors often ask you to “interpret the results.” In this case, your instructor wants you to use the statistics available to talk about how they relate to the given situation. In other words, what do the results mean to the person who collects the data?
With relative frequency tables, don’t forget to check whether all categories sum to 1 or 100 percent (subject to round-off error), and remember to look for some indicator as to total sample size.
See the following for an example of critiquing a data summary.
Q. You watch a commercial where the manufacturer of a new cold medicine (“Nocold”) compares it to the leading brand. The results are shown in the following table.
How Nocold Compares
At least as good
What kind of table is this?
Interpret the results. (Did the new cold medicine beat out the leading brand?)
What important details are missing from this table?
A. Much like the cold medicines I always take, the table about “Nocold” does “Nogood.”
This table is an incomplete relative frequency table. The remaining category is “not as good” for the Nocold brand, and the advertiser doesn’t show it. But you can do the math and see that
of the people say that the leading brand is better.
If you put the two groups together, 65% of the patients say that Nocold is at least as good as the leading brand, and almost half of the patients say Nocold is much better.
What’s missing? The remaining percentage (to keep all possible results in perspective). But more importantly, the total sample size is missing. You don’t know whether the surveyors sampled 10 people, 100 people, or 1,000 people. This means that the precision of the results is unknown. (Precision means how consistent the results will be from sample to sample; it’s related to sample size, as you see in
9 Suppose that you ask 1,000 people to identify from a list of five vacation spots which ones they’ve already visited. The frequencies you receive are Disney World: 216; New Orleans: 312; Las Vegas: 418; New York City: 359; and Washington, D.C.: 188.
Explain why creating a traditional relative frequency table doesn’t make sense here.
How can you summarize this data with percents in a way that makes sense?
10 If you have only a frequency table, can you find the corresponding relative frequency table? Conversely, if you have only a relative frequency table, can you find the corresponding frequency table? Explain.
1 Eleven shoppers prefer Brand A, and nine shoppers prefer Brand B. The frequency table is shown in the following table. Brand A got more votes, but the results are pretty close.
2 Frequencies are fine for summarizing data as long as you keep the total number in perspective.
The results are shown in the following table. Because the total is 18,726, and the yes count is 10,479, the no count is the difference between the two, which is
The total is important because it helps keep the frequencies in perspective when you compare them to each other.
3 This problem shows the importance of reporting not only the results of participants who respond but also what percentage of the total actually respond.
The results are shown in the following table.
If you don’t show the nonrespondents, the total doesn’t add up to 1,000 (the number surveyed). An alternative way to show the data is to base it on only the respondents, but the results would be biased. You can’t definitively say that the nonrespondents would respond the same way as the respondents.
Gone to the Zoo in the Last Year?
4Showing the percents rather than counts means making a relative frequency table rather than a frequency table. One advantage of a relative frequency table is that everything sums to 100 percent, making it easier to interpret the results, especially if you have a large number of categories.
5 Relative frequencies do just what they say: They help you relate the results to each other (by finding percentages).
Eleven shoppers out of the 20 prefer Brand A, and nine shoppers out of the 20 prefer Brand B. The relative frequency table is shown in the following table. Brand A got more votes, but the results are pretty close, with 55 percent of the shoppers preferring Brand A, and 45 percent preferring Brand B.
You often have an easier time interpreting percents, because when you need to interpret counts, you have to put them in perspective in terms of “out of how many?”
6 The results are shown in the following table. The yes percentage is . Because the total is 100%, the no percentage is .
7 You can see the relative frequency table that follows this answer. Knowing the response rate is critical for interpreting the results of a survey. The higher the response rate, the better. The response rate is – the total percentage of people who responded in any way (yes or no) to the survey. (Note that 21% is the nonresponse rate.)
Gone to the Zoo in the Last Year?
8 One disadvantage of a relative frequency table is that if you see only the percents, you don’t know how many people participated in the study; therefore, you don’t know how precise the results are. You can get around this problem by putting the total sample size somewhere at the top or bottom of your relative frequency table.
When making a relative frequency table, include the total sample size somewhere on the table.
9 Be careful about how you interpret tables where an individual can be in more than one category at the same time.
The frequencies don’t sum to 1,000, because people have the option to choose multiple locations or none at all, so each person doesn’t end up in exactly one group. If you take the grand total of all the frequencies (1,493) and divide each frequency by 1,493 to get a relative frequency, the relative frequencies sum to 1 (or 100 percent). But what does that mean? It makes it hard to interpret these percents because they don’t account for the total number of people.
One way you can summarize this data is by showing the percentage of people who have been at each location separately (compared to the percentage who haven’t been there before). These percents add up to 1 for each location. The following table shows the results summarized with this method.
The table isn’t a relative frequency table; however, it uses relative frequencies.
% Who Have Been There
% Who Haven’t Been There
New York City
Not all tables involving percents should sum to 1. Don’t force tables to sum to 1 when they shouldn’t; do make sure you understand whether each individual can fall under more than one category. In those cases, a typical relative frequency table isn’t appropriate.
10 You can always sum all the frequencies to get a total and then find each relative frequency by taking the frequency divided by the total. However, if you have only the percents, you can’t go back and find the original counts unless you know the total number of individuals. Suppose that you know that 80 percent of the people in a survey like ice cream. How many people in the survey like ice cream? If the total number of respondents is 100, people like ice cream. If the total is 50, you’re looking at positive answers. If the total is 5, you deal only with . This illustrates why relative frequency tables need to have the total sample size somewhere.
Watch for total sample sizes when given a relative frequency table. Don’t be misled by percentages alone, thinking they’re always based on large sample sizes, because many are not.
IN THIS CHAPTER
Calculating measures for center and spread
Interpreting your results properly
Compiling the results for a data discussion
Before data are organized in a chart or graph, the first step is to summarize them — that is, find a few numbers and/or words that can tell the story of the data in a nutshell. For quantitative data, the most important characteristics are the shape of the data (which you see in Chapter 4), where the center is located, and how much variability or spread is in the data. You may also want to point out any outliers in the data (numbers that appear far from the rest). And like everything else in statistics, there’s room for people to stretch the truth in how they choose to summarize their data (or in what they choose not to tell you). So it’s good to know the big ideas of how data are summarized and what to look for in terms of interpreting and evaluating data summaries. That’s what you practice in this chapter.
The most common way to summarize quantitative data is to describe where the center is. One way of thinking about what center means is to ask, “What’s a typical value in this data set?” You can measure the center of a data set in different ways, and the method you choose can greatly influence the conclusions people make about the data.
The mean of a data set is also known as the average. To find the mean, add all the numbers in the data set and divide by the number of numbers. The notation for the sample mean is , and the formula for the sample mean is . In this case the capital sigma stands for sum, and the subscript “i” starts at 1 and ends with n, so you are summing each value in the data set from to . Then divide by n.
The median of a data set is the true middle value when the data are ordered from smallest to largest. To find the median, order the data and pick the middle number(s). If you have an odd number of numbers, only one value is in the middle. If you have an even number of numbers, you pinpoint two values in the middle and determine the average of the two to get the median.
See the following for an example of calculating the mean and median.
Q. Find the mean and the median of the following data set: 1, 6, 5, 7, 3, 2.5, 2, , 1, 0.
A. The mean is , divided by 10 (because you have 10 numbers), which equals 2.55. To find the median, order the numbers: , 0, 1, 1, 2, 2.5, 3, 5, 6, 7. Now find the middle number. In this, case there are two middle values: 2 and 2.5. Take the average: .
1 Does the mean have to be one of the numbers in the data set? Explain.
2 Does the median have to be one of the numbers in the data set? Explain.
3 Why do you have to order the data to calculate the median but not for the mean?
4 Suppose that you have an outlier in a data set (a number that stands out away from the rest). How does an outlier affect the mean and the median of that data set?
5 Suppose that you find the mean for a certain data set.
Depending on what the data actually are, the mean should always lie between the largest and smallest values of the data set. Explain why.
When can the mean be the largest value in the data set?
6 Give an example of two different data sets containing three numbers each that both have the same median and mean. Explain why the median isn’t enough to tell the whole story about a data set.
7 Suppose that the mean and median salary at a company is $50,000, and all employees get a $1,000 raise.
What happens to the mean?
What happens to the median?
8 Suppose that the mean and median salary at a company is $50,000, and all employees get a 10% raise.
What happens to the mean?
What happens to the median?
Variation is one of the most important concepts in statistics. It measures how much the values in a sample or a population fluctuate. Values that appear close together indicate a small amount of variation. Values that are spread out indicate a large amount of variation.
A very crude measure of spread is the range. The statistical definition of range is the biggest number in the data set minus the smallest number. The range is a single value, not a pair of values, and is entirely based on only two numbers. Both numbers can be outliers, which is why range can be a crude measure of spread.
Because range is such a crude measure, by far the most common measure of variation is the standard deviation. The standard deviation represents the “typical” distance from any point in the data set to the mean. Roughly speaking, standard deviation gives you the average distance from the mean.
To find the standard deviation of data from a sample, you first find the mean (refer to the previous section). After that, follow these steps:
Take each number in the data set, subtract the mean, and square the result.
Add up all these so-called “squared deviations” and divide by
is the size of the data set).
Take the square root to undo the squaring you did earlier.
The notation for sample standard deviation is s, and the formula is .
See the following for an example of calculating and interpreting standard deviation.
Q. Find and interpret the standard deviation of the following data set: 1, 2, 3, 4, 5.
A. First, the mean of this data set is 3 (see the previous section in this chapter for mean info). After you calculate the mean, find the deviations from the mean and square them: , and squared equals 4; , and squared equals 1; , and 0 squared equals 0; , and 1 squared equals 1; and finally, , and 2 squared equals 4. Sum these values up to get . Divide 10 by (because ) to get . The final step is to take the square root of 2.5, which gives you . This answer means the data are, on average, about 1.58 steps from the mean (3).
9 What’s the smallest standard deviation you can figure, and when would that happen?
10 Choose four numbers from 1 to 5, with repetitions allowed, to create the largest standard deviation possible.
11 Suppose that the mean salary at a company is $50,000, and all employees get a $1,000 raise. What happens to the standard deviation?
12 Suppose that the mean salary at a company is $50,000, and all employees get a 10% raise. What happens to the standard deviation?
When dealing with skewed data (data that aren’t symmetric but rather lopsided off to one side), it’s often better to work with the median as the measure of center, because it’s not affected by the skewness as much as the mean is. And along this line, you can measure the spread of skewed data by focusing mainly on the range of the middle 50 percent of the data — called the interquartile range, or IQR. To understand IQR, you need to review percentiles — the kth percentile is a point in the data set where k% of the data lies below it. So if your height is at the 70th percentile, for example, that means 70 percent of the people are shorter than you. Certain numbers that represent the 25th and 75th percentiles have special names because they divide the data into quarters; they are called the first quartile, or Q1, and the third quartile, or Q3, respectively.
Note that computers can calculate quartiles for you, and that’s how you’ll normally get them done.
But, for example, suppose that you have 10 numbers (ordered) 1, 3, 5, 6, 7, 8, 10, 12, 13, 13. The Q3 would be the number that is three-fourths of the way through the data. One way to think of it is to find the median first (7.5 here), which would be the middle of the data. The median now divides the data into two halves, the upper half and the lower half. The Q1 would be the median of the lower half of the data (half of a half is a quarter), which here is the median of 1, 3, 5, 6, 7, which is 5. The Q3 would be the median of the upper half of the data; the median of 8, 10, 12, 13, 13 is 12. (Other computer programs may give slightly different answers to finding quartiles, but you get the big picture here.)
When you calculate the IQR you are ignoring any skewness or outliers that may lie on either end. To find the IQR, you find the range of the middle 50% of the data; you take . So if the 75th percentile (Q3) for height was 60 inches and the 25th percentile (Q1) was 53 inches, then the IQR for height would be inches.
Q. Find the first quartile, the third quartile, and the IQR of the numbers 1, 2, 3, 4, 5.
A. To find the first quartile, find the median, or halfway point in the data set (which here is 3). Then look at the lower half of the numbers and find the median of that, which here is 2, so the first quartile is 2. Similarly, the third quartile is the median of the upper half of the data, which here is 4. The IQR is .
13 Find the median and the IQR of the numbers 1, 2, 3, 4, 5, 6.
14 Find the median and the IQR of the numbers 2, 2, 2, 2, 2.
15 Is the IQR affected by outliers or skewness? Why or why not?
1 The mean (or average) doesn’t have to be one of the numbers in the data set, but it can be. For example, in the data set 1, 2, the mean is 1.5, which isn’t in the data set; however, in the data set 1, 2, 3, the mean is 2.
2 The median will be one of the numbers in the data set if the set has an odd number of values in it, because the set has one distinct middle value in that case. If the set has an even number of values, you find the median by averaging the two middle values, and the answer may or may not be one of the values in the data set. For example, if the data set is 1, 2, 3, 4, the median is 2.5, which isn’t included in the data set; however, if the data set is 1, 3, 3, 4, the median is , which is included.
3 If you don’t order the data to find the median, you get a different answer. For example, look at the data set 1, 5, 2. The median is 2, but if you don’t order the data, it would be 5. And if you reorder the same data set to be 2, 1, 5, you get a different answer for the median: 1. So you should always order the data from smallest to largest to always get the same answer for the median. For the mean, you add up all the values in the data set and divide by the size of the data set. Using the commutative property for addition (and you thought you’d never use algebra later in life!), you know that . Even if you reorder the data, you still get the same sum. So you don’t have to order the data to always get the same answer for the mean of a given data set.
4 Outliers attract the mean toward them and away from the rest of the data. For example, the mean and the median of the data set 1, 2, 3 is 2. Suppose that you have the data set 1, 2, 297. The mean is now divided by 3, which is . However, the median of the data set 1, 2, 297 is still 2.
Outliers affect the mean, but they don’t affect the median. The mean gets pulled in the direction of the outlier and may not truly represent a “typical” value in the data set.
5 This problem gives you one way to check your answer to see if it makes sense.
Because it averages out all the data in the set, the mean has to be somewhere between the largest and smallest values in the data set.
The mean could equal the maximum value in a data set if all the values in the data set are the same; otherwise, any other value that isn’t at the maximum pulls the mean down.
6 Many answers are possible. The key is to put the same number in the middle. One possible answer: data set 1: 100, 200, 300; data set 2: 199, 200, 201. The mean and median of both data sets is 200. These two data sets have the same center with totally different ranges (or spreads). If you want to tell the story about a data set, the center isn’t enough because it can’t distinguish between two data sets with different spreads.
7 This problem really points out what happens to the measures of center when you add any constant to all the values in the data set.
The mean also increases by $1,000 to $51,000, because you literally pick up all the salaries, move them up $1,000 on the number line, and put them back down, which moves the mean by the same amount.
The median also increases by the same amount, to $51,000, for the same reason.
Adding or subtracting a constant to or from all the values in a data set changes the mean and median by that same constant. Be careful — that constant could be negative as well as positive.
8 This scenario highlights what happens when you multiply all the data by a constant. Here, the constant is 1.1, because you take the old salary, call it X, and add 10% of the X to it: . But, so, in other words, 1.10 times the original salary gets you the new salary.
The mean also increases by 10% to become
, because you multiply each value in the data set by 1.1.
The median also increases by 10% to become $55,000 for the same reason.
9 The standard deviation can’t be negative because of the squaring that goes on in its calculation. However, it can be 0, although it happens only when the data set has no deviation in it — in other words, when all the data are exactly the same value. For example, 1, 1, 1 or 2, 2, 2, 2, 2 are two data sets with a standard deviation of 0.
10 If you choose 1, 1, 5, 5, you get the largest standard deviation possible, because these numbers are as far as possible from the mean (which is 3).
11 Adding a constant to the data doesn’t change the standard deviation, because you just relocate the data in a different spot on the number line; you don’t change how far apart the values are from the mean.
12 Multiplying by a constant changes the standard deviation. If you multiply an entire data set by 1.1, the spread increases. Suppose that two employees have salaries of $30,000 and $50,000 — right now, the figures are $20,000 apart. With a 10% raise, they become $33,000 and $55,000, making them $22,000 apart (the rich get richer, and the poor get less rich). If you recalculate the standard deviation, you find that it goes up here by a factor of 1.1 as well.
The new standard deviation becomes c times the old standard deviation, when you multiply the data set by a nonnegative constant c. If you multiply the data by a negative constant, , the new standard deviation becomes |c| times the old standard deviation (again, because of the squaring that goes on, the negative sign disappears). Also note that if c is a number between 0 and 1, the new standard deviation gets smaller than the old one.
13 The median is 3.5 and divides the data into the lower half (1, 2, 3) and the upper half (4, 5, 6). The median of the lower half is Q1, which is 2, and the median of the upper half is Q3, which is 5. IQR is then .
14 In this case, the median is 2, Q1 equals 2, and Q3 also equals 2, so the IQR is . This makes sense because there is no spread in this data set anywhere, least of all in the middle 50 percent of the data.
15 IQR is not affected by skewness or outliers because it measures only the range in the middle 50 percent of the data. It does not pay attention at all to the numbers on the outside edges, which is where skewness or outliers show up.
IN THIS CHAPTER
Making charts and graphs
Making sense of the information presented in charts and graphs
Spotting misleading charts and graphs
Organizing categorical data involves listing each of the possible categories that the responses can take on, along with the number or percentage of individuals in each group. In this chapter, you practice making, interpreting, evaluating, and critiquing charts and graphs for categorical data.
A pie chart is a circle, or pie, whose slices show the percentage of individuals that fall into each category. Larger slices signify categories that include more individuals than the smaller slices. Before you make a pie chart, you can first summarize the data in table format. A frequency table shows how many individuals fall into each category (the sum of which is the total sample size). A relative frequency table shows what percentage of individuals fall into each category by taking the frequencies and dividing by the total sample size. The relative frequencies should sum to 1 or 100 percent (subject to possible round-off error).
Not all pie charts are correct, clear, and fair, however. Here are some common problems with pie charts that you should look out for:
Important information is missing from the pie chart, such as the total number of individuals.
The categories aren’t broken down far enough; for example, a pie chart contains only three slices, and the biggest one is vaguely described as “other.”
The number of categories is so large that you can’t really see anything (such as a pie chart where every individual represents a slice of the pie).
Gaining experience with pie charts helps you see some of the nuances that affect the credibility of their results. See the following for an example of interpreting and critiquing a pie chart.
Q. A hardware store wants to know what percentage of its customers are women. The manager takes a random sample of 76 customers who enter the store and records their gender. Twenty-two customers are females; the rest are males. I summarize the results in the following pie chart.
Describe the results.
How can this pie chart be improved?
A. Apparently, the DIY craze is popular with women, too.
The results of the pie chart show that the percentage of female customers appears to be around
(or around 33%).
You can improve the chart by showing the exact percentages in each slice. (The actual percentages are females: 28.9%; males: 71.1%.)
1 Suppose that 375 individuals are asked what type of vehicle they own: SUV, truck, or car. See the following frequency table.
Make a relative frequency table of these results.
Make a pie chart of these results.
Interpret the results.
Tysiące ebooków i audiobooków
Ich liczba ciągle rośnie, a Ty masz gwarancję niezmiennej ceny.
Napisali o nas:
Nowy sposób na e-księgarnię
Czytelnicy nie wierzą
Legimi idzie na całość
Projekt Legimi wielkim wydarzeniem
Spotify for ebooks