Statistical
Techniques
in Business
& Economics

LIND

MARCHAL

WATHEN

Seventeenth Edition

Statistical Techniques in

BUSINESS &
ECONOMICS

The McGraw-Hill/Irwin Series in Operations and Decision Sciences

SUPPLY CHAIN MANAGEMENT

Benton
Purchasing and Supply Chain
Management
Third Edition

Bowersox, Closs, Cooper, and Bowersox
Supply Chain Logistics Management
Fourth Edition

Burt, Petcavage, and Pinkerton
Supply Management
Eighth Edition

Johnson, Leenders, and Flynn
Purchasing and Supply Management
Fourteenth Edition

Simchi-Levi, Kaminsky, and Simchi-Levi
Designing and Managing the Supply
Chain: Concepts, Strategies, Case
Studies
Third Edition

PROJECT MANAGEMENT

Brown and Hyer
Managing Projects: A Team-Based
Approach
First Edition

Larson and Gray
Project Management: The Managerial
Process
Fifth Edition

SERVICE OPERATIONS MANAGEMENT

Fitzsimmons and Fitzsimmons
Service Management: Operations,
Strategy, Information Technology
Eighth Edition

MANAGEMENT SCIENCE

Hillier and Hillier
Introduction to Management Science: A
Modeling and Case Studies Approach
with Spreadsheets
Fifth Edition

Stevenson and Ozgur

Introduction to Management Science
with Spreadsheets
First Edition

MANUFACTURING CONTROL SYSTEMS

Jacobs, Berry, Whybark, and Vollmann
Manufacturing Planning & Control for
Supply Chain Management
Sixth Edition

BUSINESS RESEARCH METHODS

Cooper and Schindler
Business Research Methods
Twelfth Edition

BUSINESS FORECASTING

Wilson, Keating, and John Galt Solutions,
Inc.
Business Forecasting
Sixth Edition

LINEAR STATISTICS AND REGRESSION

Kutner, Nachtsheim, and Neter
Applied Linear Regression Models
Fourth Edition

BUSINESS SYSTEMS DYNAMICS

Sterman
Business Dynamics: Systems Thinking
and Modeling for a Complex World
First Edition

OPERATIONS MANAGEMENT

Cachon and Terwiesch
Matching Supply with Demand:
An Introduction to Operations
Management
Third Edition

Finch
Interactive Models for Operations and
Supply Chain Management
First Edition

Jacobs and Chase
Operations and Supply Chain
Management
Fourteenth Edition

Jacobs and Chase
Operations and Supply Chain
Management: The Core
Third Edition

Jacobs and Whybark
Why ERP? A Primer on SAP
Implementation
First Edition

Schroeder, Goldstein, and
Rungtusanatham
Operations Management in the Supply
Chain: Decisions and Cases
Sixth Edition

Stevenson
Operations Management
Eleventh Edition

Swink, Melnyk, Cooper, and Hartley
Managing Operations across the Supply
Chain
Second Edition

PRODUCT DESIGN

Ulrich and Eppinger
Product Design and Development
Fifth Edition

BUSINESS MATH

Slater and Wittry
Math for Business and Finance: An
Algebraic Approach
First Edition

Slater and Wittry
Practical Business Math Procedures
Eleventh Edition

Slater and Wittry
Practical Business Math Procedures,
Brief Edition
Eleventh Edition

BUSINESS STATISTICS

Bowerman, O’Connell, and Murphree
Business Statistics in Practice
Seventh Edition

Bowerman, O’Connell, Murphree, and
Orris
Essentials of Business Statistics
Fourth Edition

Doane and Seward
Applied Statistics in Business and
Economics
Fourth Edition

Lind, Marchal, and Wathen
Basic Statistics for Business and
Economics
Eighth Edition

Lind, Marchal, and Wathen
Statistical Techniques in Business and
Economics
Seventeenth Edition

Jaggia and Kelly
Business Statistics: Communicating with
Numbers
First Edition

Jaggia and Kelly
Essentials of Business Statistics:
Communicating with Numbers
First Edition

Statistical Techniques in

BUSINESS &
ECONOMICS

S E V E N T E E N T H E D I T I O N

DOUGLAS A. LIND
Coastal Carolina University and The University of Toledo

WILLIAM G. MARCHAL
The University of Toledo

SAMUEL A. WATHEN
Coastal Carolina University

STATISTICAL TECHNIQUES IN BUSINESS & ECONOMICS, SEVENTEENTH EDITION
Published by McGraw-Hill Education, 2 Penn Plaza, New York, NY 10121. Copyright © 2018 by
McGraw-Hill Education. All rights reserved. Printed in the United States of America. Previous editions
© 2015, 2012, and 2010. No part of this publication may be reproduced or distributed in any form or
by any means, or stored in a database or retrieval system, without the prior written consent of McGraw-
Hill Education, including, but not limited to, in any network or other electronic storage or transmission,
or broadcast for distance learning.

Some ancillaries, including electronic and print components, may not be available to customers outside
the United States.

This book is printed on acid-free paper.

1 2 3 4 5 6 7 8 9 LWI 21 20 19 18 17 16

ISBN 978-1-259-66636-0
MHID 1-259-66636-0

Chief Product Officer, SVP Products & Markets: G. Scott Virkler
Vice President, General Manager, Products & Markets: Marty Lange
Vice President, Content Design & Delivery: Betsy Whalen
Managing Director: Tim Vertovec
Senior Brand Manager: Charles Synovec
Director, Product Development: Rose Koos
Product Developers: Michele Janicek / Ryan McAndrews
Senior Director, Digital Content Development: Douglas Ruby
Marketing Manager: Trina Maurer
Director, Content Design & Delivery: Linda Avenarius
Program Manager: Mark Christianson
Content Project Managers: Harvey Yep (Core) / Bruce Gin (Assessment)
Buyer: Susan K. Culbertson
Design: Matt Backhaus
Cover Image: © Corbis / Glow Images
Content Licensing Specialists: Melissa Homer (Image) / Beth Thole (Text)
Typeface: 9.5/11 Proxima Nova
Compositor: Aptara®, Inc.
Printer: LSC Communications

All credits appearing on page or at the end of the book are considered to be an extension of the
copyright page.

Library of Congress Cataloging-in-Publication Data

Names: Lind, Douglas A., author. | Marchal, William G., author. | Wathen,
Samuel Adam. author.
Title: Statistical techniques in business & economics/Douglas A. Lind,
Coastal Carolina University and The University of Toledo, William G.
Marchal, The University of Toledo, Samuel A. Wathen, Coastal Carolina University.
Other titles: Statistical techniques in business and economics
Description: Seventeenth Edition. | Dubuque, IA : McGraw-Hill Education,
[2017] | Revised edition of the authors’ Statistical techniques in
business & economics, [2015]
Identifiers: LCCN 2016054310| ISBN 9781259666360 (alk. paper) | ISBN
1259666360 (alk. paper)
Subjects: LCSH: Social sciences—Statistical methods. |
Economics—Statistical methods. | Commercial statistics.
Classification: LCC HA29 .M268 2017 | DDC 519.5—dc23 LC record available at
https://lccn.loc.gov/2016054310

The Internet addresses listed in the text were accurate at the time of publication. The inclusion of a
website does not indicate an endorsement by the authors or McGraw-Hill Education, and McGraw-Hill
Education does not guarantee the accuracy of the information presented at these sites.

mheducation.com/highered

DEDICATION

To Jane, my wife and best friend, and our sons, their wives, and our
grandchildren: Mike and Sue (Steve and Courtney), Steve and Kathryn
(Kennedy, Jake, and Brady), and Mark and Sarah (Jared, Drew, and Nate).

Douglas A. Lind

To Oscar Sambath Marchal, Julian Irving Horowitz, Cecilia Marchal
Nicholson and Andrea.

William G. Marchal

To my wonderful family: Barb, Hannah, and Isaac.

Samuel A. Wathen

vi

Over the years, we received many compliments on this text and understand that it’s a
favorite among students. We accept that as the highest compliment and continue to
work very hard to maintain that status.

The objective of Statistical Techniques in Business and Economics is to provide
students majoring in management, marketing, finance, accounting, economics, and
other fields of business administration with an introductory survey of descriptive and infer-
ential statistics. To illustrate the application of statistics, we use many examples and
exercises that focus on business applications, but also relate to the current world of the
college student. A previous course in statistics is not necessary, and the mathematical
requirement is first-year algebra.

In this text, we show beginning students every step needed to be successful in
a basic statistics course. This step-by-step approach enhances performance, accel-
erates preparedness, and significantly improves motivation. Understanding the
concepts, seeing and doing plenty of examples and exercises, and comprehending
the application of statistical methods in business and economics are the focus of
this book.

The first edition of this text was published in 1967. At that time, locating relevant
business data was difficult. That has changed! Today, locating data is not a problem.
The number of items you purchase at the grocery store is automatically recorded at
the checkout counter. Phone companies track the time of our calls, the length of calls,
and the identity of the person called. Credit card companies maintain information on
the number, time and date, and amount of our purchases. Medical devices automati-
cally monitor our heart rate, blood pressure, and temperature from remote locations.
A large amount of business information is recorded and reported almost instantly.
CNN, USA Today, and MSNBC, for example, all have websites that track stock prices
in real time.

Today, the practice of data analytics is widely applied to “big data.” The practice
of data analytics requires skills and knowledge in several areas. Computer skills are
needed to process large volumes of information. Analytical skills are needed to
evaluate, summarize, organize, and analyze the information. Critical thinking skills
are needed to interpret and communicate the results of processing the
information.

Our text supports the development of basic data analytical skills. In this edition,
we added a new section at the end of each chapter called Data Analytics. As you
work through the text, this section provides the instructor and student with opportu-
nities to apply statistical knowledge and statistical software to explore several busi-
ness environments. Interpretation of the analytical results is an integral part of these
exercises.

A variety of statistical software is available to complement our text. Microsoft Excel
includes an add-in with many statistical analyses. Megastat is an add-in available for
Microsoft Excel. Minitab and JMP are stand-alone statistical software available to down-
load for either PC or MAC computers. In our text, Microsoft Excel, Minitab, and Megastat
are used to illustrate statistical software analyses. When a software application is pre-
sented, the software commands for the application are available in Appendix C. We use
screen captures within the chapters, so the student becomes familiar with the nature of
the software output.

Because of the availability of computers and software, it is no longer necessary to
dwell on calculations. We have replaced many of the calculation examples with interpre-
tative ones, to assist the student in understanding and interpreting the statistical results.
In addition, we place more emphasis on the conceptual nature of the statistical topics.
While making these changes, we still continue to present, as best we can, the key con-
cepts, along with supporting interesting and relevant examples.

A N O T E F R O M T H E A U T H O R S

vii

WHAT’S NEW IN THE SEVENTEENTH EDITION?
We have made many changes to examples and exercises throughout the text. The sec-
tion on “Enhancements” to our text details them. The major change to the text is in
response to user interest in the area of data analytics. Our approach is to provide in-
structors and students with the opportunity to combine statistical knowledge, computer
and statistical software skills, and interpretative and critical thinking skills. A set of new
and revised exercises is included at the end of chapters 1 through 18 in a section titled
“Data Analytics.”

In these sections, exercises refer to three data sets. The North Valley Real Estate
sales data set lists 105 homes currently on the market. The Lincolnville School District
bus data lists information on 80 buses in the school district’s bus fleet. The authors de-
signed these data so that students will be able to use statistical software to explore the
data and find realistic relationships in the variables. The Baseball Statistics for the 2016
season is updated from the previous edition.

The intent of the exercises is to provide the basis of a continuing case analysis. We
suggest that instructors select one of the data sets and assign the corresponding exer-
cises as each chapter is completed. Instructor feedback regarding student performance
is important. Students should retain a copy of each chapter’s results and interpretations
to develop a portfolio of discoveries and findings. These will be helpful as students
progress through the course and use new statistical techniques to further explore the
data. The ideal ending for these continuing data analytics exercises is a comprehensive
report based on the analytical findings.

We know that working with a statistics class to develop a very basic competence in
data analytics is challenging. Instructors will be teaching statistics. In addition, instruc-
tors will be faced with choosing statistical software and supporting students in develop-
ing or enhancing their computer skills. Finally, instructors will need to assess student
performance based on assignments that include both statistical and written compo-
nents. Using a mentoring approach may be helpful.

We hope that you and your students find this new feature interesting and engaging.

HOW ARE CHAPTERS ORGANIZED TO ENGAGE
STUDENTS AND PROMOTE LEARNING?

Chapter Learning Objectives
Each chapter begins with a set of
learning objectives designed to pro-
vide focus for the chapter and motivate
student learning. These objectives, lo-
cated in the margins next to the topic,
indicate what the student should be
able to do after completing each sec-
tion in the chapter.

Chapter Opening Exercise
A representative exercise opens the chapter and shows how the chapter content can be applied to a real-world
situation.

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

LO2-2 Display a frequency table using a bar or pie chart.

LO2-3 Summarize quantitative variables with frequency and relative frequency distributions.

LO2-4 Display a frequency distribution using a histogram or frequency polygon.

MERRILL LYNCH recently completed a study of online investment portfolios for a sample
of clients. For the 70 participants in the study, organize these data into a frequency
distribution. (See Exercise 43 and LO2-3.)

Describing Data:
FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS,

AND GRAPHIC PRESENTATION2

Source: © rido/123RF

Lin66360_ch02_018-050.indd 18 1/6/17 4:52 AM

Introduction to the Topic
Each chapter starts with a review of
the important concepts of the previ-
ous chapter and provides a link to the
material in the current chapter. This
step-by-step approach increases com-
prehension by providing continuity
across the concepts.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 19

INTRODUCTION
The United States automobile retailing industry is highly competitive. It is dominated by
megadealerships that own and operate 50 or more franchises, employ over 10,000
people, and generate several billion dollars in annual sales. Many of the top dealerships

are publicly owned with shares traded on the New York Stock Exchange
or NASDAQ. In 2014, the largest megadealership was AutoNation (ticker
symbol AN), followed by Penske Auto Group (PAG), Group 1 Automotive,
Inc. (ticker symbol GPI), and the privately owned Van Tuyl Group.

These large corporations use statistics and analytics to summarize
and analyze data and information to support their decisions. As an ex-
ample, we will look at the Applewood Auto group. It owns four dealer-
ships and sells a wide range of vehicles. These include the popular
Korean brands Kia and Hyundai, BMW and Volvo sedans and luxury
SUVs, and a full line of Ford and Chevrolet cars and trucks.

Ms. Kathryn Ball is a member of the senior management team at
Applewood Auto Group, which has its corporate offices adjacent to Kane
Motors. She is responsible for tracking and analyzing vehicle sales and

the profitability of those vehicles. Kathryn would like to summarize the profit earned on
the vehicles sold with tables, charts, and graphs that she would review monthly. She
wants to know the profit per vehicle sold, as well as the lowest and highest amount of
profit. She is also interested in describing the demographics of the buyers. What are
their ages? How many vehicles have they previously purchased from one of the Apple-
wood dealerships? What type of vehicle did they purchase?

The Applewood Auto Group operates four dealerships:

• Tionesta Ford Lincoln sells Ford and Lincoln cars and trucks.
• Olean Automotive Inc. has the Nissan franchise as well as the General Motors

brands of Chevrolet, Cadillac, and GMC Trucks.
• Sheffield Motors Inc. sells Buick, GMC trucks, Hyundai, and Kia.
• Kane Motors offers the Chrysler, Dodge, and Jeep line as well as BMW and Volvo.

Every month, Ms. Ball collects data from each of the four dealerships
and enters them into an Excel spreadsheet. Last month the Applewood
Auto Group sold 180 vehicles at the four dealerships. A copy of the first
few observations appears to the left. The variables collected include:

• Age—the age of the buyer at the time of the purchase.
• Profit—the amount earned by the dealership on the sale of each

vehicle.
• Location—the dealership where the vehicle was purchased.
• Vehicle type—SUV, sedan, compact, hybrid, or truck.
• Previous—the number of vehicles previously purchased at any of the

four Applewood dealerships by the consumer.

The entire data set is available at the McGraw-Hill website (www.mhhe
.com/lind17e) and in Appendix A.4 at the end of the text.

Source: © Justin Sullivan/Getty Images

CONSTRUCTING FREQUENCY TABLES
Recall from Chapter 1 that techniques used to describe a set of data are called descrip-
tive statistics. Descriptive statistics organize data to show the general pattern of the
data, to identify where values tend to concentrate, and to expose extreme or unusual
data values. The first technique we discuss is a frequency table.

LO2-1
Summarize qualitative
variables with frequency
and relative frequency
tables.

FREQUENCY TABLE A grouping of qualitative data into mutually exclusive and
collectively exhaustive classes showing the number of observations in each class.

Lin66360_ch02_018-050.indd 19 1/6/17 4:52 AM

Example/Solution
After important concepts are introduced,
a solved example is given. This example
provides a how-to illustration and shows
a relevant business application that
helps students answer the question,
“How can I apply this concept?”

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 95

INTRODUCTION
Chapter 2 began our study of descriptive statistics. In order to transform raw or un-
grouped data into a meaningful form, we organize the data into a frequency distribution.
We present the frequency distribution in graphic form as a histogram or a frequency
polygon. This allows us to visualize where the data tend to cluster, the largest and the
smallest values, and the general shape of the data.

In Chapter 3, we first computed several measures of location, such as the mean,
median, and mode. These measures of location allow us to report a typical value in the
set of observations. We also computed several measures of dispersion, such as the
range, variance, and standard deviation. These measures of dispersion allow us to de-
scribe the variation or the spread in a set of observations.

We continue our study of descriptive statistics in this chapter. We study (1) dot plots,
(2) stem-and-leaf displays, (3) percentiles, and (4) box plots. These charts and statistics
give us additional insight into where the values are concentrated as well as the general
shape of the data. Then we consider bivariate data. In bivariate data, we observe two
variables for each individual or observation. Examples include the number of hours a
student studied and the points earned on an examination; if a sampled product meets
quality specifications and the shift on which it is manufactured; or the amount of electric-
ity used in a month by a homeowner and the mean daily high temperature in the region
for the month. These charts and graphs provide useful insights as we use business
analytics to enhance our understanding of data.

DOT PLOTS
Recall for the Applewood Auto Group data, we summarized the profit earned on the
180 vehicles sold with a frequency distribution using eight classes. When we orga-
nized the data into the eight classes, we lost the exact value of the observations. A
dot plot, on the other hand, groups the data as little as possible, and we do not lose
the identity of an individual observation. To develop a dot plot, we display a dot for
each observation along a horizontal number line indicating the possible values of the
data. If there are identical observations or the observations are too close to be shown
individually, the dots are “piled” on top of each other. This allows us to see the shape
of the distribution, the value about which the data tend to cluster, and the largest and
smallest observations. Dot plots are most useful for smaller data sets, whereas histo-
grams tend to be most useful for large data sets. An example will show how to con-
struct and interpret dot plots.

LO4-1
Construct and interpret a
dot plot.

E X A M P L E

The service departments at Tionesta Ford Lincoln and Sheffield Motors Inc., two
of the four Applewood Auto Group dealerships, were both open 24 days last
month. Listed below is the number of vehicles serviced last month at the two
dealerships. Construct dot plots and report summary statistics to compare the
two dealerships.

Tionesta Ford Lincoln

Monday Tuesday Wednesday Thursday Friday Saturday

23 33 27 28 39 26
30 32 28 33 35 32
29 25 36 31 32 27
35 32 35 37 36 30

Lin66360_ch04_094-131.indd 95 1/10/17 7:41 PM

Self-Reviews
Self-Reviews are interspersed
throughout each chapter and
follow Example/Solution sec-
tions. They help students mon-
itor their progress and provide
immediate reinforcement for
that particular technique. An-
swers are in Appendix E.

106 CHAPTER 4

calculate quartiles. Excel 2013 and Excel 2016 offer both methods. The Excel function,
Quartile.exc, will result in the same answer as Equation 4–1. The Excel function, Quar-
tile.inc, will result in the Excel Method answers.

The Quality Control department of Plainsville Peanut Company is responsible for checking
the weight of the 8-ounce jar of peanut butter. The weights of a sample of nine jars pro-
duced last hour are:

7.69 7.72 7.8 7.86 7.90 7.94 7.97 8.06 8.09

(a) What is the median weight?
(b) Determine the weights corresponding to the first and third quartiles.

S E L F - R E V I E W 4–2

11. Determine the median and the first and third quartiles in the following data.

46 47 49 49 51 53 54 54 55 55 59

12. Determine the median and the first and third quartiles in the following data.

5.24 6.02 6.67 7.30 7.59 7.99 8.03 8.35 8.81 9.45
9.61 10.37 10.39 11.86 12.22 12.71 13.07 13.59 13.89 15.42

13. The Thomas Supply Company Inc. is a distributor of gas-powered generators.
As with any business, the length of time customers take to pay their invoices is im-
portant. Listed below, arranged from smallest to largest, is the time, in days, for a
sample of The Thomas Supply Company Inc. invoices.

13 13 13 20 26 27 31 34 34 34 35 35 36 37 38
41 41 41 45 47 47 47 50 51 53 54 56 62 67 82

a. Determine the first and third quartiles.
b. Determine the second decile and the eighth decile.
c. Determine the 67th percentile.

14. Kevin Horn is the national sales manager for National Textbooks Inc. He
has a sales staff of 40 who visit college professors all over the United States.
Each Saturday morning he requires his sales staff to send him a report. This re-
port includes, among other things, the number of professors visited during the
previous week. Listed below, ordered from smallest to largest, are the number
of visits last week.

38 40 41 45 48 48 50 50 51 51 52 52 53 54 55 55 55 56 56 57
59 59 59 62 62 62 63 64 65 66 66 67 67 69 69 71 77 78 79 79

a. Determine the median number of calls.
b. Determine the first and third quartiles.
c. Determine the first decile and the ninth decile.
d. Determine the 33rd percentile.

E X E R C I S E S

Lin66360_ch04_094-131.indd 106 1/10/17 7:41 PM

viii

ix

Statistics in Action
Statistics in Action articles are scattered through-
out the text, usually about two per chapter. They
provide unique, interesting applications and his-
torical insights in the field of statistics.

144 CHAPTER 5

The General Rule of Addition
The outcomes of an experiment may not be mutually exclusive. For example, the Florida
Tourist Commission selected a sample of 200 tourists who visited the state during the
year. The survey revealed that 120 tourists went to Disney World and 100 went to Busch
Gardens near Tampa. What is the probability that a person selected visited either Disney
World or Busch Gardens? If the special rule of addition is used, the probability of selecting
a tourist who went to Disney World is .60, found by 120/200. Similarly, the probability of a
tourist going to Busch Gardens is .50. The sum of these probabilities is 1.10. We know,
however, that this probability cannot be greater than 1. The explanation is that many tour-
ists visited both attractions and are being counted twice! A check of the survey responses
revealed that 60 out of 200 sampled did, in fact, visit both attractions.

To answer our question, “What is the probability a selected person visited either
Disney World or Busch Gardens?” (1) add the probability that a tourist visited Disney
World and the probability he or she visited Busch Gardens, and (2) subtract the proba-
bility of visiting both. Thus:

P(Disney or Busch) = P(Disney) + P(Busch) − P(both Disney and Busch)
= .60 + .50 − .30 = .80

When two events both occur, the probability is called a joint probability. The prob-
ability (.30) that a tourist visits both attractions is an example of a joint probability.

© Rostislav Glinsky/Shutterstock.com

The following Venn diagram shows two events that are not mutually exclusive. The two
events overlap to illustrate the joint event that some people have visited both attractions.

A sample of employees of Worldwide Enterprises is to be surveyed about a new health
care plan. The employees are classified as follows:

Classification Event Number of Employees

Supervisors A 120
Maintenance B 50
Production C 1,460
Management D 302
Secretarial E 68

(a) What is the probability that the first person selected is:
(i) either in maintenance or a secretary?
(ii) not in management?
(b) Draw a Venn diagram illustrating your answers to part (a).
(c) Are the events in part (a)(i) complementary or mutually exclusive or both?

S E L F - R E V I E W 5–3

STATISTICS IN ACTION

If you wish to get some
attention at the next gath-
ering you attend, announce
that you believe that at
least two people present
were born on the same
date—that is, the same
day of the year but not
necessarily the same year.
If there are 30 people in
the room, the probability of
a duplicate is .706. If there
are 60 people in the room,
the probability is .994 that
at least two people share the
same birthday. With as few
as 23 people the chances
are even, that is .50, that at
least two people share the
same birthday. Hint: To
compute this, find the
probability everyone was
born on a different day and
use the complement rule.
Try this in your class.

Lin66360_ch05_132-174.indd 144 1/10/17 7:41 PM

Definitions
Definitions of new terms or terms unique to
the study of statistics are set apart from the
text and highlighted for easy reference and
review. They also appear in the Glossary at
the end of the book.

A SURVEY OF PROBABILITY CONCEPTS 145

P (Disney) = .60 P (Busch) = .50

P (Disney and Busch) = .30

JOINT PROBABILITY A probability that measures the likelihood two or more
events will happen concurrently.

So the general rule of addition, which is used to compute the probability of two
events that are not mutually exclusive, is:

GENERAL RULE OF ADDITION P(A or B) = P(A) + P(B) − P(A and B) [5–4]

For the expression P(A or B), the word or suggests that A may occur or B may occur.
This also includes the possibility that A and B may occur. This use of or is sometimes
called an inclusive. You could also write P(A or B or both) to emphasize that the union of
the events includes the intersection of A and B.

If we compare the general and special rules of addition, the important difference is
determining if the events are mutually exclusive. If the events are mutually exclusive, then
the joint probability P(A and B) is 0 and we could use the special rule of addition. Other-
wise, we must account for the joint probability and use the general rule of addition.

E X A M P L E

What is the probability that a card chosen at random from a standard deck of cards
will be either a king or a heart?

S O L U T I O N

We may be inclined to add the probability of a king and the probability of a heart. But this
creates a problem. If we do that, the king of hearts is counted with the kings and also
with the hearts. So, if we simply add the probability of a king (there are 4 in a deck of 52
cards) to the probability of a heart (there are 13 in a deck of 52 cards) and report that 17
out of 52 cards meet the requirement, we have counted the king of hearts twice. We
need to subtract 1 card from the 17 so the king of hearts is counted only once. Thus,
there are 16 cards that are either hearts or kings. So the probability is 16/52 = .3077.

Card Probability Explanation

King P(A) = 4/52 4 kings in a deck of 52 cards
Heart P(B) = 13/52 13 hearts in a deck of 52 cards
King of Hearts P(A and B) = 1/52 1 king of hearts in a deck of 52 cards

Lin66360_ch05_132-174.indd 145 1/10/17 7:41 PM

Formulas
Formulas that are used for the first time are
boxed and numbered for reference. In addi-
tion, a formula card is bound into the back of
the text that lists all the key formulas.

A SURVEY OF PROBABILITY CONCEPTS 147

16. Two coins are tossed. If A is the event “two heads” and B is the event “two tails,” are
A and B mutually exclusive? Are they complements?

17. The probabilities of the events A and B are .20 and .30, respectively. The probability
that both A and B occur is .15. What is the probability of either A or B occurring?

18. Let P(X) = .55 and P(Y) = .35. Assume the probability that they both occur is .20.
What is the probability of either X or Y occurring?

19. Suppose the two events A and B are mutually exclusive. What is the probability of
their joint occurrence?

20. A student is taking two courses, history and math. The probability the student will
pass the history course is .60, and the probability of passing the math course is .70.
The probability of passing both is .50. What is the probability of passing at least one?

21. The aquarium at Sea Critters Depot contains 140 fish. Eighty of these fish are green
swordtails (44 female and 36 male) and 60 are orange swordtails (36 female and
24 males). A fish is randomly captured from the aquarium:

a. What is the probability the selected fish is a green swordtail?
b. What is the probability the selected fish is male?
c. What is the probability the selected fish is a male green swordtail?
d. What is the probability the selected fish is either a male or a green swordtail?

22. A National Park Service survey of visitors to the Rocky Mountain region revealed
that 50% visit Yellowstone Park, 40% visit the Tetons, and 35% visit both.

a. What is the probability a vacationer will visit at least one of these attractions?
b. What is the probability .35 called?
c. Are the events mutually exclusive? Explain.

RULES OF MULTIPLICATION
TO CALCULATE PROBABILITY
In this section, we discuss the rules for computing the likelihood that two events both
happen, or their joint probability. For example, 16% of the 2016 tax returns were pre-
pared by H&R Block and 75% of those returns showed a refund. What is the likelihood
a person’s tax form was prepared by H&R Block and the person received a refund?
Venn diagrams illustrate this as the intersection of two events. To find the likelihood of
two events happening, we use the rules of multiplication. There are two rules of multipli-
cation: the special rule and the general rule.

Special Rule of Multiplication
The special rule of multiplication requires that two events A and B are independent.
Two events are independent if the occurrence of one event does not alter the probabil-
ity of the occurrence of the other event.

LO5-4
Calculate probabilities
using the rules of
multiplication.

INDEPENDENCE The occurrence of one event has no effect on the probability of
the occurrence of another event.

One way to think about independence is to assume that events A and B occur at differ-
ent times. For example, when event B occurs after event A occurs, does A have any effect
on the likelihood that event B occurs? If the answer is no, then A and B are independent
events. To illustrate independence, suppose two coins are tossed. The outcome of a coin
toss (head or tail) is unaffected by the outcome of any other prior coin toss (head or tail).

For two independent events A and B, the probability that A and B will both occur is
found by multiplying the two probabilities. This is the special rule of multiplication and
is written symbolically as:

SPECIAL RULE OF MULTIPLICATION P(A and B) = P(A)P(B) [5–5]

Lin66360_ch05_132-174.indd 147 1/10/17 7:41 PM

Exercises
Exercises are included after sec-
tions within the chapter and at
the end of the chapter. Section
exercises cover the material stud-
ied in the section. Many exercises
have data files available to import
into statistical software. They are
indicated with the FILE icon.
Answers to the odd-numbered
exercises are in Appendix D.

DESCRIBING DATA: NUMERICAL MEASURES 79

INTERPRETATION AND USES
OF THE STANDARD DEVIATION
The standard deviation is commonly used as a measure to compare the spread in two
or more sets of observations. For example, the standard deviation of the biweekly
amounts invested in the Dupree Paint Company profit-sharing plan is computed to be
$7.51. Suppose these employees are located in Georgia. If the standard deviation for a
group of employees in Texas is $10.47, and the means are about the same, it indicates
that the amounts invested by the Georgia employees are not dispersed as much as
those in Texas (because $7.51 < $10.47). Since the amounts invested by the Georgia
employees are clustered more closely about the mean, the mean for the Georgia em-
ployees is a more reliable measure than the mean for the Texas group.

Chebyshev’s Theorem
We have stressed that a small standard deviation for a set of values indicates that these
values are located close to the mean. Conversely, a large standard deviation reveals that
the observations are widely scattered about the mean. The Russian mathematician P. L.
Chebyshev (1821–1894) developed a theorem that allows us to determine the minimum
proportion of the values that lie within a specified number of standard deviations of the
mean. For example, according to Chebyshev’s theorem, at least three out of every four,
or 75%, of the values must lie between the mean plus two standard deviations and the
mean minus two standard deviations. This relationship applies regardless of the shape of
the distribution. Further, at least eight of nine values, or 88.9%, will lie between plus three
standard deviations and minus three standard deviations of the mean. At least 24 of 25
values, or 96%, will lie between plus and minus five standard deviations of the mean.

Chebyshev’s theorem states:

LO3-5
Explain and apply
Chebyshev’s theorem
and the Empirical Rule.

STATISTICS IN ACTION

Most colleges report the
“average class size.” This
information can be mislead-
ing because average class
size can be found in several
ways. If we find the number
of students in each class at
a particular university, the
result is the mean number
of students per class. If we
compile a list of the class
sizes for each student and
find the mean class size, we
might find the mean to be
quite different. One school
found the mean number of
students in each of its 747
classes to be 40. But when

(continued)

CHEBYSHEV’S THEOREM For any set of observations (sample or population), the
proportion of the values that lie within k standard deviations of the mean is at least
1 – 1/k2, where k is any value greater than 1.

For Exercises 47–52, do the following:

a. Compute the sample variance.
b. Determine the sample standard deviation.

47. Consider these values a sample: 7, 2, 6, 2, and 3.
48. The following five values are a sample: 11, 6, 10, 6, and 7.
49. Dave’s Automatic Door, referred to in Exercise 37, installs automatic garage

door openers. Based on a sample, following are the times, in minutes, required to
install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.

50. The sample of eight companies in the aerospace industry, referred to in Exer-
cise 38, was surveyed as to their return on investment last year. The results are
10.6, 12.6, 14.8, 18.2, 12.0, 14.8, 12.2, and 15.6.

51. The Houston, Texas, Motel Owner Association conducted a survey regarding
weekday motel rates in the area. Listed below is the room rate for business-class
guests for a sample of 10 motels.

$101 $97 $103 $110 $78 $87 $101 $80 $106 $88

52. A consumer watchdog organization is concerned about credit card debt. A
survey of 10 young adults with credit card debt of more than $2,000 showed they
paid an average of just over $100 per month against their balances. Listed below
are the amounts each young adult paid last month.

$110 $126 $103 $93 $99 $113 $87 $101 $109 $100

E X E R C I S E S

Lin66360_ch03_051-093.indd 79 1/6/17 4:51 AM

Computer Output
The text includes many software examples, using
Excel, MegaStat®, and Minitab. The software results are
illustrated in the chapters. Instructions for a particular
software example are in Appendix C.

64 CHAPTER 3

E X A M P L E

Table 2–4 on page 26 shows the profit on the sales of 180 vehicles at Applewood
Auto Group. Determine the mean and the median selling price.

S O L U T I O N

The mean, median, and modal amounts of profit are reported in the following
output (highlighted in the screen shot). (Reminder: The instructions to create the
output appear in the Software Commands in Appendix C.) There are 180 vehicles
in the study, so using a calculator would be tedious and prone to error.

Software Solution
We can use a statistical software package to find many measures of location.

a. What is the arithmetic mean of the Alaska unemployment rates?
b. Find the median and the mode for the unemployment rates.
c. Compute the arithmetic mean and median for just the winter (Dec–Mar) months.

Is it much different?
22. Big Orange Trucking is designing an information system for use in “in-cab”

communications. It must summarize data from eight sites throughout a region to
describe typical conditions. Compute an appropriate measure of central location for
the variables wind direction, temperature, and pavement.

City Wind Direction Temperature Pavement

Anniston, AL West 89 Dry
Atlanta, GA Northwest 86 Wet
Augusta, GA Southwest 92 Wet
Birmingham, AL South 91 Dry
Jackson, MS Southwest 92 Dry
Meridian, MS South 92 Trace
Monroe, LA Southwest 93 Wet
Tuscaloosa, AL Southwest 93 Trace

Lin66360_ch03_051-093.indd 64 1/6/17 4:51 AM

HOW DOES THIS TEXT REINFORCE
STUDENT LEARNING?

x

BY CHAPTER

Chapter Summary
Each chapter contains a brief summary
of the chapter material, including vocab-
ulary, definitions, and critical formulas.

202 CHAPTER 6

the number of transmission services, muffler replacements, and oil changes per day at
Avellino’s Auto Shop. They follow Poisson distributions with means of 0.7, 2.0, and
6.0, respectively.

In summary, the Poisson distribution is a family of discrete distributions. All that is
needed to construct a Poisson probability distribution is the mean number of defects,
errors, or other random variable, designated as μ.

From actuary tables, Washington Insurance Company determined the likelihood that a man
age 25 will die within the next year is .0002. If Washington Insurance sells 4,000 policies to
25-year-old men this year, what is the probability they will pay on exactly one policy?

S E L F - R E V I E W 6–6

31. In a Poisson distribution μ = 0.4.
a. What is the probability that x = 0?
b. What is the probability that x > 0?

32. In a Poisson distribution μ = 4.
a. What is the probability that x = 2?
b. What is the probability that x ≤ 2?
c. What is the probability that x > 2?

33. Ms. Bergen is a loan officer at Coast Bank and Trust. From her years of experience,
she estimates that the probability is .025 that an applicant will not be able to repay
his or her installment loan. Last month she made 40 loans.

a. What is the probability that three loans will be defaulted?
b. What is the probability that at least three loans will be defaulted?

34. Automobiles arrive at the Elkhart exit of the Indiana Toll Road at the rate of two per
minute. The distribution of arrivals approximates a Poisson distribution.

a. What is the probability that no automobiles arrive in a particular minute?
b. What is the probability that at least one automobile arrives during a particular

minute?
35. It is estimated that 0.5% of the callers to the Customer Service department of Dell

Inc. will receive a busy signal. What is the probability that of today’s 1,200 callers at
least 5 received a busy signal?

36. In the past, schools in Los Angeles County have closed an average of 3 days each
year for weather emergencies. What is the probability that schools in Los Angeles
County will close for 4 days next year?

E X E R C I S E S

C H A P T E R S U M M A R Y

I. A random variable is a numerical value determined by the outcome of an experiment.
II. A probability distribution is a listing of all possible outcomes of an experiment and the

probability associated with each outcome.
A. A discrete probability distribution can assume only certain values. The main features are:

1. The sum of the probabilities is 1.00.
2. The probability of a particular outcome is between 0.00 and 1.00.
3. The outcomes are mutually exclusive.

B. A continuous distribution can assume an infinite number of values within a specific range.
III. The mean and variance of a probability distribution are computed as follows.

A. The mean is equal to:

μ = Σ[xP(x)] (6–1)
B. The variance is equal to:

σ2 = Σ[(x − μ)2P(x)] (6–2)

Lin66360_ch06_175-208.indd 202 1/14/17 7:02 AM

Pronunciation Key
This section lists the mathematical symbol,
its meaning, and how to pronounce it. We
believe this will help the student retain the
meaning of the symbol and generally en-
hance course communications.

168 CHAPTER 5

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

P(A) Probability of A P of A

P(∼A) Probability of not A P of not A
P(A and B) Probability of A and B P of A and B

P(A or B) Probability of A or B P of A or B

P(A | B) Probability of A given B has happened P of A given B

nPr Permutation of n items selected r at a time Pnr

nCr Combination of n items selected r at a time Cnr

C H A P T E R E X E R C I S E S

47. The marketing research department at Pepsico plans to survey teenagers about a newly
developed soft drink. Each will be asked to compare it with his or her favorite soft drink.
a. What is the experiment?
b. What is one possible event?

48. The number of times a particular event occurred in the past is divided by the number of
occurrences. What is this approach to probability called?

49. The probability that the cause and the cure for all cancers will be discovered before the
year 2020 is .20. What viewpoint of probability does this statement illustrate?

50. Berdine’s Chicken Factory has several stores in the Hilton Head, South Carolina,
area. When interviewing applicants for server positions, the owner would like to in-
clude information on the amount of tip a server can expect to earn per check (or bill).
A study of 500 recent checks indicated the server earned the following amounts in
tips per 8-hour shift.

Amount of Tip Number

$0 up to $ 20 200
20 up to 50 100
50 up to 100 75
100 up to 200 75
200 or more 50

Total 500

a. What is the probability of a tip of $200 or more?
b. Are the categories “$0 up to $20,” “$20 up to $50,” and so on considered mutually

exclusive?
c. If the probabilities associated with each outcome were totaled, what would that total be?
d. What is the probability of a tip of up to $50?
e. What is the probability of a tip of less than $200?

51. Winning all three “Triple Crown” races is considered the greatest feat of a pedigree
racehorse. After a successful Kentucky Derby, Corn on the Cob is a heavy favorite at 2
to 1 odds to win the Preakness Stakes.
a. If he is a 2 to 1 favorite to win the Belmont Stakes as well, what is his probability of

winning the Triple Crown?
b. What do his chances for the Preakness Stakes have to be in order for him to be

“even money” to earn the Triple Crown?
52. The first card selected from a standard 52-card deck is a king.

a. If it is returned to the deck, what is the probability that a king will be drawn on the
second selection?

b. If the king is not replaced, what is the probability that a king will be drawn on the
second selection?

Lin66360_ch05_132-174.indd 168 1/10/17 7:41 PM

Chapter Exercises
Generally, the end-of-chapter exercises
are the most challenging and integrate
the chapter concepts. The answers and
worked-out solutions for all odd-
numbered exercises are in Appendix D
at the end of the text. Many exercises
are noted with a data file icon in the
margin. For these exercises, there are
data files in Excel format located on the
text’s website, www.mhhe.com/Lind17e.
These files help students use statistical
software to solve the exercises.

348 CHAPTER 10

The major characteristics of the t distribution are:
1. It is a continuous distribution.
2. It is mound-shaped and symmetrical.
3. It is flatter, or more spread out, than the standard normal distribution.
4. There is a family of t distributions, depending on the number of degrees of freedom.

V. There are two types of errors that can occur in a test of hypothesis.
A. A Type I error occurs when a true null hypothesis is rejected.

1. The probability of making a Type I error is equal to the level of significance.
2. This probability is designated by the Greek letter α.

B. A Type II error occurs when a false null hypothesis is not rejected.
1. The probability of making a Type II error is designated by the Greek letter β.
2. The likelihood of a Type II error must be calculated comparing the hypothesized

distribution to an alternate distribution based on sample results.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

H0 Null hypothesis H sub zero

H1 Alternate hypothesis H sub one

α/2 Two-tailed significance level Alpha divided by 2
xc Limit of the sample mean x bar sub c

μ0 Assumed population mean mu sub zero

C H A P T E R E X E R C I S E S

25. According to the local union president, the mean gross income of plumbers in the Salt
Lake City area follows the normal probability distribution with a mean of $45,000 and a
standard deviation of $3,000. A recent investigative reporter for KYAK TV found, for a
sample of 120 plumbers, the mean gross income was $45,500. At the .10 significance
level, is it reasonable to conclude that the mean income is not equal to $45,000? Deter-
mine the p-value.

26. Rutter Nursery Company packages its pine bark mulch in 50-pound bags. From a
long history, the production department reports that the distribution of the bag weights
follows the normal distribution and the standard deviation of the packaging process is
3 pounds per bag. At the end of each day, Jeff Rutter, the production manager, weighs
10 bags and computes the mean weight of the sample. Below are the weights of
10 bags from today’s production.

45.6 47.7 47.6 46.3 46.2 47.4 49.2 55.8 47.5 48.5

a. Can Mr. Rutter conclude that the mean weight of the bags is less than 50 pounds?
Use the .01 significance level.

b. In a brief report, tell why Mr. Rutter can use the z distribution as the test statistic.
c. Compute the p-value.

27. A new weight-watching company, Weight Reducers International, advertises that those
who join will lose an average of 10 pounds after the first two weeks. The standard devi-
ation is 2.8 pounds. A random sample of 50 people who joined the weight reduction
program revealed a mean loss of 9 pounds. At the .05 level of significance, can we
conclude that those joining Weight Reducers will lose less than 10 pounds? Determine
the p-value.

28. Dole Pineapple Inc. is concerned that the 16-ounce can of sliced pineapple is being
overfilled. Assume the standard deviation of the process is .03 ounce. The quality-con-
trol department took a random sample of 50 cans and found that the arithmetic mean
weight was 16.05 ounces. At the 5% level of significance, can we conclude that the
mean weight is greater than 16 ounces? Determine the p-value.

Lin66360_ch10_318-352.indd 348 1/16/17 9:53 PM

Data Analytics
The goal of the Data Analytics sec-
tions is to develop analytical skills.
The exercises present a real world
context with supporting data. The data
sets are printed in Appendix A and
available to download from the text’s
website www.mhhe.com/Lind17e. Statistical
software is required to analyze the data
and respond to the exercises. Each data
set is used to explore questions and dis-
cover findings that relate to a real world
context. For each business context, a
story is uncovered as students progress
from chapters one to seventeen.

244 CHAPTER 7

68. In establishing warranties on HDTVs, the manufacturer wants to set the limits so that few
will need repair at the manufacturer’s expense. On the other hand, the warranty period
must be long enough to make the purchase attractive to the buyer. For a new HDTV, the
mean number of months until repairs are needed is 36.84 with a standard deviation of
3.34 months. Where should the warranty limits be set so that only 10% of the HDTVs
need repairs at the manufacturer’s expense?

69. DeKorte Tele-Marketing Inc. is considering purchasing a machine that randomly selects
and automatically dials telephone numbers. DeKorte Tele-Marketing makes most of its
calls during the evening, so calls to business phones are wasted. The manufacturer of
the machine claims that its programming reduces the calling to business phones to 15%
of all calls. To test this claim, the director of purchasing at DeKorte programmed the
machine to select a sample of 150 phone numbers. What is the likelihood that more
than 30 of the phone numbers selected are those of businesses, assuming the manu-
facturer’s claim is correct?

70. A carbon monoxide detector in the Wheelock household activates once every 200 days
on average. Assume this activation follows the exponential distribution. What is the
probability that:
a. There will be an alarm within the next 60 days?
b. At least 400 days will pass before the next alarm?
c. It will be between 150 and 250 days until the next warning?
d. Find the median time until the next activation.

71. “Boot time” (the time between the appearance of the Bios screen to the first file that is
loaded in Windows) on Eric Mouser’s personal computer follows an exponential distribu-
tion with a mean of 27 seconds. What is the probability his “boot” will require:
a. Less than 15 seconds?
b. More than 60 seconds?
c. Between 30 and 45 seconds?
d. What is the point below which only 10% of the boots occur?

72. The time between visits to a U.S. emergency room for a member of the general popula-
tion follows an exponential distribution with a mean of 2.5 years. What proportion of the
population:
a. Will visit an emergency room within the next 6 months?
b. Will not visit the ER over the next 6 years?
c. Will visit an ER next year, but not this year?
d. Find the first and third quartiles of this distribution.

73. The times between failures on a personal computer follow an exponential distribution
with a mean of 300,000 hours. What is the probability of:
a. A failure in less than 100,000 hours?
b. No failure in the next 500,000 hours?
c. The next failure occurring between 200,000 and 350,000 hours?
d. What are the mean and standard deviation of the time between failures?

D A T A A N A L Y T I C S

(The data for these exercises are available at the text website: www.mhhe.com/lind17e.)

74. Refer to the North Valley Real Estate data, which report information on homes sold
during the last year.
a. The mean selling price (in $ thousands) of the homes was computed earlier to be $357.0,

with a standard deviation of $160.7. Use the normal distribution to estimate the percent-
age of homes selling for more than $500.000. Compare this to the actual results. Is price
normally distributed? Try another test. If price is normally distributed, how many homes
should have a price greater than the mean? Compare this to the actual number of homes.
Construct a frequency distribution of price. What do you observe?

b. The mean days on the market is 30 with a standard deviation of 10 days. Use
the normal distribution to estimate the number of homes on the market more than
24 days. Compare this to the actual results. Try another test. If days on the market
is normally distributed, how many homes should be on the market more than the
mean number of days? Compare this to the actual number of homes. Does the normal

Lin66360_ch07_209-249.indd 244 1/14/17 8:29 AM

Software Commands
Software examples using Excel, Mega-
Stat®, and Minitab are included through-
out the text. The explanations of the
computer input commands are placed at
the end of the text in Appendix C.

780

11–2. The Minitab commands for the two-sample t-test on page 368
are:

a. Put the amount absorbed by the Store brand in C1 and the
amount absorbed by the Name brand paper towel in C2.

b. From the toolbar, select Stat, Basic Statistics, and then
2-Sample, and click OK.

c. In the next dialog box, select Samples in different col-
umns, select C1 Store for the First column and C2 Name of
the Second, click the box next to Assume equal variances,
and click OK.

11–3. The Excel commands for the paired t-test on page 373 are:
a. Enter the data into columns B and C (or any other two col-

umns) in the spreadsheet, with the variable names in the
first row.

b. Select the Data tab on the top menu. Then, on the far right,
select Data Analysis. Select t-Test: Paired Two Sample for
Means, and then click OK.

c. In the dialog box, indicate that the range of Variable 1 is
from B1 to B11 and Variable 2 from C1 to C11, the
Hypothesized Mean Difference is 0, click Labels, Alpha is
.05, and the Output Range is E1. Click OK.

CHAPTER 12
12–1. The Excel commands for the test of variances on page 391 are:
a. Enter the data for U.S. 25 in column A and for I-75 in col-

umn B. Label the two columns.
b. Select the Data tab on the top menu. Then, on the far right,

select Data Analysis. Select F-Test: Two-Sample for
Variances, then click OK.

c. The range of the first variable is A1:A8, and B1:B9 for the
second. Click on Labels, enter 0.05 for Alpha, select D1 for
the Output Range, and click OK.

12–2. The Excel commands for the one-way ANOVA on page 400 are:
a. Key in data into four columns labeled Northern, WTA, Po-

cono, and Branson.
b. Select the Data tab on the top menu. Then, on the far right,

select Data Analysis. Select ANOVA: Single Factor, then
click OK.

c. In the subsequent dialog box, make the input range A1:D8,
click on Grouped by Columns, click on Labels in first row,
the Alpha text box is 0.05, and finally select Output Range
as F1 and click OK.

c. In the dialog box, indicate that the range of Variable 1 is
from A1 to A6 and Variable 2 from B1 to B7, the Hypothe-
sized Mean Difference is 0, click Labels, Alpha is 0.05,
and the Output Range is D1. Click OK.

Lin66360_appc_774-784.indd 780 1/20/17 10:28 AM

xi

Answers to Self-Review
The worked-out solutions to the Self-Reviews are pro-
vided at the end of the text in Appendix E.

11

16–7 a.
Rank

x y x y d d 2

805 23 5.5 1 4.5 20.25
777 62 3.0 9 −6.0 36.00
820 60 8.5 8 0.5 0.25
682 40 1.0 4 −3.0 9.00
777 70 3.0 10 −7.0 49.00
810 28 7.0 2 5.0 25.00
805 30 5.5 3 2.5 6.25
840 42 10.0 5 5.0 25.00
777 55 3.0 7 −4.0 16.00
820 51 8.5 6 2.5 6.25

0 193.00

rs = 1 −
6(193)

10(99)
= −.170

b. H0: ρ = 0; H1: ρ ≠ 0. Reject H0 if t < −2.306 or t > 2.306.

t = −.170√
10 − 2

1 − (−0.170)2
= −0.488

H0 is not rejected. We have not shown a relationship
between the two tests.

CHAPTER 17
17–1 1.

Country Amount Index (Based=US)
China 822.7 932.8
Japan 110.7 125.5
United States 88.2 100.0
India 86.5 98.1
Russia 71.5 81.1

China Produced 832.8% more steel than the US

2. a.

Year Average Hourly Earnings Index (1995 = Base)
1995 11.65 100.0
2000 14.02 120.3
2005 16.13 138.5
2013 19.97 171.4
2016 21.37 183.4

2016 Average wage Increased 83.4% from 1995

b.

Year Average Hourly Earnings Index (1995 – 2000 = Base)
1995 11.65 90.8
2000 14.02 109.2
2005 16.13 125.7
2013 19.97 155.6
2016 21.37 166.5

2016 Average wage Increased 86.5% from the average of 1995, 2000

17–2 1. a. P1 = ($85/$75)(100) = 113.3
P2 = ($45/$40)(100) = 112.5
P = (113.3 + 112.5)/2 = 112.9
b. P = ($130/$115)(100) = 113.0

c. P =
$85(500) + $45(1,200)
$75(500) + $40(1,200)

(100)

=
$96,500
85,500

(100) = 112.9

d. P =
$85(520) + $45(1,300)
$75(520) + $40(1,300)

(100)

=
$102,700

$91,000
(100) = 112.9

e. P = √(112.9) (112.9) = 112.9

17–3 a. P =
$4(9,000) + $5(200) + $8(5,000)
$3(10,000) + $1(600) + $10(3,000)

(100)

=
$77,000
60,600

(100) = 127.1

b. The value of sales went up 27.1% from 2001 to 2017

17–4 a.
For 2011

Item Weight

Cotton ($0.25/$0.20)(100)(.10) = 12.50
Autos (1,200/1,000)(100)(.30) = 36.00
Money turnover (90/80)(100)(.60) = 67.50
Total 116.00

For 2016

Item Weight

Cotton ($0.50/$0.20)(100)(.10) = 25.00
Autos (900/1,000)(100)(.30) = 27.00
Money turnover (75/80)(100)(.60) = 56.25
Total 108.25

b. Business activity increased 16% from 2004 to 2009. It
increased 8.25% from 2004 to 2014.

17–5 In terms of the base period, Jon’s salary was $14,637 in 2000
and $17,944 in 2016. This indicates that take-home pay in-
creased at a faster rate than the rate of prices paid for food,
transportation, etc.

17–6 $0.42, round by ($1.00/238.132)(100). The purchasing power
has declined by $0.58.

17–7
Year IPI PPI

2007 111.07 92.9
2008 107.12 100.2
2009 94.80 95.3
2010 100.00 100.0
2011 102.93 107.8
2012 105.80 110.1
2013 107.83 110.5
2014 110.98 111.5
2015 111.32 105.8

The Industrial Production index (IPI)
increased 11.32% from 2010 to 2015. The
Producer Price Index (PPI) increases 5.8%.

CHAPTER 18
18–1

Year Number Produced Moving Average

2011 2
2012 6 4
2013 4 5
2014 5 4
2015 3 6
2016 10

Lin66360_appe_01-13.indd 11 1/11/17 8:22 AM

BY SECTION

Section Reviews
After selected groups of chapters
(1–4, 5–7, 8 and 9, 10–12, 13 and
14, 15 and 16, and 17 and 18), a
Section Review is included. Much
like a review before an exam, these
include a brief overview of the chap-
ters and problems for review.

126 A REVIEW OF CHAPTERS 1–4

D A T A A N A L Y T I C S

44. Refer to the North Valley real estate data recorded on homes sold during the last
year. Prepare a report on the selling prices of the homes based on the answers to the
following questions.
a. Compute the minimum, maximum, median, and the first and the third quartiles of

price. Create a box plot. Comment on the distribution of home prices.
b. Develop a scatter diagram with price on the vertical axis and the size of the home on

the horizontal. Is there a relationship between these variables? Is the relationship
direct or indirect?

c. For homes without a pool, develop a scatter diagram with price on the vertical axis
and the size of the home on the horizontal. Do the same for homes with a pool. How
do the relationships between price and size for homes without a pool and homes
with a pool compare?

45. Refer to the Baseball 2016 data that report information on the 30 Major League
Baseball teams for the 2016 season.
a. In the data set, the year opened, is the first year of operation for that stadium. For

each team, use this variable to create a new variable, stadium age, by subtracting
the value of the variable, year opened, from the current year. Develop a box plot
with the new variable, age. Are there any outliers? If so, which of the stadiums are
outliers?

b. Using the variable, salary, create a box plot. Are there any outliers? Compute the
quartiles using formula (4–1). Write a brief summary of your analysis.

c. Draw a scatter diagram with the variable, wins, on the vertical axis and salary on the
horizontal axis. What are your conclusions?

d. Using the variable, wins, draw a dot plot. What can you conclude from this plot?
46. Refer to the Lincolnville School District bus data.

a. Referring to the maintenance cost variable, develop a box plot. What are the mini-
mum, first quartile, median, third quartile, and maximum values? Are there any
outliers?

b. Using the median maintenance cost, develop a contingency table with bus manufac-
turer as one variable and whether the maintenance cost was above or below the
median as the other variable. What are your conclusions?

A REVIEW OF CHAPTERS 1–4
This section is a review of the major concepts and terms introduced in Chapters 1–4. Chapter 1 began by describing the
meaning and purpose of statistics. Next we described the different types of variables and the four levels of measurement.
Chapter 2 was concerned with describing a set of observations by organizing it into a frequency distribution and then
portraying the frequency distribution as a histogram or a frequency polygon. Chapter 3 began by describing measures of
location, such as the mean, weighted mean, median, geometric mean, and mode. This chapter also included measures of
dispersion, or spread. Discussed in this section were the range, variance, and standard deviation. Chapter 4 included
several graphing techniques such as dot plots, box plots, and scatter diagrams. We also discussed the coefficient of skew-
ness, which reports the lack of symmetry in a set of data.

Throughout this section we stressed the importance of statistical software, such as Excel and Minitab. Many computer
outputs in these chapters demonstrated how quickly and effectively a large data set can be organized into a frequency
distribution, several of the measures of location or measures of variation calculated, and the information presented in
graphical form.

Lin66360_ch04_094-131.indd 126 1/10/17 7:41 PM

Cases
The review also includes continuing
cases and several small cases that let
students make decisions using tools
and techniques from a variety of
chapters.

5. Refer to the following diagram.

0 40 80 120 160 200

* *

a. What is the graph called?
b. What are the median, and first and third quartile values?
c. Is the distribution positively skewed? Tell how you know.
d. Are there any outliers? If yes, estimate these values.
e. Can you determine the number of observations in the study?

A REVIEW OF CHAPTERS 1–4 129

C A S E S

A. Century National Bank
The following case will appear in subsequent review sec-
tions. Assume that you work in the Planning Department of
the Century National Bank and report to Ms. Lamberg. You
will need to do some data analysis and prepare a short writ-
ten report. Remember, Mr. Selig is the president of the bank,
so you will want to ensure that your report is complete and
accurate. A copy of the data appears in Appendix A.6.
Century National Bank has offices in several cities in
the Midwest and the southeastern part of the United
States. Mr. Dan Selig, president and CEO, would like to
know the characteristics of his checking account custom-
ers. What is the balance of a typical customer?
How many other bank services do the checking ac-
count customers use? Do the customers use the ATM ser-
vice and, if so, how often? What about debit cards? Who
uses them, and how often are they used?
To better understand the customers, Mr. Selig asked
Ms. Wendy Lamberg, director of planning, to select a sam-
ple of customers and prepare a report. To begin, she has
appointed a team from her staff. You are the head of the
team and responsible for preparing the report. You select a
random sample of 60 customers. In addition to the balance
in each account at the end of last month, you determine
(1) the number of ATM (automatic teller machine) transac-
tions in the last month; (2) the number of other bank ser-
vices (a savings account, a certificate of deposit, etc.) the
customer uses; (3) whether the customer has a debit card
(this is a bank service in which charges are made directly to
the customer’s account); and (4) whether or not interest is
paid on the checking account. The sample includes cus-
tomers from the branches in Cincinnati, Ohio; Atlanta,
Georgia; Louisville, Kentucky; and Erie, Pennsylvania.

1. Develop a graph or table that portrays the checking
balances. What is the balance of a typical customer?
Do many customers have more than $2,000 in their
accounts? Does it appear that there is a difference in
the distribution of the accounts among the four
branches? Around what value do the account bal-
ances tend to cluster?

2. Determine the mean and median of the checking ac-
count balances. Compare the mean and the median
balances for the four branches. Is there a difference
among the branches? Be sure to explain the difference
between the mean and the median in your report.

3. Determine the range and the standard deviation of
the checking account balances. What do the first and
third quartiles show? Determine the coefficient of
skewness and indicate what it shows. Because
Mr. Selig does not deal with statistics daily, include a
brief description and interpretation of the standard
deviation and other measures.

B. Wildcat Plumbing Supply Inc.:
Do We Have Gender Differences?

Wildcat Plumbing Supply has served the plumbing
needs of Southwest Arizona for more than 40 years.
The company was founded by Mr. Terrence St. Julian
and is run today by his son Cory. The company has
grown from a handful of employees to more than 500
today. Cory is concerned about several positions within
the company where he has men and women doing es-
sentially the same job but at different pay. To investi-
gate, he collected the information below. Suppose you
are a student intern in the Accounting Department and
have been given the task to write a report summarizing
the situation.

Yearly Salary ($000) Women Men

Less than 30 2 0
30 up to 40 3 1
40 up to 50 17 4
50 up to 60 17 24
60 up to 70 8 21
70 up to 80 3 7
80 or more 0 3

To kick off the project, Mr. Cory St. Julian held a meeting
with his staff and you were invited. At this meeting, it was
suggested that you calculate several measures of

Lin66360_ch04_094-131.indd 129 1/10/17 7:41 PM

Practice Test
The Practice Test is intended to
give students an idea of content
that might appear on a test and
how the test might be structured.
The Practice Test includes both
objective questions and problems
covering the material studied in
the section.

130 A REVIEW OF CHAPTERS 1–4

location, create charts or draw graphs such as a cumula-
tive frequency distribution, and determine the quartiles
for both men and women. Develop the charts and write
the report summarizing the yearly salaries of employees
at Wildcat Plumbing Supply. Does it appear that there are
pay differences based on gender?

C. Kimble Products: Is There a Difference
In the Commissions?

At the January national sales meeting, the CEO of Kimble
Products was questioned extensively regarding the com-
pany policy for paying commissions to its sales represen-
tatives. The company sells sporting goods to two major

markets. There are 40 sales representatives who call di-
rectly on large-volume customers, such as the athletic de-
partments at major colleges and universities and
professional sports franchises. There are 30 sales repre-
sentatives who represent the company to retail stores lo-
cated in shopping malls and large discounters such as
Kmart and Target.
Upon his return to corporate headquarters, the CEO
asked the sales manager for a report comparing the com-
missions earned last year by the two parts of the sales
team. The information is reported below. Write a brief re-
port. Would you conclude that there is a difference? Be
sure to include information in the report on both the cen-
tral tendency and dispersion of the two groups.

Commissions Earned by Sales Representatives
Calling on Large Retailers ($)

1,116 681 1,294 12 754 1,206 1,448 870 944 1,255
1,213 1,291 719 934 1,313 1,083 899 850 886 1,556
886 1,315 1,858 1,262 1,338 1,066 807 1,244 758 918

Commissions Earned by Sales Representatives
Calling on Athletic Departments ($)

354 87 1,676 1,187 69 3,202 680 39 1,683 1,106
883 3,140 299 2,197 175 159 1,105 434 615 149
1,168 278 579 7 357 252 1,602 2,321 4 392
416 427 1,738 526 13 1,604 249 557 635 527

P R A C T I C E T E S T

There is a practice test at the end of each review section. The tests are in two parts. The first part contains several objec-
tive questions, usually in a fill-in-the-blank format. The second part is problems. In most cases, it should take 30 to 45
minutes to complete the test. The problems require a calculator. Check the answers in the Answer Section in the back of
the book.

Part 1—Objective
1. The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in

making effective decisions is called . 1.
2. Methods of organizing, summarizing, and presenting data in an informative way are

called . 2.
3. The entire set of individuals or objects of interest or the measurements obtained from all

individuals or objects of interest are called the . 3.
4. List the two types of variables. 4.
5. The number of bedrooms in a house is an example of a . (discrete variable,

continuous variable, qualitative variable—pick one) 5.
6. The jersey numbers of Major League Baseball players are an example of what level of

measurement? 6.
7. The classification of students by eye color is an example of what level of measurement? 7.
8. The sum of the differences between each value and the mean is always equal to what value? 8.
9. A set of data contained 70 observations. How many classes would the 2k method suggest to

construct a frequency distribution? 9.
10. What percent of the values in a data set are always larger than the median? 10.
11. The square of the standard deviation is the . 11.
12. The standard deviation assumes a negative value when . (all the values are negative,

at least half the values are negative, or never—pick one.) 12.
13. Which of the following is least affected by an outlier? (mean, median, or range—pick one) 13.

Part 2—Problems
1. The Russell 2000 index of stock prices increased by the following amounts over the last 3 years.

18% 4% 2%

What is the geometric mean increase for the 3 years?

Lin66360_ch04_094-131.indd 130 1/10/17 7:41 PM

Required=Results

®

McGraw-Hill Connect®
Learn Without Limits
Connect is a teaching and learning platform
that is proven to deliver better results for
students and instructors.

Connect empowers students by continually
adapting to deliver precisely what they
need, when they need it, and how they need
it, so your class time is more engaging and
effective.

Mobile

Connect Insight®
Connect Insight is Connect’s new one-of-a-
kind visual analytics dashboard—now available
for both instructors and students—that
provides at-a-glance information regarding
student performance, which is immediately
actionable. By presenting assignment, assessment, and
topical performance results together with a time metric
that is easily visible for aggregate or individual results,
Connect Insight gives the user the ability to take a just-in-time
approach to teaching and learning, which was never before
available. Connect Insight presents data that empowers
students and helps instructors improve class performance in a
way that is efficient and effective.

73% of instructors who use
Connect require it; instructor
satisfaction increases by 28%

when Connect is required.

Students can view
their results for any

Connect course.

Analytics

Connect’s new, intuitive mobile interface gives students
and instructors flexible and convenient, anytime–anywhere
access to all components of the Connect platform.

©Getty Images/iStockphoto

Using Connect improves retention
rates by 19.8%, passing rates by
12.7%, and exam scores by 9.1%.

SmartBook®
Proven to help students improve grades and
study more efficiently, SmartBook contains the
same content within the print book, but actively
tailors that content to the needs of the individual.
SmartBook’s adaptive technology provides precise,
personalized instruction on what the student
should do next, guiding the student to master
and remember key concepts, targeting gaps in
knowledge and offering customized feedback,
and driving the student toward comprehension
and retention of the subject matter. Available on
tablets, SmartBook puts learning at the student’s
fingertips—anywhere, anytime.

Adaptive

Over 8 billion questions have been
answered, making McGraw-Hill

Education products more intelligent,
reliable, and precise.

THE ADAPTIVE
READING EXPERIENCE
DESIGNED TO TRANSFORM
THE WAY STUDENTS READ

More students earn A’s and
B’s when they use McGraw-Hill
Education Adaptive products.

www.mheducation.com

INSTRUCTOR LIBRARY
The Connect® Business Statistics Instructor Library is your repository for additional resources to improve student
engagement in and out of class. You can select and use any asset that enhances your lecture, including:

• Solutions Manual The Solutions Manual, carefully revised by the authors, contains solutions to all basic, inter-
mediate, and challenge problems found at the end of each chapter.

• Test Bank The Test Bank, revised by Wendy Bailey of Troy University, contains hundreds of true/false, multiple
choice and short-answer/discussions, updated based on the revisions of the authors. The level of difficulty
varies, as indicated by the easy, medium, and difficult labels.

• Powerpoint Presentations Prepared by Stephanie Campbell of Mineral Area College, the presentations con-
tain exhibits, tables, key points, and summaries in a visually stimulating collection of slides.

• Excel Templates There are templates for various end of chapter problems that have been set as Excel
spreadsheets—all denoted by an icon. Students can easily download, save the files and use the data to solve
end of chapter problems.

MEGASTAT® FOR MICROSOFT EXCEL®
MegaStat® by J. B. Orris of Butler University is a full-featured Excel statistical analysis add-in that is available on the
MegaStat website at www.mhhe.com/megastat (for purchase). MegaStat works with recent versions of Microsoft Excel®
(Windows and Mac OS X). See the website for details on supported versions.

Once installed, MegaStat will always be available on the Excel add-ins ribbon with no expiration date or data limita-
tions. MegaStat performs statistical analyses within an Excel workbook. When a MegaStat menu item is selected, a
dialog box pops up for data selection and options. Since MegaStat is an easy-to-use extension of Excel, students
can focus on learning statistics without being distracted by the software. Ease-of-use features include Auto Expand
for quick data selection and Auto Label detect.

MegaStat does most calculations found in introductory statistics textbooks, such as computing descriptive statistics,
creating frequency distributions, and computing probabilities as well as hypothesis testing, ANOVA, chi-square
analysis, and regression analysis (simple and multiple). MegaStat output is carefully formatted and appended to an
output worksheet.

Video tutorials are included that provide a walkthrough using MegaStat for typical business statistics topics. A con-
text-sensitive help system is built into MegaStat and a User’s Guide is included in PDF format.

MINITAB®/SPSS®/JMP®
Minitab® Version 17, SPSS® Student Version 18.0, and
JMP® Student Edition Version 8 are software products
that are available to help students solve the exercises
with data files. Each software product can be packaged
with any McGraw-Hill business statistics text.

ADDITIONAL RESOURCES

xiv

xv

ACKNOWLEDGMENTS

Stefan Ruediger
Arizona State University
Anthony Clark
St. Louis Community College
Umair Khalil
West Virginia University
Leonie Stone
SUNY Geneseo

Golnaz Taghvatalab
Central Michigan University
John Yarber
Northeast Mississippi Community
College
John Beyers
University of Maryland

Mohammad Kazemi
University of North Carolina
Charlotte
Anna Terzyan
Loyola Marymount University
Lee O. Cannell
El Paso Community College

This edition of Statistical Techniques in Business and Economics is the product of many people: students, colleagues, reviewers, and
the staff at McGraw-Hill Education. We thank them all. We wish to express our sincere gratitude to the reviewers:

Their suggestions and thorough reviews of the previous edition and the manuscript of this edi-
tion make this a better text.

Special thanks go to a number of people. Shelly Moore, College of Western Idaho, and John
Arcaro, Lakeland Community College, accuracy checked the Connect exercises. Ed Pappanastos, Troy
University, built new data sets and revised Smartbook. Rene Ordonez, Southern Oregon University,
built the Connect guided examples. Wendy Bailey, Tory University, prepared the test bank. Stephanie
Campbell, Mineral Area College, prepared the Powerpoint decks. Vickie Fry, Westmoreland County
Community College, provided countless hours of digital accuracy checking and support.

We also wish to thank the staff at McGraw-Hill. This includes Dolly Womack, Senior Brand Man-
ager; Michele Janicek, Product Developer Coordinator; Camille Corum and Ryan McAndrews, Product
Developers; Harvey Yep and Bruce Gin, Content Project Managers; and others we do not know per-
sonally, but who have made valuable contributions.

xvi CONTENTS

xvi

ENHANCEMENTS TO STATISTICAL TECHNIQUES
IN BUSINESS & ECONOMICS, 17E

MAJOR CHANGES MADE TO INDIVIDUAL
CHAPTERS:

CHAPTER 1 What Is Statistics?
• Revised Self-Review 1-2.

• New Section describing Business Analytics and its integration
with the text.

• Updated exercises 2, 3, 17, and 19.

• New Data Analytics section with new data and questions.

CHAPTER 2 Describing Data: Frequency Tables,
Frequency Distributions, and Graphic Presentation
• Revised chapter introduction.

• Added more explanation about cumulative relative frequency
distributions.

• Updated exercises 47 and 48 using real data.

• New Data Analytics section with new data and questions.

CHAPTER 3 Describing Data:
Numerical Measures
• Updated Self-Review 3-2.

• Updated Exercises 16, 18, 73, 77, and 82.

• New Data Analytics section with new data and questions.

CHAPTER 4 Describing Data: Displaying and
Exploring Data
• Updated exercise 22 with 2016 New York Yankee player

salaries.

• New Data Analytics section with new data and questions.

CHAPTER 5 A Survey of Probability Concepts
• Revised the Example/Solution in the section on Bayes

Theorem.

• Updated exercises 45 and 58 using real data.

• New Data Analytics section with new data and questions.

CHAPTER 6 Discrete Probability Distributions
• Expanded discussion of random variables.

• Revised the Example/Solution in the section on Poisson
distribution.

• Updated exercises 18, 58, and 68.

• New Data Analytics section with new data and questions.

CHAPTER 7 Continuous Probability Distributions
• Revised Self-Review 7-1.

• Revised the Example/Solutions using Uber as the context.

• Updated exercises 19, 22, 28, 36, 47, and 64.

• New Data Analytics section with new data and questions.

CHAPTER 8 Sampling Methods and the Central
Limit Theorem
• New Data Analytics section with new data and questions.

CHAPTER 9 Estimation and Confidence Intervals
• New Self-Review 9-3 problem description.

• Updated exercises 5, 6, 12, 14, 23, 24, 33, 41, 43, and 61.

• New Data Analytics section with new data and questions.

CHAPTER 10 One-Sample Tests
of Hypothesis
• Revised the Example/Solutions using an airport, cell phone

parking lot as the context.

• Revised the section on Type II error to include an additional
example.

• New Type II error exercises, 23 and 24.

• Updated exercises 19, 31, 32, and 43.

• New Data Analytics section with new data and questions.

CHAPTER 11 Two-Sample Tests
of Hypothesis
• Updated exercises 5, 9, 12, 26, 27, 30, 32, 34, 40, 42,

and 46.

• New Data Analytics section with new data and questions.

CHAPTER 12 Analysis of Variance
• Revised Self-Reviews 12-1 and 12-3.

• Updated exercises 10, 21, 24, 33, 38, 42, and 44.

• New Data Analytics section with new data and questions.

CHAPTER 13 Correlation and Linear Regression
• Added new conceptual formula, to relate the standard error

to the regression ANOVA table.

• Updated exercises 36, 41, 42, 43, and 57.

• New Data Analytics section with new data and questions.

CHAPTER 14 Multiple Regression Analysis
• Updated exercises 19, 21, 23, 24, and 25.

• New Data Analytics section with new data and questions.

CHAPTER 15 Nonparametric Methods: Nominal
Level Hypothesis Tests
• Updated the context of Manelli Perfume Company Example/

Solution.

• Revised the “Hypothesis Test of Unequal Expected Frequen-
cies” Example/Solution.

• Updated exercises 3, 31, 42, 46, and 61.

• New Data Analytics section with new data and questions.

xvii

CHAPTER 16 Nonparametric Methods: Analysis of
Ordinal Data
• Revised the “Sign Test” Example/Solution.

• Revised the “Testing a Hypothesis About a Median” Example/
Solution.

• Revised the “Wilcoxon Rank-Sum Test for Independent Popu-
lations” Example/Solution.

• Revised Self-Reviews 16-3 and 16-6.

• Updated exercise 25.

• New Data Analytics section with new data and questions.

CHAPTER 17 Index Numbers
• Revised Self-Reviews 17-1, 17-2, 17-3, 17-4, 17-5, 17-6, 17-7.

• Updated dates, illustrations, and examples.

• New Data Analytics section with new data and questions.

CHAPTER 18 Time Series and Forecasting
• Updated dates, illustrations, and examples.

• New Data Analytics section with new data and questions.

CHAPTER 19 Statistical Process Control and
Quality Management
• Updated 2016 Malcolm Baldridge National Quality Award

winners.

• Updated exercises 13, 22, and 25.

xix

B R I E F C O N T E N T S

1 What is Statistics? 1
2 Describing Data: Frequency Tables, Frequency Distributions,

and Graphic Presentation 18

3 Describing Data: Numerical Measures 51
4 Describing Data: Displaying and Exploring Data 94 Review Section

5 A Survey of Probability Concepts 132
6 Discrete Probability Distributions 175
7 Continuous Probability Distributions 209 Review Section
8 Sampling Methods and the Central Limit Theorem 250
9 Estimation and Confidence Intervals 282 Review Section
10 One-Sample Tests of Hypothesis 318
11 Two-Sample Tests of Hypothesis 353
12 Analysis of Variance 386 Review Section
13 Correlation and Linear Regression 436
14 Multiple Regression Analysis 488 Review Section
15 Nonparametric Methods:

Nominal Level Hypothesis Tests 545

16 Nonparametric Methods:
Analysis of Ordinal Data 582 Review Section

17 Index Numbers 621
18 Time Series and Forecasting 653 Review Section
19 Statistical Process Control and Quality Management 697
20 An Introduction to Decision Theory 728

Appendixes:
Data Sets, Tables, Software Commands, Answers 745

Glossary 847

Index 851

xx

C O N T E N T S

1 What is Statistics? 1
Introduction 2

Why Study Statistics? 2

What is Meant by Statistics? 3

Types of Statistics 4

Descriptive Statistics 4
Inferential Statistics 5

Types of Variables 6

Levels of Measurement 7

Nominal-Level Data 7
Ordinal-Level Data 8
Interval-Level Data 9
Ratio-Level Data 10

EXERCISES 11

Ethics and Statistics 12

Basic Business Analytics 12

Chapter Summary 13

Chapter Exercises 14

Data Analytics 17

2 Describing Data:
FREQUENCY TABLES, FREQUENCY

DISTRIBUTIONS, AND GRAPHIC
PRESENTATION 18
Introduction 19

Constructing Frequency Tables 19

Relative Class Frequencies 20

Graphic Presentation
of Qualitative Data 21

EXERCISES 25

Constructing Frequency
Distributions 26

Relative Frequency Distribution 30

EXERCISES 31

Graphic Presentation of a Distribution 32

Histogram 32
Frequency Polygon 35

EXERCISES 37

Cumulative Distributions 38

EXERCISES 41

Chapter Summary 42

Chapter Exercises 43

Data Analytics 49

3 Describing Data:
NUMERICAL MEASURES 51

Introduction 52

Measures of Location 52

The Population Mean 53
The Sample Mean 54
Properties of the Arithmetic
Mean 55

EXERCISES 56

The Median 57
The Mode 59

EXERCISES 61

The Relative Positions of the Mean,
Median, and Mode 62

EXERCISES 63

Software Solution 64

The Weighted Mean 65

EXERCISES 66

The Geometric Mean 66

EXERCISES 68

Why Study Dispersion? 69

Range 70
Variance 71

EXERCISES 73

Population Variance 74
Population Standard Deviation 76

EXERCISES 76

Sample Variance and Standard
Deviation 77
Software Solution 78

EXERCISES 79

Interpretation and Uses of the Standard
Deviation 79

Chebyshev’s Theorem 79
The Empirical Rule 80

A Note from the Authors vi

CONTENTS xxi

EXERCISES 81

The Mean and Standard Deviation
of Grouped Data 82

Arithmetic Mean of Grouped Data 82
Standard Deviation of Grouped Data 83

EXERCISES 85

Ethics and Reporting Results 86

Chapter Summary 86

Pronunciation Key 88

Chapter Exercises 88

Data Analytics 92

4 Describing Data:
DISPLAYING AND EXPLORING DATA 94

Introduction 95

Dot Plots 95

Stem-and-Leaf Displays 96

EXERCISES 101

Measures of Position 103

Quartiles, Deciles, and Percentiles 103

EXERCISES 106

Box Plots 107

EXERCISES 109

Skewness 110

EXERCISES 113

Describing the Relationship between
Two Variables 114

Contingency Tables 116

EXERCISES 118

Chapter Summary 119

Pronunciation Key 120

Chapter Exercises 120

Data Analytics 126

Problems 127

Cases 129

Practice Test 130

5 A Survey of Probability
Concepts 132
Introduction 133

What is a Probability? 134

Approaches to Assigning Probabilities 136

Classical Probability 136
Empirical Probability 137
Subjective Probability 139

EXERCISES 140

Rules of Addition for Computing
Probabilities 141

Special Rule of Addition 141
Complement Rule 143
The General Rule of Addition 144

EXERCISES 146

Rules of Multiplication
to Calculate Probability 147

Special Rule of Multiplication 147
General Rule of Multiplication 148

Contingency Tables 150

Tree Diagrams 153

EXERCISES 155

Bayes’ Theorem 157

EXERCISES 161

Principles of Counting 161

The Multiplication Formula 161
The Permutation Formula 163
The Combination Formula 164

EXERCISES 166

Chapter Summary 167

Pronunciation Key 168

Chapter Exercises 168

Data Analytics 173

6 Discrete Probability
Distributions 175
Introduction 176

What is a Probability Distribution? 176

Random Variables 178

Discrete Random Variable 179
Continuous Random Variable 179

The Mean, Variance, and Standard Deviation of a
Discrete Probability Distribution 180

Mean 180
Variance and Standard Deviation 180

EXERCISES 182

Binomial Probability Distribution 184

How Is a Binomial Probability
Computed? 185
Binomial Probability Tables 187

EXERCISES 190

Cumulative Binomial Probability
Distributions 191

EXERCISES 193

Hypergeometric Probability Distribution 193

xxii CONTENTS

EXERCISES 197

Poisson Probability Distribution 197

EXERCISES 202

Chapter Summary 202

Chapter Exercises 203

Data Analytics 208

7 Continuous Probability
Distributions 209
Introduction 210

The Family of Uniform Probability
Distributions 210

EXERCISES 213

The Family of Normal Probability Distributions 214

The Standard Normal Probability
Distribution 217

Applications of the Standard Normal
Distribution 218
The Empirical Rule 218

EXERCISES 220

Finding Areas under the Normal Curve 221

EXERCISES 224

EXERCISES 226

EXERCISES 229

The Normal Approximation
to the Binomial 229

Continuity Correction Factor 230
How to Apply the Correction Factor 232

EXERCISES 233

The Family of Exponential Distributions 234

EXERCISES 238

Chapter Summary 239

Chapter Exercises 240

Data Analytics 244

Problems 246

Cases 247

Practice Test 248

8 Sampling Methods and the
Central Limit Theorem 250
Introduction 251

Sampling Methods 251

Reasons to Sample 251
Simple Random Sampling 252
Systematic Random Sampling 255
Stratified Random Sampling 255
Cluster Sampling 256

EXERCISES 257

Sampling “Error” 259

Sampling Distribution of the Sample Mean 261

EXERCISES 264

The Central Limit Theorem 265

EXERCISES 271

Using the Sampling Distribution of the
Sample Mean 273

EXERCISES 275

Chapter Summary 275

Pronunciation Key 276

Chapter Exercises 276

Data Analytics 281

9 Estimation and Confidence
Intervals 282
Introduction 283

Point Estimate for a Population Mean 283

Confidence Intervals for a Population Mean 284

Population Standard Deviation, Known σ 284
A Computer Simulation 289

EXERCISES 291

Population Standard Deviation, σ Unknown 292
EXERCISES 299

A Confidence Interval for a Population
Proportion 300

EXERCISES 303

Choosing an Appropriate Sample Size 303

Sample Size to Estimate a Population Mean 304
Sample Size to Estimate a Population
Proportion 305

EXERCISES 307

Finite-Population Correction Factor 307

EXERCISES 309

Chapter Summary 310

Chapter Exercises 311

Data Analytics 315

Problems 316

Cases 317

Practice Test 317

10 One-Sample Tests
of Hypothesis 318
Introduction 319

What is Hypothesis Testing? 319

CONTENTS xxiii

Six-Step Procedure for Testing
a Hypothesis 320

Step 1: State the Null Hypothesis (H0) and the
Alternate Hypothesis (H1) 320
Step 2: Select a Level of Significance 321
Step 3: Select the Test Statistic 323
Step 4: Formulate the Decision Rule 323
Step 5: Make a Decision 324
Step 6: Interpret the Result 324

One-Tailed and Two-Tailed Hypothesis Tests 325

Hypothesis Testing for a Population Mean: Known
Population Standard Deviation 327

A Two-Tailed Test 327
A One-Tailed Test 330

p-Value in Hypothesis Testing 331

EXERCISES 333

Hypothesis Testing for a Population Mean:
Population Standard Deviation Unknown 334

EXERCISES 339

A Statistical Software Solution 340

EXERCISES 342

Type II Error 343

EXERCISES 346

Chapter Summary 347

Pronunciation Key 348

Chapter Exercises 348

Data Analytics 352

11 Two-Sample Tests
of Hypothesis 353
Introduction 354

Two-Sample Tests of Hypothesis: Independent
Samples 354

EXERCISES 359

Comparing Population Means with Unknown
Population Standard Deviations 360

Two-Sample Pooled Test 360

EXERCISES 364

Unequal Population Standard
Deviations 366

EXERCISES 369

Two-Sample Tests of Hypothesis:
Dependent Samples 370

Comparing Dependent
and Independent Samples 373

EXERCISES 375

Chapter Summary 377

Pronunciation Key 378

Chapter Exercises 378

Data Analytics 385

12 Analysis of Variance 386
Introduction 387

Comparing Two Population Variances 387

The F Distribution 387
Testing a Hypothesis of Equal Population
Variances 388

EXERCISES 391

ANOVA: Analysis of Variance 392

ANOVA Assumptions 392
The ANOVA Test 394

EXERCISES 401

Inferences about Pairs of Treatment
Means 402

EXERCISES 404

Two-Way Analysis of Variance 406

EXERCISES 411

Two-Way ANOVA with Interaction 412

Interaction Plots 412
Testing for Interaction 413
Hypothesis Tests for Interaction 415

EXERCISES 417

Chapter Summary 418

Pronunciation Key 420

Chapter Exercises 420

Data Analytics 429

Problems 431

Cases 433

Practice Test 434

13 Correlation and
Linear Regression 436
Introduction 437

What is Correlation Analysis? 437

The Correlation Coefficient 440

EXERCISES 445

Testing the Significance of the Correlation
Coefficient 447

EXERCISES 450

Regression Analysis 451

Least Squares Principle 451
Drawing the Regression Line 454

EXERCISES 457

Testing the Significance of the Slope 459

xxiv CONTENTS

EXERCISES 461

Evaluating a Regression Equation’s
Ability to Predict 462

The Standard Error of Estimate 462
The Coefficient of Determination 463

EXERCISES 464

Relationships among the Correlation
Coefficient, the Coefficient of
Determination, and the Standard
Error of Estimate 464

EXERCISES 466

Interval Estimates of Prediction 467

Assumptions Underlying Linear
Regression 467
Constructing Confidence and Prediction
Intervals 468

EXERCISES 471

Transforming Data 471

EXERCISES 474

Chapter Summary 475

Pronunciation Key 477

Chapter Exercises 477

Data Analytics 487

14 Multiple Regression
Analysis 488
Introduction 489

Multiple Regression Analysis 489

EXERCISES 493

Evaluating a Multiple Regression Equation 495

The ANOVA Table 495
Multiple Standard Error of Estimate 496
Coefficient of Multiple Determination 497
Adjusted Coefficient of Determination 498

EXERCISES 499

Inferences in Multiple Linear
Regression 499

Global Test: Testing the Multiple
Regression Model 500
Evaluating Individual Regression
Coefficients 502

EXERCISES 505

Evaluating the Assumptions of Multiple
Regression 506

Linear Relationship 507
Variation in Residuals Same for Large
and Small ŷ Values 508
Distribution of Residuals 509
Multicollinearity 509
Independent Observations 511

Qualitative Independent Variables 512

Regression Models with Interaction 515

Stepwise Regression 517

EXERCISES 519

Review of Multiple Regression 521

Chapter Summary 527

Pronunciation Key 528

Chapter Exercises 529

Data Analytics 539

Problems 541

Cases 542

Practice Test 543

15 Nonparametric Methods:
NOMINAL LEVEL HYPOTHESIS TESTS 545

Introduction 546

Test a Hypothesis of a Population
Proportion 546

EXERCISES 549

Two-Sample Tests about Proportions 550

EXERCISES 554

Goodness-of-Fit Tests: Comparing Observed and
Expected Frequency Distributions 555

Hypothesis Test of Equal Expected
Frequencies 555

EXERCISES 560

Hypothesis Test of Unequal Expected
Frequencies 562

Limitations of Chi-Square 563

EXERCISES 565

Testing the Hypothesis That a Distribution is
Normal 566

EXERCISES 569

Contingency Table Analysis 570

EXERCISES 573

Chapter Summary 574

Pronunciation Key 575

Chapter Exercises 576

Data Analytics 581

16 Nonparametric Methods:
ANALYSIS OF ORDINAL DATA 582

Introduction 583

The Sign Test 583

CONTENTS xxv

EXERCISES 587

Using the Normal Approximation to the
Binomial 588

EXERCISES 590

Testing a Hypothesis About a Median 590

EXERCISES 592

Wilcoxon Signed-Rank Test for Dependent
Populations 592

EXERCISES 596

Wilcoxon Rank-Sum Test for Independent
Populations 597

EXERCISES 601

Kruskal-Wallis Test: Analysis of Variance by
Ranks 601

EXERCISES 605

Rank-Order Correlation 607

Testing the Significance of rs 609

EXERCISES 610

Chapter Summary 612

Pronunciation Key 613

Chapter Exercises 613

Data Analytics 616

Problems 618

Cases 619

Practice Test 619

17 Index Numbers 621
Introduction 622

Simple Index Numbers 622

Why Convert Data to Indexes? 625
Construction of Index Numbers 625

EXERCISES 627

Unweighted Indexes 628

Simple Average of the Price Indexes 628
Simple Aggregate Index 629

Weighted Indexes 629

Laspeyres Price Index 629
Paasche Price Index 631
Fisher’s Ideal Index 632

EXERCISES 633

Value Index 634

EXERCISES 635

Special-Purpose Indexes 636

Consumer Price Index 637
Producer Price Index 638
Dow Jones Industrial Average (DJIA) 638

EXERCISES 640

Consumer Price Index 640

Special Uses of the Consumer Price Index 641
Shifting the Base 644

EXERCISES 646

Chapter Summary 647

Chapter Exercises 648

Data Analytics 652

18 Time Series and
Forecasting 653
Introduction 654

Components of a Time Series 654

Secular Trend 654
Cyclical Variation 655
Seasonal Variation 656
Irregular Variation 656

A Moving Average 657

Weighted Moving Average 660

EXERCISES 663

Linear Trend 663

Least Squares Method 665

EXERCISES 667

Nonlinear Trends 668

EXERCISES 669

Seasonal Variation 670

Determining a Seasonal Index 671

EXERCISES 676

Deseasonalizing Data 677

Using Deseasonalized Data to Forecast 678

EXERCISES 680

The Durbin-Watson Statistic 680

EXERCISES 686

Chapter Summary 686

Chapter Exercises 686

Data Analytics 693

Problems 695

Practice Test 696

19 Statistical Process
Control and Quality
Management 697
Introduction 698

A Brief History of Quality Control 698

Six Sigma 700

xxvi CONTENTS

Sources of Variation 701

Diagnostic Charts 702

Pareto Charts 702
Fishbone Diagrams 704

EXERCISES 705

Purpose and Types of Quality Control Charts 705

Control Charts for Variables 706
Range Charts 709

In-Control and Out-of-Control Situations 711

EXERCISES 712

Attribute Control Charts 713

p-Charts 713
c-Bar Charts 716

EXERCISES 718

Acceptance Sampling 719

EXERCISES 722

Chapter Summary 722

Pronunciation Key 723

Chapter Exercises 724

20 An Introduction to
Decision Theory 728
Introduction 729

Elements of a Decision 729

Decision Making Under Conditions
of Uncertainty 730

Payoff Table 730
Expected Payoff 731

EXERCISES 732

Opportunity Loss 733

EXERCISES 734

Expected Opportunity Loss 734

EXERCISES 735

Maximin, Maximax, and
Minimax Regret Strategies 735

Value of Perfect Information 736

Sensitivity Analysis 737

EXERCISES 738

Decision Trees 739

Chapter Summary 740

Chapter Exercises 741

APPENDIXES 745

Appendix A: Data Sets 746

Appendix B: Tables 756

Appendix C: Software Commands 774

Appendix D: Answers to Odd-Numbered
Chapter Exercises 785

Review Exercises 829

Solutions to Practice Tests 831

Appendix E: Answers to Self-Review 834

Glossary 847

Index 851

What is Statistics? 1

BEST BUY sells Fitbit wearable technology products that track a person’s physical
activity and sleep quality. The Fitbit technology collects daily information on a person’s
number of steps so that a person can track calories consumed. The information can be
synced with a cell phone and displayed with a Fitbit app. Assume you know the daily
number of Fitbit Flex 2 units sold last month at the Best Buy store in Collegeville,
Pennsylvania. Describe a situation where the number of units sold is considered a
sample. Illustrate a second situation where the number of units sold is considered a
population. (See Exercise 11 and LO1-3.)

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO1-1 Explain why knowledge of statistics is important.

LO1-2 Define statistics and provide an example of how statistics is applied.

LO1-3 Differentiate between descriptive and inferential statistics.

LO1-4 Classify variables as qualitative or quantitative, and discrete or continuous.

LO1-5 Distinguish between nominal, ordinal, interval, and ratio levels of measurement.

LO1-6 List the values associated with the practice of statistics.

© Kelvin Wong/Shutterstock.com

2 CHAPTER 1

INTRODUCTION
Suppose you work for a large company and your supervisor asks you to decide if a new
version of a smartphone should be produced and sold. You start by thinking about the
product’s innovations and new features. Then, you stop and realize the consequences
of the decision. The product will need to make a profit so the pricing and the costs of
production and distribution are all very important. The decision to introduce the product
is based on many alternatives. So how will you know? Where do you start?

Without a long experience in the industry, beginning to develop an intelligence that
will make you an expert is essential. You select three other people to work with and meet
with them. The conversation focuses on what you need to know and what information
and data you need. In your meeting, many questions are asked. How many competitors
are already in the market? How are smartphones priced? What design features do com-
petitors’ products have? What features does the market require? What do customers
want in a smartphone? What do customers like about the existing products? The answers
will be based on business intelligence consisting of data and information collected
through customer surveys, engineering analysis, and market research. In the end, your
presentation to support your decision regarding the introduction of a new smartphone is
based on the statistics that you use to summarize and organize your data, the statistics
that you use to compare the new product to existing products, and the statistics to esti-
mate future sales, costs, and revenues. The statistics will be the focus of the conversa-
tion that you will have with your supervisor about this very important decision.

As a decision maker, you will need to acquire and analyze data to support your
decisions. The purpose of this text is to develop your knowledge of basic statistical
techniques and methods and how to apply them to develop the business and personal
intelligence that will help you make decisions.

WHY STUDY STATISTICS?
If you look through your university catalogue, you will find that statistics is required for
many college programs. As you investigate a future career in accounting, economics,

human resources, finance, business analytics, or other business area, you
also will discover that statistics is required as part of these college pro-
grams. So why is statistics a requirement in so many disciplines?

A major driver of the requirement for statistics knowledge is the tech-
nologies available for capturing data. Examples include the technology
that Google uses to track how Internet users access websites. As people
use Google to search the Internet, Google records every search and then
uses these data to sort and prioritize the results for future Internet
searches. One recent estimate indicates that Google processes 20,000
terabytes of information per day. Big-box retailers like Target, Walmart,
Kroger, and others scan every purchase and use the data to manage the
distribution of products, to make decisions about marketing and sales,
and to track daily and even hourly sales. Police departments collect and
use data to provide city residents with maps that communicate informa-
tion about crimes committed and their location. Every organization is col-
lecting and using data to develop knowledge and intelligence that will
help people make informed decisions, and to track the implementation of
their decisions. The graphic to the left shows the amount of data gener-
ated every minute (www.domo.com). A good working knowledge of sta-
tistics is useful for summarizing and organizing data to provide information
that is useful and supportive of decision making. Statistics is used to make
valid comparisons and to predict the outcomes of decisions.

In summary, there are at least three reasons for studying statistics:
(1) data are collected everywhere and require statistical knowledge to

LO1-1
Explain why knowledge
of statistics is important.

© Gregor Schuster/Getty Images RF

WHAT IS STATISTICS? 3

make the information useful, (2) statistical techniques are used to make professional
and personal decisions, and (3) no matter what your career, you will need a knowl-
edge of statistics to understand the world and to be conversant in your career. An
understanding of statistics and statistical method will help you make more effective
personal and professional decisions.

WHAT IS MEANT BY STATISTICS?
This question can be rephrased in two, subtly different ways: what are statistics and
what is statistics? To answer the first question, a statistic is a number used to communi-
cate a piece of information. Examples of statistics are:

• The inflation rate is 2%.
• Your grade point average is 3.5.
• The price of a new Tesla Model S sedan is $79,570.

Each of these statistics is a numerical fact and communicates a very limited piece of in-
formation that is not very useful by itself. However, if we recognize that each of these
statistics is part of a larger discussion, then the question “what is statistics” is applicable.
Statistics is the set of knowledge and skills used to organize, summarize, and analyze
data. The results of statistical analysis will start interesting conversations in the search
for knowledge and intelligence that will help us make decisions. For example:

• The inflation rate for the calendar year was 0.7%. By applying statistics we could
compare this year’s inflation rate to the past observations of inflation. Is it higher,
lower, or about the same? Is there a trend of increasing or decreasing inflation? Is
there a relationship between interest rates and government bonds?

• Your grade point average (GPA) is 3.5. By collecting data and applying statistics,
you can determine the required GPA to be admitted to the Master of Business
Administration program at the University of Chicago, Harvard, or the University of
Michigan. You can determine the likelihood that you would be admitted to a partic-
ular program. You may be interested in interviewing for a management position
with Procter & Gamble. What GPA does Procter & Gamble require for college grad-
uates with a bachelor’s degree? Is there a range of acceptable GPAs?

• You are budgeting for a new car. You would like to own an electric car with a small
carbon footprint. The price for the Tesla Model S Sedan is $79,570. By collecting
additional data and applying statistics, you can analyze the alternatives. For exam-
ple, another choice is a hybrid car that runs on both gas and electricity such as a
2015 Toyota Prius. It can be purchased for about $28,659. Another hybrid, the
Chevrolet Volt, costs $33,995. What are the differences in the cars’ specifications?
What additional information can be collected and summarized so that you can
make a good purchase decision?

Another example of using statistics to provide information to evaluate decisions is the
distribution and market share of Frito-Lay products. Data are collected on each of the
Frito-Lay product lines. These data include the market share and the pounds of product
sold. Statistics is used to present this information in a bar chart in Chart 1–1. It clearly
shows Frito-Lay’s dominance in the potato, corn, and tortilla chip markets. It also shows
the absolute measure of pounds of each product line consumed in the United States.

These examples show that statistics is more than the presentation of numerical in-
formation. Statistics is about collecting and processing information to create a conversa-
tion, to stimulate additional questions, and to provide a basis for making decisions.
Specifically, we define statistics as:

LO1-2
Define statistics and
provide an example of
how statistics is applied.

STATISTICS IN ACTION

A feature of our textbook is
called Statistics in Action.
Read each one carefully to
get an appreciation of the
wide application of statis-
tics in management,
economics, nursing, law
enforcement, sports, and
other disciplines.
• In 2015, Forbes pub-

lished a list of the rich-
est Americans. William
Gates, founder of
Microsoft Corporation,
is the richest. His net
worth is estimated at
$76.0 billion. (www
.forbes.com)

• In 2015, the four largest
privately owned American
companies, ranked by
revenue, were Cargill,
Koch Industries, Dell,
and Albertsons. (www
.forbes.com)

• In the United States, a
typical high school grad-
uate earns $668 per
week, a typical college
graduate with a bache-
lor’s degree earns
$1,101 per week, and a
typical college graduate
with a master’s degree
earns $1,326 per week.
(www.bls.gov/emp/
ep_chart_001.htm)

STATISTICS The science of collecting, organizing, presenting, analyzing, and
interpreting data to assist in making more effective decisions.

4 CHAPTER 1

In this book, you will learn the basic techniques and applications of statistics that
you can use to support your decisions, both personal and professional. To start, we will
differentiate between descriptive and inferential statistics.

TYPES OF STATISTICS
When we use statistics to generate information for decision making from data, we use
either descriptive statistics or inferential statistics. Their application depends on the
questions asked and the type of data available.

Descriptive Statistics
Masses of unorganized data—such as the census of population, the weekly earnings of
thousands of computer programmers, and the individual responses of 2,000 registered
voters regarding their choice for president of the United States—are of little value as is.
However, descriptive statistics can be used to organize data into a meaningful form. We
define descriptive statistics as:

LO1-3
Differentiate between
descriptive and inferential
statistics.

DESCRIPTIVE STATISTICS Methods of organizing, summarizing, and presenting
data in an informative way.

The following are examples that apply descriptive statistics to summarize a large
amount of data and provide information that is easy to understand.

• There are a total of 46,837 miles of interstate highways in the United States. The
interstate system represents only 1% of the nation’s total roads but carries more
than 20% of the traffic. The longest is I-90, which stretches from Boston to Seattle,
a distance of 3,099 miles. The shortest is I-878 in New York City, which is 0.70 mile
in length. Alaska does not have any interstate highways, Texas has the most inter-
state miles at 3,232, and New York has the most interstate routes with 28.

• The average person spent $133.91 on traditional Valentine’s Day merchandise in
2014. This is an increase of $2.94 from 2013. As in previous years, men spent
more than twice the amount women spent on the holiday. The average man spent
$108.38 to impress the people in his life while women only spent $48.41.

Statistical methods and techniques to generate descriptive statistics are presented
in Chapters 2 and 4. These include organizing and summarizing data with frequency
distributions and presenting frequency distributions with charts and graphs. In addition,
statistical measures to summarize the characteristics of a distribution are discussed in
Chapter 3.

Frito-Lay

Rest of Industry

0 100 200 300 400

Millions of Pounds

500 600 700 800

Potato Chips

Tortilla Chips

Pretzels

Extruded Snacks

Corn Chips

64%

75%

26%

56%

82%

CHART 1–1 Frito-Lay Volume and Share of Major Snack Chip Categories in U.S. Supermarkets

WHAT IS STATISTICS? 5

Inferential Statistics
Sometimes we must make decisions based on a limited set of data. For example, we
would like to know the operating characteristics, such as fuel efficiency measured by
miles per gallon, of sport utility vehicles (SUVs) currently in use. If we spent a lot of time,
money, and effort, all the owners of SUVs could be surveyed. In this case, our goal
would be to survey the population of SUV owners.

POPULATION The entire set of individuals or objects of interest or the
measurements obtained from all individuals or objects of interest.

INFERENTIAL STATISTICS The methods used to estimate a property of a
population on the basis of a sample.

SAMPLE A portion, or part, of the population of interest.

However, based on inferential statistics, we can survey a limited number of SUV owners
and collect a sample from the population.

Samples often are used to obtain reliable estimates of population parameters. (Sam-
pling is discussed in Chapter 8.) In the process, we make trade-offs between the time,
money, and effort to collect the data and the error of estimating a population parameter.
The process of sampling SUVs is illustrated in the following graphic. In this example, we
would like to know the mean or average SUV fuel efficiency. To estimate the mean of the
population, six SUVs are sampled and the mean of their MPG is calculated.

Population
All items

Sample
Items selected

from the population

So, the sample of six SUVs represents evidence from the population that we use to
reach an inference or conclusion about the average MPG for all SUVs. The process of
sampling from a population with the objective of estimating properties of a population is
called inferential statistics.

STATISTICS IN ACTION

Where did statistics get its
start? In 1662 John Graunt
published an article called
“Natural and Political Obser-
vations Made upon Bills of
Mortality.” The author’s
“observations” were the re-
sult of a study and analysis
of a weekly church publica-
tion called “Bill of Mortality,”
which listed births, christen-
ings, and deaths and their
causes. Graunt realized that
the Bills of Mortality repre-
sented only a fraction of all
births and deaths in London.
However, he used the data
to reach broad conclusions
or inferences about the im-
pact of disease, such as the
plague, on the general
population. His logic is an
example of statistical
inference. His analysis and
interpretation of the data
are thought to mark the
start of statistics.

6 CHAPTER 1

Inferential statistics is widely applied to learn something about a population in busi-
ness, agriculture, politics, and government, as shown in the following examples:

• Television networks constantly monitor the popularity of their programs by hiring
Nielsen and other organizations to sample the preferences of TV viewers. For example,
9.0% of a sample of households with TVs watched The Big Bang Theory during the
week of November 2, 2015 (www.nielsen.com). These program ratings are used to
make decisions about advertising rates and whether to continue or cancel a program.

• In 2015, a sample of U.S. Internal Revenue Service tax preparation volunteers were
tested with three standard tax returns. The sample indicated that tax returns were
completed with a 49% accuracy rate. In other words there were errors on about half
of the returns. In this example, the statistics are used to make decisions about how
to improve the accuracy rate by correcting the most common errors and improving
the training of volunteers.

A feature of our text is self-review problems. There are a number of them inter-
spersed throughout each chapter. The first self-review follows. Each self-review tests
your comprehension of preceding material. The answer and method of solution are
given in Appendix E. You can find the answer to the following self-review in 1–1 in
Appendix E. We recommend that you solve each one and then check your answer.

The answers are in Appendix E.

The Atlanta-based advertising firm Brandon and Associates asked a sample of 1,960 con-
sumers to try a newly developed chicken dinner by Boston Market. Of the 1,960 sampled,
1,176 said they would purchase the dinner if it is marketed.
(a) Is this an example of descriptive statistics or inferential statistics? Explain.
(b) What could Brandon and Associates report to Boston Market regarding acceptance of

the chicken dinner in the population?

TYPES OF VARIABLES
There are two basic types of variables: (1) qualitative and (2) quantitative (see Chart 1–2).
When an object or individual is observed and recorded as a nonnumeric characteristic, it
is a qualitative variable or an attribute. Examples of qualitative variables are gender, bev-
erage preference, type of vehicle owned, state of birth, and eye color. When a variable is
qualitative, we usually count the number of observations for each category and determine

LO1-4
Classify variables as
qualitative or quantitative,
and discrete or continuous.

S E L F - R E V I E W 1–1

Types of Variables

Qualitative Quantitative

ContinuousDiscrete

• Brand of PC
• Marital status
• Hair color

• Children in a family
• Strokes on a golf hole
• TV sets owned

• Amount of income
tax paid
• Weight of a student
• Yearly rainfall in
Tampa, FL

CHART 1–2 Summary of the Types of Variables

WHAT IS STATISTICS? 7

what percent are in each category. For example, if we observe the variable eye color,
what percent of the population has blue eyes and what percent has brown eyes? If the
variable is type of vehicle, what percent of the total number of cars sold last month were
SUVs? Qualitative variables are often summarized in charts and bar graphs (Chapter 2).

When a variable can be reported numerically, it is called a quantitative variable.
Examples of quantitative variables are the balance in your checking account, the num-
ber of gigabytes of data used on your cell phone plan last month, the life of a car battery
(such as 42 months), and the number of people employed by a company.

Quantitative variables are either discrete or continuous. Discrete variables can as-
sume only certain values, and there are “gaps” between the values. Examples of dis-
crete variables are the number of bedrooms in a house (1, 2, 3, 4, etc.), the number of
cars arriving at Exit 25 on I-4 in Florida near Walt Disney World in an hour (326, 421,
etc.), and the number of students in each section of a statistics course (25 in section A,
42 in section B, and 18 in section C). We count, for example, the number of cars arriving
at Exit 25 on I-4, and we count the number of statistics students in each section. Notice
that a home can have 3 or 4 bedrooms, but it cannot have 3.56 bedrooms. Thus, there
is a “gap” between possible values. Typically, discrete variables are counted.

Observations of a continuous variable can assume any value within a specific range.
Examples of continuous variables are the air pressure in a tire and the weight of a shipment
of tomatoes. Other examples are the ounces of raisins in a box of raisin bran cereal and the
duration of flights from Orlando to San Diego. Grade point average (GPA) is a continuous
variable. We could report the GPA of a particular student as 3.2576952. The usual practice
is to round to 3 places—3.258. Typically, continuous variables result from measuring.

LEVELS OF MEASUREMENT
Data can be classified according to levels of measurement. The level of measurement
determines how data should be summarized and presented. It also will indicate the type
of statistical analysis that can be performed. Here are two examples of the relationship
between measurement and how we apply statistics. There are six colors of candies in a
bag of M&Ms. Suppose we assign brown a value of 1, yellow 2, blue 3, orange 4, green

5, and red 6. What kind of variable is the color of an M&M? It is a qualita-
tive variable. Suppose someone summarizes M&M color by adding the
assigned color values, divides the sum by the number of M&Ms, and re-
ports that the mean color is 3.56. How do we interpret this statistic? You
are correct in concluding that it has no meaning as a measure of M&M
color. As a qualitative variable, we can only report the count and per-
centage of each color in a bag of M&Ms. As a second example, in a high
school track meet there are eight competitors in the 400-meter run. We
report the order of finish and that the mean finish is 4.5. What does the
mean finish tell us? Nothing! In both of these instances, we have not
used the appropriate statistics for the level of measurement.

There are four levels of measurement: nominal, ordinal, interval, and ratio. The low-
est, or the most primitive, measurement is the nominal level. The highest is the ratio
level of measurement.

Nominal-Level Data
For the nominal level of measurement, observations of a qualitative variable are mea-
sured and recorded as labels or names. The labels or names can only be classified and
counted. There is no particular order to the labels.

LO1-5
Distinguish between
nominal, ordinal, interval,
and ratio levels of
measurement.

© Ron Buskirk/Alamy Stock Photo

NOMINAL LEVEL OF MEASUREMENT Data recorded at the nominal level of
measurement is represented as labels or names. They have no order. They can
only be classified and counted.

8 CHAPTER 1

The classification of the six colors of M&M milk chocolate candies is an example of
the nominal level of measurement. We simply classify the candies by color. There is no
natural order. That is, we could report the brown candies first, the orange first, or any of
the other colors first. Recording the variable gender is another example of the nominal
level of measurement. Suppose we count the number of students entering a football
game with a student ID and report how many are men and how many are women. We
could report either the men or the women first. For the data measured at the nominal
level, we are limited to counting the number in each category of the variable. Often, we
convert these counts to percentages. For example, a random sample of M&M candies
reports the following percentages for each color:

Color Percent in a bag

Blue 24%
Green 20%
Orange 16%
Yellow 14%
Red 13%
Brown 13%

To process the data for a variable measured at the nominal level, we often numer-
ically code the labels or names. For example, if we are interested in measuring the
home state for students at East Carolina University, we would assign a student’s home
state of Alabama a code of 1, Alaska a code of 2, Arizona a 3, and so on. Using this
procedure with an alphabetical listing of states, Wisconsin is coded 49 and Wyoming
50. Realize that the number assigned to each state is still a label or name. The reason
we assign numerical codes is to facilitate counting the number of students from each
state with statistical software. Note that assigning numbers to the states does not give
us license to manipulate the codes as numerical information. Specifically, in this exam-
ple, 1 + 2 = 3 corresponds to Alabama + Alaska = Arizona. Clearly, the nominal level
of measurement does not permit any mathematical operation that has any valid
interpretation.

Ordinal-Level Data
The next higher level of measurement is the ordinal level. For this level of measure-
ment a qualitative variable or attribute is either ranked or rated on a relative scale.

ORDINAL LEVEL OF MEASUREMENT Data recorded at the ordinal level of
measurement is based on a relative ranking or rating of items based on a defined
attribute or qualitative variable. Variables based on this level of measurement are
only ranked or counted.

For example, many businesses make decisions about where to locate their facil-
ities; in other words, where is the best place for their business? Business Facilities
(www.businessfacilities.com) publishes a list of the top 10 states for the “best
business climate.” The 2016 rankings are shown to the left. They are based on the
evaluation of many different factors, including the cost of labor, business tax climate,
quality of life, transportation infrastructure, educated workforce, and economic
growth potential.

This is an example of an ordinal scale because the states are ranked in order of
best to worst business climate. That is, we know the relative order of the states based

Best Business Climate

1. Florida
2. Utah
3. Texas
4. Georgia
5. Indiana
6. Tennessee
7. Nebraska
8. North Carolina
9. Virginia
10. Washington

WHAT IS STATISTICS? 9

on the attribute. For example, in 2016 Florida had the best business climate and Utah
was second. Indiana was fifth, and that was better than Tennessee but not as good as
Georgia. Notice we cannot say that Floridaʼs business climate is five times better than
Indianaʼs business climate because the magnitude of the differences between the
states is not known. To put it another way, we do not know if the magnitude of the differ-
ence between Louisiana and Utah is the same as between Texas and Georgia.

Another example of the ordinal level measure is based on a scale that measures an
attribute. This type of scale is used when students rate instructors on a variety of attri-
butes. One attribute may be: “Overall, how do you rate the quality of instruction in this
class?” A student’s response is recorded on a relative scale of inferior, poor, good, ex-
cellent, and superior. An important characteristic of using a relative measurement scale
is that we cannot distinguish the magnitude of the differences between groups. We do
not know if the difference between “Superior” and “Good” is the same as the difference
between “Poor” and “Inferior.”

Table 1–1 lists the frequencies of 60 student ratings of instructional quality for Pro-
fessor James Brunner in an Introduction to Finance course. The data are summarized
based on the order of the scale used to rate the instructor. That is, they are summarized
by the number of students who indicated a rating of superior (6), good (26), and so on.
We also can convert the frequencies to percentages. About 43.3% (26/60) of the stu-
dents rated the instructor as good.

TABLE 1–1 Rating of a Finance Professor

Rating Frequency Percentage

Superior 6 10.0%
Good 26 43.3%
Average 16 26.7%
Poor 9 15.0%
Inferior 3 5.0%

Interval-Level Data
The interval level of measurement is the next highest level. It includes all the character-
istics of the ordinal level, but, in addition, the difference or interval between values is
meaningful.

INTERVAL LEVEL OF MEASUREMENT For data recorded at the interval level of
measurement, the interval or the distance between values is meaningful. The interval
level of measurement is based on a scale with a known unit of measurement.

The Fahrenheit temperature scale is an example of the interval level of measurement.
Suppose the high temperatures on three consecutive winter days in Boston are 28, 31,
and 20 degrees Fahrenheit. These temperatures can be easily ranked, but we can also
determine the interval or distance between temperatures. This is possible because 1 de-
gree Fahrenheit represents a constant unit of measurement. That is, the distance between
10 and 15 degrees Fahrenheit is 5 degrees, and is the same as the 5-degree distance
between 50 and 55 degrees Fahrenheit. It is also important to note that 0 is just a point
on the scale. It does not represent the absence of the condition. The measurement of
zero degrees Fahrenheit does not represent the absence of heat or cold. But by our own
measurement scale, it is cold! A major limitation of a variable measured at the interval
level is that we cannot make statements similar to 20 degrees Fahrenheit is twice as
warm as 10 degrees Fahrenheit.

10 CHAPTER 1

Another example of the interval scale of measurement is women’s dress sizes.
Listed below is information on several dimensions of a standard U.S. woman’s dress.

Size Bust (in) Waist (in) Hips (in)

8 32 24 35
10 34 26 37
12 36 28 39
14 38 30 41
16 40 32 43
18 42 34 45
20 44 36 47
22 46 38 49
24 48 40 51
26 50 42 53
28 52 44 55

Why is the “size” scale an interval measurement? Observe that as the size changes
by two units (say from size 10 to size 12 or from size 24 to size 26), each of the mea-
surements increases by 2 inches. To put it another way, the intervals are the same.

There is no natural zero point for dress size. A “size 0” dress does not have “zero”
material. Instead, it would have a 24-inch bust, 16-inch waist, and 27-inch hips. More-
over, the ratios are not reasonable. If you divide a size 28 by a size 14, you do not get
the same answer as dividing a size 20 by a size 10. Neither ratio is equal to two, as the
“size” number would suggest. In short, if the distances between the numbers make
sense, but the ratios do not, then you have an interval scale of measurement.

Ratio-Level Data
Almost all quantitative variables are recorded on the ratio level of measurement. The ratio
level is the “highest” level of measurement. It has all the characteristics of the interval level,
but, in addition, the 0 point and the ratio between two numbers are both meaningful.

RATIO LEVEL OF MEASUREMENT Data recorded at the ratio level of measurement
are based on a scale with a known unit of measurement and a meaningful
interpretation of zero on the scale.

Examples of the ratio scale of measurement include wages, units of production,
weight, changes in stock prices, distance between branch offices, and height. Money is
also a good illustration. If you have zero dollars, then you have no money, and a wage
of $50 per hour is two times the wage of $25 per hour. Weight also is measured at the
ratio level of measurement. If a scale is correctly calibrated, then it will read 0 when
nothing is on the scale. Further, something that weighs 1 pound is half as heavy as
something that weighs 2 pounds.

Table 1–2 illustrates the ratio scale of measurement for the variable, annual income
for four father-and-son combinations. Observe that the senior Lahey earns twice as
much as his son. In the Rho family, the son makes twice as much as the father.

Name Father Son

Lahey $80,000 $ 40,000
Nale 90,000 30,000
Rho 60,000 120,000
Steele 75,000 130,000

TABLE 1–2 Father–Son Income Combinations

WHAT IS STATISTICS? 11

Chart 1–3 summarizes the major characteristics of the various levels of measure-
ment. The level of measurement will determine the type of statistical methods that can
be used to analyze a variable. Statistical methods to analyze variables measured on a
nominal level are discussed in Chapter 15; methods for ordinal-level variables are dis-
cussed in Chapter 16. Statistical methods to analyze variables measured on an interval
or ratio level are presented in Chapters 9 through 14.

Levels of Measurement

RatioNominal Ordinal Interval

Meaningful 0 point and
ratio between values

Data may only be
classi�ed

Data are ranked Meaningful difference
between values

• Jersey numbers
of football players
• Make of car

• Your rank in class
• Team standings in
the Southeastern
Conference

• Temperature
• Dress size

• Number of patients
seen
• Number of sales
calls made
• Distance to class

CHART 1–3 Summary and Examples of the Characteristics for Levels of Measurement

(a) The mean age of people who listen to talk radio is 42.1 years. What level of measure-
ment is used to assess the variable age?

(b) In a survey of luxury-car owners, 8% of the U.S. population owned luxury cars. In
California and Georgia, 14% of people owned luxury cars. Two variables are included
in this information. What are they and how are they measured?

S E L F - R E V I E W 1–2

The answers to the odd-numbered exercises are in Appendix D.

1. What is the level of measurement for each of the following variables?
a. Student IQ ratings.
b. Distance students travel to class.
c. The jersey numbers of a sorority soccer team.
d. A student’s state of birth.
e. A student’s academic class—that is, freshman, sophomore, junior, or senior.
f. Number of hours students study per week.

2. Slate is a daily magazine on the Web. Its business activities can be described by a
number of variables. What is the level of measurement for each of the following
variables?

a. The number of hits on their website on Saturday between 8:00 am and 9:00 am.
b. The departments, such as food and drink, politics, foreign policy, sports, etc.
c. The number of weekly hits on the Sam’s Club ad.
d. The number of years each employee has been employed with Slate.

3. On the Web, go to your favorite news source and find examples of each type of
variable. Write a brief memo that lists the variables and describes them in terms of
qualitative or quantitative, discrete or continuous, and the measurement level.

E X E R C I S E S

12 CHAPTER 1

ETHICS AND STATISTICS
Following events such as Wall Street money manager Bernie Madoff’s Ponzi scheme,
which swindled billions from investors, and financial misrepresentations by Enron and
Tyco, business students need to understand that these events were based on the mis-
representation of business and financial information. In each case, people within each
organization reported financial information to investors that indicated the companies
were performing much better than the actual situation. When the true financial informa-
tion was reported, the companies were worth much less than advertised. The result was
many investors lost all or nearly all of the money they had invested.

The article “Statistics and Ethics: Some Advice for Young Statisticians,” in The American
Statistician 57, no. 1 (2003), offers guidance. The authors advise us to practice statistics
with integrity and honesty, and urge us to “do the right thing” when collecting, organizing,
summarizing, analyzing, and interpreting numerical information. The real contribution of
statistics to society is a moral one. Financial analysts need to provide information that truly
reflects a company’s performance so as not to mislead individual investors. Information
regarding product defects that may be harmful to people must be analyzed and reported
with integrity and honesty. The authors of The American Statistician article further indicate
that when we practice statistics, we need to maintain “an independent and principled
point-of-view” when analyzing and reporting findings and results.

As you progress through this text, we will highlight ethical issues in the collection,
analysis, presentation, and interpretation of statistical information. We also hope that, as
you learn about using statistics, you will become a more informed consumer of informa-
tion. For example, you will question a report based on data that do not fairly represent
the population, a report that does not include all relevant statistics, one that includes an
incorrect choice of statistical measures, or a presentation that introduces bias in a delib-
erate attempt to mislead or misrepresent.

BASIC BUSINESS ANALYTICS
A knowledge of statistics is necessary to support the increasing need for companies
and organizations to apply business analytics. Business analytics is used to process and
analyze data and information to support a story or narrative of a company’s business,
such as “what makes us profitable,” “how will our customers respond to a change in
marketing”? In addition to statistics, an ability to use computer software to summarize,
organize, analyze, and present the findings of statistical analysis is essential. In this text,
we will be using very elementary applications of business analytics using common and
available computer software. Throughout our text, we will use Microsoft Excel and, oc-
casionally, Minitab. Universities and colleges usually offer access to Microsoft Excel.
Your computer already may be packaged with Microsoft Excel. If not, the Microsoft
Office package with Excel often is sold at a reduced academic price through your uni-
versity or college. In this text, we use Excel for the majority of the applications. We also
use an Excel “Add-in” called MegaStat. If your instructor requires this package, it is avail-
able at www.mhhe.com/megastat. This add-in gives Excel the capability to produce
additional statistical reports. Occasionally, we use Minitab to illustrate an application.
See www.minitab.com for further information. Minitab also offers discounted academic
pricing. The 2016 version of Microsoft Excel supports the analyses in our text. However,

LO1-6
List the values associated
with the practice of
statistics.

4. For each of the following, determine whether the group is a sample or a population.
a. The participants in a study of a new cholesterol drug.
b. The drivers who received a speeding ticket in Kansas City last month.
c. People on welfare in Cook County (Chicago), Illinois.
d. The 30 stocks that make up the Dow Jones Industrial Average.

WHAT IS STATISTICS? 13

earlier versions of Excel for Apple Mac computers do not have the necessary add-in. If
you do not have Excel 2016 and are using an Apple Mac computer with Excel, you can
download the free, trial version of Stat Plus at www.analystsoft.com. It is a statistical
software package that will integrate with Excel for Mac computers.

The following example shows the application of Excel to perform a statistical summary.
It refers to sales information from the Applewood Auto Group, a multi-location car sales and
service company. The Applewood information has sales information for 180 vehicle sales.
Each sale is described by several variables: the age of the buyer, whether the buyer is a re-
peat customer, the location of the dealership for the sale, the type of vehicle sold, and the
profit for the sale. The following shows Excel’s summary of statistics for the variable profit.
The summary of profit shows the mean profit per vehicle was $1,843.17, the median profit
was slightly more at $1,882.50, and profit ranged from $294 to $3,292.

Throughout the text, we will motivate the use of computer software to summarize,
describe, and present information and data. The applications of Excel are supported by
instructions so that you can learn how to apply Excel to do statistical analysis. The in-
structions are presented in Appendix C of this text. These data and other data sets and
files are available on the text’s student website, www.mhhe.com/lind17e.

C H A P T E R S U M M A R Y

I. Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting
data to assist in making more effective decisions.

II. There are two types of statistics.
A. Descriptive statistics are procedures used to organize and summarize data.
B. Inferential statistics involve taking a sample from a population and making estimates

about a population based on the sample results.
1. A population is an entire set of individuals or objects of interest or the measure-

ments obtained from all individuals or objects of interest.
2. A sample is a part of the population.

III. There are two types of variables.
A. A qualitative variable is nonnumeric.

1. Usually we are interested in the number or percent of the observations in each
category.

2. Qualitative data are usually summarized in graphs and bar charts.

14 CHAPTER 1

B. There are two types of quantitative variables and they are usually reported
numerically.
1. Discrete variables can assume only certain values, and there are usually gaps be-

tween values.
2. A continuous variable can assume any value within a specified range.

IV. There are four levels of measurement.
A. With the nominal level, the data are sorted into categories with no particular order to

the categories.
B. The ordinal level of measurement presumes that one classification is ranked higher

than another.
C. The interval level of measurement has the ranking characteristic of the ordinal level of

measurement plus the characteristic that the distance between values is a constant
size.

D. The ratio level of measurement has all the characteristics of the interval level, plus
there is a 0 point and the ratio of two values is meaningful.

C H A P T E R E X E R C I S E S

5. Explain the difference between qualitative and quantitative variables. Give an example
of qualitative and quantitative variables.

6. Explain the difference between a sample and a population.
7. Explain the difference between a discrete and a continuous variable. Give an example

of each not included in the text.
8. For the following situations, would you collect information using a sample or a popula-

tion? Why?
a. Statistics 201 is a course taught at a university. Professor Rauch has taught nearly

1,500 students in the course over the past 5 years. You would like to know the aver-
age grade for the course.

b. As part of a research project, you need to report the average profit as a percent-
age of revenue for the #1-ranked corporation in the Fortune 500 for each of the
last 10 years.

c. You are looking forward to graduation and your first job as a salesperson for one of
five large pharmaceutical corporations. Planning for your interviews, you will need to
know about each company’s mission, profitability, products, and markets.

d. You are shopping for a new MP3 music player such as the Apple iPod. The manu-
facturers advertise the number of music tracks that can be stored in the memory.
Usually, the advertisers assume relatively short, popular songs to estimate the
number of tracks that can be stored. You, however, like Broadway musical tunes
and they are much longer. You would like to estimate how many Broadway tunes
will fit on your MP3 player.

9. Exits along interstate highways were formerly numbered successively from the western
or southern border of a state. However, the Department of Transportation has recently
changed most of them to agree with the numbers on the mile markers along the
highway.
a. What level of measurement were data on the consecutive exit numbers?
b. What level of measurement are data on the milepost numbers?
c. Discuss the advantages of the newer system.

10. A poll solicits a large number of college undergraduates for information on the following
variables: the name of their cell phone provider (AT&T, Verizon, and so on), the numbers
of minutes used last month (200, 400, for example), and their satisfaction with the ser-
vice (Terrible, Adequate, Excellent, and so forth). What is the level of measurement for
each of these three variables?

11. Best Buy sells Fitbit wearable technology products that track a person’s activity. For ex-
ample, the Fitbit technology collects daily information on a person’s number of steps so
that a person can track calories consumed. The information can be synced with a cell
phone and displayed with a Fitbit app. Assume you know the daily number of Fitbit Flex

WHAT IS STATISTICS? 15

2 units sold last month at the Best Buy store in Collegeville, Pennsylvania. Describe a
situation where the number of units sold is considered a sample. Illustrate a second sit-
uation where the number of units sold is considered a population.

12. Using the concepts of sample and population, describe how a presidential election is
unlike an “exit” poll of the electorate.

13. Place these variables in the following classification tables. For each table, summarize your
observations and evaluate if the results are generally true. For example, salary is reported
as a continuous quantitative variable. It is also a continuous ratio-scaled variable.
a. Salary
b. Gender
c. Sales volume of MP3 players
d. Soft drink preference
e. Temperature
f. SAT scores
g. Student rank in class
h. Rating of a finance professor
i. Number of home video screens

Discrete Variable Continuous Variable

Qualitative

Quantitative a. Salary

Discrete Continuous

Nominal

Ordinal

Interval

Ratio a. Salary

14. Using data from such publications as the Statistical Abstract of the United States,
Forbes, or any news source, give examples of variables measured with nominal, ordinal,
interval, and ratio scales.

15. The Struthers Wells Corporation employs more than 10,000 white-collar workers in its
sales offices and manufacturing facilities in the United States, Europe, and Asia. A sam-
ple of 300 U.S. workers revealed 120 would accept a transfer to a location outside the
United States. On the basis of these findings, write a brief memo to Ms. Wanda Carter,
Vice President of Human Services, regarding all white-collar workers in the firm and
their willingness to relocate.

16. AVX Home Entertainment, Inc., recently began a “no-hassles” return policy. A sample
of 500 customers who recently returned items showed 400 thought the policy was
fair, 32 thought it took too long to complete the transaction, and the rest had no opin-
ion. On the basis of this information, make an inference about customer reaction to
the new policy.

17. The Wall Street Journal’s website, www.wsj.com, reported the number of cars
and light-duty trucks sold through October of 2014 and October of 2015. The top six-
teen manufacturers are listed here. Sales data often is reported in this way to compare
current sales to last year’s sales.

16 CHAPTER 1

a. Using computer software, compare the October 2015 sales to the October 2014
sales for each manufacturer by computing the difference. Make a list of the manufac-
turers that increased sales compared to 2014; make a list of manufacturers that de-
creased sales.

b. Using computer software, compare 2014 sales to 2015 sales for each manufacturer
by computing the percentage change in sales. Make a list of the manufacturers in
order of increasing percentage changes. Which manufacturers are in the top five in
percentage change? Which manufacturers are in the bottom five in percentage
change?

c. Using computer software, first sort the data using the 2015 year-to-date sales.
Then, design a bar graph to illustrate the 2014 and 2015 year-to-date sales for the
top 12 manufacturers. Also, design a bar graph to illustrate the percentage change
for the top 12 manufacturers. Compare these two graphs and prepare brief written
comments.

18. The following chart depicts the average amounts spent by consumers on holiday gifts.

Write a brief report summarizing the amounts spent during the holidays. Be sure to in-
clude the total amount spent and the percent spent by each group.

19. The following chart depicts the earnings in billions of dollars for ExxonMobil for the pe-
riod 2003 until 2014. Write a brief report discussing the earnings at ExxonMobil during

Year-to-Date Sales

Through October Through October
Manufacturer 2015 2014

General Motors Corp. 2,562,840 2,434,707
Ford Motor Company 2,178,587 2,065,612
Toyota Motor Sales USA Inc. 2,071,446 1,975,368
Chrysler 1,814,268 1,687,313
American Honda Motor Co Inc. 1,320,217 1,281,777
Nissan North America Inc. 1,238,535 1,166,389
Hyundai Motor America 638,195 607,539
Kia Motors America Inc. 526,024 489,711
Subaru of America Inc. 480,331 418,497
Volkswagen of America Inc. 294,602 301,187
Mercedes-Benz 301,915 281,728
BMW of North America Inc. 279,395 267,193
Mazda Motor of America Inc. 267,158 259,751
Audi of America Inc. 165,103 146,133
Mitsubishi Motors N A, Inc. 80,683 64,564
Volvo 53,803 47,823

WHAT IS STATISTICS? 17

the period. Was one year higher than the others? Did the earnings increase, decrease,
or stay the same over the period?

Year Earnings ($ billions)

A B C D E F G H

2003 14.5

2004 16.7

2005 24.3

2006 26.2

2007 26.5

2008 45.2

2009 19.3

2010 30.5

2011 41.1

2012 44.9

2013 32.6

2014

1
2

3

4

5

6

7

8

9

10

11

12

13 32.5

50

40

D
o

lla
rs

(b
ill

io
ns

)

30

20

10

20
03

20
04

20
05

20
06

20
07

20
08

20
09

20
10

20
11

20
12

20
13

20
14

0

Year

ExxonMobile Annual Earnings

D A T A A N A L Y T I C S

20. Refer to the North Valley Real Estate data, which report information on homes sold
in the area last year. Consider the following variables: selling price, number of bed-
rooms, township, and mortgage type.
a. Which of the variables are qualitative and which are quantitative?
b. How is each variable measured? Determine the level of measurement for each of the

variables.
21. Refer to the Baseball 2016 data, which report information on the 30 Major League

Baseball teams for the 2016 season. Consider the following variables: number of wins,
payroll, season attendance, whether the team is in the American or National League,
and the number of home runs hit.
a. Which of these variables are quantitative and which are qualitative?
b. Determine the level of measurement for each of the variables.

22. Refer to the Lincolnville School District bus data, which report information on the
school district’s bus fleet.
a. Which of the variables are qualitative and which are quantitative?
b. Determine the level of measurement for each variable.

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO2-1 Summarize qualitative variables with frequency and relative frequency tables.

LO2-2 Display a frequency table using a bar or pie chart.

LO2-3 Summarize quantitative variables with frequency and relative frequency distributions.

LO2-4 Display a frequency distribution using a histogram or frequency polygon.

MERRILL LYNCH recently completed a study of online investment portfolios for a sample
of clients. For the 70 participants in the study, organize these data into a frequency
distribution. (See Exercise 43 and LO2-3.)

Describing Data:
FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS,

AND GRAPHIC PRESENTATION2

© rido/123RF

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 19

INTRODUCTION
The United States automobile retailing industry is highly competitive. It is dominated by
megadealerships that own and operate 50 or more franchises, employ over 10,000
people, and generate several billion dollars in annual sales. Many of the top dealerships

are publicly owned with shares traded on the New York Stock Exchange
or NASDAQ. In 2014, the largest megadealership was AutoNation (ticker
symbol AN), followed by Penske Auto Group (PAG), Group 1 Automotive,
Inc. (ticker symbol GPI), and the privately owned Van Tuyl Group.

These large corporations use statistics and analytics to summarize
and analyze data and information to support their decisions. As an ex-
ample, we will look at the Applewood Auto group. It owns four dealer-
ships and sells a wide range of vehicles. These include the popular
Korean brands Kia and Hyundai, BMW and Volvo sedans and luxury
SUVs, and a full line of Ford and Chevrolet cars and trucks.

Ms. Kathryn Ball is a member of the senior management team at
Applewood Auto Group, which has its corporate offices adjacent to Kane
Motors. She is responsible for tracking and analyzing vehicle sales and

the profitability of those vehicles. Kathryn would like to summarize the profit earned on
the vehicles sold with tables, charts, and graphs that she would review monthly. She
wants to know the profit per vehicle sold, as well as the lowest and highest amount of
profit. She is also interested in describing the demographics of the buyers. What are
their ages? How many vehicles have they previously purchased from one of the Apple-
wood dealerships? What type of vehicle did they purchase?

The Applewood Auto Group operates four dealerships:

• Tionesta Ford Lincoln sells Ford and Lincoln cars and trucks.
• Olean Automotive Inc. has the Nissan franchise as well as the General Motors

brands of Chevrolet, Cadillac, and GMC Trucks.
• Sheffield Motors Inc. sells Buick, GMC trucks, Hyundai, and Kia.
• Kane Motors offers the Chrysler, Dodge, and Jeep line as well as BMW and Volvo.

Every month, Ms. Ball collects data from each of the four dealerships
and enters them into an Excel spreadsheet. Last month the Applewood
Auto Group sold 180 vehicles at the four dealerships. A copy of the first
few observations appears to the left. The variables collected include:

• Age—the age of the buyer at the time of the purchase.
• Profit—the amount earned by the dealership on the sale of each

vehicle.
• Location—the dealership where the vehicle was purchased.
• Vehicle type—SUV, sedan, compact, hybrid, or truck.
• Previous—the number of vehicles previously purchased at any of the

four Applewood dealerships by the consumer.

The entire data set is available at the McGraw-Hill website (www.mhhe
.com/lind17e) and in Appendix A.4 at the end of the text.

© Justin Sullivan/Getty Images

CONSTRUCTING FREQUENCY TABLES
Recall from Chapter 1 that techniques used to describe a set of data are called descrip-
tive statistics. Descriptive statistics organize data to show the general pattern of the
data, to identify where values tend to concentrate, and to expose extreme or unusual
data values. The first technique we discuss is a frequency table.

LO2-1
Summarize qualitative
variables with frequency
and relative frequency
tables.

FREQUENCY TABLE A grouping of qualitative data into mutually exclusive and
collectively exhaustive classes showing the number of observations in each class.

20 CHAPTER 2

In Chapter 1, we distinguished between qualitative and quantitative variables. To
review, a qualitative variable is nonnumeric, that is, it can only be classified into distinct
categories. Examples of qualitative data include political affiliation (Republican, Demo-
crat, Independent, or other), state of birth (Alabama, . . . , Wyoming), and method of
payment for a purchase at Barnes & Noble (cash, digital wallet, debit, or credit). On the
other hand, quantitative variables are numerical in nature. Examples of quantitative data
relating to college students include the price of their textbooks, their age, and the num-
ber of credit hours they are registered for this semester.

In the Applewood Auto Group data set, there are five variables for each vehicle
sale: age of the buyer, amount of profit, dealer that made the sale, type of vehicle sold,
and number of previous purchases by the buyer. The dealer and the type of vehicle are
qualitative variables. The amount of profit, the age of the buyer, and the number of pre-
vious purchases are quantitative variables.

Suppose Ms. Ball wants to summarize last month’s sales by location. The
first step is to sort the vehicles sold last month according to their location and
then tally, or count, the number sold at each location of the four locations:
Tionesta, Olean, Sheffield, or Kane. The four locations are used to develop a
frequency table with four mutually exclusive (distinctive) classes. Mutually exclu-
sive classes means that a particular vehicle can be assigned to only one class. In
addition, the frequency table must be collectively exhaustive. That is every vehi-
cle sold last month is accounted for in the table. If every vehicle is included in the
frequency table, the table will be collectively exhaustive and the total number of
vehicles will be 180. How do we obtain these counts? Excel provides a tool
called a Pivot Table that will quickly and accurately establish the four classes and
do the counting. The Excel results follow in Table 2–1. The table shows a total of
180 vehicles and, of the 180 vehicles, 52 were sold at Kane Motors. © Steve Cole/Getty Images RF

TABLE 2–1 Frequency Table for Vehicles Sold Last Month at Applewood Auto Group by Location

Location Number of Cars

Kane 52
Olean 40
Sheffield 45
Tionesta 43

Total 180

Relative Class Frequencies
You can convert class frequencies to relative class frequencies to show the fraction of the
total number of observations in each class. A relative frequency captures the relationship
between a class frequency and the total number of observations. In the vehicle sales ex-
ample, we may want to know the percentage of total cars sold at each of the four locations.
To convert a frequency table to a relative frequency table, each of the class frequencies is
divided by the total number of observations. Again, this is easily accomplished using Excel.
The fraction of vehicles sold last month at the Kane location is 0.289, found by 52 divided
by 180. The relative frequency for each location is shown in Table 2–2.

TABLE 2–2 Relative Frequency Table of Vehicles Sold by Location Last Month at Applewood Auto Group

Location Number of Cars Relative Frequency Found by

Kane 52 .289 52/180
Olean 40 .222 40/180
Sheffield 45 .250 45/180
Tionesta 43 .239 43/180

Total 180 1.000

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 21

GRAPHIC PRESENTATION
OF QUALITATIVE DATA
The most common graphic form to present a qualitative variable is a bar chart. In most
cases, the horizontal axis shows the variable of interest. The vertical axis shows the
frequency or fraction of each of the possible outcomes. A distinguishing feature of a bar
chart is there is distance or a gap between the bars. That is, because the variable of in-
terest is qualitative, the bars are not adjacent to each other. Thus, a bar chart graphically
describes a frequency table using a series of uniformly wide rectangles, where the
height of each rectangle is the class frequency.

LO2-2
Display a frequency table
using a bar or pie chart.

BAR CHART A graph that shows qualitative classes on the horizontal axis and the
class frequencies on the vertical axis. The class frequencies are proportional to the
heights of the bars.

PIE CHART A chart that shows the proportion or percentage that each class
represents of the total number of frequencies.

We use the Applewood Auto Group data as an example (Chart 2–1). The variables
of interest are the location where the vehicle was sold and the number of vehicles sold
at each location. We label the horizontal axis with the four locations and scale the verti-
cal axis with the number sold. The variable location is of nominal scale, so the order of
the locations on the horizontal axis does not matter. In Chart 2–1, the locations are
listed alphabetically. The locations could also be in order of decreasing or increasing
frequencies.

The height of the bars, or rectangles, corresponds to the number of vehicles at
each location. There were 52 vehicles sold last month at the Kane location, so the
height of the Kane bar is 52; the height of the bar for the Olean location is 40.

Nu
m

be
r o

f V
eh

ic
le

s
So

ld

50

40

30

20

10

0
Kane Olean

Location

Shef�eld Tionesta

CHART 2–1 Number of Vehicles Sold by Location

Another useful type of chart for depicting qualitative information is a pie chart.

We explain the details of constructing a pie chart using the information in Table 2–3,
which shows the frequency and percent of cars sold by the Applewood Auto Group for
each vehicle type.

22 CHAPTER 2

The first step to develop a pie chart is to mark the percentages 0, 5, 10, 15, and so
on evenly around the circumference of a circle (see Chart 2–2). To plot the 40% of total
sales represented by sedans, draw a line from the center of the circle to 0 and another
line from the center of the circle to 40%. The area in this “slice” represents the number
of sedans sold as a percentage of the total sales. Next, add the SUV’s percentage of
total sales, 30%, to the sedan’s percentage of total sales, 40%. The result is 70%. Draw
a line from the center of the circle to 70%, so the area between 40 and 70 shows the
sales of SUVs as a percentage of total sales. Continuing, add the 15% of total sales for
compact vehicles, which gives us a total of 85%. Draw a line from the center of the circle
to 85, so the “slice” between 70% and 85% represents the number of compact vehicles
sold as a percentage of the total sales. The remaining 10% for truck sales and 5% for
hybrid sales are added to the chart using the same method.

Vehicle Type Number Sold Percent Sold

Sedan 72 40
SUV 54 30
Compact 27 15
Truck 18 10
Hybrid 9 5

Total 180 100

TABLE 2–3 Vehicle Sales by Type at Applewood Auto Group

25%

50%

70%

85%

95% 0%

40%

75%

Hybrid

Truck

Sedan

SUV

Compact

CHART 2–2 Pie Chart of Vehicles by Type

Because each slice of the pie represents the relative frequency of each vehicle
type as a percentage of the total sales, we can easily compare them:

• The largest percentage of sales is for sedans.
• Sedans and SUVs together account for 70% of vehicle sales.
• Hybrids account for 5% of vehicle sales, in spite of being on the market for only a

few years.

We can use Excel software to quickly count the number of cars for each vehicle
type and create the frequency table, bar chart, and pie chart shown in the following
summary. The Excel tool is called a Pivot Table. The instructions to produce these de-
scriptive statistics and charts are given in Appendix C.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 23

Pie and bar charts both serve to illustrate frequency and relative frequency ta-
bles. When is a pie chart preferred to a bar chart? In most cases, pie charts are used
to show and compare the relative differences in the percentage of observations for
each value or class of a qualitative variable. Bar charts are preferred when the goal is
to compare the number or frequency of observations for each value or class of a
qualitative variable. The following Example/Solution shows another application of bar
and pie charts.

E X A M P L E

SkiLodges.com is test marketing its new website and is interested in how easy its
website design is to navigate. It randomly selected 200 regular Internet users and
asked them to perform a search task on the website. Each person was asked to
rate the relative ease of navigation as poor, good, excellent, or awesome. The re-
sults are shown in the following table:

Awesome 102
Excellent 58
Good 30
Poor 10

1. What type of measurement scale is used for ease of navigation?
2. Draw a bar chart for the survey results.
3. Draw a pie chart for the survey results.

S O L U T I O N

The data are measured on an ordinal scale. That is, the scale is ranked in relative
ease of navigation when moving from “awesome” to “poor.” The interval between
each rating is unknown so it is impossible, for example, to conclude that a rating of
good is twice the value of a poor rating.

We can use a bar chart to graph the data. The vertical scale shows the
relative frequency and the horizontal scale shows the values of the ease-of-
navigation variable.

24 CHAPTER 2

A pie chart can also be used to graph these data. The pie chart emphasizes that more
than half of the respondents rate the relative ease of using the website awesome.

Re
la

tiv
e

Fr
eq

ue
nc

y
%

60

50

40

30

20

10

0
PoorGoodExcellentAwesome

Ease of Navigation of SkiLodges.com website

Ease of Navigation

Beverage Number

Cola-Plus 40
Coca-Cola 25
Pepsi 20
Lemon-Lime 15

Total 100

The answers are in Appendix E.

DeCenzo Specialty Food and Beverage Company has been serving a cola drink with
an additional flavoring, Cola-Plus, that is very popular among its customers. The company
is interested in customer preferences for Cola-Plus versus Coca-Cola, Pepsi, and a lemon-lime
beverage. They ask 100 randomly sampled customers to take a taste test and select the
beverage they prefer most. The results are shown in the following table:

S E L F - R E V I E W 2–1

Poor
5%

Ease of Navigation of SkiLodges.com website

Good
15%

Awesome
51% Excellent

29%

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 25

(a) Is the data qualitative or quantitative? Why?
(b) What is the table called? What does it show?
(c) Develop a bar chart to depict the information.
(d) Develop a pie chart using the relative frequencies.

The answers to the odd-numbered exercises are at the end of the book in Appendix D.

1. A pie chart shows the relative market share of cola products. The “slice” for Pepsi-
Cola has a central angle of 90 degrees. What is its market share?

2. In a marketing study, 100 consumers were asked to select the best digital music
player from the iPod, the iRiver, and the Magic Star MP3. To summarize the con-
sumer responses with a frequency table, how many classes would the frequency
table have?

3. A total of 1,000 residents in Minnesota were asked which season they preferred.
One hundred liked winter best, 300 liked spring, 400 liked summer, and 200 liked
fall. Develop a frequency table and a relative frequency table to summarize this
information.

4. Two thousand frequent business travelers are asked which midwestern city they
prefer: Indianapolis, Saint Louis, Chicago, or Milwaukee. One hundred liked India-
napolis best, 450 liked Saint Louis, 1,300 liked Chicago, and the remainder pre-
ferred Milwaukee. Develop a frequency table and a relative frequency table to
summarize this information.

5. Wellstone Inc. produces and markets replacement covers for cell phones in five
different colors: bright white, metallic black, magnetic lime, tangerine orange, and
fusion red. To estimate the demand for each color, the company set up a kiosk in
the Mall of America for several hours and asked randomly selected people which
cover color was their favorite. The results follow:

E X E R C I S E S

Bright white 130
Metallic black 104
Magnetic lime 325
Tangerine orange 455
Fusion red 286

a. What is the table called?
b. Draw a bar chart for the table.
c. Draw a pie chart.
d. If Wellstone Inc. plans to produce 1 million cell phone covers, how many of

each color should it produce?
6. A small business consultant is investigating the performance of several companies.

The fourth-quarter sales for last year (in thousands of dollars) for the selected com-
panies were:

Fourth-Quarter Sales
Company ($ thousands)

Hoden Building Products $ 1,645.2
J & R Printing Inc. 4,757.0
Long Bay Concrete Construction 8,913.0
Mancell Electric and Plumbing 627.1
Maxwell Heating and Air Conditioning 24,612.0
Mizelle Roofing & Sheet Metals 191.9

The consultant wants to include a chart in his report comparing the sales of the six
companies. Use a bar chart to compare the fourth-quarter sales of these corpora-
tions and write a brief report summarizing the bar chart.

26 CHAPTER 2

CONSTRUCTING FREQUENCY DISTRIBUTIONS
In Chapter 1 and earlier in this chapter, we distinguished between qualitative and quantitative
data. In the previous section, using the Applewood Automotive Group data, we summarized
two qualitative variables: the location of the sale and the type of vehicle sold. We created
frequency and relative frequency tables and depicted the results in bar and pie charts.

The Applewood Auto Group data also includes several quantitative variables: the
age of the buyer, the profit earned on the sale of the vehicle, and the number of previ-
ous purchases. Suppose Ms. Ball wants to summarize last month’s sales by profit earned
for each vehicle. We can describe profit using a frequency distribution.

LO2-3
Summarize quantitative
variables with frequency
and relative frequency
distributions.

FREQUENCY DISTRIBUTION A grouping of quantitative data into mutually exclusive
and collectively exhaustive classes showing the number of observations in each class.

How do we develop a frequency distribution? The following example shows the steps to
construct a frequency distribution. Remember, our goal is to construct tables, charts,
and graphs that will quickly summarize the data by showing the location, extreme
values, and shape of the data’s distribution.

TABLE 2–4 Profit on Vehicles Sold Last Month by the Applewood Auto Group Maximum

Minimum

$1,387 $2,148 $2,201 $ 963 $ 820 $2,230 $3,043 $2,584 $2,370
1,754 2,207 996 1,298 1,266 2,341 1,059 2,666 2,637
1,817 2,252 2,813 1,410 1,741 3,292 1,674 2,991 1,426
1,040 1,428 323 1,553 1,772 1,108 1,807 934 2,944
1,273 1,889 352 1,648 1,932 1,295 2,056 2,063 2,147
1,529 1,166 482 2,071 2,350 1,344 2,236 2,083 1,973
3,082 1,320 1,144 2,116 2,422 1,906 2,928 2,856 2,502
1,951 2,265 1,485 1,500 2,446 1,952 1,269 2,989 783
2,692 1,323 1,509 1,549 369 2,070 1,717 910 1,538
1,206 1,760 1,638 2,348 978 2,454 1,797 1,536 2,339
1,342 1,919 1,961 2,498 1,238 1,606 1,955 1,957 2,700
443 2,357 2,127 294 1,818 1,680 2,199 2,240 2,222
754 2,866 2,430 1,115 1,824 1,827 2,482 2,695 2,597
1,621 732 1,704 1,124 1,907 1,915 2,701 1,325 2,742
870 1,464 1,876 1,532 1,938 2,084 3,210 2,250 1,837
1,174 1,626 2,010 1,688 1,940 2,639 377 2,279 2,842
1,412 1,762 2,165 1,822 2,197 842 1,220 2,626 2,434
1,809 1,915 2,231 1,897 2,646 1,963 1,401 1,501 1,640
2,415 2,119 2,389 2,445 1,461 2,059 2,175 1,752 1,821
1,546 1,766 335 2,886 1,731 2,338 1,118 2,058 2,487

S O L U T I O N

To begin, we need the profits for each of the 180 vehicle sales listed in Table 2–4.
This information is called raw or ungrouped data because it is simply a listing

E X A M P L E

Ms. Kathryn Ball of the Applewood Auto Group wants to summarize the quantitative
variable profit with a frequency distribution and display the distribution with charts
and graphs. With this information, Ms. Ball can easily answer the following ques-
tions: What is the typical profit on each sale? What is the largest or maximum profit
on any sale? What is the smallest or minimum profit on any sale? Around what value
do the profits tend to cluster?

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 27

of the individual, observed profits. It is possible to search the list and find the
smallest or minimum profit ($294) and the largest or maximum profit ($3,292), but
that is about all. It is difficult to determine a typical profit or to visualize where the
profits tend to cluster. The raw data are more easily interpreted if we summarize
the data with a frequency distribution. The steps to create this frequency distribu-
tion follow.

Step 1: Decide on the number of classes. A useful recipe to determine the
number of classes (k) is the “2 to the k rule.” This guide suggests you
select the smallest number (k) for the number of classes such that 2k
(in words, 2 raised to the power of k) is greater than the number of
observations (n). In the Applewood Auto Group example, there were
180 vehicles sold. So n = 180. If we try k = 7, which means we would
use 7 classes, 27 = 128, which is less than 180. Hence, 7 is too few
classes. If we let k = 8, then 28 = 256, which is greater than 180. So the
recommended number of classes is 8.

Step 2: Determine the class interval. Generally, the class interval is the
same for all classes. The classes all taken together must cover at
least the distance from the minimum value in the data up to the max-
imum value. Expressing these words in a formula:

i ≥
Maximum Value − Minimum Value

k
where i is the class interval, and k is the number of classes.

For the Applewood Auto Group, the minimum value is $294 and
the maximum value is $3,292. If we need 8 classes, the interval
should be:

i ≥
Maximum Value − Minimum Value

k
=

$3,292 − $294
8

= $374.75

In practice, this interval size is usually rounded up to some conve-
nient number, such as a multiple of 10 or 100. The value of $400 is a
reasonable choice.

Step 3: Set the individual class limits. State clear class limits so you can
put each observation into only one category. This means you must
avoid overlapping or unclear class limits. For example, classes such
as “$1,300–$1,400” and “$1,400–$1,500” should not be used
because it is not clear whether the value $1,400 is in the first or
second class. In this text, we will generally use the format $1,300
up to $1,400 and $1,400 up to $1,500 and so on. With this format,
it is clear that $1,399 goes into the first class and $1,400 in the
second.

Because we always round the class interval up to get a conve-
nient class size, we cover a larger than necessary range. For ex-
ample, using 8 classes with an interval of $400 in the Applewood
Auto Group example results in a range of 8($400) = $3,200. The
actual range is $2,998, found by ($3,292 − $294). Comparing that
value to $3,200, we have an excess of $202. Because we need to
cover only the range (Maximum − Minimum), it is natural to put ap-
proximately equal amounts of the excess in each of the two tails.
Of course, we also should select convenient class limits. A guide-
line is to make the lower limit of the first class a multiple of the
class interval. Sometimes this is not possible, but the lower limit
should at least be rounded. So here are the classes we could use
for these data.

28 CHAPTER 2

Classes

$ 200 up to $ 600
600 up to 1,000
1,000 up to 1,400
1,400 up to 1,800
1,800 up to 2,200
2,200 up to 2,600
2,600 up to 3,000
3,000 up to 3,400

Profit Frequency

$ 200 up to $ 600 |||| |||
600 up to 1,000 |||| |||| |
1,000 up to 1,400 |||| |||| |||| |||| |||
1,400 up to 1,800 |||| |||| |||| |||| |||| |||| |||| |||
1,800 up to 2,200 |||| |||| |||| |||| |||| |||| |||| |||| ||||
2,200 up to 2,600 |||| |||| |||| |||| |||| ||
2,600 up to 3,000 |||| |||| |||| ||||
3,000 up to 3,400 ||||

Step 4: Tally the vehicle profit into the classes and determine the number of
observations in each class. To begin, the profit from the sale of the first
vehicle in Table 2–4 is $1,387. It is tallied in the $1,000 up to $1,400
class. The second profit in the first row of Table 2–4 is $2,148. It is tallied
in the $1,800 up to $2,200 class. The other profits are tallied in a similar
manner. When all the profits are tallied, the table would appear as:

The number of observations in each class is called the class
frequency. In the $200 up to $600 class there are 8 observations,
and in the $600 up to $1,000 class there are 11 observations. There-
fore, the class frequency in the first class is 8 and the class frequency
in the second class is 11. There are a total of 180 observations in the
entire set of data. So the sum of all the frequencies should be equal
to 180. The results of the frequency distribution are in Table 2–5.

Now that we have organized the data into a frequency distribution (see Table 2–5),
we can summarize the profits of the vehicles for the Applewood Auto Group.
Observe the following:

1. The profits from vehicle sales range between $200 and $3,400.
2. The vehicle profits are classified using a class interval of $400. The class inter-

val is determined by subtracting consecutive lower or upper class limits. For

TABLE 2–5 Frequency Distribution of Profit for Vehicles Sold Last Month at Applewood Auto Group

Profit Frequency

$ 200 up to $ 600 8
600 up to 1,000 11
1,000 up to 1,400 23
1,400 up to 1,800 38
1,800 up to 2,200 45
2,200 up to 2,600 32
2,600 up to 3,000 19
3,000 up to 3,400 4

Total 180

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 29

example, the lower limit of the first class is $200, and the lower limit of the
second class is $600. The difference is the class interval of $400.

3. The profits are concentrated between $1,000 and $3,000. The profit on 157
vehicles, or 87%, was within this range.

4. For each class, we can determine the typical profit or class midpoint. It is half-
way between the lower or upper limits of two consecutive classes. It is com-
puted by adding the lower or upper limits of consecutive classes and dividing
by 2. Referring to Table 2–5, the lower class limit of the first class is $200, and
the next class limit is $600. The class midpoint is $400, found by ($600 +
$200)/2. The midpoint best represents, or is typical of, the profits of the vehi-
cles in that class. Applewood sold 8 vehicles with a typical profit of $400.

5. The largest concentration, or highest frequency, of vehicles sold is in the $1,800 up
to $2,200 class. There are 45 vehicles in this class. The class midpoint is $2,000.
So we say that the typical profit in the class with the highest frequency is $2,000.

By presenting this information to Ms. Ball, we give her a clear picture of the distribu-
tion of the vehicle profits for last month.

We admit that arranging the information on profits into a frequency distribution
does result in the loss of some detailed information. That is, by organizing the data
into a frequency distribution, we cannot pinpoint the exact profit on any vehicle,
such as $1,387, $2,148, or $2,201. Further, we cannot tell that the actual minimum
profit for any vehicle sold is $294 or that the maximum profit was $3,292. However,
the lower limit of the first class and the upper limit of the last class convey essen-
tially the same meaning. Likely, Ms. Ball will make the same judgment if she knows
the smallest profit is about $200 that she will if she knows the exact profit is $292.
The advantages of summarizing the 180 profits into a more understandable and
organized form more than offset this disadvantage.

Number of Returns
Adjusted Gross Income (in thousands)

No adjusted gross income 178.2
$ 1 up to 5,000 1,204.6
5,000 up to 10,000 2,595.5
10,000 up to 15,000 3,142.0
15,000 up to 20,000 3,191.7
20,000 up to 25,000 2,501.4
25,000 up to 30,000 1,901.6
30,000 up to 40,000 2,502.3
40,000 up to 50,000 1,426.8
50,000 up to 75,000 1,476.3
75,000 up to 100,000 338.8
100,000 up to 200,000 223.3
200,000 up to 500,000 55.2
500,000 up to 1,000,000 12.0
1,000,000 up to 2,000,000 5.1
2,000,000 up to 10,000,000 3.4
10,000,000 or more 0.6

TABLE 2–6 Adjusted Gross Income for Individuals Filing Income Tax Returns

When we summarize raw data with frequency distributions, equal class intervals are pre-
ferred. However, in certain situations unequal class intervals may be necessary to avoid a
large number of classes with very small frequencies. Such is the case in Table 2–6. The
U.S. Internal Revenue Service uses unequal-sized class intervals for adjusted gross
income on individual tax returns to summarize the number of individual tax returns. If
we use our method to find equal class intervals, the 2k rule results in 25 classes, and

STATISTICS IN ACTION

In 1788, James Madison,
John Jay, and Alexander
Hamilton anonymously
published a series of essays
entitled The Federalist.
These Federalist papers
were an attempt to convince
the people of New York
that they should ratify the
Constitution. In the course
of history, the authorship
of most of these papers
became known, but 12 re-
mained contested. Through
the use of statistical analysis,
and particularly studying
the frequency distributions
of various words, we can
now conclude that James
Madison is the likely author
of the 12 papers. In fact,
the statistical evidence that
Madison is the author is
overwhelming.

30 CHAPTER 2

a class interval of $400,000, assuming $0 and $10,000,000 as the minimum and maximum
values for adjusted gross income. Using equal class intervals, the first 13 classes in Table 2–6
would be combined into one class of about 99.9% of all tax returns and 24 classes for the
0.1% of the returns with an adjusted gross income above $400,000. Using equal class inter-
vals does not provide a good understanding of the raw data. In this case, good judgment in
the use of unequal class intervals, as demonstrated in Table 2–6, is required to show the
distribution of the number of tax returns filed, especially for incomes under $500,000.

In the first quarter of last year, the 11 members of the sales staff at Master Chemical Company
earned the following commissions:

$1,650 $1,475 $1,510 $1,670 $1,595 $1,760 $1,540 $1,495 $1,590 $1,625 $1,510

(a) What are the values such as $1,650 and $1,475 called?
(b) Using $1,400 up to $1,500 as the first class, $1,500 up to $1,600 as the second class,

and so forth, organize the quarterly commissions into a frequency distribution.
(c) What are the numbers in the right column of your frequency distribution called?
(d) Describe the distribution of quarterly commissions, based on the frequency distribu-

tion. What is the largest concentration of commissions earned? What is the smallest,
and the largest? What is the typical amount earned?

Relative Frequency Distribution
It may be desirable, as we did earlier with qualitative data, to convert class frequencies
to relative class frequencies to show the proportion of the total number of observations
in each class. In our vehicle profits, we may want to know what percentage of the vehi-
cle profits are in the $1,000 up to $1,400 class. To convert a frequency distribution to a
relative frequency distribution, each of the class frequencies is divided by the total num-
ber of observations. From the distribution of vehicle profits, Table 2–5, the relative fre-
quency for the $1,000 up to $1,400 class is 0.128, found by dividing 23 by 180. That
is, profit on 12.8% of the vehicles sold is between $1,000 and $1,400. The relative fre-
quencies for the remaining classes are shown in Table 2–7.

S E L F - R E V I E W 2–2

TABLE 2–7 Relative Frequency Distribution of Profit for Vehicles Sold Last Month at Applewood Auto Group

Profit Frequency Relative Frequency Found by

$ 200 up to $ 600 8 .044 8/180
600 up to 1,000 11 .061 11/180
1,000 up to 1,400 23 .128 23/180
1,400 up to 1,800 38 .211 38/180
1,800 up to 2,200 45 .250 45/180
2,200 up to 2,600 32 .178 32/180
2,600 up to 3,000 19 .106 19/180
3,000 up to 3,400 4 .022 4/180

Total 180 1.000

There are many software packages that perform
statistical calculations. Throughout this text, we will
show the output from Microsoft Excel, MegaStat (a
Microsoft Excel add-in), and Minitab (a statistical
software package). Because Excel is most readily
available, it is used most frequently.

Within the earlier Graphic Presentation of
Qualitative Data section, we used the Pivot Table
tool in Excel to create a frequency table. To create
the table to the left, we use the same Excel tool to

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 31

compute frequency and relative frequency distributions for the profit variable in the
Applewood Auto Group data. The necessary steps are given in the Software Commands
section in Appendix C.

Barry Bonds of the San Francisco Giants established a new single-season Major League
Baseball home run record by hitting 73 home runs during the 2001 season. Listed below is
the sorted distance of each of the 73 home runs.

S E L F - R E V I E W 2–3

(a) For this data, show that seven classes would be used to create a frequency distribution
using the 2k rule.

(b) Show that a class interval of 30 would summarize the data in seven classes.
(c) Construct frequency and relative frequency distributions for the data with

seven classes and a class interval of 30. Start the first class with a lower limit
of 300.

(d) How many home runs traveled a distance of 360 up to 390 feet?
(e) What percentage of the home runs traveled a distance of 360 up to 390 feet?
(f) What percentage of the home runs traveled a distance of 390 feet or more?

7. A set of data consists of 38 observations. How many classes would you recom-
mend for the frequency distribution?

8. A set of data consists of 45 observations between $0 and $29. What size would
you recommend for the class interval?

9. A set of data consists of 230 observations between $235 and $567. What class
interval would you recommend?

10. A set of data contains 53 observations. The minimum value is 42 and the maximum
value is 129. The data are to be organized into a frequency distribution.

a. How many classes would you suggest?
b. What would you suggest as the lower limit of the first class?

11. Wachesaw Manufacturing Inc. produced the following number of units in the
last 16 days.

The information is to be organized into a frequency distribution.
a. How many classes would you recommend?
b. What class interval would you suggest?
c. What lower limit would you recommend for the first class?
d. Organize the information into a frequency distribution and determine the relative

frequency distribution.
e. Comment on the shape of the distribution.

E X E R C I S E S
This icon indicates that
the data are available at the text
website: www.mhhe.com/
Lind17e. You will be able to
download the data directly into
Excel or Minitab from this site.

27 27 27 28 27 25 25 28
26 28 26 28 31 30 26 26

320 320 347 350 360 360 360 361 365 370
370 375 375 375 375 380 380 380 380 380
380 390 390 391 394 396 400 400 400 400
405 410 410 410 410 410 410 410 410 410
410 410 411 415 415 416 417 417 420 420
420 420 420 420 420 420 429 430 430 430
430 430 435 435 436 440 440 440 440 440
450 480 488

32 CHAPTER 2

The data are to be organized into a frequency distribution.
a. How many classes would you recommend?
b. What class interval would you suggest?
c. What lower limit would you recommend for the first class?
d. Organize the number of oil changes into a frequency distribution.
e. Comment on the shape of the frequency distribution. Also determine the relative

frequency distribution.

13. The manager of the BiLo Supermarket in Mt. Pleasant, Rhode Island, gathered
the following information on the number of times a customer visits the store during
a month. The responses of 51 customers were:

65 98 55 62 79 59 51 90 72 56
70 62 66 80 94 79 63 73 71 85

12. The Quick Change Oil Company has a number of outlets in the metropolitan Seat-
tle area. The daily number of oil changes at the Oak Street outlet in the past 20 days are:

5 3 3 1 4 4 5 6 4 2 6 6 6 7 1
1 14 1 2 4 4 4 5 6 3 5 3 4 5 6
8 4 7 6 5 9 11 3 12 4 7 6 5 15 1
1 10 8 9 2 12

a. Starting with 0 as the lower limit of the first class and using a class interval of 3,
organize the data into a frequency distribution.

b. Describe the distribution. Where do the data tend to cluster?
c. Convert the distribution to a relative frequency distribution.

14. The food services division of Cedar River Amusement Park Inc. is studying the
amount of money spent per day on food and drink by families who visit the amuse-
ment park. A sample of 40 families who visited the park yesterday revealed they
spent the following amounts:

$77 $18 $63 $84 $38 $54 $50 $59 $54 $56 $36 $26 $50 $34 $44
41 58 58 53 51 62 43 52 53 63 62 62 65 61 52
60 60 45 66 83 71 63 58 61 71

a. Organize the data into a frequency distribution, using seven classes and 15 as
the lower limit of the first class. What class interval did you select?

b. Where do the data tend to cluster?
c. Describe the distribution.
d. Determine the relative frequency distribution.

GRAPHIC PRESENTATION OF A DISTRIBUTION
Sales managers, stock analysts, hospital administrators, and other busy executives of-
ten need a quick picture of the distributions of sales, stock prices, or hospital costs.
These distributions can often be depicted by the use of charts and graphs. Three charts
that will help portray a frequency distribution graphically are the histogram, the fre-
quency polygon, and the cumulative frequency polygon.

Histogram
A histogram for a frequency distribution based on quantitative data is similar to the
bar chart showing the distribution of qualitative data. The classes are marked on the

LO2-4
Display a distribution
using a histogram or
frequency polygon.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 33

horizontal axis and the class frequencies on the vertical axis. The class frequencies
are represented by the heights of the bars. However, there is one important differ-
ence based on the nature of the data. Quantitative data are usually measured using
scales that are continuous, not discrete. Therefore, the horizontal axis represents all
possible values, and the bars are drawn adjacent to each other to show the continu-
ous nature of the data.

HISTOGRAM A graph in which the classes are marked on the horizontal axis and
the class frequencies on the vertical axis. The class frequencies are represented by
the heights of the bars, and the bars are drawn adjacent to each other.

E X A M P L E

Below is the frequency distribution of the profits on vehicle sales last month at the
Applewood Auto Group.

Construct a histogram. What observations can you reach based on the information
presented in the histogram?

S O L U T I O N

The class frequencies are scaled along the vertical axis (Y-axis) and either the class
limits or the class midpoints along the horizontal axis. To illustrate the construction
of the histogram, the first three classes are shown in Chart 2–3.

Profit Frequency

$ 200 up to $ 600 8
600 up to 1,000 11
1,000 up to 1,400 23
1,400 up to 1,800 38
1,800 up to 2,200 45
2,200 up to 2,600 32
2,600 up to 3,000 19
3,000 up to 3,400 4

Total 180

200 600 1,000 1,400

32

24

16

8
8

11

23

Nu
m

be
r o

f V
eh

ic
le

s
(c

la
ss

fr
eq

ue
nc

y)

Pro�t $

CHART 2–3 Construction of a Histogram

34 CHAPTER 2

From Chart 2–3 we note the profit on eight vehicles was $200 up to $600. There-
fore, the height of the column for that class is 8. There are 11 vehicle sales where
the profit was $600 up to $1,000. So, logically, the height of that column is 11. The
height of the bar represents the number of observations in the class.

This procedure is continued for all classes. The complete histogram is shown in
Chart 2–4. Note that there is no space between the bars. This is a feature of the
histogram. Why is this so? Because the variable profit, plotted on the horizontal
axis, is a continuous variable. In a bar chart, the scale of measurement is usually
nominal and the vertical bars are separated. This is an important distinction be-
tween the histogram and the bar chart.

We can make the following statements using Chart 2–4. They are the same as the
observations based on Table 2–5.

1. The profits from vehicle sales range between $200 and $3,400.
2. The vehicle profits are classified using a class interval of $400. The class inter-

val is determined by subtracting consecutive lower or upper class limits. For
example, the lower limit of the first class is $200, and the lower limit of the
second class is $600. The difference is the class interval or $400.

3. The profits are concentrated between $1,000 and $3,000. The profit on 157
vehicles, or 87%, was within this range.

4. For each class, we can determine the typical profit or class midpoint. It is halfway
between the lower or upper limits of two consecutive classes. It is computed by
adding the lower or upper limits of consecutive classes and dividing by 2. Refer-
ring to Chart 2–4, the lower class limit of the first class is $200, and the next class
limit is $600. The class midpoint is $400, found by ($600 + $200)/2. The mid-
point best represents, or is typical of, the profits of the vehicles in that class.
Applewood sold 8 vehicles with a typical profit of $400.

5. The largest concentration, or highest frequency of vehicles sold, is in the $1,800 up
to $2,200 class. There are 45 vehicles in this class. The class midpoint is $2,000.
So we say that the typical profit in the class with the highest frequency is $2,000.

Thus, the histogram provides an easily interpreted visual representation of a
frequency distribution. We should also point out that we would have made the
same observations and the shape of the histogram would have been the same had
we used a relative frequency distribution instead of the actual frequencies. That is,
if we use the relative frequencies of Table 2–7, the result is a histogram of the same
shape as Chart 2–4. The only difference is that the vertical axis would have been
reported in percentage of vehicles instead of the number of vehicles. The Excel
commands to create Chart 2–4 are given in Appendix C.

20
0–

60
0

60
0–

1,0
00

1,0
00

–1
,40

0

1,4
00

–1
,80

0

1,8
00

–2
,20

0

2,2
00

–2
,60

0

2,6
00

–3
,00

0

3,0
00

–3
,40

0

10

0

30

20

Pro�t

11

23

38

45

32

19

4
8

40

v

Fr
eq

ue
nc

y

CHART 2–4 Histogram of the Profit on 180 Vehicles Sold at the Applewood Auto Group

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 35

Frequency Polygon
A frequency polygon also shows the shape of a distribution and is similar to a histo-
gram. It consists of line segments connecting the points formed by the intersections of
the class midpoints and the class frequencies. The construction of a frequency polygon
is illustrated in Chart 2–5. We use the profits from the cars sold last month at the Apple-
wood Auto Group. The midpoint of each class is scaled on the X-axis and the class
frequencies on the Y-axis. Recall that the class midpoint is the value at the center of a
class and represents the typical values in that class. The class frequency is the number
of observations in a particular class. The profit earned on the vehicles sold last month
by the Applewood Auto Group is repeated below.

STATISTICS IN ACTION

Florence Nightingale is
known as the founder of
the nursing profession.
However, she also saved
many lives by using statisti-
cal analysis. When she
encountered an unsanitary
condition or an undersup-
plied hospital, she improved
the conditions and then
used statistical data to
document the improve-
ment. Thus, she was able
to convince others of the
need for medical reform,
particularly in the area of
sanitation. She developed
original graphs to demon-
strate that, during the
Crimean War, more soldiers
died from unsanitary condi-
tions than were killed in
combat.

Fr
eq

ue
nc

y

8

24

40

48

16

4000

Pro�t $

32

800 1,200 1,600 2,000 2,400 2,800 3,200 3,600

CHART 2–5 Frequency Polygon of Profit on 180 Vehicles Sold at Applewood Auto Group

As noted previously, the $200 up to $600 class is represented by the midpoint
$400. To construct a frequency polygon, move horizontally on the graph to the mid-
point, $400, and then vertically to 8, the class frequency, and place a dot. The x and
the y values of this point are called the coordinates. The coordinates of the next point
are x = 800 and y = 11. The process is continued for all classes. Then the points are
connected in order. That is, the point representing the lowest class is joined to the
one representing the second class and so on. Note in Chart 2–5 that, to complete
the frequency polygon, midpoints of $0 and $3,600 are added to the X-axis to “anchor”
the polygon at zero frequencies. These two values, $0 and $3,600, were derived by
subtracting the class interval of $400 from the lowest midpoint ($400) and by adding
$400 to the highest midpoint ($3,200) in the frequency distribution.

Both the histogram and the frequency polygon allow us to get a quick picture of
the main characteristics of the data (highs, lows, points of concentration, etc.). Although
the two representations are similar in purpose, the histogram has the advantage of
depicting each class as a rectangle, with the height of the rectangular bar representing

Profit Midpoint Frequency

$ 200 up to $ 600 $ 400 8
600 up to 1,000 800 11
1,000 up to 1,400 1,200 23
1,400 up to 1,800 1,600 38
1,800 up to 2,200 2,000 45
2,200 up to 2,600 2,400 32
2,600 up to 3,000 2,800 19
3,000 up to 3,400 3,200 4

Total 180

36 CHAPTER 2

8

24

40

48

56

16

4000

Pro�t $

32

Fr
eq

ue
nc

y

800 1,200 1,600 2,000 2,400 2,800 3,200 3,600

Fowler Motors
Applewood

CHART 2–6 Distribution of Profit at Applewood Auto Group and Fowler Motors

the number in each class. The frequency polygon, in turn, has an advantage over the
histogram. It allows us to compare directly two or more frequency distributions. Sup-
pose Ms. Ball wants to compare the profit per vehicle sold at Applewood Auto Group
with a similar auto group, Fowler Auto in Grayling, Michigan. To do this, two frequency
polygons are constructed, one on top of the other, as in Chart 2–6. Two things are clear
from the chart:

• The typical vehicle profit is larger at Fowler Motors—about $2,000 for Applewood
and about $2,400 for Fowler.

• There is less variation or dispersion in the profits at Fowler Motors than at Apple-
wood. The lower limit of the first class for Applewood is $0 and the upper limit is
$3,600. For Fowler Motors, the lower limit is $800 and the upper limit is the
same: $3,600.

The total number of cars sold at the two dealerships is about the same, so a direct
comparison is possible. If the difference in the total number of cars sold is large, then
converting the frequencies to relative frequencies and then plotting the two distribu-
tions would allow a clearer comparison.

The annual imports of a selected group of electronic suppliers are shown in the following
frequency distribution.

S E L F - R E V I E W 2–4

Imports ($ millions) Number of Suppliers

2 up to 5 6
5 up to 8 13
8 up to 11 20
11 up to 14 10
14 up to 17 1

(a) Portray the imports as a histogram.
(b) Portray the imports as a relative frequency polygon.
(c) Summarize the important facets of the distribution (such as classes with the highest

and lowest frequencies).

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 37

15. Molly’s Candle Shop has several retail stores in the coastal areas of North and
South Carolina. Many of Molly’s customers ask her to ship their purchases. The fol-
lowing chart shows the number of packages shipped per day for the last 100 days.
For example, the first class shows that there were 5 days when the number of pack-
ages shipped was 0 up to 5.

Fr
eq

ue
nc

y
Number of Packages

10

0
5 10 15 20 25 30 35

20

30

13

28
23

18

10
35

a. What is this chart called?
b. What is the total number of packages shipped?
c. What is the class interval?
d. What is the number of packages shipped in the 10 up to 15 class?
e. What is the relative frequency of packages shipped in the 10 up to 15 class?
f. What is the midpoint of the 10 up to 15 class?
g. On how many days were there 25 or more packages shipped?

16. The following chart shows the number of patients admitted daily to Memorial Hospital
through the emergency room.

0

10

20

30

2 4 6 8 10 12

Fr
eq

ue
nc

y

Number of Patients

a. What is the midpoint of the 2 up to 4 class?
b. How many days were 2 up to 4 patients admitted?
c. What is the class interval?
d. What is this chart called?

17. The following frequency distribution reports the number of frequent flier miles,
reported in thousands, for employees of Brumley Statistical Consulting Inc. during
the most recent quarter.

E X E R C I S E S

Frequent Flier Miles Number of
(000) Employees

0 up to 3 5
3 up to 6 12
6 up to 9 23
9 up to 12 8
12 up to 15 2
Total 50

38 CHAPTER 2

Cumulative Distributions
Consider once again the distribution of the profits on vehicles sold by the Applewood
Auto Group. Suppose we were interested in the number of vehicles that sold for a profit of
less than $1,400. These values can be approximated by developing a cumulative
frequency distribution and portraying it graphically in a cumulative frequency polygon.
Or, suppose we were interested in the profit earned on the lowest-selling 40% of the ve-
hicles. These values can be approximated by developing a cumulative relative frequency
distribution and portraying it graphically in a cumulative relative frequency polygon.

a. How many employees were studied?
b. What is the midpoint of the first class?
c. Construct a histogram.
d. A frequency polygon is to be drawn. What are the coordinates of the plot for the

first class?
e. Construct a frequency polygon.
f. Interpret the frequent flier miles accumulated using the two charts.

18. A large Internet retailer is studying the lead time (elapsed time between when an
order is placed and when it is filled) for a sample of recent orders. The lead times
are reported in days.

a. How many orders were studied?
b. What is the midpoint of the first class?
c. What are the coordinates of the first class for a frequency polygon?
d. Draw a histogram.
e. Draw a frequency polygon.
f. Interpret the lead times using the two charts.

Lead Time (days) Frequency

0 up to 5 6
5 up to 10 7
10 up to 15 12
15 up to 20 8
20 up to 25 7
Total 40

E X A M P L E

The frequency distribution of the profits earned at Applewood Auto Group is
repeated from Table 2–5.

Profit Frequency

$ 200 up to $ 600 8
600 up to 1,000 11
1,000 up to 1,400 23
1,400 up to 1,800 38
1,800 up to 2,200 45
2,200 up to 2,600 32
2,600 up to 3,000 19
3,000 up to 3,400 4

Total 180

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 39

Construct a cumulative frequency polygon to answer the following question: sixty
of the vehicles earned a profit of less than what amount? Construct a cumulative
relative frequency polygon to answer this question: seventy-five percent of the
vehicles sold earned a profit of less than what amount?

S O L U T I O N

As the names imply, a cumulative frequency distribution and a cumulative fre-
quency polygon require cumulative frequencies. To construct a cumulative fre-
quency distribution, refer to the preceding table and note that there were eight
vehicles in which the profit earned was less than $600. Those 8 vehicles, plus
the 11 in the next higher class, for a total of 19, earned a profit of less than $1,000.
The cumulative frequency for the next higher class is 42, found by 8 + 11 + 23.
This process is continued for all the classes. All the vehicles earned a profit of less
than $3,400. (See Table 2–8.)

TABLE 2–8 Cumulative Frequency Distribution for Profit on Vehicles Sold Last Month at Applewood
Auto Group

Profit Cumulative Frequency Found by

Less than $ 600 8 8
Less than 1,000 19 8 + 11
Less than 1,400 42 8 + 11 + 23
Less than 1,800 80 8 + 11 + 23 + 38
Less than 2,200 125 8 + 11 + 23 + 38 + 45
Less than 2,600 157 8 + 11 + 23 + 38 + 45 + 32
Less than 3,000 176 8 + 11 + 23 + 38 + 45 + 32 + 19
Less than 3,400 180 8 + 11 + 23 + 38 + 45 + 32 + 19 + 4

TABLE 2–9 Cumulative Relative Frequency Distribution for Profit on Vehicles Sold Last Month at
Applewood Auto Group

Profit Cumulative Frequency Cumulative Relative Frequency

Less than $ 600 8 8/180 = 0.044 = 4.4%
Less than $ 1,000 19 19/180 = 0.106 = 10.6%
Less than $ 1,400 42 42/180 = 0.233 = 23.3%
Less than $ 1,800 80 80/180 = 0.444 = 44.4%
Less than $2,200 125 125/180 = 0.694 = 69.4%
Less than $2,600 157 157/180 = 0.872 = 87.2%
Less than $3,000 176 176/180 = 0.978 = 97.8%
Less than $3,400 180 180/180 = 1.000 = 100%

To construct a cumulative relative frequency distribution, we divide the cumulative
frequencies by the total number of observations, 180. As shown in Table 2-9, the
cumulative relative frequency of the fourth class is 80/180 = 44%. This means that
44% of the vehicles sold for less than $1,800.

To plot a cumulative frequency distribution, scale the upper limit of each
class along the X-axis and the corresponding cumulative frequencies along the
Y-axis. To provide additional information, you can label the vertical axis on the
right in terms of cumulative relative frequencies. In the Applewood Auto Group,

40 CHAPTER 2

the vertical axis on the left is labeled from 0 to 180 and on the right from 0 to
100%. Note, as an example, that 50% on the right axis should be opposite 90
vehicles on the left axis.

To begin, the first plot is at x = 200 and y = 0. None of the vehicles sold for a
profit of less than $200. The profit on 8 vehicles was less than $600, so the next
plot is at x = 600 and y = 8. Continuing, the next plot is x = 1,000 and y = 19. There
were 19 vehicles that sold for a profit of less than $1,000. The rest of the points are
plotted and then the dots connected to form Chart 2–7.

We should point out that the shape of the distribution is the same if we use
cumulative relative frequencies instead of the cumulative frequencies. The only
difference is that the vertical axis is scaled in percentages. In the following charts,
a percentage scale is added to the right side of the graphs to help answer ques-
tions about cumulative relative frequencies.

200 600 1,000 1,400 1,800 2,200 2,600 3,000 3,400

Nu
m

be
r o

f V
eh

ic
le

s
So

ld

Pe
rc

en
t o

f V
eh

ic
le

s
So

ld
Pro�t $

100

75

50

25

0

20

40

60

80

100

120

140

160

180

CHART 2–7 Cumulative Frequency Polygon for Profit on Vehicles Sold Last
Month at Applewood Auto Group

Using Chart 2–7 to find the amount of profit on 75% of the cars sold, draw a hori-
zontal line from the 75% mark on the right-hand vertical axis over to the polygon,
then drop down to the X-axis and read the amount of profit. The value on the X-axis
is about $2,300, so we estimate that 75% of the vehicles sold earned a profit of
$2,300 or less for the Applewood group.

To find the highest profit earned on 60 of the 180 vehicles, we use Chart 2–7
to locate the value of 60 on the left-hand vertical axis. Next, we draw a horizontal
line from the value of 60 to the polygon and then drop down to the X-axis and read
the profit. It is about $1,600, so we estimate that 60 of the vehicles sold for a profit
of less than $1,600. We can also make estimates of the percentage of vehicles that
sold for less than a particular amount. To explain, suppose we want to estimate the
percentage of vehicles that sold for a profit of less than $2,000. We begin by locat-
ing the value of $2,000 on the X-axis, move vertically to the polygon, and then
horizontally to the vertical axis on the right. The value is about 56%, so we conclude
56% of the vehicles sold for a profit of less than $2,000.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 41

A sample of the hourly wages of 15 employees at Home Depot in Brunswick, Georgia, was
organized into the following table.

Hourly Wages Number of Employees

$ 8 up to $10 3
10 up to 12 7
12 up to 14 4
14 up to 16 1

(a) What is the table called?
(b) Develop a cumulative frequency distribution and portray the distribution in a cumula-

tive frequency polygon.
(c) On the basis of the cumulative frequency polygon, how many employees earn less

than $11 per hour?

S E L F - R E V I E W 2–5

19. The following cumulative frequency and the cumulative relative frequency polygon
for the distribution of hourly wages of a sample of certified welders in the Atlanta,
Georgia, area is shown in the graph.

Fr
eq

ue
nc

y

Hourly Wage

Pe
rc

en
t

0 5 10 15 20 25 30

100

75

50

25

40

30

20

10

a. How many welders were studied?
b. What is the class interval?
c. About how many welders earn less than $10.00 per hour?
d. About 75% of the welders make less than what amount?
e. Ten of the welders studied made less than what amount?
f. What percent of the welders make less than $20.00 per hour?

20. The cumulative frequency and the cumulative relative frequency polygon for a dis-
tribution of selling prices ($000) of houses sold in the Billings, Montana, area is
shown in the graph.

Fr
eq

ue
nc

y

Pe
rc

en
t

200

150

100

50

100

75

50

25

Selling Price ($000)

500 100 150 200 250 350300

E X E R C I S E S

42 CHAPTER 2

a. How many homes were studied?
b. What is the class interval?
c. One hundred homes sold for less than what amount?
d. About 75% of the homes sold for less than what amount?
e. Estimate the number of homes in the $150,000 up to $200,000 class.
f. About how many homes sold for less than $225,000?

21. The frequency distribution representing the number of frequent flier miles accumulated
by employees at Brumley Statistical Consulting Inc. is repeated from Exercise 17.

Frequent Flier Miles
(000) Frequency

0 up to 3 5
3 up to 6 12
6 up to 9 23
9 up to 12 8
12 up to 15 2

Total 50

a. How many employees accumulated less than 3,000 miles?
b. Convert the frequency distribution to a cumulative frequency distribution.
c. Portray the cumulative distribution in the form of a cumulative frequency polygon.
d. Based on the cumulative relative frequencies, about 75% of the employees

accumulated how many miles or less?
22. The frequency distribution of order lead time of the retailer from Exercise 18 is

repeated below.

Lead Time (days) Frequency

0 up to 5 6
5 up to 10 7
10 up to 15 12
15 up to 20 8
20 up to 25 7

Total 40

a. How many orders were filled in less than 10 days? In less than 15 days?
b. Convert the frequency distribution to cumulative frequency and cumulative rela-

tive frequency distributions.
c. Develop a cumulative frequency polygon.
d. About 60% of the orders were filled in less than how many days?

C H A P T E R S U M M A R Y

I. A frequency table is a grouping of qualitative data into mutually exclusive and collectively
exhaustive classes showing the number of observations in each class.

II. A relative frequency table shows the fraction of the number of frequencies in each class.
III. A bar chart is a graphic representation of a frequency table.
IV. A pie chart shows the proportion each distinct class represents of the total number of

observations.
V. A frequency distribution is a grouping of data into mutually exclusive and collectively ex-

haustive classes showing the number of observations in each class.
A. The steps in constructing a frequency distribution are

1. Decide on the number of classes.
2. Determine the class interval.
3. Set the individual class limits.
4. Tally the raw data into classes and determine the frequency in each class.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 43

B. The class frequency is the number of observations in each class.
C. The class interval is the difference between the limits of two consecutive classes.
D. The class midpoint is halfway between the limits of consecutive classes.

VI. A relative frequency distribution shows the percent of observations in each class.
VII. There are several methods for graphically portraying a frequency distribution.

A. A histogram portrays the frequencies in the form of a rectangle or bar for each class.
The height of the rectangles is proportional to the class frequencies.

B. A frequency polygon consists of line segments connecting the points formed by the
intersection of the class midpoint and the class frequency.

C. A graph of a cumulative frequency distribution shows the number of observations less
than a given value.

D. A graph of a cumulative relative frequency distribution shows the percent of observa-
tions less than a given value.

C H A P T E R E X E R C I S E S

23. Describe the similarities and differences of qualitative and quantitative variables. Be
sure to include the following:
a. What level of measurement is required for each variable type?
b. Can both types be used to describe both samples and populations?

24. Describe the similarities and differences between a frequency table and a frequency
distribution. Be sure to include which requires qualitative data and which requires quan-
titative data.

25. Alexandra Damonte will be building a new resort in Myrtle Beach, South Carolina. She
must decide how to design the resort based on the type of activities that the resort will
offer to its customers. A recent poll of 300 potential customers showed the following
results about customers’ preferences for planned resort activities:

Like planned activities 63
Do not like planned activities 135
Not sure 78
No answer 24

a. What is the table called?
b. Draw a bar chart to portray the survey results.
c. Draw a pie chart for the survey results.
d. If you are preparing to present the results to Ms. Damonte as part of a report, which

graph would you prefer to show? Why?
26. Speedy Swift is a package delivery service that serves the greater Atlanta, Georgia,

metropolitan area. To maintain customer loyalty, one of Speedy Swift’s performance
objectives is on-time delivery. To monitor its performance, each delivery is measured on
the following scale: early (package delivered before the promised time), on-time (pack-
age delivered within 5 minutes of the promised time), late (package delivered more than
5 minutes past the promised time), or lost (package never delivered). Speedy Swift’s
objective is to deliver 99% of all packages either early or on-time. Speedy collected the
following data for last month’s performance:

On-time On-time Early Late On-time On-time On-time On-time Late On-time
Early On-time On-time Early On-time On-time On-time On-time On-time On-time
Early On-time Early On-time On-time On-time Early On-time On-time On-time
Early On-time On-time Late Early Early On-time On-time On-time Early
On-time Late Late On-time On-time On-time On-time On-time On-time On-time
On-time Late Early On-time Early On-time Lost On-time On-time On-time
Early Early On-time On-time Late Early Lost On-time On-time On-time
On-time On-time Early On-time Early On-time Early On-time Late On-time
On-time Early On-time On-time On-time Late On-time Early On-time On-time
On-time On-time On-time On-time On-time Early Early On-time On-time On-time

44 CHAPTER 2

a. What kind of variable is delivery performance? What scale is used to measure delivery
performance?

b. Construct a frequency table for delivery performance for last month.
c. Construct a relative frequency table for delivery performance last month.
d. Construct a bar chart of the frequency table for delivery performance for last month.
e. Construct a pie chart of on-time delivery performance for last month.
f. Write a memo reporting the results of the analyses. Include your tables and graphs with

written descriptions of what they show. Conclude with a general statement of last
month’s delivery performance as it relates to Speedy Swift’s performance objectives.

27. A data set consists of 83 observations. How many classes would you recommend for a
frequency distribution?

28. A data set consists of 145 observations that range from 56 to 490. What size class inter-
val would you recommend?

29. The following is the number of minutes to commute from home to work for a group
of 25 automobile executives.

28 25 48 37 41 19 32 26 16 23 23 29 36
31 26 21 32 25 31 43 35 42 38 33 28

a. How many classes would you recommend?
b. What class interval would you suggest?
c. What would you recommend as the lower limit of the first class?
d. Organize the data into a frequency distribution.
e. Comment on the shape of the frequency distribution.

30. The following data give the weekly amounts spent on groceries for a sample of 45
households.

$271 $363 $159 $ 76 $227 $337 $295 $319 $250
279 205 279 266 199 177 162 232 303
192 181 321 309 246 278 50 41 335
116 100 151 240 474 297 170 188 320
429 294 570 342 279 235 434 123 325

a. How many classes would you recommend?
b. What class interval would you suggest?
c. What would you recommend as the lower limit of the first class?
d. Organize the data into a frequency distribution.

31. A social scientist is studying the use of iPods by college students. A sample of 45
students revealed they played the following number of songs yesterday.

4 6 8 7 9 6 3 7 7 6 7 1 4 7 7
4 6 4 10 2 4 6 3 4 6 8 4 3 3 6
8 8 4 6 4 6 5 5 9 6 8 8 6 5 10

Organize the information into a frequency distribution.
a. How many classes would you suggest?
b. What is the most suitable class interval?
c. What is the lower limit of the initial class?
d. Create the frequency distribution.
e. Describe the shape of the distribution.

32. David Wise handles his own investment portfolio, and has done so for many years.
Listed below is the holding time (recorded to the nearest whole year) between purchase
and sale for his collection of 36 stocks.

8 8 6 11 11 9 8 5 11 4 8 5 14 7 12 8 6 11 9 7
9 15 8 8 12 5 9 8 5 9 10 11 3 9 8 6

a. How many classes would you propose?
b. What class interval would you suggest?
c. What quantity would you use for the lower limit of the initial class?

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 45

d. Using your responses to parts (a), (b), and (c), create a frequency distribution.
e. Describe the shape of the frequency distribution.

33. You are exploring the music in your iTunes library. The total play counts over the past
year for the 27 songs on your “smart playlist” are shown below. Make a frequency distribu-
tion of the counts and describe its shape. It is often claimed that a small fraction of a person’s
songs will account for most of their total plays. Does this seem to be the case here?

128 56 54 91 190 23 160 298 445 50
578 494 37 677 18 74 70 868 108 71
466 23 84 38 26 814 17

34. The monthly issues of the Journal of Finance are available on the Internet. The
table below shows the number of times an issue was downloaded over the last
33 months. Suppose that you wish to summarize the number of downloads with a
frequency distribution.

312 2,753 2,595 6,057 7,624 6,624 6,362 6,575 7,760 7,085 7,272
5,967 5,256 6,160 6,238 6,709 7,193 5,631 6,490 6,682 7,829 7,091
6,871 6,230 7,253 5,507 5,676 6,974 6,915 4,999 5,689 6,143 7,086

a. How many classes would you propose?
b. What class interval would you suggest?
c. What quantity would you use for the lower limit of the initial class?
d. Using your responses to parts (a), (b), and (c), create a frequency distribution.
e. Describe the shape of the frequency distribution.

35. The following histogram shows the scores on the first exam for a statistics class.

50 60 70 80 90 100

25
20
15
10

5
0

Score

Fr
eq

ue
nc

y

3

14

21

12

6

a. How many students took the exam?
b. What is the class interval?
c. What is the class midpoint for the first class?
d. How many students earned a score of less than 70?

36. The following chart summarizes the selling price of homes sold last month in the
Sarasota, Florida, area.

100

75

50

25

250
200
150
100
50

0 50 100 150
Selling Price ($000)

200 250 300 350

Fr
eq

ue
nc

y

Pe
rc

en
t

a. What is the chart called?
b. How many homes were sold during the last month?
c. What is the class interval?
d. About 75% of the houses sold for less than what amount?
e. One hundred seventy-five of the homes sold for less than what amount?

46 CHAPTER 2

37. A chain of sport shops catering to beginning skiers, headquartered in Aspen,
Colorado, plans to conduct a study of how much a beginning skier spends on his or her
initial purchase of equipment and supplies. Based on these figures, it wants to explore
the possibility of offering combinations, such as a pair of boots and a pair of skis, to
induce customers to buy more. A sample of 44 cash register receipts revealed these
initial purchases:

$140 $ 82 $265 $168 $ 90 $114 $172 $230 $142
86 125 235 212 171 149 156 162 118
139 149 132 105 162 126 216 195 127
161 135 172 220 229 129 87 128 126
175 127 149 126 121 118 172 126

a. Arrive at a suggested class interval.
b. Organize the data into a frequency distribution using a lower limit of $70.
c. Interpret your findings.

38. The numbers of outstanding shares for 24 publicly traded companies are listed in
the following table.

Number of
Outstanding
Shares
Company (millions)

Southwest Airlines 738
FirstEnergy 418
Harley Davidson 226
Entergy 178
Chevron 1,957
Pacific Gas and Electric 430
DuPont 932
Westinghouse 22
Eversource 314
Facebook 1,067
Google, Inc. 64
Apple 941

Number of
Outstanding
Shares
Company (millions)

Costco 436
Home Depot 1,495
DTE Energy 172
Dow Chemical 1,199
Eastman Kodak 272
American Electric Power 485
ITT Corporation 93
Ameren 243
Virginia Electric and Power 575
Public Service Electric & Gas 506
Consumers Energy 265
Starbucks 744

a. Using the number of outstanding shares, summarize the companies with a frequency
distribution.

b. Display the frequency distribution with a frequency polygon.
c. Create a cumulative frequency distribution of the outstanding shares.
d. Display the cumulative frequency distribution with a cumulative frequency polygon.
e. Based on the cumulative relative frequency distribution, 75% of the companies have

less than “what number” of outstanding shares?
f. Write a brief analysis of this group of companies based on your statistical summaries

of “number of outstanding shares.”
39. A recent survey showed that the typical American car owner spends $2,950 per year on

operating expenses. Below is a breakdown of the various expenditure items. Draw an
appropriate chart to portray the data and summarize your findings in a brief report.

Expenditure Item Amount

Fuel $ 603
Interest on car loan 279
Repairs 930
Insurance and license 646
Depreciation 492

Total $2,950

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 47

40. Midland National Bank selected a sample of 40 student checking accounts. Below
are their end-of-the-month balances.

$404 $ 74 $234 $149 $279 $215 $123 $ 55 $ 43 $321
87 234 68 489 57 185 141 758 72 863

703 125 350 440 37 252 27 521 302 127
968 712 503 489 327 608 358 425 303 203

a. Tally the data into a frequency distribution using $100 as a class interval and $0 as
the starting point.

b. Draw a cumulative frequency polygon.
c. The bank considers any student with an ending balance of $400 or more a “pre-

ferred customer.” Estimate the percentage of preferred customers.
d. The bank is also considering a service charge to the lowest 10% of the ending bal-

ances. What would you recommend as the cutoff point between those who have to
pay a service charge and those who do not?

41. Residents of the state of South Carolina earned a total of $69.5 billion in adjusted gross
income. Seventy-three percent of the total was in wages and salaries; 11% in dividends,
interest, and capital gains; 8% in IRAs and taxable pensions; 3% in business income
pensions; 2% in Social Security; and the remaining 3% from other sources. Develop a
pie chart depicting the breakdown of adjusted gross income. Write a paragraph summa-
rizing the information.

42. A recent study of home technologies reported the number of hours of personal
computer usage per week for a sample of 60 persons. Excluded from the study were
people who worked out of their home and used the computer as a part of their work.

9.3 5.3 6.3 8.8 6.5 0.6 5.2 6.6 9.3 4.3
6.3 2.1 2.7 0.4 3.7 3.3 1.1 2.7 6.7 6.5
4.3 9.7 7.7 5.2 1.7 8.5 4.2 5.5 5.1 5.6
5.4 4.8 2.1 10.1 1.3 5.6 2.4 2.4 4.7 1.7
2.0 6.7 1.1 6.7 2.2 2.6 9.8 6.4 4.9 5.2
4.5 9.3 7.9 4.6 4.3 4.5 9.2 8.5 6.0 8.1

a. Organize the data into a frequency distribution. How many classes would you sug-
gest? What value would you suggest for a class interval?

b. Draw a histogram. Describe your result.
43. Merrill Lynch recently completed a study regarding the size of online investment

portfolios (stocks, bonds, mutual funds, and certificates of deposit) for a sample of cli-
ents in the 40 up to 50 years old age group. Listed following is the value of all the in-
vestments in thousands of dollars for the 70 participants in the study.

$669.9 $ 7.5 $ 77.2 $ 7.5 $125.7 $516.9 $ 219.9 $645.2
301.9 235.4 716.4 145.3 26.6 187.2 315.5 89.2
136.4 616.9 440.6 408.2 34.4 296.1 185.4 526.3
380.7 3.3 363.2 51.9 52.2 107.5 82.9 63.0
228.6 308.7 126.7 430.3 82.0 227.0 321.1 403.4
39.5 124.3 118.1 23.9 352.8 156.7 276.3 23.5
31.3 301.2 35.7 154.9 174.3 100.6 236.7 171.9
221.1 43.4 212.3 243.3 315.4 5.9 1,002.2 171.7
295.7 437.0 87.8 302.1 268.1 899.5

a. Organize the data into a frequency distribution. How many classes would you sug-
gest? What value would you suggest for a class interval?

b. Draw a histogram. Financial experts suggest that this age group of people have at
least five times their salary saved. As a benchmark, assume an investment portfolio
of $500,000 would support retirement in 10–15 years. In writing, summarize your
results.

48 CHAPTER 2

44. A total of 5.9% of the prime-time viewing audience watched shows on ABC, 7.6%
watched shows on CBS, 5.5% on Fox, 6.0% on NBC, 2.0% on Warner Brothers, and
2.2% on UPN. A total of 70.8% of the audience watched shows on other cable net-
works, such as CNN and ESPN. You can find the latest information on TV viewing from
the following website: http://www.nielsen.com/us/en/top10s.html/. Develop a pie
chart or a bar chart to depict this information. Write a paragraph summarizing your
findings.

45. Refer to the following chart:

Contact for Job Placement at Wake Forest University

Networking
and

Connections
70%

On-Campus
Recruiting

10%

Job Posting
Websites

20%

a. What is the name given to this type of chart?
b. Suppose that 1,000 graduates will start a new job shortly after graduation. Estimate

the number of graduates whose first contact for employment occurred through net-
working and other connections.

c. Would it be reasonable to conclude that about 90% of job placements were made
through networking, connections, and job posting websites? Cite evidence.

46. The following chart depicts the annual revenues, by type of tax, for the state of Georgia.

Sales
44.54%Income

43.34%

Other
0.9%

License
2.9%

Corporate
8.31%

Annual Revenue State of Georgia

a. What percentage of the state revenue is accounted for by sales tax and individual
income tax?

b. Which category will generate more revenue: corporate taxes or license fees?
c. The total annual revenue for the state of Georgia is $6.3 billion. Estimate the amount

of revenue in billions of dollars for sales taxes and for individual taxes.

DESCRIBING DATA: FREQUENCY TABLES, FREQUENCY DISTRIBUTIONS, AND GRAPHIC PRESENTATION 49

47. In 2014, the United States exported a total of $376 billion worth of products to Canada.
The five largest categories were:

Product Amount

Vehicles $63.3
Machinery 59.7
Electrical machinery 36.6
Mineral fuel and oil 24.8
Plastic 17.0

a. Use a software package to develop a bar chart.
b. What percentage of the United States’ total exports to Canada is represented by the

two categories “Machinery” and “Electrical Machinery”?
c. What percentage of the top five exported products do “Machinery” and “Electrical

Machinery” represent?
48. In the United States, the industrial revolution of the early 20th century changed

farming by making it more efficient. For example, in 1910 U.S. farms used 24.2 million
horses and mules and only about 1,000 tractors. By 1960, 4.6 million tractors were
used and only 3.2 million horses and mules. An outcome of making farming more
efficient is the reduction of the number of farms from over 6 million in 1920 to about
2.2 million farms today. Listed below is the number of farms, in thousands, for each of
the 50 states. Summarize the data and write a paragraph that describes your findings.

50 12 5 28 59 19 35 22 80 5
8 48 3 75 25 77 46 68 10 69
77 25 13 20 35 6 52 61 36 38
88 1 75 246 59 50 44 98 74 2
32 42 7 31 28 9 8 44 25 37

49. One of the most popular candies in the United States is M&M’s produced by the Mars
Company. In the beginning M&M’s were all brown. Now they are produced in red, green,
blue, orange, brown, and yellow. Recently, the purchase of a 14-ounce bag of M&M’s
Plain had 444 candies with the following breakdown by color: 130 brown, 98 yellow,
96 red, 35 orange, 52 blue, and 33 green. Develop a chart depicting this information
and write a paragraph summarizing the results.

50. The number of families who used the Minneapolis YWCA day care service was
recorded during a 30-day period. The results are as follows:

31 49 19 62 24 45 23 51 55 60
40 35 54 26 57 37 43 65 18 41
50 56 4 54 39 52 35 51 63 42

a. Construct a cumulative frequency distribution.
b. Sketch a graph of the cumulative frequency polygon.
c. How many days saw fewer than 30 families utilize the day care center?
d. Based on cumulative relative frequencies, how busy were the highest 80% of the days?

D A T A A N A L Y T I C S

51. Refer to the North Valley Real Estate data that reports information on homes sold
during the last year. For the variable price, select an appropriate class interval and orga-
nize the selling prices into a frequency distribution. Write a brief report summarizing
your findings. Be sure to answer the following questions in your report.
a. Around what values of price do the data tend to cluster?
b. Based on the frequency distribution, what is the typical selling price in the first class?

What is the typical selling price in the last class?

50 CHAPTER 2

c. Draw a cumulative relative frequency distribution. Using this distribution, fifty
percent of the homes sold for what price or less? Estimate the lower price of the
top ten percent of homes sold. About what percent of the homes sold for less than
$300,000?

d. Refer to the variable bedrooms. Draw a bar chart showing the number of homes sold
with 2, 3, 4 or more bedrooms. Write a description of the distribution.

52. Refer to the Baseball 2016 data that report information on the 30 Major League
Baseball teams for the 2016 season. Create a frequency distribution for the Team Salary
variable and answer the following questions.
a. What is the typical salary for a team? What is the range of the salaries?
b. Comment on the shape of the distribution. Does it appear that any of the teams have

a salary that is out of line with the others?
c. Draw a cumulative relative frequency distribution of team salary. Using this distribu-

tion, forty percent of the teams have a salary of less than what amount? About how
many teams have a total salary of more than $220 million?

53. Refer to the Lincolnville School District bus data. Select the variable referring to
the number of miles traveled since the last maintenance, and then organize these data
into a frequency distribution.
a. What is a typical amount of miles traveled? What is the range?
b. Comment on the shape of the distribution. Are there any outliers in terms of miles

driven?
c. Draw a cumulative relative frequency distribution. Forty percent of the buses

were driven fewer than how many miles? How many buses were driven less than
10,500 miles?

d. Refer to the variables regarding the bus manufacturer and the bus capacity. Draw a
pie chart of each variable and write a description of your results.

Describing Data:
NUMERICAL MEASURES 3

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO3-1 Compute and interpret the mean, the median, and the mode.

LO3-2 Compute a weighted mean.

LO3-3 Compute and interpret the geometric mean.

LO3-4 Compute and interpret the range, variance, and standard deviation.

LO3-5 Explain and apply Chebyshev’s theorem and the Empirical Rule.

LO3-6 Compute the mean and standard deviation of grouped data.

THE KENTUCKY DERBY is held the first Saturday in May at Churchill Downs in Louisville,
Kentucky. The race track is one and one-quarter miles. The table in Exercise 82 shows the
winners since 1990, their margin of victory, the winning time, and the payoff on a $2 bet.
Determine the mean and median for the variables winning time and payoff on a $2 bet.
(See Exercise 82 and LO3-1.)

© Andy Lyons/Getty Images

52 CHAPTER 3

INTRODUCTION
Chapter 2 began our study of descriptive statistics. To summarize raw data into a mean-
ingful form, we organized qualitative data into a frequency table and portrayed the re-
sults in a bar chart. In a similar fashion, we organized quantitative data into a frequency
distribution and portrayed the results in a histogram. We also looked at other graphical
techniques such as pie charts to portray qualitative data and frequency polygons to
portray quantitative data.

This chapter is concerned with two numerical ways of describing quantitative vari-
ables, namely, measures of location and measures of dispersion. Measures of location
are often referred to as averages. The purpose of a measure of location is to pinpoint
the center of a distribution of data. An average is a
measure of location that shows the central value
of the data. Averages appear daily on TV, on vari-
ous websites, in the newspaper, and in other jour-
nals. Here are some examples:

• The average U.S. home changes ownership
every 11.8 years.

• An American receives an average of 568
pieces of mail per year.

• The average American home has more TV
sets than people. There are 2.73 TV sets and
2.55 people in the typical home.

• The average American couple spends
$20,398 for their wedding, while their budget
is 50% less. This does not include the cost of
a honeymoon or engagement ring.

• The average price of a theater ticket in the
United States is $8.31, according to the
National Association of Theater Owners.

If we consider only measures of location in a set of data, or if we compare sev-
eral sets of data using central values, we may draw an erroneous conclusion. In
addition to measures of location, we should consider the dispersion—often called
the variation or the spread—in the data. As an illustration, suppose the average
annual income of executives for Internet-related companies is $80,000, and the
average income for executives in pharmaceutical firms is also $80,000. If we
looked only at the average incomes, we might wrongly conclude that the distribu-
tions of the two salaries are the same. However, we need to examine the disper-
sion or spread of the distributions of salary. A look at the salary ranges indicates
that this conclusion of equal distributions is not correct. The salaries for the execu-
tives in the Internet firms range from $70,000 to $90,000, but salaries for the mar-
keting executives in pharmaceuticals range from $40,000 to $120,000. Thus, we
conclude that although the average salaries are the same for the two industries,
there is much more spread or dispersion in salaries for the pharmaceutical execu-
tives. To describe the dispersion, we will consider the range, the variance, and the
standard deviation.

MEASURES OF LOCATION
We begin by discussing measures of location. There is not just one measure of location;
in fact, there are many. We will consider five: the arithmetic mean, the median, the mode,
the weighted mean, and the geometric mean. The arithmetic mean is the most widely
used and widely reported measure of location. We study the mean as both a population
parameter and a sample statistic.

LO3-1
Compute and interpret
the mean, the median,
and the mode.

© Andersen Ross/Getty Images RF

STATISTICS IN ACTION

Did you ever meet the
“average” American man?
Well, his name is Robert
(that is the nominal level of
measurement) and he is
31 years old (that is the
ratio level), is 69.5 inches
tall (again the ratio level of
measurement), weighs
172 pounds, wears a size
9½ shoe, has a 34-inch
waist, and wears a size
40 suit. In addition, the
average man eats 4 pounds
of potato chips, watches
1,456 hours of TV, and
eats 26 pounds of bananas
each year, and also sleeps
7.7 hours per night.
The average American
woman is 5′ 4″ tall and
weighs 140 pounds, while
the average American
model is 5′ 11″ tall and
weighs 117 pounds. On
any given day, almost half
of the women in the United
States are on a diet. Idol-
ized in the 1950s, Marilyn
Monroe would be consid-
ered overweight by today’s
standards. She fluctuated
between a size 14 and a
size 18 dress, and was a
healthy and attractive
woman.

DESCRIBING DATA: NUMERICAL MEASURES 53

The Population Mean
Many studies involve all the values in a population. For example, there are 12 sales as-
sociates employed at the Reynolds Road Carpet Outlet. The mean amount of commis-
sion they earned last month was $1,345. This is a population value because we
considered the commission of all the sales associates. Other examples of a population
mean would be:

• The mean closing price for Johnson & Johnson stock for the last 5 days is $95.47.
• The mean number of hours of overtime worked last week by the six welders in the

welding department of Butts Welding Inc. is 6.45 hours.
• Caryn Tirsch began a website last month devoted to organic gardening. The mean

number of hits on her site for the 31 days in July was 84.36.

For raw data—that is, data that have not been grouped in a frequency distribution—
the population mean is the sum of all the values in the population divided by the num-
ber of values in the population. To find the population mean, we use the following
formula.

Population mean =
Sum of all the values in the population

Number of values in the population

Instead of writing out in words the full directions for computing the population mean
(or any other measure), it is more convenient to use the shorthand symbols of mathe-
matics. The mean of the population using mathematical symbols is:

POPULATION MEAN μ =
Σx
N

(3–1)

where:
μ represents the population mean. It is the Greek lowercase letter “mu.”
N is the number of values in the population.
x represents any particular value.
Σ is the Greek capital letter “sigma” and indicates the operation of adding.
Σx is the sum of the x values in the population.

Any measurable characteristic of a population is called a parameter. The mean of a
population is an example of a parameter.

PARAMETER A characteristic of a population.

E X A M P L E

There are 42 exits on I-75 through the state of Kentucky. Listed below are the
distances between exits (in miles).

11 4 10 4 9 3 8 10 3 14 1 10 3 5
2 2 5 6 1 2 2 3 7 1 3 7 8 10
1 4 7 5 2 2 5 1 1 3 3 1 2 1

54 CHAPTER 3

Why is this information a population? What is the mean number of miles between
exits?

S O L U T I O N

This is a population because we are considering all the exits on I-75 in Kentucky.
We add the distances between each of the 42 exits. The total distance is 192 miles.
To find the arithmetic mean, we divide this total by 42. So the arithmetic mean is
4.57 miles, found by 192/42. From formula (3–1):

μ =
Σx
N

=
11 + 4 + 10 + … + 1

42
=

192
42

= 4.57

How do we interpret the value of 4.57? It is the typical number of miles between
exits. Because we considered all the exits on I-75 in Kentucky, this value is a popu-
lation parameter.

The Sample Mean
As explained in Chapter 1, we often select a sample from the population
to estimate a specific characteristic of the population. Smucker’s quality
assurance department needs to be assured that the amount of orange
marmalade in the jar labeled as containing 12 ounces actually contains
that amount. It would be very expensive and time-consuming to check
the weight of each jar. Therefore, a sample of 20 jars is selected, the
mean of the sample is determined, and that value is used to estimate the
amount in each jar.

For raw data—that is, ungrouped data—the mean is the sum of all
the sampled values divided by the total number of sampled values. To
find the mean for a sample:

Sample mean =
Sum of all the values in the sample

Number of values in the sample

The mean of a sample and the mean of a population are computed in the same
way, but the shorthand notation used is different. The formula for the mean of a sam-
ple is:

© Bloomberg/Getty Images

SAMPLE MEAN x =
Σx
n

(3–2)

where:
x represents the sample mean. It is read “x bar.”
n is the number of values in the sample.
x represents any particular value.
Σ is the Greek capital letter “sigma” and indicates the operation of adding.
Σx is the sum of the x values in the sample.

The mean of a sample, or any other measure based on sample data, is called a statistic.
If the mean weight of a sample of 10 jars of Smucker’s orange marmalade is 11.5
ounces, this is an example of a statistic.

DESCRIBING DATA: NUMERICAL MEASURES 55

Properties of the Arithmetic Mean
The arithmetic mean is a widely used measure of location. It has several important
properties:

1. To compute a mean, the data must be measured at the interval or ratio level.
Recall from Chapter 1 that ratio-level data include such data as ages, incomes, and
weights, with the distance between numbers being constant.

2. All the values are included in computing the mean.
3. The mean is unique. That is, there is only one mean in a set of data. Later in the

chapter, we will discover a measure of location that might appear twice, or more
than twice, in a set of data.

4. The sum of the deviations of each value from the mean is zero. Expressed
symbolically:

Σ (x − x) = 0

As an example, the mean of 3, 8, and 4 is 5. Then:

Σ(x − x ) = (3 − 5) + (8 − 5) + (4 − 5)

= −2 + 3 − 1

= 0

Thus, we can consider the mean as a balance point for a set of data. To illustrate,
we have a long board with the numbers 1, 2, 3, . . . , 9 evenly spaced on it. Suppose
three bars of equal weight were placed on the board at numbers 3, 4, and 8, and the
balance point was set at 5, the mean of the three numbers. We would find that the

STATISTIC A characteristic of a sample.

E X A M P L E

Verizon is studying the number of monthly minutes used by clients in a particular
cell phone rate plan. A random sample of 12 clients showed the following number
of minutes used last month.

90 77 94 89 119 112
91 110 92 100 113 83

What is the arithmetic mean number of minutes used last month?

S O L U T I O N

Using formula (3–2), the sample mean is:

Sample mean =
Sum of all values in the sample
Number of values in the sample

x =
Σx
n

=
90 + 77 + … + 83

12
=

1,170
12

= 97.5

The arithmetic mean number of minutes used last month by the sample of cell
phone users is 97.5 minutes.

56 CHAPTER 3

board is balanced perfectly! The deviations below the mean (−3) are equal to the devi-
ations above the mean (+3). Shown schematically:

21

22

+3

1 2 3 4 5 6 7 8 9

_
x

The mean does have a weakness. Recall that the mean uses the value of every
item in a sample, or population, in its computation. If one or two of these values are
either extremely large or extremely small compared to the majority of data, the mean
might not be an appropriate average to represent the data. For example, suppose
the annual incomes of a sample of financial planners at Merrill Lynch are $62,900,
$61,600, $62,500, $60,800, and $1,200,000. The mean income is $289,560. Obvi-
ously, it is not representative of this group because all but one financial planner has
an income in the $60,000 to $63,000 range. One income ($1.2 million) is unduly
affecting the mean.

1. The annual incomes of a sample of middle-management employees at Westinghouse
are $62,900, $69,100, $58,300, and $76,800.

(a) Give the formula for the sample mean.
(b) Find the sample mean.
(c) Is the mean you computed in (b) a statistic or a parameter? Why?
(d) What is your best estimate of the population mean?
2. The six students in Computer Science 411 are a population. Their final course grades

are 92, 96, 61, 86, 79, and 84.
(a) Give the formula for the population mean.
(b) Compute the mean course grade.
(c) Is the mean you computed in (b) a statistic or a parameter? Why?

S E L F - R E V I E W 3–1

The answers to the odd-numbered exercises are in Appendix D.

1. Compute the mean of the following population values: 6, 3, 5, 7, 6.
2. Compute the mean of the following population values: 7, 5, 7, 3, 7, 4.
3. a. Compute the mean of the following sample values: 5, 9, 4, 10.

b. Show that Σ (x − x) = 0.
4. a. Compute the mean of the following sample values: 1.3, 7.0, 3.6, 4.1, 5.0.

b. Show that Σ (x − x) = 0.
5. Compute the mean of the following sample values: 16.25, 12.91, 14.58.
6. Suppose you go to the grocery store and spend $61.85 for the purchase of 14

items. What is the mean price per item?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 57

The Median
We have stressed that, for data containing one or two very large or very small values,
the arithmetic mean may not be representative. The center for such data is better de-
scribed by a measure of location called the median.

To illustrate the need for a measure of location other than the arithmetic mean, sup-
pose you are seeking to buy a condominium in Palm Aire. Your real estate agent says
that the typical price of the units currently available is $110,000. Would you still want to
look? If you had budgeted your maximum purchase price at $75,000, you might think
they are out of your price range. However, checking the prices of the individual units
might change your mind. They are $60,000, $65,000, $70,000, and $80,000, and a
superdeluxe penthouse costs $275,000. The arithmetic mean price is $110,000, as the
real estate agent reported, but one price ($275,000) is pulling the arithmetic mean up-
ward, causing it to be an unrepresentative average. It does seem that a price around
$70,000 is a more typical or representative average, and it is. In cases such as this, the
median provides a more valid measure of location.

MEDIAN The midpoint of the values after they have been ordered from the
minimum to the maximum values.

For Exercises 7–10, (a) compute the arithmetic mean and (b) indicate whether it is a
statistic or a parameter.

7. There are 10 salespeople employed by Midtown Ford. The number of new cars
sold last month by the respective salespeople were: 15, 23, 4, 19, 18, 10, 10, 8,
28, 19.

8. A mail-order company counted the number of incoming calls per day to the compa-
ny’s toll-free number during the first 7 days in May: 14, 24, 19, 31, 36, 26, 17.

9. The Cambridge Power and Light Company selected a random sample of 20
residential customers. Following are the amounts, to the nearest dollar, the custom-
ers were charged for electrical service last month:

54 48 58 50 25 47 75 46 60 70
67 68 39 35 56 66 33 62 65 67

10. A Human Resources manager at Metal Technologies studied the overtime
hours of welders. A sample of 15 welders showed the following number of overtime
hours worked last month.

13 13 12 15 7 15 5 12
6 7 12 10 9 13 12

11. AAA Heating and Air Conditioning completed 30 jobs last month with a mean reve-
nue of $5,430 per job. The president wants to know the total revenue for the month.
Based on the limited information, can you compute the total revenue? What is it?

12. A large pharmaceutical company hires business administration graduates to sell its
products. The company is growing rapidly and dedicates only 1 day of sales train-
ing for new salespeople. The company’s goal for new salespeople is $10,000 per
month. The goal is based on the current mean sales for the entire company, which
is $10,000 per month. After reviewing the retention rates of new employees, the
company finds that only 1 in 10 new employees stays longer than 3 months. Com-
ment on using the current mean sales per month as a sales goal for new employ-
ees. Why do new employees leave the company?

58 CHAPTER 3

The median price of the units available is $70,000. To determine this, we order the
prices from the minimum value ($60,000) to the maximum value ($275,000) and select
the middle value ($70,000). For the median, the data must be at least an ordinal level of
measurement.

Prices Ordered from Prices Ordered from
Minimum to Maximum Maximum to Minimum

$ 60,000 $275,000
65,000 80,000
70,000 ← Median → 70,000
80,000 65,000
275,000 60,000

Note that there is the same number of prices below the median of $70,000 as
above it. The median is, therefore, unaffected by extremely low or high prices. Had the
highest price been $90,000, or $300,000, or even $1 million, the median price would
still be $70,000. Likewise, had the lowest price been $20,000 or $50,000, the median
price would still be $70,000.

In the previous illustration, there are an odd number of observations (five). How is
the median determined for an even number of observations? As before, the observa-
tions are ordered. Then by convention to obtain a unique value we calculate the mean
of the two middle observations. So for an even number of observations, the median may
not be one of the given values.

E X A M P L E

Facebook is a popular social networking website. Users can add friends and send
them messages, and update their personal profiles to notify friends about them-
selves and their activities. A sample of 10 adults revealed they spent the following
number of hours last month using Facebook.

3 5 7 5 9 1 3 9 17 10

Find the median number of hours.

S O L U T I O N

Note that the number of adults sampled is even (10). The first step, as before, is
to order the hours using Facebook from the minimum value to the maximum
value. Then identify the two middle times. The arithmetic mean of the two middle
observations gives us the median hours. Arranging the values from minimum to
maximum:

1 3 3 5 5 7 9 9 10 17

The median is found by averaging the two middle values. The middle values are
5 hours and 7 hours, and the mean of these two values is 6. We conclude that the
typical adult Facebook user spends 6 hours per month at the website. Notice that
the median is not one of the values. Also, half of the times are below the median
and half are above it.

DESCRIBING DATA: NUMERICAL MEASURES 59

The major properties of the median are:

1. It is not affected by extremely large or small values. Therefore, the median is a
valuable measure of location when such values do occur.

2. It can be computed for ordinal-level data or higher. Recall from Chapter 1 that
ordinal-level data can be ranked from low to high.

The Mode
The mode is another measure of location.

MODE The value of the observation that appears most frequently.

The mode is especially useful in summarizing nominal-level data. As an example of
its use for nominal-level data, a company has developed five bath oils. The bar chart in
Chart 3–1 shows the results of a marketing survey designed to find which bath oil con-
sumers prefer. The largest number of respondents favored Lamoure, as evidenced by
the highest bar. Thus, Lamoure is the mode.

N
um

be
r

of
R

es
po

ns
es

Bath oil

Amor Lamoure Soothing

300

200

100

0

400

Smell Nice Far Out

Mode

CHART 3–1 Number of Respondents Favoring Various Bath Oils

E X A M P L E

Recall the data regarding the distance in miles between exits on I-75 in Kentucky.
The information is repeated below.

11 4 10 4 9 3 8 10 3 14 1 10 3 5
2 2 5 6 1 2 2 3 7 1 3 7 8 10
1 4 7 5 2 2 5 1 1 3 3 1 2 1

What is the modal distance?

S O L U T I O N

The first step is to organize the distances into a frequency table. This will help us
determine the distance that occurs most frequently.

60 CHAPTER 3

In summary, we can determine the mode for all levels of data—nominal, ordinal, in-
terval, and ratio. The mode also has the advantage of not being affected by extremely
high or low values.

The mode does have disadvantages, however, that cause it to be used less fre-
quently than the mean or median. For many sets of data, there is no mode because no
value appears more than once. For example, there is no mode for this set of price data
because every value occurs once: $19, $21, $23, $20, and $18. Conversely, for some
data sets there is more than one mode. Suppose the ages of the individuals in a stock
investment club are 22, 26, 27, 27, 31, 35, and 35. Both the ages 27 and 35 are modes.
Thus, this grouping of ages is referred to as bimodal (having two modes). One would
question the use of two modes to represent the location of this set of age data.

Distance in Miles between Exits Frequency

1 8
2 7
3 7
4 3
5 4
6 1
7 3
8 2
9 1
10 4
11 1
14 1

Total 42

The distance that occurs most often is 1 mile. This happens eight times—that is,
there are eight exits that are 1 mile apart. So the modal distance between exits
is 1 mile.

Which of the three measures of location (mean, median, or mode) best rep-
resents the central location of these data? Is the mode the best measure of location
to represent the Kentucky data? No. The mode assumes only the nominal scale of
measurement and the variable miles is measured using the ratio scale. We calcu-
lated the mean to be 4.57 miles. See page 54. Is the mean the best measure of
location to represent these data? Probably not. There are several cases in which
the distance between exits is large. These values are affecting the mean, making it
too large and not representative of the distances between exits. What about the
median? The median distance is 3 miles. That is, half of the distances between exits
are 3 miles or less. In this case, the median of 3 miles between exits is probably a
more representative measure of the distance between exits.

1. A sample of single persons in Towson, Texas, receiving Social Security payments
revealed these monthly benefits: $852, $598, $580, $1,374, $960, $878, and $1,130.

(a) What is the median monthly benefit?
(b) How many observations are below the median? Above it?
2. The number of work stoppages in the United States over the last 10 years are 22, 20,

21, 15, 5, 11, 19, 19, 15, and 11.
(a) What is the median number of stoppages?
(b) How many observations are below the median? Above it?
(c) What is the modal number of work stoppages?

S E L F - R E V I E W 3–2

DESCRIBING DATA: NUMERICAL MEASURES 61

13. What would you report as the modal value for a set of observations if there were a total of:
a. 10 observations and no two values were the same?
b. 6 observations and they were all the same?
c. 6 observations and the values were 1, 2, 3, 3, 4, and 4?

For Exercises 14–16, determine the (a) mean, (b) median, and (c) mode.

14. The following is the number of oil changes for the last 7 days at the Jiffy Lube
located at the corner of Elm Street and Pennsylvania Avenue.

41 15 39 54 31 15 33

15. The following is the percent change in net income from last year to this year for a
sample of 12 construction companies in Denver.

5 1 −10 −6 5 12 7 8 6 5 −1 11

16. The following are the ages of the 10 people in the Java Coffee Shop at the Southwyck
Shopping Mall at 10 a.m.

21 41 20 23 24 33 37 42 23 29

17. Several indicators of long-term economic growth in the United States and
their annual percent change are listed below.

Economic Indicator Percent Change Economic Indicator Percent Change

Inflation 4.5% Real GNP 2.9%
Exports 4.7 Investment (residential) 3.6
Imports 2.3 Investment (nonresidential) 2.1
Real disposable income 2.9 Productivity (total) 1.4
Consumption 2.7 Productivity (manufacturing) 5.2

a. What is the median percent change?
b. What is the modal percent change?

18. Sally Reynolds sells real estate along the coastal area of Northern California.
Below are her total annual commissions between 2005 and 2015. Find the mean,
median, and mode of the commissions she earned for the 11 years.

Year Amount (thousands)

2005 292.16
2006 233.80
2007 206.97
2008 202.67
2009 164.69
2010 206.53
2011 237.51
2012 225.57
2013 255.33
2014 248.14
2015 269.11

19. The accounting firm of Rowatti and Koppel specializes in income tax returns
for self-employed professionals, such as physicians, dentists, architects, and law-
yers. The firm employs 11 accountants who prepare the returns. For last year, the
number of returns prepared by each accountant was:

58 75 31 58 46 65 60 71 45 58 80

E X E R C I S E S

62 CHAPTER 3

The Relative Positions of the Mean, Median, and Mode
Refer to the histogram in Chart 3–2. It is a symmetric distribution, which is also mound-
shaped. This distribution has the same shape on either side of the center. If the histo-
gram were folded in half, the two halves would be identical. For any symmetric
distribution, the mode, median, and mean are located at the center and are always
equal. They are all equal to 30 years in Chart 3–2. We should point out that there are
symmetric distributions that are not mound-shaped.

Age

Fr
eq

ue
nc

y

Mean = 30
Median = 30
Mode = 30

CHART 3–2 A Symmetric Distribution

The number of years corresponding to the highest point of the curve is the mode
(30 years). Because the distribution is symmetrical, the median corresponds to the point
where the distribution is cut in half (30 years). Also, because the arithmetic mean is the
balance point of a distribution (as shown in the Properties of the Arithmetic Mean sec-
tion on page 56), and the distribution is symmetric, the arithmetic mean is 30. Logically,
any of the three measures would be appropriate to represent the distribution’s center.

If a distribution is nonsymmetrical, or skewed, the relationship among the three
measures changes. In a positively skewed distribution, such as the distribution of
weekly income in Chart 3–3, the arithmetic mean is the largest of the three measures.
Why? Because the mean is influenced more than the median or mode by a few
extremely high values. The median is generally the next largest measure in a positively
skewed frequency distribution. The mode is the smallest of the three measures.

If the distribution is highly skewed, the mean would not be a good measure to use.
The median and mode would be more representative.

Find the mean, median, and mode for the number of returns prepared by each
accountant. If you could report only one, which measure of location would you
recommend reporting?

20. The demand for the video games provided by Mid-Tech Video Games Inc. has
exploded in the last several years. Hence, the owner needs to hire several new
technical people to keep up with the demand. Mid-Tech gives each applicant a
special test that Dr. McGraw, the designer of the test, believes is closely related to
the ability to create video games. For the general population, the mean on this test
is 100. Below are the scores on this test for the applicants.

95 105 120 81 90 115 99 100 130 10

The president is interested in the overall quality of the job applicants based on this
test. Compute the mean and the median scores for the 10 applicants. What would
you report to the president? Does it seem that the applicants are better than the
general population?

DESCRIBING DATA: NUMERICAL MEASURES 63

Conversely, if a distribution is negatively skewed, such as the distribution of tensile
strength in Chart 3–4, the mean is the lowest of the three measures. The mean is, of
course, influenced by a few extremely low observations. The median is greater than the
arithmetic mean, and the modal value is the largest of the three measures. Again, if the
distribution is highly skewed, the mean should not be used to represent the data.

Mode = 25

Fr
eq

ue
nc

y

Median = 29 Mean = 60

Weekly Income

CHART 3–3 A Positively Skewed Distribution

CHART 3–4 A Negatively Skewed Distribution

Mean = 45

Fr
eq

ue
nc

y

Median = 76 Mode = 80

Tensile Strength

The weekly sales from a sample of Hi-Tec electronic supply stores were organized into a
frequency distribution. The mean of weekly sales was computed to be $105,900, the
median $105,000, and the mode $104,500.
(a) Sketch the sales in the form of a smoothed frequency polygon. Note the location of the

mean, median, and mode on the X-axis.
(b) Is the distribution symmetrical, positively skewed, or negatively skewed? Explain.

S E L F - R E V I E W 3–3

21. The unemployment rate in the state of Alaska by month is given in the table below:

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

8.7 8.8 8.7 7.8 7.3 7.8 6.6 6.5 6.5 6.8 7.3 7.6

E X E R C I S E S

64 CHAPTER 3

E X A M P L E

Table 2–4 on page 26 shows the profit on the sales of 180 vehicles at Applewood
Auto Group. Determine the mean and the median selling price.

S O L U T I O N

The mean, median, and modal amounts of profit are reported in the following
output (highlighted in the screen shot). (Reminder: The instructions to create the
output appear in the Software Commands in Appendix C.) There are 180 vehicles
in the study, so using a calculator would be tedious and prone to error.

Software Solution
We can use a statistical software package to find many measures of location.

a. What is the arithmetic mean of the Alaska unemployment rates?
b. Find the median and the mode for the unemployment rates.
c. Compute the arithmetic mean and median for just the winter (Dec–Mar) months.

Is it much different?
22. Big Orange Trucking is designing an information system for use in “in-cab”

communications. It must summarize data from eight sites throughout a region to
describe typical conditions. Compute an appropriate measure of central location for
the variables wind direction, temperature, and pavement.

City Wind Direction Temperature Pavement

Anniston, AL West 89 Dry
Atlanta, GA Northwest 86 Wet
Augusta, GA Southwest 92 Wet
Birmingham, AL South 91 Dry
Jackson, MS Southwest 92 Dry
Meridian, MS South 92 Trace
Monroe, LA Southwest 93 Wet
Tuscaloosa, AL Southwest 93 Trace

DESCRIBING DATA: NUMERICAL MEASURES 65

THE WEIGHTED MEAN
The weighted mean is a convenient way to compute the arithmetic mean when there
are several observations of the same value. To explain, suppose the nearby Wendy’s
Restaurant sold medium, large, and Biggie-sized soft drinks for $1.84, $2.07, and $2.40,
respectively. Of the last 10 drinks sold, 3 were medium, 4 were large, and 3 were Biggie-
sized. To find the mean price of the last 10 drinks sold, we could use formula (3–2).

x =
$1.84 + $1.84 + $1.84 + $2.07 + $2.07 + $2.07 + $2.07 + $2.40 + $2.40 + $2.40

10

x =
$21.00

10
= $2.10

The mean selling price of the last 10 drinks is $2.10.
An easier way to find the mean selling price is to determine the weighted mean.

That is, we multiply each observation by the number of times it occurs. We will refer to
the weighted mean as xW . This is read “x bar sub w.”

xw =
3($1.84) + 4($2.07) + 3($2.40)

10
=

$21.00
10

= $2.10

In this case, the weights are frequency counts. However, any measure of importance
could be used as a weight. In general, the weighted mean of a set of numbers
designated x1, x2, x3, . . . , xn with the corresponding weights w1, w2, w3, . . . , wn is
computed by:

LO3-2
Compute a weighted
mean.

The mean profit is $1,843.17 and the median is $1,882.50. These two values
are less than $40 apart, so either value is reasonable. We can also see from the Excel
output that there were 180 vehicles sold and their total profit was $331,770.00. We
will describe the meaning of standard error, standard deviation, and other measures
reported on the output later in this chapter and in later chapters.

What can we conclude? The typical profit on a vehicle is about $1,850. Man-
agement at Applewood might use this value for revenue projections. For example,
if the dealership could increase the number of vehicles sold in a month from 180 to
200, this would result in an additional estimated $37,000 of revenue, found by
20($1,850).

WEIGHTED MEAN xw =
w1x1 + w2x2 + w3x3 + … + wnxn

w1 + w2 + w3 + … + wn
(3–3)

This may be shortened to:

xw =
Σ (wx)

Σw
Note that the denominator of a weighted mean is always the sum of the weights.

E X A M P L E

The Carter Construction Company pays its hourly employees $16.50, $19.00, or
$25.00 per hour. There are 26 hourly employees, 14 of whom are paid at the
$16.50 rate, 10 at the $19.00 rate, and 2 at the $25.00 rate. What is the mean
hourly rate paid the 26 employees?

66 CHAPTER 3

S O L U T I O N

To find the mean hourly rate, we multiply each of the hourly rates by the number of
employees earning that rate. From formula (3–3), the mean hourly rate is

xw =
14($16.50) + 10($19.00) + 2($25.00)

14 + 10 + 2
=

$471.00
26

= $18.1154

The weighted mean hourly wage is rounded to $18.12.

Springers sold 95 Antonelli men’s suits for the regular price of $400. For the spring sale,
the suits were reduced to $200 and 126 were sold. At the final clearance, the price was
reduced to $100 and the remaining 79 suits were sold.
(a) What was the weighted mean price of an Antonelli suit?
(b) Springers paid $200 a suit for the 300 suits. Comment on the store’s profit per suit if a

salesperson receives a $25 commission for each one sold.

S E L F - R E V I E W 3–4

THE GEOMETRIC MEAN
The geometric mean is useful in finding the average change of percentages, ratios, in-
dexes, or growth rates over time. It has a wide application in business and economics
because we are often interested in finding the percentage changes in sales, salaries, or
economic figures, such as the gross domestic product, which compound or build on
each other. The geometric mean of a set of n positive numbers is defined as the nth root
of the product of n values. The formula for the geometric mean is written:

LO3-3
Compute and interpret
the geometric mean.

GEOMETRIC MEAN GM = √n (x1) (x2) … (xn) (3–4)

The geometric mean will always be less than or equal to (never more than) the arithme-
tic mean. Also, all the data values must be positive.

As an example of the geometric mean, suppose you receive a 5% increase in salary
this year and a 15% increase next year. The average annual percent increase is 9.886%,

23. In June, an investor purchased 300 shares of Oracle (an information technology
company) stock at $20 per share. In August, she purchased an additional 400
shares at $25 per share. In November, she purchased an additional 400 shares, but
the stock declined to $23 per share. What is the weighted mean price per share?

24. The Bookstall Inc. is a specialty bookstore concentrating on used books sold via
the Internet. Paperbacks are $1.00 each, and hardcover books are $3.50. Of the
50 books sold last Tuesday morning, 40 were paperback and the rest were hard-
cover. What was the weighted mean price of a book?

25. The Loris Healthcare System employs 200 persons on the nursing staff. Fifty are
nurse’s aides, 50 are practical nurses, and 100 are registered nurses. Nurse’s aides
receive $8 an hour, practical nurses $15 an hour, and registered nurses $24 an
hour. What is the weighted mean hourly wage?

26. Andrews and Associates specialize in corporate law. They charge $100 an hour for
researching a case, $75 an hour for consultations, and $200 an hour for writing a
brief. Last week one of the associates spent 10 hours consulting with her client, 10
hours researching the case, and 20 hours writing the brief. What was the weighted
mean hourly charge for her legal services?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 67

not 10.0%. Why is this so? We begin by calculating the geometric mean. Recall, for ex-
ample, that a 5% increase in salary is 105%. We will write it as 1.05.

GM = √(1.05) (1.15) = 1.09886

This can be verified by assuming that your monthly earning was $3,000 to start and you
received two increases of 5% and 15%.

Raise 1 = $3,000(.05) = $150.00

Raise 2 = $3,150(.15) = 472.50
Total $622.50

Your total salary increase is $622.50. This is equivalent to:

$3,000.00(.09886) = $296.59

$3,296.58(.09886) = 325.91
$622.50

The following example shows the geometric mean of several percentages.

E X A M P L E

The return on investment earned by Atkins Construction Company for four succes-
sive years was 30%, 20%, −40%, and 200%. What is the geometric mean rate of
return on investment?

S O L U T I O N

The number 1.3 represents the 30% return on investment, which is the “original”
investment of 1.0 plus the “return” of 0.3. The number 0.6 represents the loss of
40%, which is the original investment of 1.0 less the loss of 0.4. This calculation
assumes the total return each period is reinvested or becomes the base for the
next period. In other words, the base for the second period is 1.3 and the base for
the third period is (1.3)(1.2) and so forth.

Then the geometric mean rate of return is 29.4%, found by

GM = √n (x1) (x2) … (xn) = √
4 (1.3) (1.2) (0.6) (3.0) = √4 2.808 = 1.294

The geometric mean is the fourth root of 2.808. So, the average rate of return (com-
pound annual growth rate) is 29.4%.

Notice also that if you compute the arithmetic mean [(30 + 20 − 40 + 200)/4 =
52.5], you would have a much larger number, which would overstate the true rate of
return!

A second application of the geometric mean is to find an average percentage
change over a period of time. For example, if you earned $45,000 in 2004 and
$100,000 in 2016, what is your annual rate of increase over the period? It is 6.88%. The
rate of increase is determined from the following formula.

RATE OF INCREASE OVER TIME GM = √
n Value at end of period

Value at start of period
− 1 (3–5)

In formula 3-5 above, n is the number of periods. An example will show the details of
finding the average annual percent increase.

68 CHAPTER 3

E X A M P L E

During the decade of the 1990s, and into the 2000s, Las Vegas, Nevada, was one
of the fastest-growing cities in the United States. The population increased from
258,295 in 1990 to 613,599 in 2014. This is an increase of 355,304 people, or a
137.56% increase over the period. The population has more than doubled. What is
the average annual percent increase?

S O L U T I O N

There are 24 years between 1990 and 2014, so n = 24. Then the geometric mean
formula (3–5) as applied to this problem is:

GM = √
n Value at end of period

Value at start of period
− 1.0 = √

24 613,599
258,295

− 1.0 = 1.0367 − 1.0 = .0367

To summarize, the steps to compute the geometric mean are:

1. Divide the value at the end of the period by the value at the beginning of the
period.

2. Find the nth root of the ratio, where n is the number of periods.
3. Subtract one.

The value of .0367 indicates that the average annual growth over the period
was 3.67%. To put it another way, the population of Las Vegas increased at a rate of
3.67% per year from 1990 to 2014.

1. The percent increase in sales for the last 4 years at Combs Cosmetics were 4.91, 5.75,
8.12, and 21.60.

(a) Find the geometric mean percent increase.
(b) Find the arithmetic mean percent increase.
(c) Is the arithmetic mean equal to or greater than the geometric mean?
2. Production of Cablos trucks increased from 23,000 units in 1996 to 120,520 in 2016.

Find the geometric mean annual percent increase.

S E L F - R E V I E W 3–5

27. Compute the geometric mean of the following monthly percent increases: 8, 12,
14, 26, and 5.

28. Compute the geometric mean of the following weekly percent increases: 2, 8, 6, 4,
10, 6, 8, and 4.

29. Listed below is the percent increase in sales for the MG Corporation over the last 5
years. Determine the geometric mean percent increase in sales over the period.

9.4 13.8 11.7 11.9 14.7

30. In 2001, a total of 40,244,000 taxpayers in the United States filed their individual
tax returns electronically. By the year 2015, the number increased to 128,653,000.
What is the geometric mean annual increase for the period?

31. The Consumer Price Index is reported monthly by the U.S. Bureau of Labor Statis-
tics. It reports the change in prices for a market basket of goods from one period to
another. The index for 2000 was 172.2. By 2015, it increased to 236.525. What
was the geometric mean annual increase for the period?

32. JetBlue Airways is an American low-cost airline headquartered in New York City. Its
main base is John F. Kennedy International Airport. JetBlue’s revenue in 2002 was
$635.2 million. By 2014, revenue had increased to $5,817.0 million. What was the
geometric mean annual increase for the period?

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 69

WHY STUDY DISPERSION?
A measure of location, such as the mean, median, or mode, only describes the center of
the data. It is valuable from that standpoint, but it does not tell us anything about the
spread of the data. For example, if your nature guide told you that the river ahead aver-
aged 3 feet in depth, would you want to wade across on foot without additional informa-
tion? Probably not. You would want to know something about the variation in the depth.
Is the maximum depth of the river 3.25 feet and the minimum 2.75 feet? If that is the
case, you would probably agree to cross. What if you learned the river depth ranged
from 0.50 foot to 5.5 feet? Your decision would probably be not to cross. Before making
a decision about crossing the river, you want information on both the typical depth and
the dispersion in the depth of the river.

A small value for a measure of dispersion indicates that the data are clustered closely,
say, around the arithmetic mean. The mean is therefore considered representative of the
data. Conversely, a large measure of dispersion indicates that the mean is not reliable.
Refer to Chart 3–5. The 100 employees of Hammond Iron Works Inc., a steel fabricating
company, are organized into a histogram based on the number of years of employment
with the company. The mean is 4.9 years, but the spread of the data is from 6 months to
16.8 years. The mean of 4.9 years is not very representative of all the employees.

LO3-4
Compute and interpret
the range, variance, and
standard deviation.

STATISTICS IN ACTION

The U.S. Postal Service has
tried to become more “user
friendly” in the last several
years. A recent survey
showed that customers
were interested in more
consistency in the time it
takes to make a delivery.
Under the old conditions, a
local letter might take only
one day to deliver, or it
might take several. “Just
tell me how many days
ahead I need to mail the
birthday card to Mom so it
gets there on her birthday,
not early, not late,” was a
common complaint. The
level of consistency is mea-
sured by the standard devi-
ation of the delivery times.

0

0

10

Years

20

Em
pl

oy
ee

s

10 20

CHART 3–5 Histogram of Years of Employment at Hammond Iron Works Inc.

A second reason for studying the dispersion in a set of data is to compare the
spread in two or more distributions. Suppose, for example, that the new Vision Quest
LCD computer monitor is assembled in Baton Rouge and also in Tucson. The arithmetic
mean hourly output in both the Baton Rouge plant and the Tucson plant is 50. Based on

33. In 2000, there were 720,000 cell phone subscribers worldwide. By 2015, the num-
ber of cell phone subscribers increased to 752,000,000. What is the geometric
mean annual increase for the period?

34. The information below shows the cost for a year of college in public and private
colleges in 2002–03 and 2015–16. What is the geometric mean annual increase
for the period for the two types of colleges? Compare the rates of increase.

Type of College 2002–03 2015–16

Public $ 4,960 $23,893
Private 18,056 32,405

70 CHAPTER 3

the two means, you might conclude that the distributions of the hourly outputs are iden-
tical. Production records for 9 hours at the two plants, however, reveal that this conclu-
sion is not correct (see Chart 3–6). Baton Rouge production varies from 48 to 52
assemblies per hour. Production at the Tucson plant is more erratic, ranging from 40 to
60 per hour. Therefore, the hourly output for Baton Rouge is clustered near the mean of
50; the hourly output for Tucson is more dispersed.

We will consider several measures of dispersion. The range is based on the maxi-
mum and minimum values in the data set; that is, only two values are considered. The
variance and the standard deviation use all the values in a data set and are based on
deviations from the arithmetic mean.

Range
The simplest measure of dispersion is the range. It is the difference between the maxi-
mum and minimum values in a data set. In the form of an equation:

48 49 50 51 52
_
X

48 49 50 51 52
_
X

53 54 55 56 57 58 59 604746454443424140

Baton Rouge

Tucson

Hourly Production

CHART 3–6 Hourly Production of Computer Monitors at the Baton Rouge and Tucson Plants

RANGE Range = Maximum value − Minimum value (3–6)

The range is widely used in production management and control applications be-
cause it is very easy to calculate and understand.

E X A M P L E

Refer to Chart 3–6 above. Find the range in the number of computer monitors pro-
duced per hour for the Baton Rouge and the Tucson plants. Interpret the two
ranges.

S O L U T I O N

The range of the hourly production of computer monitors at the Baton Rouge plant
is 4, found by the difference between the maximum hourly production of 52 and

DESCRIBING DATA: NUMERICAL MEASURES 71

Variance
A limitation of the range is that it is based on only two values, the maximum and the
minimum; it does not take into consideration all of the values. The variance does. It
measures the mean amount by which the values in a population, or sample, vary from
their mean. In terms of a definition:

the minimum of 48. The range in the hourly production for the Tucson plant is 20
computer monitors, found by 60 − 40. We therefore conclude that (1) there is less
dispersion in the hourly production in the Baton Rouge plant than in the Tucson
plant because the range of 4 computer monitors is less than a range of 20 com-
puter monitors and (2) the production is clustered more closely around the mean of
50 at the Baton Rouge plant than at the Tucson plant (because a range of 4 is less
than a range of 20). Thus, the mean production in the Baton Rouge plant (50 com-
puter monitors) is a more representative measure of location than the mean of 50
computer monitors for the Tucson plant.

VARIANCE The arithmetic mean of the squared deviations from the mean.

The following example illustrates how the variance is used to measure
dispersion.

E X A M P L E

The chart below shows the number of
cappuccinos sold at the Starbucks in
the Orange County airport and the
Ontario, California, airport between
4 and 5 p.m. for a sample of 5 days
last month.

Determine the mean, median, range, and variance for each location. Comment on
the similarities and differences in these measures.

S O L U T I O N

The mean, median, and range for each of the airport locations are reported as part
of an Excel spreadsheet.

© Sorbis/Shutterstock.com

72 CHAPTER 3

Notice that all three of the measures are exactly the same. Does this indicate that
there is no difference in the two sets of data? We get a clearer picture if we calcu-
late the variance. First, for Orange County:

Variance =
Σ(x − μ)2

N
=

(−302) + (−102) + 02 + 102 + 302

5
=

2,000
5

= 400

The variance is 400. That is, the average squared deviation from the mean is 400.
The following shows the detail of determining the variance for the number of

cappuccinos sold at the Ontario Airport.

Variance =
Σ(x − μ)2

N
=

(−302) + (−52) + 02 + 52 + 302

5
=

1,850
5

= 370

So the mean, median, and range of the cappuccinos sold are the same at the
two airports, but the variances are different. The variance at Orange County is 400,
but it is 370 at Ontario.

Let’s interpret and compare the results of our measures for the two Starbucks
airport locations. The mean and median of the two locations are exactly the same,
50 cappuccinos sold. These measures of location suggest the two distributions are
the same. The range for both locations is also the same, 60. However, recall that

DESCRIBING DATA: NUMERICAL MEASURES 73

the range provides limited information about the dispersion because it is based on
only two values, the minimum and maximum.

The variances are not the same for the two airports. The variance is based on
the differences between each observation and the arithmetic mean. It shows the
closeness or clustering of the data relative to the mean or center of the distribution.
Compare the variance for Orange County of 400 to the variance for Ontario of 370.
Based on the variance, we conclude that the dispersion for the sales distribution of
the Ontario Starbucks is more concentrated—that is, nearer the mean of 50—than
for the Orange County location.

The variance has an important advantage over the range. It uses all the values in
the computation. Recall that the range uses only the highest and the lowest values.

The weights of containers being shipped to Ireland are (in thousands of pounds):

95 103 105 110 104 105 112 90

(a) What is the range of the weights?
(b) Compute the arithmetic mean weight.
(c) Compute the variance of the weights.

S E L F - R E V I E W 3–6

For Exercises 35–38, calculate the (a) range, (b) arithmetic mean, (c) variance, and
(d) interpret the statistics.

35. During last weekend’s sale, there were five customer service representatives
on duty at the Electronic Super Store. The numbers of HDTVs these representatives
sold were 5, 8, 4, 10, and 3.

36. The Department of Statistics at Western State University offers eight sections
of basic statistics. Following are the numbers of students enrolled in these sections:
34, 46, 52, 29, 41, 38, 36, and 28.

37. Dave’s Automatic Door installs automatic garage door openers. The following
list indicates the number of minutes needed to install 10 door openers: 28, 32, 24,
46, 44, 40, 54, 38, 32, and 42.

38. All eight companies in the aerospace industry were surveyed as to their return
on investment last year. The results are: 10.6%, 12.6%, 14.8%, 18.2%, 12.0%,
14.8%, 12.2%, and 15.6%.

39. Ten young adults living in California rated the taste of a newly developed su-
shi pizza topped with tuna, rice, and kelp on a scale of 1 to 50, with 1 indicating
they did not like the taste and 50 that they did. The ratings were:

34 39 40 46 33 31 34 14 15 45

In a parallel study, 10 young adults in Iowa rated the taste of the same pizza. The
ratings were:

28 25 35 16 25 29 24 26 17 20

As a market researcher, compare the potential for sushi pizza in the two markets.
40. The personnel files of all eight employees at the Pawnee location of Acme

Carpet Cleaners Inc. revealed that during the last 6-month period they lost the fol-
lowing number of days due to illness:

2 0 6 3 10 4 1 2

E X E R C I S E S

74 CHAPTER 3

Population Variance
In the previous example, we developed the concept of variance as a measure of disper-
sion. Similar to the mean, we can calculate the variance of a population or the variance
of a sample. The formula to compute the population variance is:

POPULATION VARIANCE σ2 =
Σ(x − μ)2

N
(3–7)

where:

σ2 is the population variance (σ is the lowercase Greek letter sigma). It is read as
“sigma squared.”

x is the value of a particular observation in the population.
μ is the arithmetic mean of the population.
N is the number of observations in the population.

The process for computing the variance is implied by the formula.

1. Begin by finding the mean.
2. Find the difference between each observation and the mean, and square that

difference.
3. Sum all the squared differences.
4. Divide the sum of the squared differences by the number of items in the

population.

So the population variance is the mean of the squared difference between each value
and the mean. For populations whose values are near the mean, the variance will be
small. For populations whose values are dispersed from the mean, the population vari-
ance will be large.

The variance overcomes the weakness of the range by using all the values in the
population, whereas the range uses only the maximum and minimum values. We over-
come the issue where Σ(x − μ) = 0 by squaring the differences. Squaring the differences
will always result in nonnegative values. The following is another example that illus-
trates the calculation and interpretation of the variance.

E X A M P L E

The number of traffic citations issued last year by month in Beaufort County, South
Carolina, is reported below.

Citations by Month

January February March April May June July August September October November December
19 17 22 18 28 34 45 39 38 44 34 10

Determine the population variance.

All eight employees during the same period at the Chickpee location of Acme
Carpets revealed they lost the following number of days due to illness:

2 0 1 0 5 0 1 0

As the director of human resources, compare the two locations. What would you
recommend?

DESCRIBING DATA: NUMERICAL MEASURES 75

S O L U T I O N

Because we are studying all the citations for a year, the data comprise a population.
To determine the population variance, we use formula (3–7). The table below de-
tails the calculations.

Citations
Month (x) x − μ (x − μ)2

January 19 −10 100
February 17 −12 144
March 22 −7 49
April 18 −11 121
May 28 −1 1
June 34 5 25
July 45 16 256
August 39 10 100
September 38 9 81
October 44 15 225
November 34 5 25
December 10 −19 361
Total 348 0 1,488

1. We begin by determining the arithmetic mean of the population. The total num-
ber of citations issued for the year is 348, so the mean number issued per
month is 29.

μ =
Σx
N

=
19 + 17 + … + 10

12
=

348
12

= 29

2. Next we find the difference between each observation and the mean. This is
shown in the third column of the table. Recall on page 55 in this chapter, the
Verizon example showed that the sum of the differences between each value
and the mean is 0. This principle is repeated here. The sum of the differences
between the mean and the number of citations each month is 0.

3. The next step is to square the difference for each month. That is shown in the
fourth column of the table. All the squared differences will be positive. Note
that squaring a negative value, or multiplying a negative value by itself, always
results in a positive value.

4. The squared differences are totaled. The total of the fourth column is 1,488.
That is the term Σ(x − μ)2.

5. Finally, we divide the squared differences by N, the number of observations in
the population.

σ2 =
Σ(x − σ)2

N
=

1,488
12

= 124

So, the population variance for the number of citations is 124.

Like the range, the variance can be used to compare the dispersion in two or
more sets of observations. For example, the variance for the number of citations
issued in Beaufort County was just computed to be 124. If the variance in the num-
ber of citations issued in Marlboro County, South Carolina, is 342.9, we conclude
that (1) there is less dispersion in the distribution of the number of citations issued
in Beaufort County than in Marlboro County (because 124 is less than 342.9) and
(2) the number of citations in Beaufort County is more closely clustered around the
mean of 29 than for the number of citations issued in Marlboro County. Thus the
mean number of citations issued in Beaufort County is a more representative mea-
sure of location than the mean number of citations in Marlboro County.

76 CHAPTER 3

Population Standard Deviation
When we compute the variance, it is important to understand the unit of measure and
what happens when the differences in the numerator are squared. That is, in the previ-
ous example, the number of monthly citations is the variable. When we calculate the
variance, the unit of measure for the variance is citations squared. Using “squared cita-
tions” as a unit of measure is cumbersome.

There is a way out of this difficulty. By taking the square root of the population vari-
ance, we can transform it to the same unit of measurement used for the original data. The
square root of 124 citations squared is 11.14 citations. The units are now simply citations.
The square root of the population variance is the population standard deviation.

POPULATION STANDARD DEVIATION σ = √
Σ(x − μ)2

N
(3–8)

The Philadelphia office of PricewaterhouseCoopers hired five accounting trainees this year.
Their monthly starting salaries were $3,536; $3,173; $3,448; $3,121; and $3,622.
(a) Compute the population mean.
(b) Compute the population variance.
(c) Compute the population standard deviation.
(d) The Pittsburgh office hired six trainees. Their mean monthly salary was $3,550, and

the standard deviation was $250. Compare the two groups.

S E L F - R E V I E W 3–7

41. Consider these five values a population: 8, 3, 7, 3, and 4.
a. Determine the mean of the population.
b. Determine the variance.

42. Consider these six values a population: 13, 3, 8, 10, 8, and 6.
a. Determine the mean of the population.
b. Determine the variance.

43. The annual report of Dennis Industries cited these primary earnings per common
share for the past 5 years: $2.68, $1.03, $2.26, $4.30, and $3.58. If we assume
these are population values, what is:

a. The arithmetic mean primary earnings per share of common stock?
b. The variance?

44. Referring to Exercise 43, the annual report of Dennis Industries also gave these
returns on stockholder equity for the same 5-year period (in percent): 13.2, 5.0,
10.2, 17.5, and 12.9.

a. What is the arithmetic mean return?
b. What is the variance?

45. Plywood Inc. reported these returns on stockholder equity for the past 5 years: 4.3,
4.9, 7.2, 6.7, and 11.6. Consider these as population values.

a. Compute the range, the arithmetic mean, the variance, and the standard deviation.
b. Compare the return on stockholder equity for Plywood Inc. with that for Dennis

Industries cited in Exercise 44.
46. The annual incomes of the five vice presidents of TMV Industries are $125,000;

$128,000; $122,000; $133,000; and $140,000. Consider this a population.
a. What is the range?
b. What is the arithmetic mean income?
c. What is the population variance? The standard deviation?
d. The annual incomes of officers of another firm similar to TMV Industries were

also studied. The mean was $129,000 and the standard deviation $8,612. Com-
pare the means and dispersions in the two firms.

E X E R C I S E S

DESCRIBING DATA: NUMERICAL MEASURES 77

Sample Variance and Standard Deviation
The formula for the population mean is μ = Σx/N. We just changed the symbols for the
sample mean; that is, x = Σx/n. Unfortunately, the conversion from the population vari-
ance to the sample variance is not as direct. It requires a change in the denominator.
Instead of substituting n (number in the sample) for N (number in the population), the
denominator is n − 1. Thus the formula for the sample variance is:

SAMPLE VARIANCE s2 =
Σ(x − x )2

n − 1
(3–9)

where:

s2 is the sample variance.
x is the value of each observation in the sample.
x is the mean of the sample.
n is the number of observations in the sample.

Why is this change made in the denominator? Although the use of n is logical since x
is used to estimate μ, it tends to underestimate the population variance, σ2. The use of
(n − 1) in the denominator provides the appropriate correction for this tendency.
Because the primary use of sample statistics like s2 is to estimate population parame-
ters like σ2, (n − 1) is preferred to n in defining the sample variance. We will also use this
convention when computing the sample standard deviation.

E X A M P L E

The hourly wages for a sample of part-time employees at Home Depot are $12,
$20, $16, $18, and $19. What is the sample variance?

S O L U T I O N

The sample variance is computed by using formula (3–9).

x =
Σx
n

=
$85

5
= $17

Hourly Wage
(x) x − x (x − x )2

$12 −$5 25
20 3 9
16 −1 1
18 1 1
19 2 4

$85 0 40

s2 =
Σ(x − x )2

n − 1
=

40
5 − 1

= 10 in dollars squared

78 CHAPTER 3

The sample standard deviation is used as an estimator of the population standard
deviation. As noted previously, the population standard deviation is the square root of
the population variance. Likewise, the sample standard deviation is the square root of
the sample variance. The sample standard deviation is determined by:

SAMPLE STANDARD DEVIATION s = √
Σ(x − x )2

n − 1 (3–10)

E X A M P L E

The sample variance in the previous example involving hourly wages was com-
puted to be 10. What is the sample standard deviation?

S O L U T I O N

The sample standard deviation is $3.16, found by √10. Note again that the sample
variance is in terms of dollars squared, but taking the square root of 10 gives us
$3.16, which is in the same units (dollars) as the original data.

Software Solution
On page 64, we used Excel to determine the mean, median, and mode of profit for
the Applewood Auto Group data. You also will note that it lists the sample variance
and sample standard deviation. Excel, like most other statistical software, assumes
the data are from a sample.

The years of service for a sample of seven employees at a State Farm Insurance claims
office in Cleveland, Ohio, are 4, 2, 5, 4, 5, 2, and 6. What is the sample variance? Compute
the sample standard deviation.

S E L F - R E V I E W 3–8

DESCRIBING DATA: NUMERICAL MEASURES 79

INTERPRETATION AND USES
OF THE STANDARD DEVIATION
The standard deviation is commonly used as a measure to compare the spread in two
or more sets of observations. For example, the standard deviation of the biweekly
amounts invested in the Dupree Paint Company profit-sharing plan is computed to be
$7.51. Suppose these employees are located in Georgia. If the standard deviation for a
group of employees in Texas is $10.47, and the means are about the same, it indicates
that the amounts invested by the Georgia employees are not dispersed as much as
those in Texas (because $7.51 < $10.47). Since the amounts invested by the Georgia
employees are clustered more closely about the mean, the mean for the Georgia em-
ployees is a more reliable measure than the mean for the Texas group.

Chebyshev’s Theorem
We have stressed that a small standard deviation for a set of values indicates that these
values are located close to the mean. Conversely, a large standard deviation reveals that
the observations are widely scattered about the mean. The Russian mathematician P. L.
Chebyshev (1821–1894) developed a theorem that allows us to determine the minimum
proportion of the values that lie within a specified number of standard deviations of the
mean. For example, according to Chebyshev’s theorem, at least three out of every four,
or 75%, of the values must lie between the mean plus two standard deviations and the
mean minus two standard deviations. This relationship applies regardless of the shape of
the distribution. Further, at least eight of nine values, or 88.9%, will lie between plus three
standard deviations and minus three standard deviations of the mean. At least 24 of 25
values, or 96%, will lie between plus and minus five standard deviations of the mean.

Chebyshev’s theorem states:

LO3-5
Explain and apply
Chebyshev’s theorem
and the Empirical Rule.

STATISTICS IN ACTION

Most colleges report the
“average class size.” This
information can be mislead-
ing because average class
size can be found in several
ways. If we find the number
of students in each class at
a particular university, the
result is the mean number
of students per class. If we
compile a list of the class
sizes for each student and
find the mean class size, we
might find the mean to be
quite different. One school
found the mean number of
students in each of its 747
classes to be 40. But when

(continued)

CHEBYSHEV’S THEOREM For any set of observations (sample or population), the
proportion of the values that lie within k standard deviations of the mean is at least
1 – 1/k2, where k is any value greater than 1.

For Exercises 47–52, do the following:

a. Compute the sample variance.
b. Determine the sample standard deviation.

47. Consider these values a sample: 7, 2, 6, 2, and 3.
48. The following five values are a sample: 11, 6, 10, 6, and 7.
49. Dave’s Automatic Door, referred to in Exercise 37, installs automatic garage

door openers. Based on a sample, following are the times, in minutes, required to
install 10 door openers: 28, 32, 24, 46, 44, 40, 54, 38, 32, and 42.

50. The sample of eight companies in the aerospace industry, referred to in Exer-
cise 38, was surveyed as to their return on investment last year. The results are
10.6, 12.6, 14.8, 18.2, 12.0, 14.8, 12.2, and 15.6.

51. The Houston, Texas, Motel Owner Association conducted a survey regarding
weekday motel rates in the area. Listed below is the room rate for business-class
guests for a sample of 10 motels.

$101 $97 $103 $110 $78 $87 $101 $80 $106 $88

52. A consumer watchdog organization is concerned about credit card debt. A
survey of 10 young adults with credit card debt of more than $2,000 showed they
paid an average of just over $100 per month against their balances. Listed below
are the amounts each young adult paid last month.

$110 $126 $103 $93 $99 $113 $87 $101 $109 $100

E X E R C I S E S

80 CHAPTER 3

it found the mean from a list
of the class sizes of each
student, it was 147. Why
the disparity? Because there
are few students in the
small classes and a larger
number of students in the
larger classes, which has
the effect of increasing the
mean class size when it is
calculated this way. A
school could reduce this
mean class size for each
student by reducing the
number of students in each
class. That is, cut out the
large freshman lecture
classes.

(continued from p. 79)

EMPIRICAL RULE For a symmetrical, bell-shaped frequency distribution,
approximately 68% of the observations will lie within plus and minus one
standard deviation of the mean; about 95% of the observations will lie within plus
and minus two standard deviations of the mean; and practically all (99.7%) will lie
within plus and minus three standard deviations of the mean.

E X A M P L E

Dupree Paint Company employees contribute a mean of $51.54 to the company’s
profit-sharing plan every two weeks. The standard deviation of biweekly contributions is
$7.51. At least what percent of the contributions lie within plus 3.5 standard deviations
and minus 3.5 standard deviations of the mean, that is between $25.26 and $77.83?

S O L U T I O N

About 92%, found by

1 −
1

k2
= 1 −

1

(3.5)2
= 1 −

1
12.25

= 0.92

The Empirical Rule
Chebyshev’s theorem applies to any set of values; that is, the distribution of values can
have any shape. However, for a symmetrical, bell-shaped distribution such as the one in
Chart 3–7, we can be more precise in explaining the dispersion about the mean. These
relationships involving the standard deviation and the mean are described by the
Empirical Rule, sometimes called the Normal Rule.

These relationships are portrayed graphically in Chart 3–7 for a bell-shaped distribution
with a mean of 100 and a standard deviation of 10.

908070 110 120 130100
68%
95%

99.7%

CHART 3–7 A Symmetrical, Bell-Shaped Curve Showing the Relationships between the Standard
Deviation and the Percentage of Observations

Applying the Empirical Rule, if a distribution is symmetrical and bell-shaped, practically
all of the observations lie between the mean plus and minus three standard deviations.
Thus, if x = 100 and s = 10, practically all the observations lie between 100 + 3(10) and
100 − 3(10), or 70 and 130. The estimated range is therefore 60, found by 130 − 70.

DESCRIBING DATA: NUMERICAL MEASURES 81

Conversely, if we know that the range is 60 and the distribution is bell-shaped, we
can approximate the standard deviation by dividing the range by 6. For this illustration:
range ÷ 6 = 60 ÷ 6 = 10, the standard deviation.

E X A M P L E

A sample of the rental rates at University Park Apartments approximates a symmet-
rical, bell-shaped distribution. The sample mean is $500; the standard deviation is
$20. Using the Empirical Rule, answer these questions:

1. About 68% of the monthly rentals are between what two amounts?
2. About 95% of the monthly rentals are between what two amounts?
3. Almost all of the monthly rentals are between what two amounts?

S O L U T I O N

1. About 68% are between $480 and $520, found by x ± 1s = $500 ± 1($20).
2. About 95% are between $460 and $540, found by x ± 2s = $500 ± 2($20).
3. Almost all (99.7%) are between $440 and $560, found by x ± 3s = $500 ± 3($20).

The Pitney Pipe Company is one of several domestic manufacturers of PVC pipe. The quality
control department sampled 600 10-foot lengths. At a point 1 foot from the end of the pipe, they
measured the outside diameter. The mean was 14.0 inches and the standard deviation 0.1 inch.
(a) If we do not know the shape of the distribution of outside pipe diameters, at least what

percent of the observations will be between 13.85 inches and 14.15 inches?
(b) If we assume that the distribution of diameters is symmetrical and bell-shaped, about

95% of the observations will be between what two values?

S E L F - R E V I E W 3–9

53. According to Chebyshev’s theorem, at least what percent of any set of observa-
tions will be within 1.8 standard deviations of the mean?

54. The mean income of a group of sample observations is $500; the standard devia-
tion is $40. According to Chebyshev’s theorem, at least what percent of the in-
comes will lie between $400 and $600?

55. The distribution of the weights of a sample of 1,400 cargo containers is symmetric
and bell-shaped. According to the Empirical Rule, what percent of the weights will lie:

a. Between x − 2s and x + 2s?
b. Between x and x + 2s ? Above x + 2s?

56. The following graph portrays the distribution of the number of spicy chicken sand-
wiches sold at a nearby Wendy’s for the last 141 days. The mean number of sand-
wiches sold per day is 91.9 and the standard deviation is 4.67.

10090
Sales

If we use the Empirical Rule, sales will be between what two values on 68% of the
days? Sales will be between what two values on 95% of the days?

E X E R C I S E S

82 CHAPTER 3

THE MEAN AND STANDARD DEVIATION
OF GROUPED DATA
In most instances, measures of location, such as the mean, and measures of dispersion,
such as the standard deviation, are determined by using the individual values. Statistical
software packages make it easy to calculate these values, even for large data sets.
However, sometimes we are given only the frequency distribution and wish to estimate
the mean or standard deviation. In the following discussion, we show how we can esti-
mate the mean and standard deviation from data organized into a frequency distribu-
tion. We should stress that a mean or a standard deviation from grouped data is an
estimate of the corresponding actual values.

Arithmetic Mean of Grouped Data
To approximate the arithmetic mean of data organized into a frequency distribution, we
begin by assuming the observations in each class are represented by the midpoint of the
class. The mean of a sample of data organized in a frequency distribution is computed by:

LO3-6
Compute the mean and
standard deviation of
grouped data.

STATISTICS IN ACTION

During the 2016 Major
League Baseball season,
DJ LeMahieu of the
Colorado Rockies had the
highest batting average at
.348. Tony Gwynn hit .394
in the strike-shortened
season of 1994, and Ted
Williams hit .406 in 1941.
No one has hit over .400
since 1941. The mean bat-
ting average has remained
constant at about .260 for
more than 100 years, but
the standard deviation
declined from .049 to .031.
This indicates less disper-
sion in the batting averages
today and helps explain the
lack of any .400 hitters in
recent times.

ARITHMETIC MEAN OF GROUPED DATA x =
ΣfM

n
(3–11)

where:

x is the sample mean.
M is the midpoint of each class.
f is the frequency in each class.
fM is the frequency in each class times the midpoint of the class.
Σfm is the sum of these products.
n is the total number of frequencies.

E X A M P L E

The computations for the arithmetic mean of data grouped into a frequency distribution
will be shown based on the Applewood Auto Group profit data. Recall in Chapter 2, in
Table 2–7 on page 30, we constructed a frequency distribution for the vehicle profit.
The information is repeated below. Determine the arithmetic mean profit per vehicle.

Profit Frequency

$ 200 up to $ 600 8
600 up to 1,000 11
1,000 up to 1,400 23
1,400 up to 1,800 38
1,800 up to 2,200 45
2,200 up to 2,600 32
2,600 up to 3,000 19
3,000 up to 3,400 4

Total 180

S O L U T I O N

The mean vehicle selling price can be estimated from data grouped into a fre-
quency distribution. To find the estimated mean, assume the midpoint of each class
is representative of the data values in that class. Recall that the midpoint of a class

DESCRIBING DATA: NUMERICAL MEASURES 83

Standard Deviation of Grouped Data
To calculate the standard deviation of data grouped into a frequency distribution, we
need to adjust formula (3–10) slightly. We weight each of the squared differences by the
number of frequencies in each class. The formula is:

is halfway between the lower class limits of two consecutive classes. To find the
midpoint of a particular class, we add the lower limits of two consecutive classes
and divide by 2. Hence, the midpoint of the first class is $400, found by ($200 +
$600)/2. We assume the value of $400 is representative of the eight values in that
class. To put it another way, we assume the sum of the eight values in this class
is $3,200, found by 8($400). We continue the process of multiplying the class
midpoint by the class frequency for each class and then sum these products. The
results are summarized in Table 3–1.

TABLE 3–1 Profit on 180 Vehicles Sold Last Month at Applewood Auto Group

Solving for the arithmetic mean using formula (3–11), we get:

x =
ΣfM

n
=

$333,200
180

= $1,851.11

We conclude that the mean profit per vehicle is about $1,851.

Profit Frequency (f ) Midpoint (M) fM

$ 200 up to $ 600 8 $ 400 $ 3,200
600 up to 1,000 11 800 8,800
1,000 up to 1,400 23 1,200 27,600
1,400 up to 1,800 38 1,600 60,800
1,800 up to 2,200 45 2,000 90,000
2,200 up to 2,600 32 2,400 76,800
2,600 up to 3,000 19 2,800 53,200
3,000 up to 3,400 4 3,200 12,800

Total 180 $333,200

STANDARD DEVIATION, GROUPED DATA s = √
Σf(M − x )2

n − 1 (3–12)

where:

s is the sample standard deviation.
M is the midpoint of the class.
f is the class frequency.
n is the number of observations in the sample.
x is the sample mean.

E X A M P L E

Refer to the frequency distribution for the Applewood Auto Group profit data re-
ported in Table 3–1. Compute the standard deviation of the vehicle selling prices.

S O L U T I O N

Following the same practice used earlier for computing the mean of data grouped
into a frequency distribution, f is the class frequency, M the class midpoint, and n
the number of observations.

84 CHAPTER 3

Profit Frequency (f ) Midpoint (M) fM (M − x ) (M −x )2 f(M − x )2

$ 200 up to $ 600 8 400 3,200 −1,451 2,105,401 16,843,208
600 up to 1,000 11 800 8,800 −1,051 1,104,601 12,150,611
1,000 up to 1,400 23 1,200 27,600 −651 423,801 9,747,423
1,400 up to 1,800 38 1,600 60,800 −251 63,001 2,394,038
1,800 up to 2,200 45 2,000 90,000 149 22,201 999,045
2,200 up to 2,600 32 2,400 76,800 549 301,401 9,644,832
2,600 up to 3,000 19 2,800 53,200 949 900,601 17,111,419
3,000 up to 3,400 4 3,200 12,800 1,349 1,819,801 7,279,204

Total 180 333,200 76,169,780

To find the standard deviation:

Step 1: Subtract the mean from the class midpoint. That is, find (M − x ) =
($400 − $1,851 = −$1,451) for the first class, for the second class
($800 − $1,851 = −$1,051), and so on.

Step 2: Square the difference between the class midpoint and the mean. For
the first class, it would be ($400 − $1,851)2 = 2,105,401, for the sec-
ond class ($800 − $1,851)2 = 1,104,601, and so on.

Step 3: Multiply the squared difference between the class midpoint and the
mean by the class frequency. For the first class, the value is 8($400 −
$1,851)2 = 16,843,208; for the second, 11($800 − $1,851)2 =
12,150,611, and so on.

Step 4: Sum the f(M − x )2. The total is 76,169,920. To find the standard devi-
ation, we insert these values in formula (3–12).

s = √
Σf(M − x )2

n − 1
= √

76,169,780
180 − 1

= 652.33

The mean and the standard deviation calculated from the data grouped into
a frequency distribution are usually close to the values calculated from raw
data. The grouped data result in some loss of information. For the vehicle profit
example, the mean profit reported in the Excel output on page 64 is $1,843.17
and the standard deviation is $643.63. The respective values estimated from
data grouped into a frequency distribution are $1,851.11 and $652.33. The
difference in the means is $7.94, or about 0.4%. The standard deviations differ
by $8.70, or 1.4%. Based on the percentage difference, the estimates are very
close to the actual values.

The net incomes of a sample of twenty container shipping companies were organized into
the following table:

Net Income ($ millions) Number of Companies

2 up to 6 1
6 up to 10 4
10 up to 14 10
14 up to 18 3
18 up to 22 2

(a) What is the table called?
(b) Based on the distribution, what is the estimate of the arithmetic mean net income?
(c) Based on the distribution, what is the estimate of the standard deviation?

S E L F - R E V I E W 3–10

DESCRIBING DATA: NUMERICAL MEASURES 85

57. When we compute the mean of a frequency distribution, why do we refer to this as
an estimated mean?

58. Estimate the mean and the standard deviation of the following frequency distribu-
tion showing the number of times students eat at campus dining places in a
month.

Class Frequency

0 up to 5 2
5 up to 10 7
10 up to 15 12
15 up to 20 6
20 up to 25 3

59. Estimate the mean and the standard deviation of the following frequency dis-
tribution showing the ages of the first 60 people in line on Black Friday at a
retail store.

Class Frequency

20 up to 30 7
30 up to 40 12
40 up to 50 21
50 up to 60 18
60 up to 70 12

60. SCCoast, an Internet provider in the Southeast, developed the following frequency
distribution on the age of Internet users. Estimate the mean and the standard
deviation.

Age (years) Frequency

10 up to 20 3
20 up to 30 7
30 up to 40 18
40 up to 50 20
50 up to 60 12

61. The IRS was interested in the number of individual tax forms prepared by small
accounting firms. The IRS randomly sampled 50 public accounting firms with 10
or fewer employees in the Dallas–Fort Worth area. The following frequency ta-
ble reports the results of the study. Estimate the mean and the standard
deviation.

Number
of Clients Frequency

20 up to 30 1
30 up to 40 15
40 up to 50 22
50 up to 60 8
60 up to 70 4

E X E R C I S E S

86 CHAPTER 3

ETHICS AND REPORTING RESULTS
In Chapter 1, we discussed the ethical and unbiased reporting of statistical results.
While you are learning about how to organize, summarize, and interpret data using sta-
tistics, it is also important to understand statistics so that you can be an intelligent con-
sumer of information.

In this chapter, we learned how to compute numerical descriptive statistics. Specifi-
cally, we showed how to compute and interpret measures of location for a data set: the
mean, median, and mode. We also discussed the advantages and disadvantages for each
statistic. For example, if a real estate developer tells a client that the average home in a
particular subdivision sold for $150,000, we assume that $150,000 is a representative
selling price for all the homes. But suppose that the client also asks what the median sales
price is, and the median is $60,000. Why was the developer only reporting the mean
price? This information is extremely important to a person’s decision making when buying
a home. Knowing the advantages and disadvantages of the mean, median, and mode is
important as we report statistics and as we use statistical information to make decisions.

We also learned how to compute measures of dispersion: range, variance, and
standard deviation. Each of these statistics also has advantages and disadvantages.
Remember that the range provides information about the overall spread of a distribu-
tion. However, it does not provide any information about how the data are clustered or
concentrated around the center of the distribution. As we learn more about statistics,
we need to remember that when we use statistics we must maintain an independent
and principled point of view. Any statistical report requires objective and honest com-
munication of the results.

C H A P T E R S U M M A R Y

I. A measure of location is a value used to describe the central tendency of a set of data.
A. The arithmetic mean is the most widely reported measure of location.

1. It is calculated by adding the values of the observations and dividing by the total
number of observations.
a. The formula for the population mean of ungrouped or raw data is

μ =
Σx
N

(3–1)

b. The formula for the sample mean is

x =
Σx
n

(3–2)

62. Advertising expenses are a significant component of the cost of goods sold. Listed
below is a frequency distribution showing the advertising expenditures for 60 man-
ufacturing companies located in the Southwest. Estimate the mean and the stan-
dard deviation of advertising expenses.

Advertising Expenditure Number of
($ millions) Companies

25 up to 35 5
35 up to 45 10
45 up to 55 21
55 up to 65 16
65 up to 75 8

Total 60

DESCRIBING DATA: NUMERICAL MEASURES 87

c. The formula for the sample mean of data in a frequency distribution is

x =
ΣfM

n
(3–11)

2. The major characteristics of the arithmetic mean are:
a. At least the interval scale of measurement is required.
b. All the data values are used in the calculation.
c. A set of data has only one mean. That is, it is unique.
d. The sum of the deviations from the mean equals 0.

B. The median is the value in the middle of a set of ordered data.
1. To find the median, sort the observations from minimum to maximum and identify

the middle value.
2. The major characteristics of the median are:

a. At least the ordinal scale of measurement is required.
b. It is not influenced by extreme values.
c. Fifty percent of the observations are larger than the median.
d. It is unique to a set of data.

C. The mode is the value that occurs most often in a set of data.
1. The mode can be found for nominal-level data.
2. A set of data can have more than one mode.

D. The weighted mean is found by multiplying each observation by its corresponding weight.
1. The formula for determining the weighted mean is

xw =
w1x1 + w2 x2 + w3 x3 + … + wn xn

w1 + w2 + w3 + … + wn
(3–3)

E. The geometric mean is the nth root of the product of n positive values.
1. The formula for the geometric mean is

GM = √n (x1) (x2) (x3) … (xn) (3–4)
2. The geometric mean is also used to find the rate of change from one period to another.

GM = √
n Value at end of period

Value at beginning of period
− 1 (3–5)

3. The geometric mean is always equal to or less than the arithmetic mean.
II. The dispersion is the variation or spread in a set of data.

A. The range is the difference between the maximum and minimum values in a set of data.
1. The formula for the range is

Range = Maximum value − Minimum value (3–6)
2. The major characteristics of the range are:

a. Only two values are used in its calculation.
b. It is influenced by extreme values.
c. It is easy to compute and to understand.

B. The variance is the mean of the squared deviations from the arithmetic mean.
1. The formula for the population variance is

σ2 =
Σ(x − μ)2

N
(3–7)

2. The formula for the sample variance is

s2 =
Σ(x − x )2

n − 1
(3–9)

3. The major characteristics of the variance are:
a. All observations are used in the calculation.
b. The units are somewhat difficult to work with; they are the original units squared.

C. The standard deviation is the square root of the variance.
1. The major characteristics of the standard deviation are:

a. It is in the same units as the original data.
b. It is the square root of the average squared distance from the mean.
c. It cannot be negative.
d. It is the most widely reported measure of dispersion.

88 CHAPTER 3

2. The formula for the sample standard deviation is

s = √
Σ(x − x )2

n − 1
(3–10)

3. The formula for the standard deviation of grouped data is

s = √
Σf(M − x )2

n − 1
(3–12)

III. We use the standard deviation to describe a frequency distribution by applying
Chebyshev’s theorem or the Empirical Rule.
A. Chebyshev’s theorem states that regardless of the shape of the distribution, at least

1 − 1/k2 of the observations will be within k standard deviations of the mean, where k
is greater than 1.

B. The Empirical Rule states that for a bell-shaped distribution about 68% of the values will
be within one standard deviation of the mean, 95% within two, and virtually all within three.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

μ Population mean mu
Σ Operation of adding sigma
Σx Adding a group of values sigma x
x Sample mean x bar

xw Weighted mean x bar sub w

GM Geometric mean G M

ΣfM Adding the product of the frequencies
and the class midpoints sigma f M

σ2 Population variance sigma squared
σ Population standard deviation sigma

C H A P T E R E X E R C I S E S

63. The accounting firm of Crawford and Associates has five senior partners. Yesterday the
senior partners saw six, four, three, seven, and five clients, respectively.
a. Compute the mean and median number of clients seen by the partners.
b. Is the mean a sample mean or a population mean?
c. Verify that Σ(x − μ) = 0.

64. Owens Orchards sells apples in a large bag by weight. A sample of seven bags con-
tained the following numbers of apples: 23, 19, 26, 17, 21, 24, 22.
a. Compute the mean and median number of apples in a bag.
b. Verify that Σ(x − x ) = 0.

65. A sample of households that subscribe to United Bell Phone Company for landline
phone service revealed the following number of calls received per household last week.
Determine the mean and the median number of calls received.

52 43 30 38 30 42 12 46 39 37
34 46 32 18 41 5

66. The Citizens Banking Company is studying the number of times the ATM located
in a Loblaws Supermarket at the foot of Market Street is used per day. Following are the
number of times the machine was used daily over each of the last 30 days. Determine
the mean number of times the machine was used per day.

83 64 84 76 84 54 75 59 70 61
63 80 84 73 68 52 65 90 52 77
95 36 78 61 59 84 95 47 87 60

DESCRIBING DATA: NUMERICAL MEASURES 89

67. A recent study of the laundry habits of Americans included the time in minutes of
the wash cycle. A sample of 40 observations follows. Determine the mean and the me-
dian of a typical wash cycle.

35 37 28 37 33 38 37 32 28 29
39 33 32 37 33 35 36 44 36 34
40 38 46 39 37 39 34 39 31 33
37 35 39 38 37 32 43 31 31 35

68. Trudy Green works for the True-Green Lawn Company. Her job is to solicit lawn-
care business via the telephone. Listed below is the number of appointments she made
in each of the last 25 hours of calling. What is the arithmetic mean number of appoint-
ments she made per hour? What is the median number of appointments per hour? Write
a brief report summarizing the findings.

9 5 2 6 5 6 4 4 7 2 3 6 3
4 4 7 8 4 4 5 5 4 8 3 3

69. The Split-A-Rail Fence Company sells three types of fence to homeowners in suburban
Seattle, Washington. Grade A costs $5.00 per running foot to install, Grade B costs
$6.50 per running foot, and Grade C, the premium quality, costs $8.00 per running foot.
Yesterday, Split-A-Rail installed 270 feet of Grade A, 300 feet of Grade B, and 100 feet
of Grade C. What was the mean cost per foot of fence installed?

70. Rolland Poust is a sophomore in the College of Business at Scandia Tech. Last semester
he took courses in statistics and accounting, 3 hours each, and earned an A in both. He
earned a B in a 5-hour history course and a B in a 2-hour history of jazz course. In addi-
tion, he took a 1-hour course dealing with the rules of basketball so he could get his li-
cense to officiate high school basketball games. He got an A in this course. What was
his GPA for the semester? Assume that he receives 4 points for an A, 3 for a B, and so
on. What measure of central tendency did you calculate? What method did you use?

71. The table below shows the percent of the labor force that is unemployed and the size of
the labor force for three counties in northwest Ohio. Jon Elsas is the Regional Director of
Economic Development. He must present a report to several companies that are consid-
ering locating in northwest Ohio. What would be an appropriate unemployment rate to
show for the entire region?

County Percent Unemployed Size of Workforce

Wood 4.5 15,300
Ottawa 3.0 10,400
Lucas 10.2 150,600

72. The American Diabetes Association recommends a blood glucose reading of less
than 130 for those with Type 2 diabetes. Blood glucose measures the amount of sugar
in the blood. Below are the readings for February for a person recently diagnosed with
Type 2 diabetes.

112 122 116 103 112 96 115 98 106 111
106 124 116 127 116 108 112 112 121 115
124 116 107 118 123 109 109 106

a. What is the arithmetic mean glucose reading?
b. What is the median glucose reading?
c. What is the modal glucose reading?

73. The first Super Bowl was played in 1967. The cost for a 30-second commercial was
$42,000. The cost of a 30-second commercial for Super Bowl 50 was $4.6 million. What
was the geometric mean rate of increase for the 50 year period?

90 CHAPTER 3

74. A recent article suggested that, if you earn $25,000 a year today and the inflation
rate continues at 3% per year, you’ll need to make $33,598 in 10 years to have the
same buying power. You would need to make $44,771 if the inflation rate jumped to
6%. Confirm that these statements are accurate by finding the geometric mean rate
of increase.

75. The ages of a sample of Canadian tourists flying from Toronto to Hong Kong were 32,
21, 60, 47, 54, 17, 72, 55, 33, and 41.
a. Compute the range.
b. Compute the standard deviation.

76. The weights (in pounds) of a sample of five boxes being sent by UPS are 12, 6, 7, 3, and 10.
a. Compute the range.
b. Compute the standard deviation.

77. The enrollments of the 13 public universities in the state of Ohio are listed below.

College Enrollment

University of Akron 26,106
Bowling Green State University 18,864
Central State University 1,718
University of Cincinnati 44,354
Cleveland State University 17,194
Kent State University 41,444
Miami University 23,902
Ohio State University 62,278
Ohio University 36,493
Shawnee State University 4,230
University of Toledo 20,595
Wright State University 17,460
Youngstown State University 12,512

a. Is this a sample or a population?
b. What is the mean enrollment?
c. What is the median enrollment?
d. What is the range of the enrollments?
e. Compute the standard deviation.

78. Health issues are a concern of managers, especially as they evaluate the cost of medi-
cal insurance. A recent survey of 150 executives at Elvers Industries, a large insurance
and financial firm located in the Southwest, reported the number of pounds by which the
executives were overweight. Compute the mean and the standard deviation.

Pounds Overweight Frequency

0 up to 6 14
6 up to 12 42
12 up to 18 58
18 up to 24 28
24 up to 30 8

79. The Apollo space program lasted from 1967 until 1972 and included 13 missions.
The missions lasted from as little as 7 hours to as long as 301 hours. The duration of
each flight is listed below.

9 195 241 301 216 260 7 244 192 147
10 295 142

a. Explain why the flight times are a population.
b. Find the mean and median of the flight times.
c. Find the range and the standard deviation of the flight times.

DESCRIBING DATA: NUMERICAL MEASURES 91

80. Creek Ratz is a very popular restaurant located along the coast of northern Florida.
They serve a variety of steak and seafood dinners. During the summer beach season,
they do not take reservations or accept “call ahead” seating. Management of the restau-
rant is concerned with the time a patron must wait before being seated for dinner. Listed
below is the wait time, in minutes, for the 25 tables seated last Saturday night.

28 39 23 67 37 28 56 40 28 50
51 45 44 65 61 27 24 61 34 44
64 25 24 27 29

a. Explain why the times are a population.
b. Find the mean and median of the times.
c. Find the range and the standard deviation of the times.

81. A sample of 25 undergraduates reported the following dollar amounts of enter-
tainment expenses last year:

684 710 688 711 722 698 723 743 738 722 696 721 685
763 681 731 736 771 693 701 737 717 752 710 697

a. Find the mean, median, and mode of this information.
b. What are the range and standard deviation?
c. Use the Empirical Rule to establish an interval that includes about 95% of the

observations.
82. The Kentucky Derby is held the first Saturday in May at Churchill Downs in

Louisville, Kentucky. The race track is one and one-quarter miles. The following table
shows the winners since 1990, their margin of victory, the winning time, and the pay-
off on a $2 bet.

Winning Margin Winning Time Payoff on a
Year Winner (lengths) (minutes) $2 Win Bet

1990 Unbridled 3.5 2.03333 10.80
1991 Strike the Gold 1.75 2.05000 4.80
1992 Lil E. Tee 1 2.05000 16.80
1993 Sea Hero 2.5 2.04000 12.90
1994 Go For Gin 2 2.06000 9.10
1995 Thunder Gulch 2.25 2.02000 24.50
1996 Grindstone nose 2.01667 5.90
1997 Silver Charm head 2.04000 4.00
1998 Real Quiet 0.5 2.03667 8.40
1999 Charismatic neck 2.05333 31.30
2000 Fusaichi Pegasus 1.5 2.02000 2.30
2001 Monarchos 4.75 1.99950 10.50
2002 War Emblem 4 2.01883 20.50
2003 Funny Cide 1.75 2.01983 12.80
2004 Smarty Jones 2.75 2.06767 4.10
2005 Giacomo 0.5 2.04583 50.30
2006 Barbaro 6.5 2.02267 6.10
2007 Street Sense 2.25 2.03617 4.90
2008 Big Brown 4.75 2.03033 6.80
2009 Mine That Bird 6.75 2.04433 103.20
2010 Super Saver 2.50 2.07417 18.00
2011 Animal Kingdom 2.75 2.034 43.80
2012 I’ll Have Another 1.5 2.03050 32.60
2013 Orb 2.5 2.04817 12.80
2014 California Chrome 1.75 2.0610 7.00
2015 American Pharaoh 1.00 2.05033 7.80

92 CHAPTER 3

a. Determine the mean and median for the variables winning time and payoff on a $2 bet.
b. Determine the range and standard deviation of the variables winning time and payoff

on a $2 bet.
c. Refer to the variable winning margin. What is the level of measurement? What mea-

sure of location would be most appropriate?
83. The manager of the local Walmart Supercenter is studying the number of items

purchased by customers in the evening hours. Listed below is the number of items for a
sample of 30 customers.

15 8 6 9 9 4 18 10 10 12
12 4 7 8 12 10 10 11 9 13
5 6 11 14 5 6 6 5 13 5

a. Find the mean and the median of the number of items.
b. Find the range and the standard deviation of the number of items.
c. Organize the number of items into a frequency distribution. You may want to review the

guidelines in Chapter 2 for establishing the class interval and the number of classes.
d. Find the mean and the standard deviation of the data organized into a frequency distri-

bution. Compare these values with those computed in part (a). Why are they different?
84. The following frequency distribution reports the electricity cost for a sample of 50 two-

bedroom apartments in Albuquerque, New Mexico, during the month of May last year.

Electricity Cost Frequency

$ 80 up to $100 3
100 up to 120 8
120 up to 140 12
140 up to 160 16
160 up to 180 7
180 up to 200 4

Total 50

a. Estimate the mean cost.
b. Estimate the standard deviation.
c. Use the Empirical Rule to estimate the proportion of costs within two standard devia-

tions of the mean. What are these limits?
85. Bidwell Electronics Inc. recently surveyed a sample of employees to determine how far

they lived from corporate headquarters. The results are shown below. Compute the
mean and the standard deviation.

Distance (miles) Frequency M

0 up to 5 4 2.5
5 up to 10 15 7.5
10 up to 15 27 12.5
15 up to 20 18 17.5
20 up to 25 6 22.5

D A T A A N A L Y T I C S

86. Refer to the North Valley Real Estate data and prepare a report on the sales prices
of the homes. Be sure to answer the following questions in your report.
a. Around what values of price do the data tend to cluster? What is the mean sales

price? What is the median sales price? Is one measure more representative of the
typical sales prices than the others?

b. What is the range of sales prices? What is the standard deviation? About 95% of the
sales prices are between what two values? Is the standard deviation a useful statistic
for describing the dispersion of sales price?

c. Repeat (a) and (b) using FICO score.

DESCRIBING DATA: NUMERICAL MEASURES 93

87. Refer to the Baseball 2016 data, which report information on the 30 Major League
Baseball teams for the 2016 season. Refer to the variable team salary.
a. Prepare a report on the team salaries. Be sure to answer the following questions in

your report.
1. Around what values do the data tend to cluster? Specifically what is the mean

team salary? What is the median team salary? Is one measure more representa-
tive of the typical team salary than the others?

2. What is the range of the team salaries? What is the standard deviation? About
95% of the salaries are between what two values?

b. Refer to the information on the average salary for each year. In 2000 the average
player salary was $1.99 million. By 2016 the average player salary had increased to
$4.40 million. What was the rate of increase over the period?

88. Refer to the Lincolnville School District bus data. Prepare a report on the mainte-
nance cost for last month. Be sure to answer the following questions in your report.
a. Around what values do the data tend to cluster? Specifically what was the mean

maintenance cost last month? What is the median cost? Is one measure more repre-
sentative of the typical cost than the others?

b. What is the range of maintenance costs? What is the standard deviation? About 95%
of the maintenance costs are between what two values?

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO4-1 Construct and interpret a dot plot.

LO4-2 Construct and describe a stem-and-leaf display.

LO4-3 Identify and compute measures of position.

LO4-4 Construct and analyze a box plot.

LO4-5 Compute and interpret the coefficient of skewness.

LO4-6 Create and interpret a scatter diagram.

LO4-7 Develop and explain a contingency table.

MCGIVERN JEWELERS recently posted an advertisement on a social media site reporting
the shape, size, price, and cut grade for 33 of its diamonds in stock. Develop a box plot of
the variable price and comment on the result. (See Exercise 37 and LO4-4.)

Describing Data:
DISPLAYING AND EXPLORING DATA4

© Denis Vrublevski/Shutterstock.com

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 95

INTRODUCTION
Chapter 2 began our study of descriptive statistics. In order to transform raw or un-
grouped data into a meaningful form, we organize the data into a frequency distribution.
We present the frequency distribution in graphic form as a histogram or a frequency
polygon. This allows us to visualize where the data tend to cluster, the largest and the
smallest values, and the general shape of the data.

In Chapter 3, we first computed several measures of location, such as the mean,
median, and mode. These measures of location allow us to report a typical value in the
set of observations. We also computed several measures of dispersion, such as the
range, variance, and standard deviation. These measures of dispersion allow us to de-
scribe the variation or the spread in a set of observations.

We continue our study of descriptive statistics in this chapter. We study (1) dot plots,
(2) stem-and-leaf displays, (3) percentiles, and (4) box plots. These charts and statistics
give us additional insight into where the values are concentrated as well as the general
shape of the data. Then we consider bivariate data. In bivariate data, we observe two
variables for each individual or observation. Examples include the number of hours a
student studied and the points earned on an examination; if a sampled product meets
quality specifications and the shift on which it is manufactured; or the amount of electric-
ity used in a month by a homeowner and the mean daily high temperature in the region
for the month. These charts and graphs provide useful insights as we use business
analytics to enhance our understanding of data.

DOT PLOTS
Recall for the Applewood Auto Group data, we summarized the profit earned on the
180 vehicles sold with a frequency distribution using eight classes. When we orga-
nized the data into the eight classes, we lost the exact value of the observations. A
dot plot, on the other hand, groups the data as little as possible, and we do not lose
the identity of an individual observation. To develop a dot plot, we display a dot for
each observation along a horizontal number line indicating the possible values of the
data. If there are identical observations or the observations are too close to be shown
individually, the dots are “piled” on top of each other. This allows us to see the shape
of the distribution, the value about which the data tend to cluster, and the largest and
smallest observations. Dot plots are most useful for smaller data sets, whereas histo-
grams tend to be most useful for large data sets. An example will show how to con-
struct and interpret dot plots.

LO4-1
Construct and interpret a
dot plot.

E X A M P L E

The service departments at Tionesta Ford Lincoln and Sheffield Motors Inc., two
of the four Applewood Auto Group dealerships, were both open 24 days last
month. Listed below is the number of vehicles serviced last month at the two
dealerships. Construct dot plots and report summary statistics to compare the
two dealerships.

Tionesta Ford Lincoln

Monday Tuesday Wednesday Thursday Friday Saturday

23 33 27 28 39 26
30 32 28 33 35 32
29 25 36 31 32 27
35 32 35 37 36 30

96 CHAPTER 4

Sheffield Motors Inc.

Monday Tuesday Wednesday Thursday Friday Saturday

31 35 44 36 34 37
30 37 43 31 40 31
32 44 36 34 43 36
26 38 37 30 42 33

S O L U T I O N

The Minitab system provides a dot plot and outputs the mean, median, maximum,
and minimum values, and the standard deviation for the number of cars serviced
at each dealership over the last 24 working days.

The dot plots, shown in the center of the output, graphically illustrate the distribu-
tions for each dealership. The plots show the difference in the location and dis-
persion of the observations. By looking at the dot plots, we can see that the
number of vehicles serviced at the Sheffield dealership is more widely dispersed
and has a larger mean than at the Tionesta dealership. Several other features of
the number of vehicles serviced are:

• Tionesta serviced the fewest cars in any day, 23.
• Sheffield serviced 26 cars during their slowest day, which is 4 cars less than

the next lowest day.
• Tionesta serviced exactly 32 cars on four different days.
• The numbers of cars serviced cluster around 36 for Sheffield and 32 for Tionesta.

From the descriptive statistics, we see Sheffield serviced a mean of 35.83 vehicles
per day. Tionesta serviced a mean of 31.292 vehicles per day during the same
period. So Sheffield typically services 4.54 more vehicles per day. There is also
more dispersion, or variation, in the daily number of vehicles serviced at Sheffield
than at Tionesta. How do we know this? The standard deviation is larger at Shef-
field (4.96 vehicles per day) than at Tionesta (4.112 cars per day).

STEM-AND-LEAF DISPLAYS
In Chapter 2, we showed how to organize data into a frequency distribution so we could
summarize the raw data into a meaningful form. The major advantage to organizing the
data into a frequency distribution is we get a quick visual picture of the shape of the

LO4-2
Construct and describe a
stem-and-leaf display.

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 97

distribution without doing any further calculation. To put it another way, we can see
where the data are concentrated and also determine whether there are any extremely
large or small values. There are two disadvantages, however, to organizing the data into
a frequency distribution: (1) we lose the exact identity of each value and (2) we are not
sure how the values within each class are distributed. To explain, the Theater of the
Republic in Erie, Pennsylvania, books live theater and musical performances. The the-
ater’s capacity is 160 seats. Last year, among the forty-five performances, there were
eight different plays and twelve different bands. The following frequency distribution
shows that between eighty up to ninety people attended two of the forty-five perfor-
mances; there were seven performances where ninety up to one hundred people at-
tended. However, is the attendance within this class clustered about 90, spread evenly
throughout the class, or clustered near 99? We cannot tell.

Attendance Frequency

80 up to 90 2
90 up to 100 7
100 up to 110 6
110 up to 120 9
120 up to 130 8
130 up to 140 7
140 up to 150 3
150 up to 160 3

Total 45

One technique used to display quantitative information in a condensed form
and provide more information than the frequency distribution is the stem-and-leaf
display. An advantage of the stem-and-leaf display over a frequency distribution is
we do not lose the identity of each observation. In the above example, we would not
know the identity of the values in the 90 up to 100 class. To illustrate the construc-
tion of a stem-and-leaf display using the number people attending each perfor-
mance, suppose the seven observations in the 90 up to 100 class are 96, 94, 93,
94, 95, 96, and 97. The stem value is the leading digit or digits, in this case 9. The
leaves are the trailing digits. The stem is placed to the left of a vertical line and the
leaf values to the right.

The values in the 90 up to 100 class would appear as follows:

9 ∣ 6 4 3 4 5 6 7

It is also customary to sort the values within each stem from smallest to largest. Thus,
the second row of the stem-and-leaf display would appear as follows:

9 ∣ 3 4 4 5 6 6 7

With the stem-and-leaf display, we can quickly observe that 94 people attended two
performances and the number attending ranged from 93 to 97. A stem-and-leaf display
is similar to a frequency distribution with more information, that is, the identity of the
observations is preserved.

STEM-AND-LEAF DISPLAY A statistical technique to present a set of data. Each
numerical value is divided into two parts. The leading digit(s) becomes the stem
and the trailing digit the leaf. The stems are located along the vertical axis, and the
leaf values are stacked against each other along the horizontal axis.

98 CHAPTER 4

The following example explains the details of developing a stem-and-leaf display.

E X A M P L E

Listed in Table 4–1 is the number of people attending each of the 45 performances
at the Theater of the Republic last year. Organize the data into a stem-and-leaf
display. Around what values does attendance tend to cluster? What is the smallest
attendance? The largest attendance?

S O L U T I O N

From the data in Table 4–1, we note that the smallest attendance is 88. So we will
make the first stem value 8. The largest attendance is 156, so we will have the
stem values begin at 8 and continue to 15. The first number in Table 4–1 is 96,
which has a stem value of 9 and a leaf value of 6. Moving across the top row, the
second value is 93 and the third is 88. After the first 3 data values are considered,
the chart is as follows.

Stem Leaf

8 8
9 6 3
10
11
12
13
14
15

Organizing all the data, the stem-and-leaf chart looks as follows.

Stem Leaf

8 8 9
9 6 3 5 6 4 4 7
10 8 7 3 4 6 3
11 7 3 2 7 2 1 9 8 3
12 7 5 7 0 5 5 0 4
13 9 5 2 9 4 6 8
14 8 2 3
15 6 5 5

The usual procedure is to sort the leaf values from the smallest to largest. The last
line, the row referring to the values in the 150s, would appear as:

15 ∣ 5 5 6

TABLE 4–1 Number of People Attending Each of the 45 Performances at the Theater of
the Republic

96 93 88 117 127 95 113 96 108 94 148 156
139 142 94 107 125 155 155 103 112 127 117 120
112 135 132 111 125 104 106 139 134 119 97 89
118 136 125 143 120 103 113 124 138

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 99

The final table would appear as follows, where we have sorted all of the leaf values.

Stem Leaf

8 8 9
9 3 4 4 5 6 6 7
10 3 3 4 6 7 8
11 1 2 2 3 3 7 7 8 9
12 0 0 4 5 5 5 7 7
13 2 4 5 6 8 9 9
14 2 3 8
15 5 5 6

You can draw several conclusions from the stem-and-leaf display. First, the mini-
mum number of people attending is 88 and the maximum is 156. There were two per-
formances with less than 90 people attending, and three performances with 150 or
more. You can observe, for example, that for the three performances with more than
150 people attending, the actual attendances were 155, 155, and 156. The concentra-
tion of attendance is between 110 and 130. There were fifteen performances with at-
tendance between 110 and 119 and eight performances between 120 and 129. We
can also tell that within the 120 to 129 group the actual attendances were spread
evenly throughout the class. That is, 120 people attended two performances, 124 peo-
ple attended one performance, 125 people attended three performances, and 127 peo-
ple attended two performances.

We also can generate this information on the Minitab software system. We have
named the variable Attendance. The Minitab output is below. You can find the Minitab
commands that will produce this output in Appendix C.

The Minitab solution provides some additional information regarding cumulative totals.
In the column to the left of the stem values are numbers such as 2, 9, 15, and so on. The
number 9 indicates there are 9 observations that have occurred before the value of 100.
The number 15 indicates that 15 observations have occurred prior to 110. About halfway
down the column the number 9 appears in parentheses. The parentheses indicate that the
middle value or median appears in that row and there are nine values in this group. In this
case, we describe the middle value as the value below which half of the observations oc-
cur. There are a total of 45 observations, so the middle value, if the data were arranged
from smallest to largest, would be the 23rd observation; its value is 118. After the median,
the values begin to decline. These values represent the “more than” cumulative totals.
There are 21 observations of 120 or more, 13 of 130 or more, and so on.

100 CHAPTER 4

Which is the better choice, a dot plot or a stem-and-leaf chart? This is really a matter
of personal choice and convenience. For presenting data, especially with a large num-
ber of observations, you will find dot plots are more frequently used. You will see dot
plots in analytical literature, marketing reports, and occasionally in annual reports. If you
are doing a quick analysis for yourself, stem-and-leaf tallies are handy and easy, partic-
ularly on a smaller set of data.

© Somos/Veer/Getty Images RF

1. The number of employees at each of the 142 Home Depot stores in the Southeast
region is shown in the following dot plot.

100 10484 88 92
Number of employees

9680

(a) What are the maximum and minimum numbers of employees per store?
(b) How many stores employ 91 people?
(c) Around what values does the number of employees per store tend to cluster?
2. The rate of return for 21 stocks is:

8.3 9.6 9.5 9.1 8.8 11.2 7.7 10.1 9.9 10.8
10.2 8.0 8.4 8.1 11.6 9.6 8.8 8.0 10.4 9.8 9.2

S E L F - R E V I E W 4–1

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 101

Organize this information into a stem-and-leaf display.
(a) How many rates are less than 9.0?
(b) List the rates in the 10.0 up to 11.0 category.
(c) What is the median?
(d) What are the maximum and the minimum rates of return?

1. Describe the differences between a histogram and a dot plot. When might a dot
plot be better than a histogram?

2. Describe the differences between a histogram and a stem-and-leaf display.
3. Consider the following chart.

6 72 3 4 51

a. What is this chart called?
b. How many observations are in the study?
c. What are the maximum and the minimum values?
d. Around what values do the observations tend to cluster?

4. The following chart reports the number of cell phones sold at a big-box retail store
for the last 26 days.

199 144

a. What are the maximum and the minimum numbers of cell phones sold in a day?
b. What is a typical number of cell phones sold?

5. The first row of a stem-and-leaf chart appears as follows: 62 | 1 3 3 7 9. Assume
whole number values.

a. What is the “possible range” of the values in this row?
b. How many data values are in this row?
c. List the actual values in this row of data.

6. The third row of a stem-and-leaf chart appears as follows: 21 | 0 1 3 5 7 9. Assume
whole number values.

a. What is the “possible range” of the values in this row?
b. How many data values are in this row?
c. List the actual values in this row of data.

7. The following stem-and-leaf chart shows the number of units produced per day in a
factory.

Stem Leaf
3 8
4
5 6
6 0133559
7 0236778
8 59
9 00156
10 36

a. How many days were studied?
b. How many observations are in the first class?

E X E R C I S E S

102 CHAPTER 4

c. What are the minimum value and the maximum value?
d. List the actual values in the fourth row.
e. List the actual values in the second row.
f. How many values are less than 70?
g. How many values are 80 or more?
h. What is the median?
i. How many values are between 60 and 89, inclusive?

8. The following stem-and-leaf chart reports the number of prescriptions filled per day
at the pharmacy on the corner of Fourth and Main Streets.

Stem Leaf
12 689
13 123
14 6889
15 589
16 35
17 24568
18 268
19 13456
20 034679
21 2239
22 789
23 00179
24 8
25 13
26
27 0

a. How many days were studied?
b. How many observations are in the last class?
c. What are the maximum and the minimum values in the entire set of data?
d. List the actual values in the fourth row.
e. List the actual values in the next to the last row.
f. On how many days were less than 160 prescriptions filled?
g. On how many days were 220 or more prescriptions filled?
h. What is the middle value?
i. How many days did the number of filled prescriptions range between 170 and 210?

9. A survey of the number of phone calls made by a sample of 16 Verizon sub-
scribers last week revealed the following information. Develop a stem-and-leaf
chart. How many calls did a typical subscriber make? What were the maximum and
the minimum number of calls made?

52 43 30 38 30 42 12 46 39
37 34 46 32 18 41 5

10. Aloha Banking Co. is studying ATM use in suburban Honolulu. Yesterday, for a
sample of 30 ATM's, the bank counted the number of times each machine was
used. The data is presented in the table. Develop a stem-and-leaf chart to summa-
rize the data. What were the typical, minimum, and maximum number of times each
ATM was used?

83 64 84 76 84 54 75 59 70 61
63 80 84 73 68 52 65 90 52 77
95 36 78 61 59 84 95 47 87 60

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 103

MEASURES OF POSITION
The standard deviation is the most widely used measure of dispersion. However, there
are other ways of describing the variation or spread in a set of data. One method is to
determine the location of values that divide a set of observations into equal parts. These
measures include quartiles, deciles, and percentiles.

Quartiles divide a set of observations into four equal parts. To explain further, think of
any set of values arranged from the minimum to the maximum. In Chapter 3, we called the
middle value of a set of data arranged from the minimum to the maximum the median.
That is, 50% of the observations are larger than the median and 50% are smaller. The
median is a measure of location because it pinpoints the center of the data. In a similar
fashion, quartiles divide a set of observations into four equal parts. The first quartile, usu-
ally labeled Q1, is the value below which 25% of the observations occur, and the third
quartile, usually labeled Q3, is the value below which 75% of the observations occur.

Similarly, deciles divide a set of observations into 10 equal parts and percentiles
into 100 equal parts. So if you found that your GPA was in the 8th decile at your univer-
sity, you could conclude that 80% of the students had a GPA lower than yours and 20%
had a higher GPA. If your GPA was in the 92nd percentile, then 92% of students had a
GPA less than your GPA and only 8% of students had a GPA greater than your GPA. Per-
centile scores are frequently used to report results on such national standardized tests
as the SAT, ACT, GMAT (used to judge entry into many master of business administration
programs), and LSAT (used to judge entry into law school).

Quartiles, Deciles, and Percentiles
To formalize the computational procedure, let Lp refer to the location of a desired percen-
tile. So if we want to find the 92nd percentile we would use L92, and if we wanted the
median, the 50th percentile, then L50. For a number of observations, n, the location of
the Pth percentile, can be found using the formula:

LO4-3
Identify and compute
measures of position.

LOCATION OF A PERCENTILE Lp = (n + 1)
P

100
[4–1]

An example will help to explain further.

E X A M P L E

Morgan Stanley is an investment company with offices located throughout the
United States. Listed below are the commissions earned last month by a sample of
15 brokers at the Morgan Stanley office in Oakland, California.

$2,038 $1,758 $1,721 $1,637 $2,097 $2,047 $2,205 $1,787 $2,287
1,940 2,311 2,054 2,406 1,471 1,460

Locate the median, the first quartile, and the third quartile for the commissions
earned.

S O L U T I O N

The first step is to sort the data from the smallest commission to the largest.

$1,460 $1,471 $1,637 $1,721 $1,758 $1,787 $1,940 $2,038
2,047 2,054 2,097 2,205 2,287 2,311 2,406

104 CHAPTER 4

In the above example, the location formula yielded a whole number. That is, we
wanted to find the first quartile and there were 15 observations, so the location formula
indicated we should find the fourth ordered value. What if there were 20 observations
in the sample, that is n = 20, and we wanted to locate the first quartile? From the loca-
tion formula (4–1):

L25 = (n + 1)
P

100
= (20 + 1)

25
100

= 5.25

We would locate the fifth value in the ordered array and then move .25 of the distance
between the fifth and sixth values and report that as the first quartile. Like the median,
the quartile does not need to be one of the actual values in the data set.

To explain further, suppose a data set contained the six values 91, 75, 61, 101, 43,
and 104. We want to locate the first quartile. We order the values from the minimum to
the maximum: 43, 61, 75, 91, 101, and 104. The first quartile is located at

L25 = (n + 1)
P

100
= (6 + 1)

25
100

= 1.75

The position formula tells us that the first quartile is located between the first and the
second values and it is .75 of the distance between the first and the second values. The
first value is 43 and the second is 61. So the distance between these two values is 18.
To locate the first quartile, we need to move .75 of the distance between the first and
second values, so .75(18) = 13.5. To complete the procedure, we add 13.5 to the first
value, 43, and report that the first quartile is 56.5.

We can extend the idea to include both deciles and percentiles. To locate the 23rd
percentile in a sample of 80 observations, we would look for the 18.63 position.

L23 = (n + 1)
P

100
= (80 + 1)

23
100

= 18.63

The median value is the observation in the
center and is the same as the 50th percen-
tile, so P equals 50. So the median or L50 is
located at (n + 1)(50/100), where n is the
number of observations. In this case, that is
position number 8, found by (15 + 1)
(50/100). The eighth-largest commission is
$2,038. So we conclude this is the median
and that half the brokers earned com-
missions more than $2,038 and half
earned less than $2,038. The result using

formula (4–1) to find the median is the same as the method presented in
Chapter 3.

Recall the definition of a quartile. Quartiles divide a set of observations into
four equal parts. Hence 25% of the observations will be less than the first quartile.
Seventy-five percent of the observations will be less than the third quartile. To
locate the first quartile, we use formula (4–1), where n = 15 and P = 25:

L25 = (n + 1)
P

100
= (15 + 1)

25
100

= 4

and to locate the third quartile, n = 15 and P = 75:

L75 = (n + 1)
P

100
= (15 + 1)

75
100

= 12

Therefore, the first and third quartile values are located at positions 4 and 12,
respectively. The fourth value in the ordered array is $1,721 and the twelfth is
$2,205. These are the first and third quartiles.

© Ramin Talaie/Getty Images

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 105

To find the value corresponding to the 23rd percentile, we would locate the 18th value
and the 19th value and determine the distance between the two values. Next, we would
multiply this difference by 0.63 and add the result to the smaller value. The result would
be the 23rd percentile.

Statistical software is very helpful when describing and summarizing data. Excel,
Minitab, and MegaStat, a statistical analysis Excel add-in, all provide summary statistics
that include quartiles. For example, the Minitab summary of the Morgan Stanley com-
mission data, shown below, includes the first and third quartiles, and other statistics.
Based on the reported quartiles, 25% of the commissions earned were less than
$1,721 and 75% were less than $2,205. These are the same values we calculated
using formula (4–1).

There are ways other than formula (4–1) to lo-
cate quartile values. For example, another method
uses 0.25n + 0.75 to locate the position of the first
quartile and 0.75n + 0.25 to locate the position of
the third quartile. We will call this the Excel Method.
In the Morgan Stanley data, this method would
place the first quartile at position 4.5 (.25 × 15 +
.75) and the third quartile at position 11.5 (.75 ×
15 + .25). The first quartile would be interpolated
as 0.5, or one-half the difference between the
fourth- and the fifth-ranked values. Based on this
method, the first quartile is $1739.5, found by
($1,721 + 0.5[$1,758 − $1,721]). The third quar-
tile, at position 11.5, would be $2,151, or one-half
the distance between the eleventh- and the

twelfth-ranked values, found by ($2,097 + 0.5[$2,205 − $2,097]). Excel, as shown in
the Morgan Stanley and Applewood examples, can compute quartiles using either of
the two methods. Please note the text uses formula (4–1) to calculate quartiles.

Is the difference between the two methods important? No. Usually it is just a nui-
sance. In general, both methods calculate values that will support the statement that ap-
proximately 25% of the values are less than the value of the first quartile, and approximately
75% of the data values are less than the value of the third quartile. When the sample is

large, the difference in the results
from the two methods is small. For
example, in the Applewood Auto
Group data there are 180 vehicles.
The quartiles computed using both
methods are shown to the left. Based
on the variable profit, 45 of the
180 values (25%) are less than both
values of the first quartile, and 135 of
the 180 values (75%) are less than
both values of the third quartile.

When using Excel, be careful to
understand the method used to

STATISTICS IN ACTION

John W. Tukey (1915–2000)
received a PhD in mathe-
matics from Princeton in
1939. However, when he
joined the Fire Control Re-
search Office during World
War II, his interest in ab-
stract mathematics shifted
to applied statistics. He de-
veloped effective numerical
and graphical methods for
studying patterns in data.
Among the graphics he
developed are the stem-
and-leaf diagram and the
box-and-whisker plot or box
plot. From 1960 to 1980,
Tukey headed the statistical
division of NBC’s election
night vote projection team.
He became renowned in
1960 for preventing an
early call of victory for
Richard Nixon in the presi-
dential election won by
John F. Kennedy.

Morgan Stanley
Commissions

1460 Equation 4-1
2047
1471

Quartile 1
Quartile 3

1721
2205

Alternate Method
Quartile 1
Quartile 3

1739.5
2151

2054
1637
2097
1721
2205
1758
2287
1787
2311
1940
2406
2038

Pro�tAge
Applewood

Equation 4-1
Quartile 1
Quartile 3

1415.5
2275.5

Alternate Method
Quartile 1
Quartile 3

1422.5
2268.5

$1,387
$1,754
$1,817
$1,040
$1,273
$1,529
$3,082
$1,951
$2,692
$1,342

21
23
24
25
26
27
27
28
28
29

106 CHAPTER 4

calculate quartiles. Excel 2013 and Excel 2016 offer both methods. The Excel function,
Quartile.exc, will result in the same answer as Equation 4–1. The Excel function, Quar-
tile.inc, will result in the Excel Method answers.

The Quality Control department of Plainsville Peanut Company is responsible for checking
the weight of the 8-ounce jar of peanut butter. The weights of a sample of nine jars pro-
duced last hour are:

7.69 7.72 7.8 7.86 7.90 7.94 7.97 8.06 8.09

(a) What is the median weight?
(b) Determine the weights corresponding to the first and third quartiles.

S E L F - R E V I E W 4–2

11. Determine the median and the first and third quartiles in the following data.

46 47 49 49 51 53 54 54 55 55 59

12. Determine the median and the first and third quartiles in the following data.

5.24 6.02 6.67 7.30 7.59 7.99 8.03 8.35 8.81 9.45
9.61 10.37 10.39 11.86 12.22 12.71 13.07 13.59 13.89 15.42

13. The Thomas Supply Company Inc. is a distributor of gas-powered generators.
As with any business, the length of time customers take to pay their invoices is im-
portant. Listed below, arranged from smallest to largest, is the time, in days, for a
sample of The Thomas Supply Company Inc. invoices.

13 13 13 20 26 27 31 34 34 34 35 35 36 37 38
41 41 41 45 47 47 47 50 51 53 54 56 62 67 82

a. Determine the first and third quartiles.
b. Determine the second decile and the eighth decile.
c. Determine the 67th percentile.

14. Kevin Horn is the national sales manager for National Textbooks Inc. He
has a sales staff of 40 who visit college professors all over the United States.
Each Saturday morning he requires his sales staff to send him a report. This re-
port includes, among other things, the number of professors visited during the
previous week. Listed below, ordered from smallest to largest, are the number
of visits last week.

38 40 41 45 48 48 50 50 51 51 52 52 53 54 55 55 55 56 56 57
59 59 59 62 62 62 63 64 65 66 66 67 67 69 69 71 77 78 79 79

a. Determine the median number of calls.
b. Determine the first and third quartiles.
c. Determine the first decile and the ninth decile.
d. Determine the 33rd percentile.

E X E R C I S E S

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 107

BOX PLOTS
A box plot is a graphical display, based on quartiles, that helps us picture a set of data.
To construct a box plot, we need only five statistics: the minimum value, Q1 (the first
quartile), the median, Q3 (the third quartile), and the maximum value. An example will
help to explain.

LO4-4
Construct and analyze a
box plot.

E X A M P L E

Alexander’s Pizza offers free delivery of its pizza within 15 miles. Alex, the owner,
wants some information on the time it takes for delivery. How long does a typical
delivery take? Within what range of times will most deliveries be completed? For a
sample of 20 deliveries, he determined the following information:

Minimum value = 13 minutes

Q1 = 15 minutes

Median = 18 minutes

Q3 = 22 minutes

Maximum value = 30 minutes

Develop a box plot for the delivery times. What conclusions can you make about
the delivery times?

S O L U T I O N

The first step in drawing a box plot is to create an appropriate scale along the
horizontal axis. Next, we draw a box that starts at Q1 (15 minutes) and ends at Q3
(22 minutes). Inside the box we place a vertical line to represent the median (18
minutes). Finally, we extend horizontal lines from the box out to the minimum
value (13 minutes) and the maximum value (30 minutes). These horizontal lines
outside of the box are sometimes called “whiskers” because they look a bit like a
cat’s whiskers.

12 14 16 18 20 22 24 26 28 30 32

Q1
Median

Q3

Minimum
value

Maximum
value

Minutes

The box plot also shows the interquartile range of delivery times between
Q1 and Q3. The interquartile range is 7 minutes and indicates that 50% of the
deliveries are between 15 and 22 minutes.

The box plot also reveals that the distribution of delivery times is positively skewed.
In Chapter 3, we defined skewness as the lack of symmetry in a set of data. How do we
know this distribution is positively skewed? In this case, there are actually two pieces
of information that suggest this. First, the dashed line to the right of the box from 22
minutes (Q3) to the maximum time of 30 minutes is longer than the dashed line from
the left of 15 minutes (Q1) to the minimum value of 13 minutes. To put it another way,

108 CHAPTER 4

the 25% of the data larger than the third quartile is more spread out than the 25% less
than the first quartile. A second indication of positive skewness is that the median is
not in the center of the box. The distance from the first quartile to the median is smaller
than the distance from the median to the third quartile. We know that the number of
delivery times between 15 minutes and 18 minutes is the same as the number of de-
livery times between 18 minutes and 22 minutes.

E X A M P L E

Refer to the Applewood Auto Group data. Develop a box plot for the variable age of
the buyer. What can we conclude about the distribution of the age of the buyer?

S O L U T I O N

Minitab was used to develop the following chart and summary statistics.

The median age of the purchaser is 46 years, 25% of the purchasers are less than
40 years of age, and 25% are more than 52.75 years of age. Based on the sum-
mary information and the box plot, we conclude:

• Fifty percent of the purchasers are between the ages of 40 and 52.75 years.
• The distribution of ages is fairly symmetric. There are two reasons for this con-

clusion. The length of the whisker above 52.75 years (Q3) is about the same
length as the whisker below 40 years (Q1). Also, the area in the box between
40 years and the median of 46 years is about the same as the area between
the median and 52.75.

There are three asterisks (*) above 70 years. What do they indicate? In a box
plot, an asterisk identifies an outlier. An outlier is a value that is inconsistent with
the rest of the data. It is defined as a value that is more than 1.5 times the inter-
quartile range smaller than Q1 or larger than Q3. In this example, an outlier would
be a value larger than 71.875 years, found by:

Outlier > Q3 + 1.5(Q3 − Q1) = 52.75 + 1.5(52.75 − 40) = 71.875

An outlier would also be a value less than 20.875 years.

Outlier < Q1 − 1.5(Q3 − Q1) = 40 − 1.5(52.75 − 40) = 20.875

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 109

The following box plot shows the assets in millions of dollars for credit unions in Seattle,
Washington.

0 10 20 30 40 50 60 70 80 90 100

What are the smallest and largest values, the first and third quartiles, and the median?
Would you agree that the distribution is symmetrical? Are there any outliers?

S E L F - R E V I E W 4–3

From the box plot, we conclude there are three purchasers 72 years of age or
older and none less than 21 years of age. Technical note: In some cases, a single
asterisk may represent more than one observation because of the limitations of the
software and space available. It is a good idea to check the actual data. In this in-
stance, there are three purchasers 72 years old or older; two are 72 and one is 73.

15. The box plot below shows the amount spent for books and supplies per year by
students at four-year public colleges.

0 350 700 1,050 1,400 $1,750

a. Estimate the median amount spent.
b. Estimate the first and third quartiles for the amount spent.
c. Estimate the interquartile range for the amount spent.
d. Beyond what point is a value considered an outlier?
e. Identify any outliers and estimate their value.
f. Is the distribution symmetrical or positively or negatively skewed?

16. The box plot shows the undergraduate in-state tuition per credit hour at four-year
public colleges.

*

0 300 600 900 1,200 $1,500

a. Estimate the median.
b. Estimate the first and third quartiles.
c. Determine the interquartile range.
d. Beyond what point is a value considered an outlier?
e. Identify any outliers and estimate their value.
f. Is the distribution symmetrical or positively or negatively skewed?

17. In a study of the gasoline mileage of model year 2016 automobiles, the mean miles
per gallon was 27.5 and the median was 26.8. The smallest value in the study was
12.70 miles per gallon, and the largest was 50.20. The first and third quartiles were
17.95 and 35.45 miles per gallon, respectively. Develop a box plot and comment
on the distribution. Is it a symmetric distribution?

E X E R C I S E S

110 CHAPTER 4

SKEWNESS
In Chapter 3, we described measures of central location for a distribution of data by re-
porting the mean, median, and mode. We also described measures that show the amount
of spread or variation in a distribution, such as the range and the standard deviation.

Another characteristic of a distribution is the shape. There are four shapes com-
monly observed: symmetric, positively skewed, negatively skewed, and bimodal. In a
symmetric distribution the mean and median are equal and the data values are evenly
spread around these values. The shape of the distribution below the mean and median
is a mirror image of distribution above the mean and median. A distribution of values is
skewed to the right or positively skewed if there is a single peak, but the values extend
much farther to the right of the peak than to the left of the peak. In this case, the mean
is larger than the median. In a negatively skewed distribution there is a single peak, but
the observations extend farther to the left, in the negative direction, than to the right. In
a negatively skewed distribution, the mean is smaller than the median. Positively
skewed distributions are more common. Salaries often follow this pattern. Think of the
salaries of those employed in a small company of about 100 people. The president and
a few top executives would have very large salaries relative to the other workers and
hence the distribution of salaries would exhibit positive skewness. A bimodal distribu-
tion will have two or more peaks. This is often the case when the values are from two or
more populations. This information is summarized in Chart 4–1.

LO4-5
Compute and interpret
the coefficient of
skewness.

M
ed

ia
n

M
ea

n

45

Fr
eq

ue
nc

y

Fr
eq

ue
nc

y

Fr
eq

ue
nc

y

Fr
eq

ue
nc

y

Years

Ages

Symmetric

Monthly Salaries

Positively Skewed

$3,000 $4,000

M
ed

ia
n

M
ea

n

Median
Mean

Test Scores

Negatively Skewed

75 80 Score

Mean

Outside Diameter

Bimodal

.98 1.04 Inches$

CHART 4–1 Shapes of Frequency Polygons

There are several formulas in the statistical literature used to calculate skewness.
The simplest, developed by Professor Karl Pearson (1857–1936), is based on the differ-
ence between the mean and the median.

18. A sample of 28 time shares in the Orlando, Florida, area revealed the follow-
ing daily charges for a one-bedroom suite. For convenience, the data are ordered
from smallest to largest. Construct a box plot to represent the data. Comment on
the distribution. Be sure to identify the first and third quartiles and the median.

$116 $121 $157 $192 $207 $209 $209
229 232 236 236 239 243 246
260 264 276 281 283 289 296
307 309 312 317 324 341 353

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 111

Using this relationship, the coefficient of skewness can range from −3 up to 3. A value
near −3, such as −2.57, indicates considerable negative skewness. A value such as
1.63 indicates moderate positive skewness. A value of 0, which will occur when the
mean and median are equal, indicates the distribution is symmetrical and there is no
skewness present.

In this text, we present output from Minitab and Excel. Both of these software pack-
ages compute a value for the coefficient of skewness based on the cubed deviations
from the mean. The formula is:

SOFTWARE COEFFICIENT OF SKEWNESS

sk =
n

(n − 1) (n − 2)[
∑(

x − x
s )

3

] [4–3]

Formula (4–3) offers an insight into skewness. The right-hand side of the formula is
the difference between each value and the mean, divided by the standard deviation.
That is the portion (x − x )/s of the formula. This idea is called standardizing. We will
discuss the idea of standardizing a value in more detail in Chapter 7 when we describe
the normal probability distribution. At this point, observe that the result is to report the
difference between each value and the mean in units of the standard deviation. If this
difference is positive, the particular value is larger than the mean; if the value is nega-
tive, the standardized quantity is smaller than the mean. When we cube these values,
we retain the information on the direction of the difference. Recall that in the formula for
the standard deviation [see formula (3–10)] we squared the difference between each
value and the mean, so that the result was all nonnegative values.

If the set of data values under consideration is symmetric, when we cube the stan-
dardized values and sum over all the values, the result would be near zero. If there are
several large values, clearly separate from the others, the sum of the cubed differences
would be a large positive value. If there are several small values clearly separate from
the others, the sum of the cubed differences will be negative.

An example will illustrate the idea of skewness.

PEARSON’S COEFFICIENT OF SKEWNESS sk =
3(x − Median)

s
[4–2]

STATISTICS IN ACTION

The late Stephen Jay Gould
(1941–2002) was a profes-
sor of zoology and professor
of geology at Harvard
University. In 1982, he was
diagnosed with cancer and
had an expected survival
time of 8 months. However,
never to be discouraged,
his research showed that
the distribution of survival
time is dramatically skewed
to the right and showed that
not only do 50% of similar
cancer patients survive
more than 8 months, but
that the survival time could
be years rather than months!
In fact, Dr. Gould lived an-
other 20 years. Based on
his experience, he wrote a
widely published essay
titled “The Median Is Not
the Message.”

E X A M P L E

Following are the earnings per share for a sample of 15 software companies for the
year 2016. The earnings per share are arranged from smallest to largest.

Compute the mean, median, and standard deviation. Find the coefficient of
skewness using Pearson’s estimate and the software methods. What is your
conclusion regarding the shape of the distribution?

S O L U T I O N

These are sample data, so we use formula (3–2) to determine the mean

x =
Σx
n

=
$74.26

15
= $4.95

$0.09 $0.13 $0.41 $0.51 $ 1.12 $ 1.20 $ 1.49 $3.18
3.50 6.36 7.83 8.92 10.13 12.99 16.40

112 CHAPTER 4

The median is the middle value in a set of data, arranged from smallest to largest.
In this case, there is an odd-number of observations, so the middle value is the
median. It is $3.18.

We use formula (3–10) on page 78 to determine the sample standard deviation.

s = √
Σ(x − x )2

n − 1
= √

($0.09 − $4.95)2 + … + ($16.40 − $4.95)2

15 − 1
= $5.22

Pearson’s coefficient of skewness is 1.017, found by

sk =
3(x − Median)

s
=

3($4.95 − $3.18)
$5.22

= 1.017

This indicates there is moderate positive skewness in the earnings per share data.
We obtain a similar, but not exactly the same, value from the software method.

The details of the calculations are shown in Table 4–2. To begin, we find the differ-
ence between each earnings per share value and the mean and divide this result
by the standard deviation. We have referred to this as standardizing. Next, we cube,
that is, raise to the third power, the result of the first step. Finally, we sum the cubed
values. The details for the first company, that is, the company with an earnings per
share of $0.09, are:

(
x − x

s )
3

= (
0.09 − 4.95

5.22 )
3

= (−0.9310)3 = −0.8070

When we sum the 15 cubed values, the result is 11.8274. That is, the term
Σ[(x − x )/s]3 = 11.8274. To find the coefficient of skewness, we use formula (4–3),
with n = 15.

sk =
n

(n − 1) (n − 2)
∑(

x − x
s )

3

=
15

(15 − 1) (15 − 2)
(11.8274) = 0.975

We conclude that the earnings per share values are somewhat positively
skewed. The following Minitab summary reports the descriptive measures, such as

TABLE 4–2 Calculation of the Coefficient of Skewness

Earnings per Share
(x − x )

s
(

x − x
s )

3

0.09 −0.9310 −0.8070
0.13 −0.9234 −0.7873
0.41 −0.8697 −0.6579
0.51 −0.8506 −0.6154
1.12 −0.7337 −0.3950
1.20 −0.7184 −0.3708
1.49 −0.6628 −0.2912
3.18 −0.3391 −0.0390
3.50 −0.2778 −0.0214
6.36 0.2701 0.0197
7.83 0.5517 0.1679
8.92 0.7605 0.4399
10.13 0.9923 0.9772

12.99 1.5402 3.6539
16.40 2.1935 10.5537

11.8274

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 113

A sample of five data entry clerks employed in the Horry County Tax Office revised the fol-
lowing number of tax records last hour: 73, 98, 60, 92, and 84.
(a) Find the mean, median, and the standard deviation.
(b) Compute the coefficient of skewness using Pearson’s method.
(c) Calculate the coefficient of skewness using the software method.
(d) What is your conclusion regarding the skewness of the data?

S E L F - R E V I E W 4–4

For Exercises 19–22:

a. Determine the mean, median, and the standard deviation.
b. Determine the coefficient of skewness using Pearson’s method.
c. Determine the coefficient of skewness using the software method.

19. The following values are the starting salaries, in $000, for a sample of five
accounting graduates who accepted positions in public accounting last year.

36.0 26.0 33.0 28.0 31.0

20. Listed below are the salaries, in $000, for a sample of 15 chief financial offi-
cers in the electronics industry.

$516.0 $548.0 $566.0 $534.0 $586.0 $529.0
546.0 523.0 538.0 523.0 551.0 552.0
486.0 558.0 574.0

E X E R C I S E S

the mean, median, and standard deviation of the earnings per share data. Also in-
cluded are the coefficient of skewness and a histogram with a bell-shaped curve
superimposed.

114 CHAPTER 4

DESCRIBING THE RELATIONSHIP BETWEEN
TWO VARIABLES
In Chapter 2 and the first section of this chapter, we presented graphical techniques
to summarize the distribution of a single variable. We used a histogram in Chapter 2
to summarize the profit on vehicles sold by the Applewood Auto Group. Earlier in

this chapter, we used dot plots and stem-and-leaf
displays to visually summarize a set of data. Because
we are studying a single variable, we refer to this as
univariate data.

There are situations where we wish to study and
visually portray the relationship between two vari-
ables. When we study the relationship between two
variables, we refer to the data as bivariate. Data ana-
lysts frequently wish to understand the relationship
between two variables. Here are some examples:

• Tybo and Associates is a law firm that advertises ex-
tensively on local TV. The partners are considering
increasing their advertising budget. Before doing
so, they would like to know the relationship be-
tween the amount spent per month on advertising
and the total amount of billings for that month. To
put it another way, will increasing the amount spent
on advertising result in an increase in billings?

LO4-6
Create and interpret a
scatter diagram.

© Steve Mason/Getty Images RF

21. Listed below are the commissions earned ($000) last year by the 15 sales
representatives at Furniture Patch Inc.

$ 3.9 $ 5.7 $ 7.3 $10.6 $13.0 $13.6 $15.1 $15.8 $17.1
17.4 17.6 22.3 38.6 43.2 87.7

22. Listed below are the salaries for the 2016 New York Yankees Major League
Baseball team.

Player Salary Player Salary

CC Sabathia $25,000,000 Dustin Ackley $3,200,000
Mark Teixeira 23,125,000 Martin Prado 3,000,000
Masahiro Tanaka 22,000,000 Didi Gregorius 2,425,000
Jacoby Ellsbury 21,142,857 Aaron Hicks 574,000
Alex Rodriguez 21,000,000 Austin Romine 556,000
Brian McCann 17,000,000 Chasen Shreve 533,400
Carlos Beltran 15,000,000 Greg Bird 525,300
Brett Gardner 13,500,000 Luis Severino 521,300
Chase Headley 13,000,000 Bryan Mitchell 516,650
Aroldis Chapman 11,325,000 Kirby Yates 511,900
Andrew Miller 9,000,000 Mason Williams 509,700
Starlin Castro 7,857,143 Ronald Torreyes 508,600
Nathan Eovaldi 5,600,000 John Barbato 507,500
Michael Pineda 4,300,000 Dellin Betances 507,500
Ivan Nova 4,100,000 Luis Cessa 507,500

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 115

• Coastal Realty is studying the selling prices of homes. What variables seem to be
related to the selling price of homes? For example, do larger homes sell for more
than smaller ones? Probably. So Coastal might study the relationship between the
area in square feet and the selling price.

• Dr. Stephen Givens is an expert in human development. He is studying the relation-
ship between the height of fathers and the height of their sons. That is, do tall fathers
tend to have tall children? Would you expect LeBron James, the 6′8″, 250 pound
professional basketball player, to have relatively tall sons?

One graphical technique we use to show the relationship between variables is called a
scatter diagram.

To draw a scatter diagram, we need two variables. We scale one variable
along the horizontal axis (X-axis) of a graph and the other variable along the vertical
axis (Y-axis). Usually one variable depends to some degree on the other. In the
third example above, the height of the son depends on the height of the father. So
we scale the height of the father on the horizontal axis and that of the son on the
vertical axis.

We can use statistical software, such as Excel, to perform the plotting function for
us. Caution: You should always be careful of the scale. By changing the scale of either
the vertical or the horizontal axis, you can affect the apparent visual strength of the
relationship.

Following are three scatter diagrams (Chart 4–2). The one on the left shows a
rather strong positive relationship between the age in years and the maintenance
cost last year for a sample of 10 buses owned by the city of Cleveland, Ohio. Note
that as the age of the bus increases, the yearly maintenance cost also increases. The
example in the center, for a sample of 20 vehicles, shows a rather strong indirect rela-
tionship between the odometer reading and the auction price. That is, as the number
of miles driven increases, the auction price decreases. The example on the right de-
picts the relationship between the height and yearly salary for a sample of 15 shift
supervisors. This graph indicates there is little relationship between their height and
yearly salary.

$24,000
21,000
18,000
15,000
12,000A

uc
tio

n
pr

ic
e

10,000 30,000 50,000
Odometer

Auction Price versus Odometer
$10,000

8,000
6,000
4,000
2,000

0

Co
st

(a
nn

ua
l)

0 1 2 3 4 5 6
Age (years)

Age of Buses and
Maintenance Cost Height versus Salary

125
120
115
110
105
100

95
90S

al
ar

y
($

00
0)

54 55 56 57 58 59 60 61 62 63
Height (inches)

CHART 4–2 Three Examples of Scatter Diagrams.

E X A M P L E

In the introduction to Chapter 2, we presented data from the Applewood Auto
Group. We gathered information concerning several variables, including the profit
earned from the sale of 180 vehicles sold last month. In addition to the amount of
profit on each sale, one of the other variables is the age of the purchaser. Is there a
relationship between the profit earned on a vehicle sale and the age of the pur-
chaser? Would it be reasonable to conclude that more profit is made on vehicles
purchased by older buyers?

116 CHAPTER 4

In the preceding example, there is a weak positive, or direct, relationship between the
variables. There are, however, many instances where there is a relationship between
the variables, but that relationship is inverse or negative. For example:

• The value of a vehicle and the number of miles driven. As the number of miles in-
creases, the value of the vehicle decreases.

• The premium for auto insurance and the age of the driver. Auto rates tend to be the
highest for younger drivers and less for older drivers.

• For many law enforcement personnel, as the number of years on the job increases,
the number of traffic citations decreases. This may be because personnel become
more liberal in their interpretations or they may be in supervisor positions and not
in a position to issue as many citations. But in any event, as age increases, the num-
ber of citations decreases.

CONTINGENCY TABLES
A scatter diagram requires that both of the variables be at least interval scale. In the
Applewood Auto Group example, both age and vehicle profit are ratio scale variables.
Height is also ratio scale as used in the discussion of the relationship between the
height of fathers and the height of their sons. What if we wish to study the relationship
between two variables when one or both are nominal or ordinal scale? In this case, we
tally the results in a contingency table.

LO4-7
Develop and explain a
contingency table.

S O L U T I O N

We can investigate the relationship between vehicle profit and the age of the buyer
with a scatter diagram. We scale age on the horizontal, or X-axis, and the profit on
the vertical, or Y-axis. We assume profit depends on the age of the purchaser. As
people age, they earn more income and purchase more expensive cars which, in
turn, produce higher profits. We use Excel to develop the scatter diagram. The
Excel commands are in Appendix C.

The scatter diagram shows a rather weak positive relationship between the two
variables. It does not appear there is much relationship between the vehicle profit
and the age of the buyer. In Chapter 13, we will study the relationship between
variables more extensively, even calculating several numerical measures to ex-
press the relationship between variables.

0 10 20 30 40
Age (Years)

Profit and Age of Buyer at Applewood Auto Group
Pr

ofi
t p

er
V

eh
ic

le
($

)

50 60 70 80
$0

$500

$1,000

$1,500

$2,000

$2,500

$3,000

$3,500

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 117

A contingency table is a cross-tabulation that simultaneously summarizes two variables
of interest. For example:

• Students at a university are classified by gender and class (freshman, sophomore,
junior, or senior).

• A product is classified as acceptable or unacceptable and by the shift (day, after-
noon, or night) on which it is manufactured.

• A voter in a school bond referendum is classified as to party affiliation (Democrat,
Republican, other) and the number of children that voter has attending school in the
district (0, 1, 2, etc.).

CONTINGENCY TABLE A table used to classify observations according to two
identifiable characteristics.

E X A M P L E

There are four dealerships in the Applewood Auto Group. Suppose we want to com-
pare the profit earned on each vehicle sold by the particular dealership. To put it
another way, is there a relationship between the amount of profit earned and the
dealership?

S O L U T I O N

In a contingency table, both variables only need to be nominal or ordinal. In this
example, the variable dealership is a nominal variable and the variable profit is a
ratio variable. To convert profit to an ordinal variable, we classify the variable profit
into two categories, those cases where the profit earned is more than the median
and those cases where it is less. On page 64, we calculated the median profit for all
sales last month at Applewood Auto Group to be $1,882.50.

Contingency Table Showing the Relationship between Profit and Dealership

Above/Below
Median Profit Kane Olean Sheffield Tionesta Total

Above 25 20 19 26 90
Below 27 20 26 17 90

Total 52 40 45 43 180

By organizing the information into a contingency table, we can compare the profit
at the four dealerships. We observe the following:

• From the Total column on the right, 90 of the 180 cars sold had a profit above
the median and half below. From the definition of the median, this is
expected.

• For the Kane dealership, 25 out of the 52, or 48%, of the cars sold were sold
for a profit more than the median.

• The percentage of profits above the median for the other dealerships are 50%
for Olean, 42% for Sheffield, and 60% for Tionesta.

We will return to the study of contingency tables in Chapter 5 during the study of
probability and in Chapter 15 during the study of nonparametric methods of analysis.

118 CHAPTER 4

The rock group Blue String Beans is touring the United States. The following chart shows
the relationship between concert seating capacity and revenue in $000 for a sample of
concerts.

5800 6300 6800
Seating Capacity

8

7

6

5

4

3

2

Am
ou

nt
($

00
0)

7300

(a) What is the diagram called?
(b) How many concerts were studied?
(c) Estimate the revenue for the concert with the largest seating capacity.
(d) How would you characterize the relationship between revenue and seating capacity?

Is it strong or weak, direct or inverse?

S E L F - R E V I E W 4–5

23. Develop a scatter diagram for the following sample data. How would you
describe the relationship between the values?

x-Value y-Value x-Value y-Value

10 6 11 6
8 2 10 5
9 6 7 2

11 5 7 3
13 7 11 7

24. Silver Springs Moving and Storage Inc. is studying the relationship between the
number of rooms in a move and the number of labor hours required for the move.
As part of the analysis, the CFO of Silver Springs developed the following scatter
diagram.

1 2 3
Rooms

40

30

20

10

0

Ho
ur

s

54

E X E R C I S E S

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 119

a. How many moves are in the sample?
b. Does it appear that more labor hours are required as the number of rooms

increases, or do labor hours decrease as the number of rooms increases?

25. The Director of Planning for Devine Dining Inc. wishes to study the relationship be-
tween the gender of a guest and whether the guest orders dessert. To investigate the
relationship, the manager collected the following information on 200 recent customers.

Gender

Dessert Ordered Male Female Total

Yes 32 15 47
No 68 85 153

Total 100 100 200

a. What is the level of measurement of the two variables?
b. What is the above table called?
c. Does the evidence in the table suggest men are more likely to order dessert

than women? Explain why.

26. Ski Resorts of Vermont Inc. is considering a merger with Gulf Shores Beach Resorts
Inc. of Alabama. The board of directors surveyed 50 stockholders concerning their
position on the merger. The results are reported below.

Opinion

Number of Shares Held Favor Oppose Undecided Total

Under 200 8 6 2 16
200 up to 1,000 6 8 1 15
Over 1,000 6 12 1 19

Total 20 26 4 50

a. What level of measurement is used in this table?
b. What is this table called?
c. What group seems most strongly opposed to the merger?

C H A P T E R S U M M A R Y

I. A dot plot shows the range of values on the horizontal axis and the number of observa-
tions for each value on the vertical axis.
A. Dot plots report the details of each observation.
B. They are useful for comparing two or more data sets.

II. A stem-and-leaf display is an alternative to a histogram.
A. The leading digit is the stem and the trailing digit the leaf.
B. The advantages of a stem-and-leaf display over a histogram include:

1. The identity of each observation is not lost.
2. The digits themselves give a picture of the distribution.
3. The cumulative frequencies are also shown.

III. Measures of location also describe the shape of a set of observations.
A. Quartiles divide a set of observations into four equal parts.

1. Twenty-five percent of the observations are less than the first quartile, 50% are
less than the second quartile, and 75% are less than the third quartile.

2. The interquartile range is the difference between the third quartile and the first
quartile.

B. Deciles divide a set of observations into 10 equal parts and percentiles into 100
equal parts.

120 CHAPTER 4

IV. A box plot is a graphic display of a set of data.
A. A box is drawn enclosing the regions between the first quartile and the third quartile.

1. A line is drawn inside the box at the median value.
2. Dotted line segments are drawn from the third quartile to the largest value to

show the highest 25% of the values and from the first quartile to the smallest
value to show the lowest 25% of the values.

B. A box plot is based on five statistics: the maximum and minimum values, the first and
third quartiles, and the median.

V. The coefficient of skewness is a measure of the symmetry of a distribution.
A. There are two formulas for the coefficient of skewness.

1. The formula developed by Pearson is:

sk =
3(x − Median)

s
[4–2]

2. The coefficient of skewness computed by statistical software is:

sk =
n

(n − 1) (n − 2)[
∑(

x − x
s )

3

] [4–3]

VI. A scatter diagram is a graphic tool to portray the relationship between two variables.
A. Both variables are measured with interval or ratio scales.
B. If the scatter of points moves from the lower left to the upper right, the variables un-

der consideration are directly or positively related.
C. If the scatter of points moves from the upper left to the lower right, the variables are

inversely or negatively related.
VII. A contingency table is used to classify nominal-scale observations according to two

characteristics.

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

Lp Location of percentile L sub p

Q1 First quartile Q sub 1

Q3 Third quartile Q sub 3

C H A P T E R E X E R C I S E S

27. A sample of students attending Southeast Florida University is asked the number of so-
cial activities in which they participated last week. The chart below was prepared from
the sample data.

41 2
Activities

30

a. What is the name given to this chart?
b. How many students were in the study?
c. How many students reported attending no social activities?

28. Doctor’s Care is a walk-in clinic, with locations in Georgetown, Moncks Corner, and
Aynor, at which patients may receive treatment for minor injuries, colds, and flu, as well
as physical examinations. The following charts report the number of patients treated in
each of the three locations last month.

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 121

5020 30
Patients

4010

Location

Georgetown

Moncks Corner

Aynor

Describe the number of patients served at the three locations each day. What are the
maximum and minimum numbers of patients served at each of the locations?

29. Below is the number of customers who visited Smith’s True-Value hardware store
in Bellville, Ohio, over the last twenty-three days. Make a stem-and-leaf display of this
variable.

46 52 46 40 42 46 40 37 46 40 52 32 37 32 52
40 32 52 40 52 46 46 52

30. The top 25 companies (by market capitalization) operating in the Washington, DC,
area along with the year they were founded and the number of employees are given
below. Make a stem-and-leaf display of each of these variables and write a short de-
scription of your findings.

Company Name Year Founded Employees

AES Corp. 1981 30,000
American Capital Ltd. 1986 484
AvalonBay Communities Inc. 1978 1,767
Capital One Financial Corp. 1995 31,800
Constellation Energy Group Inc. 1816 9,736
Coventry Health Care Inc. 1986 10,250
Danaher Corp. 1984 45,000
Dominion Resources Inc. 1909 17,500
Fannie Mae 1938 6,450
Freddie Mac 1970 5,533
Gannett Co. 1906 49,675
General Dynamics Corp. 1952 81,000
Genworth Financial Inc. 2004 7,200
Harman International Industries Inc. 1980 11,246
Host Hotels & Resorts Inc. 1927 229
Legg Mason 1899 3,800
Lockheed Martin Corp. 1995 140,000
Marriott International Inc. 1927 151,000
MedImmune LLC 1988 2,516
NII Holdings Inc. 1996 7,748
Norfolk Southern Corp. 1982 30,594
Pepco Holdings Inc. 1896 5,057
Sallie Mae 1972 11,456
T. Rowe Price Group Inc. 1937 4,605
The Washington Post Co. 1877 17,100

31. In recent years, due to low interest rates, many homeowners refinanced their
home mortgages. Linda Lahey is a mortgage officer at Down River Federal Savings

122 CHAPTER 4

and Loan. Below is the amount refinanced for 20 loans she processed last week.
The data are reported in thousands of dollars and arranged from smallest to
largest.

59.2 59.5 61.6 65.5 66.6 72.9 74.8 77.3 79.2
83.7 85.6 85.8 86.6 87.0 87.1 90.2 93.3 98.6
100.2 100.7

a. Find the median, first quartile, and third quartile.
b. Find the 26th and 83rd percentiles.
c. Draw a box plot of the data.

32. A study is made by the recording industry in the United States of the number
of music CDs owned by 25 senior citizens and 30 young adults. The information is
reported below.

Seniors

28 35 41 48 52 81 97 98 98 99
118 132 133 140 145 147 153 158 162 174
177 180 180 187 188

Young Adults

81 107 113 147 147 175 183 192 202 209
233 251 254 266 283 284 284 316 372 401
417 423 490 500 507 518 550 557 590 594

a. Find the median and the first and third quartiles for the number of CDs owned by
senior citizens. Develop a box plot for the information.

b. Find the median and the first and third quartiles for the number of CDs owned by
young adults. Develop a box plot for the information.

c. Compare the number of CDs owned by the two groups.
33. The corporate headquarters of Bank.com, an on-line banking company, is located

in downtown Philadelphia. The director of human resources is making a study of the
time it takes employees to get to work. The city is planning to offer incentives to each
downtown employer if they will encourage their employees to use public transportation.
Below is a listing of the time to get to work this morning according to whether the em-
ployee used public transportation or drove a car.

Public Transportation

23 25 25 30 31 31 32 33 35 36
37 42

Private

32 32 33 34 37 37 38 38 38 39
40 44

a. Find the median and the first and third quartiles for the time it took employees using
public transportation. Develop a box plot for the information.

b. Find the median and the first and third quartiles for the time it took employees who
drove their own vehicle. Develop a box plot for the information.

c. Compare the times of the two groups.
34. The following box plot shows the number of daily newspapers published in each

state and the District of Columbia. Write a brief report summarizing the number pub-
lished. Be sure to include information on the values of the first and third quartiles,

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 123

the median, and whether there is any skewness. If there are any outliers, estimate
their value.

Number of Newspapers

****

0 20 40 60 80 100

35. Walter Gogel Company is an industrial supplier of fasteners, tools, and springs. The
amounts of its invoices vary widely, from less than $20.00 to more than $400.00. During
the month of January the company sent out 80 invoices. Here is a box plot of these in-
voices. Write a brief report summarizing the invoice amounts. Be sure to include infor-
mation on the values of the first and third quartiles, the median, and whether there is any
skewness. If there are any outliers, approximate the value of these invoices.

Invoice Amount

*

0 50 100 150 200 250

36. The American Society of PeriAnesthesia Nurses (ASPAN; www.aspan.org) is a
national organization serving nurses practicing in ambulatory surgery, preanesthesia, and
postanesthesia care. The organization consists of the 40 components listed below.

State/Region Membership

Alabama 95
Arizona 399
Maryland, Delaware, DC 531
Connecticut 239
Florida 631
Georgia 384
Hawaii 73
Illinois 562
Indiana 270
Iowa 117
Kentucky 197
Louisiana 258
Michigan 411
Massachusetts 480
Maine 97
Minnesota, Dakotas 289
Missouri, Kansas 282
Mississippi 90
Nebraska 115
North Carolina 542
Nevada 106

State/Region Membership

New Jersey, Bermuda 517
Alaska, Idaho, Montana,
Oregon, Washington 708
New York 891
Ohio 708
Oklahoma 171
Arkansas 68
California 1,165
New Mexico 79
Pennsylvania 575
Rhode Island 53
Colorado 409
South Carolina 237
Texas 1,026
Tennessee 167
Utah 67
Virginia 414
Vermont,
New Hampshire 144
Wisconsin 311
West Virginia 62

Use statistical software to answer the following questions.
a. Find the mean, median, and standard deviation of the number of members per

component.

124 CHAPTER 4

b. Find the coefficient of skewness, using the software. What do you conclude about
the shape of the distribution of component size?

c. Compute the first and third quartiles using formula (4–1).
d. Develop a box plot. Are there any outliers? Which components are outliers? What are

the limits for outliers?
37. McGivern Jewelers is located in the Levis Square Mall just south of Toledo, Ohio.

Recently it posted an advertisement on a social media site reporting the shape, size,
price, and cut grade for 33 of its diamonds currently in stock. The information is re-
ported below.

Shape Size (carats) Price Cut Grade Shape Size (carats) Price Cut Grade

Princess 5.03 $44,312 Ideal cut Round 0.77 $2,828 Ultra ideal cut
Round 2.35 20,413 Premium cut Oval 0.76 3,808 Premium cut
Round 2.03 13,080 Ideal cut Princess 0.71 2,327 Premium cut
Round 1.56 13,925 Ideal cut Marquise 0.71 2,732 Good cut
Round 1.21 7,382 Ultra ideal cut Round 0.70 1,915 Premium cut
Round 1.21 5,154 Average cut Round 0.66 1,885 Premium cut
Round 1.19 5,339 Premium cut Round 0.62 1,397 Good cut
Emerald 1.16 5,161 Ideal cut Round 0.52 2,555 Premium cut
Round 1.08 8,775 Ultra ideal cut Princess 0.51 1,337 Ideal cut
Round 1.02 4,282 Premium cut Round 0.51 1,558 Premium cut
Round 1.02 6,943 Ideal cut Round 0.45 1,191 Premium cut
Marquise 1.01 7,038 Good cut Princess 0.44 1,319 Average cut
Princess 1.00 4,868 Premium cut Marquise 0.44 1,319 Premium cut
Round 0.91 5,106 Premium cut Round 0.40 1,133 Premium cut
Round 0.90 3,921 Good cut Round 0.35 1,354 Good cut
Round 0.90 3,733 Premium cut Round 0.32 896 Premium cut
Round 0.84 2,621 Premium cut

a. Develop a box plot of the variable price and comment on the result. Are there any
outliers? What is the median price? What are the values of the first and the third
quartiles?

b. Develop a box plot of the variable size and comment on the result. Are there any
outliers? What is the median price? What are the values of the first and the third
quartiles?

c. Develop a scatter diagram between the variables price and size. Be sure to put price
on the vertical axis and size on the horizontal axis. Does there seem to be an associ-
ation between the two variables? Is the association direct or indirect? Does any point
seem to be different from the others?

d. Develop a contingency table for the variables shape and cut grade. What is the most
common cut grade? What is the most common shape? What is the most common
combination of cut grade and shape?

38. Listed below is the amount of commissions earned last month for the eight mem-
bers of the sales staff at Best Electronics. Calculate the coefficient of skewness using
both methods. Hint: Use of a spreadsheet will expedite the calculations.

980.9 1,036.5 1,099.5 1,153.9 1,409.0 1,456.4 1,718.4 1,721.2

39. Listed below is the number of car thefts in a large city over the last week. Calculate
the coefficient of skewness using both methods. Hint: Use of a spreadsheet will expe-
dite the calculations.

3 12 13 7 8 3 8

DESCRIBING DATA: DISPLAYING AND EXPLORING DATA 125

40. The manager of Information Services at Wilkin Investigations, a private investigation firm,
is studying the relationship between the age (in months) of a combination printer, copier,
and fax machine and its monthly maintenance cost. For a sample of 15 machines, the
manager developed the following chart. What can the manager conclude about the re-
lationship between the variables?

34 39 44
Months

$130

120

110

100

90

80

M
on

th
ly

M
ai

nt
en

an
ce

C
os

t
49

41. An auto insurance company reported the following information regarding the age
of a driver and the number of accidents reported last year. Develop a scatter diagram for
the data and write a brief summary.

Age Accidents Age Accidents

16 4 23 0
24 2 27 1
18 5 32 1
17 4 22 3

42. Wendy’s offers eight different condiments (mustard, catsup, onion, mayonnaise, pickle,
lettuce, tomato, and relish) on hamburgers. A store manager collected the following in-
formation on the number of condiments ordered and the age group of the customer.
What can you conclude regarding the information? Who tends to order the most or least
number of condiments?

Age

Number of Condiments Under 18 18 up to 40 40 up to 60 60 or older

0 12 18 24 52
1 21 76 50 30
2 39 52 40 12
3 or more 71 87 47 28

43. Here is a table showing the number of employed and unemployed workers 20 years or
older by gender in the United States.

Number of Workers (000)

Gender Employed Unemployed

Men 70,415 4,209
Women 61,402 3,314

a. How many workers were studied?
b. What percent of the workers were unemployed?
c. Compare the percent unemployed for the men and the women.

126 A REVIEW OF CHAPTERS 1–4

D A T A A N A L Y T I C S

44. Refer to the North Valley real estate data recorded on homes sold during the last
year. Prepare a report on the selling prices of the homes based on the answers to the
following questions.
a. Compute the minimum, maximum, median, and the first and the third quartiles of

price. Create a box plot. Comment on the distribution of home prices.
b. Develop a scatter diagram with price on the vertical axis and the size of the home on

the horizontal. Is there a relationship between these variables? Is the relationship
direct or indirect?

c. For homes without a pool, develop a scatter diagram with price on the vertical axis
and the size of the home on the horizontal. Do the same for homes with a pool. How
do the relationships between price and size for homes without a pool and homes
with a pool compare?

45. Refer to the Baseball 2016 data that report information on the 30 Major League
Baseball teams for the 2016 season.
a. In the data set, the year opened, is the first year of operation for that stadium. For

each team, use this variable to create a new variable, stadium age, by subtracting
the value of the variable, year opened, from the current year. Develop a box plot
with the new variable, age. Are there any outliers? If so, which of the stadiums are
outliers?

b. Using the variable, salary, create a box plot. Are there any outliers? Compute the
quartiles using formula (4–1). Write a brief summary of your analysis.

c. Draw a scatter diagram with the variable, wins, on the vertical axis and salary on the
horizontal axis. What are your conclusions?

d. Using the variable, wins, draw a dot plot. What can you conclude from this plot?
46. Refer to the Lincolnville School District bus data.

a. Referring to the maintenance cost variable, develop a box plot. What are the mini-
mum, first quartile, median, third quartile, and maximum values? Are there any
outliers?

b. Using the median maintenance cost, develop a contingency table with bus manufac-
turer as one variable and whether the maintenance cost was above or below the
median as the other variable. What are your conclusions?

A REVIEW OF CHAPTERS 1–4
This section is a review of the major concepts and terms introduced in Chapters 1–4. Chapter 1 began by describing the
meaning and purpose of statistics. Next we described the different types of variables and the four levels of measurement.
Chapter 2 was concerned with describing a set of observations by organizing it into a frequency distribution and then
portraying the frequency distribution as a histogram or a frequency polygon. Chapter 3 began by describing measures of
location, such as the mean, weighted mean, median, geometric mean, and mode. This chapter also included measures of
dispersion, or spread. Discussed in this section were the range, variance, and standard deviation. Chapter 4 included
several graphing techniques such as dot plots, box plots, and scatter diagrams. We also discussed the coefficient of skew-
ness, which reports the lack of symmetry in a set of data.

Throughout this section we stressed the importance of statistical software, such as Excel and Minitab. Many computer
outputs in these chapters demonstrated how quickly and effectively a large data set can be organized into a frequency
distribution, several of the measures of location or measures of variation calculated, and the information presented in
graphical form.

A REVIEW OF CHAPTERS 1–4 127

124 14 150 289 52 156 203 82 27 248
39 52 103 58 136 249 110 298 251 157
186 107 142 185 75 202 119 219 156 78
116 152 206 117 52 299 58 153 219 148
145 187 165 147 158 146 185 186 149 140

Use a statistical software package such as Excel or Minitab to help answer the following
questions.
a. Determine the mean, median, and standard deviation.
b. Determine the first and third quartiles.
c. Develop a box plot. Are there any outliers? Do the amounts follow a symmetric distri-

bution or are they skewed? Justify your answer.
d. Organize the distribution of funds into a frequency distribution.
e. Write a brief summary of the results in parts a to d.

2. Listed below are the 45 U.S. presidents and their age as they began their terms in
office.

Number Name Age

1 Washington 57
2 J. Adams 61
3 Jefferson 57
4 Madison 57
5 Monroe 58
6 J. Q. Adams 57
7 Jackson 61
8 Van Buren 54
9 W. H. Harrison 68
10 Tyler 51
11 Polk 49
12 Taylor 64
13 Fillmore 50
14 Pierce 48
15 Buchanan 65
16 Lincoln 52
17 A. Johnson 56
18 Grant 46
19 Hayes 54
20 Garfield 49
21 Arthur 50
22 Cleveland 47
23 B. Harrison 55

Number Name Age

24 Cleveland 55
25 McKinley 54
26 T. Roosevelt 42
27 Taft 51
28 Wilson 56
29 Harding 55
30 Coolidge 51
31 Hoover 54
32 F. D. Roosevelt 51
33 Truman 60
34 Eisenhower 62
35 Kennedy 43
36 L. B. Johnson 55
37 Nixon 56
38 Ford 61
39 Carter 52
40 Reagan 69
41 G. H. W. Bush 64
42 Clinton 46
43 G. W. Bush 54
44 Obama 47
45 Trump 70

Use a statistical software package such as Excel or Minitab to help answer the following
questions.
a. Determine the mean, median, and standard deviation.
b. Determine the first and third quartiles.
c. Develop a box plot. Are there any outliers? Do the amounts follow a symmetric distri-

bution or are they skewed? Justify your answer.
d. Organize the distribution of ages into a frequency distribution.
e. Write a brief summary of the results in parts a to d.

P R O B L E M S

1. The duration in minutes of a sample of 50 power outages last year in the state of
South Carolina is listed below.

128 A REVIEW OF CHAPTERS 1–4

3. Listed below is the 2014 median household income for the 50 states and the
District of Columbia. https://www.census.gov/hhes/www/income/data/historical/
household/

State Amount

Alabama 42,278
Alaska 67,629
Arizona 49,254
Arkansas 44,922
California 60,487
Colorado 60,940
Connecticut 70,161
Delaware 57,522
D.C. 68,277
Florida 46,140
Georgia 49,555
Hawaii 71,223
Idaho 53,438
Illinois 54,916
Indiana 48,060
Iowa 57,810
Kansas 53,444
Kentucky 42,786
Louisiana 42,406
Maine 51,710
Maryland 76,165
Massachusetts 63,151
Michigan 52,005
Minnesota 67,244
Mississippi 35,521
Missouri 56,630

State Amount

Montana 51,102
Nebraska 56,870
Nevada 49,875
New Hampshire 73,397
New Jersey 65,243
New Mexico 46,686
New York 54,310
North Carolina 46,784
North Dakota 60,730
Ohio 49,644
Oklahoma 47,199
Oregon 58,875
Pennsylvania 55,173
Rhode Island 58,633
South Carolina 44,929
South Dakota 53,053
Tennessee 43,716
Texas 53,875
Utah 63,383
Vermont 60,708
Virginia 66,155
Washington 59,068
West Virginia 39,552
Wisconsin 58,080
Wyoming 55,690

Use a statistical software package such as Excel or Minitab to help answer the following
questions.
a. Determine the mean, median, and standard deviation.
b. Determine the first and third quartiles.
c. Develop a box plot. Are there any outliers? Do the amounts follow a symmetric distri-

bution or are they skewed? Justify your answer.
d. Organize the distribution of funds into a frequency distribution.
e. Write a brief summary of the results in parts a to d.

4. A sample of 12 homes sold last week in St. Paul, Minnesota, revealed the following
information. Draw a scatter diagram. Can we conclude that, as the size of the home
(reported below in thousands of square feet) increases, the selling price (reported in
$ thousands) also increases?

Home Size Home Size
(thousands of Selling Price (thousands of Selling Price
square feet) ($ thousands) square feet) ($ thousands)

1.4 100 1.3 110
1.3 110 0.8 85
1.2 105 1.2 105
1.1 120 0.9 75
1.4 80 1.1 70
1.0 105 1.1 95

5. Refer to the following diagram.

0 40 80 120 160 200

* *

a. What is the graph called?
b. What are the median, and first and third quartile values?
c. Is the distribution positively skewed? Tell how you know.
d. Are there any outliers? If yes, estimate these values.
e. Can you determine the number of observations in the study?

A REVIEW OF CHAPTERS 1–4 129

C A S E S

A. Century National Bank
The following case will appear in subsequent review sec-
tions. Assume that you work in the Planning Department of
the Century National Bank and report to Ms. Lamberg. You
will need to do some data analysis and prepare a short writ-
ten report. Remember, Mr. Selig is the president of the bank,
so you will want to ensure that your report is complete and
accurate. A copy of the data appears in Appendix A.6.
Century National Bank has offices in several cities in
the Midwest and the southeastern part of the United
States. Mr. Dan Selig, president and CEO, would like to
know the characteristics of his checking account custom-
ers. What is the balance of a typical customer?
How many other bank services do the checking ac-
count customers use? Do the customers use the ATM ser-
vice and, if so, how often? What about debit cards? Who
uses them, and how often are they used?
To better understand the customers, Mr. Selig asked
Ms. Wendy Lamberg, director of planning, to select a sam-
ple of customers and prepare a report. To begin, she has
appointed a team from her staff. You are the head of the
team and responsible for preparing the report. You select a
random sample of 60 customers. In addition to the balance
in each account at the end of last month, you determine
(1) the number of ATM (automatic teller machine) transac-
tions in the last month; (2) the number of other bank ser-
vices (a savings account, a certificate of deposit, etc.) the
customer uses; (3) whether the customer has a debit card
(this is a bank service in which charges are made directly to
the customer’s account); and (4) whether or not interest is
paid on the checking account. The sample includes cus-
tomers from the branches in Cincinnati, Ohio; Atlanta,
Georgia; Louisville, Kentucky; and Erie, Pennsylvania.

1. Develop a graph or table that portrays the checking
balances. What is the balance of a typical customer?
Do many customers have more than $2,000 in their
accounts? Does it appear that there is a difference in
the distribution of the accounts among the four
branches? Around what value do the account bal-
ances tend to cluster?

2. Determine the mean and median of the checking ac-
count balances. Compare the mean and the median
balances for the four branches. Is there a difference
among the branches? Be sure to explain the difference
between the mean and the median in your report.

3. Determine the range and the standard deviation of
the checking account balances. What do the first and
third quartiles show? Determine the coefficient of
skewness and indicate what it shows. Because
Mr. Selig does not deal with statistics daily, include a
brief description and interpretation of the standard
deviation and other measures.

B. Wildcat Plumbing Supply Inc.:
Do We Have Gender Differences?

Wildcat Plumbing Supply has served the plumbing
needs of Southwest Arizona for more than 40 years.
The company was founded by Mr. Terrence St. Julian
and is run today by his son Cory. The company has
grown from a handful of employees to more than 500
today. Cory is concerned about several positions within
the company where he has men and women doing es-
sentially the same job but at different pay. To investi-
gate, he collected the information below. Suppose you
are a student intern in the Accounting Department and
have been given the task to write a report summarizing
the situation.

Yearly Salary ($000) Women Men

Less than 30 2 0
30 up to 40 3 1
40 up to 50 17 4
50 up to 60 17 24
60 up to 70 8 21
70 up to 80 3 7
80 or more 0 3

To kick off the project, Mr. Cory St. Julian held a meeting
with his staff and you were invited. At this meeting, it was
suggested that you calculate several measures of

130 A REVIEW OF CHAPTERS 1–4

location, create charts or draw graphs such as a cumula-
tive frequency distribution, and determine the quartiles
for both men and women. Develop the charts and write
the report summarizing the yearly salaries of employees
at Wildcat Plumbing Supply. Does it appear that there are
pay differences based on gender?

C. Kimble Products: Is There a Difference
In the Commissions?

At the January national sales meeting, the CEO of Kimble
Products was questioned extensively regarding the com-
pany policy for paying commissions to its sales represen-
tatives. The company sells sporting goods to two major

markets. There are 40 sales representatives who call di-
rectly on large-volume customers, such as the athletic de-
partments at major colleges and universities and
professional sports franchises. There are 30 sales repre-
sentatives who represent the company to retail stores lo-
cated in shopping malls and large discounters such as
Kmart and Target.
Upon his return to corporate headquarters, the CEO
asked the sales manager for a report comparing the com-
missions earned last year by the two parts of the sales
team. The information is reported below. Write a brief re-
port. Would you conclude that there is a difference? Be
sure to include information in the report on both the cen-
tral tendency and dispersion of the two groups.

Commissions Earned by Sales Representatives
Calling on Large Retailers ($)

1,116 681 1,294 12 754 1,206 1,448 870 944 1,255
1,213 1,291 719 934 1,313 1,083 899 850 886 1,556
886 1,315 1,858 1,262 1,338 1,066 807 1,244 758 918

Commissions Earned by Sales Representatives
Calling on Athletic Departments ($)

354 87 1,676 1,187 69 3,202 680 39 1,683 1,106
883 3,140 299 2,197 175 159 1,105 434 615 149
1,168 278 579 7 357 252 1,602 2,321 4 392
416 427 1,738 526 13 1,604 249 557 635 527

P R A C T I C E T E S T

There is a practice test at the end of each review section. The tests are in two parts. The first part contains several objec-
tive questions, usually in a fill-in-the-blank format. The second part is problems. In most cases, it should take 30 to 45
minutes to complete the test. The problems require a calculator. Check the answers in the Answer Section in the back of
the book.

Part 1—Objective
1. The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in

making effective decisions is called . 1.
2. Methods of organizing, summarizing, and presenting data in an informative way are

called . 2.
3. The entire set of individuals or objects of interest or the measurements obtained from all

individuals or objects of interest are called the . 3.
4. List the two types of variables. 4.
5. The number of bedrooms in a house is an example of a . (discrete variable,

continuous variable, qualitative variable—pick one) 5.
6. The jersey numbers of Major League Baseball players are an example of what level of

measurement? 6.
7. The classification of students by eye color is an example of what level of measurement? 7.
8. The sum of the differences between each value and the mean is always equal to what value? 8.
9. A set of data contained 70 observations. How many classes would the 2k method suggest to

construct a frequency distribution? 9.
10. What percent of the values in a data set are always larger than the median? 10.
11. The square of the standard deviation is the . 11.
12. The standard deviation assumes a negative value when . (all the values are negative,

at least half the values are negative, or never—pick one.) 12.
13. Which of the following is least affected by an outlier? (mean, median, or range—pick one) 13.

Part 2—Problems
1. The Russell 2000 index of stock prices increased by the following amounts over the last 3 years.

18% 4% 2%

What is the geometric mean increase for the 3 years?

2. The information below refers to the selling prices ($000) of homes sold in Warren, Pennsylvania, during 2016.

Selling Price ($000) Frequency

120.0 up to 150.0 4
150.0 up to 180.0 18
180.0 up to 210.0 30
210.0 up to 240.0 20
240.0 up to 270.0 17
270.0 up to 300.0 10
300.0 up to 330.0 6

a. What is the class interval?
b. How many homes were sold in 2016?
c. How many homes sold for less than $210,000?
d. What is the relative frequency of the 210 up to 240 class?
e. What is the midpoint of the 150 up to 180 class?
f. The selling prices range between what two amounts?

3. A sample of eight college students revealed they owned the following number of CDs.

52 76 64 79 80 74 66 69

a. What is the mean number of CDs owned?
b. What is the median number of CDs owned?
c. What is the 40th percentile?
d. What is the range of the number of CDs owned?
e. What is the standard deviation of the number of CDs owned?

4. An investor purchased 200 shares of the Blair Company for $36 each in July of 2013, 300 shares
at $40 each in September 2015, and 500 shares at $50 each in January 2016. What is the
investor’s weighted mean price per share?

5. During the 50th Super Bowl, 30 million pounds of snack food were eaten. The chart below depicts
this information.

Snack Nuts
8%

Potato Chips
37%

Tortilla Chips
28%

Pretzels
14%

Popcorn
13%

a. What is the name given to this graph?
b. Estimate, in millions of pounds, the amount of potato chips eaten during the game.
c. Estimate the relationship of potato chips to popcorn. (twice as much, half as much, three

times, none of these—pick one)
d. What percent of the total do potato chips and tortilla chips comprise?

A REVIEW OF CHAPTERS 1–4 131

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO5-1 Define the terms probability, experiment, event, and outcome.

LO5-2 Assign probabilities using a classical, empirical, or subjective approach.

LO5-3 Calculate probabilities using the rules of addition.

LO5-4 Calculate probabilities using the rules of multiplication.

LO5-5 Compute probabilities using a contingency table.

LO5-6 Calculate probabilities using Bayes’ theorem.

LO5-7 Determine the number of outcomes using principles of counting.

RECENT SURVEYS indicate 60% of tourists to China visited the Forbidden City, the
Temple of Heaven, the Great Wall, and other historical sites in or near Beijing. Forty percent
visited Xi’an and its magnificent terra-cotta soldiers, horses, and chariots, which lay buried
for over 2,000 years. Thirty percent of the tourists went to both Beijing and Xi’an. What is the
probability that a tourist visited at least one of these places? (See Exercise 76 and LO5-3.)

A Survey of
Probability Concepts5

© Karin Slade/Getty Image

A SURVEY OF PROBABILITY CONCEPTS 133

INTRODUCTION
The emphasis in Chapters 2, 3, and 4 is on descriptive statistics. In Chapter 2, we orga-
nize the profits on 180 vehicles sold by the Applewood Auto Group into a frequency
distribution. This frequency distribution shows the smallest and the largest profits and
where the largest concentration of data occurs. In Chapter 3, we use numerical mea-
sures of location and dispersion to locate a typical profit on vehicle sales and to exam-
ine the variation in the profit of a sale. We describe the variation in the profits with such
measures of dispersion as the range and the standard deviation. In Chapter 4, we de-
velop charts and graphs, such as a scatter diagram or a dot plot, to further describe the
data graphically.

Descriptive statistics is concerned with summarizing data collected from past
events. We now turn to the second facet of statistics, namely, computing the chance
that something will occur in the future. This facet of statistics is called statistical infer-
ence or inferential statistics.

Seldom does a decision maker have complete information to make a decision. For
example:

• Toys and Things, a toy and puzzle manufac-
turer, recently developed a new game
based on sports trivia. It wants to know
whether sports buffs will purchase the
game. “Slam Dunk” and “Home Run” are
two of the names under consideration. To
investigate, the president of Toys and Things
decided to hire a market research firm. The
firm selected a sample of 800 consumers
from the population and asked each respon-
dent for a reaction to the new game and its
proposed titles. Using the sample results,
the company can estimate the proportion of
the population that will purchase the game.

• The quality assurance department of a U.S. Steel mill must assure management that
the quarter-inch wire being produced has an acceptable tensile strength. Clearly,
not all the wire produced can be tested for tensile strength because testing re-
quires the wire to be stretched until it breaks—thus destroying it. So a random sam-
ple of 10 pieces is selected and tested. Based on the test results, all the wire
produced is deemed to be either acceptable or unacceptable.

• Other questions involving uncertainty are: Should the daytime drama Days of Our
Lives be discontinued immediately? Will a newly developed mint-flavored cereal be
profitable if marketed? Will Charles Linden be elected to county auditor in Batavia
County?

Statistical inference deals with conclusions about a population based on a sample
taken from that population. (The populations for the preceding illustrations are all con-
sumers who like sports trivia games, all the quarter-inch steel wire produced, all televi-
sion viewers who watch soaps, all who purchase breakfast cereal, and so on.)

Because there is uncertainty in decision making, it is important that all the known
risks involved be scientifically evaluated. Helpful in this evaluation is probability
theory, often referred to as the science of uncertainty. Probability theory allows the
decision maker to analyze the risks and minimize the gamble inherent, for example,
in marketing a new product or accepting an incoming shipment possibly containing
defective parts.

Because probability concepts are so important in the field of statistical inference (to
be discussed starting with Chapter 8), this chapter introduces the basic language of
probability, including such terms as experiment, event, subjective probability, and addi-
tion and multiplication rules.

© Ballda/Shutterstock.com

STATISTICS IN ACTION

Government statistics
show there are about 1.7
automobile-caused fatalities
for every 100,000,000
vehicle-miles. If you drive
1 mile to the store to buy
your lottery ticket and then
return home, you have
driven 2 miles. Thus the
probability that you will join
this statistical group on
your next 2-mile round trip
is 2 × 1.7/100,000,000 =
0.000000034. This can
also be stated as “One in
29,411,765.” Thus, if you
drive to the store to buy
your Powerball ticket, your
chance of being killed (or
killing someone else) is
more than 4 times greater
than the chance that you
will win the Powerball
Jackpot, one chance in
120,526,770.
http://www.durangobill
.com/PowerballOdds.html

134 CHAPTER 5

WHAT IS A PROBABILITY?
No doubt you are familiar with terms such as probability, chance, and likelihood. They
are often used interchangeably. The weather forecaster announces that there is a 70%
chance of rain for Super Bowl Sunday. Based on a survey of consumers who tested a
newly developed toothpaste with a banana flavor, the probability is .03 that, if marketed,
it will be a financial success. (This means that the chance of the banana-flavor tooth-
paste being accepted by the public is rather remote.) What is a probability? In general,
it is a numerical value that describes the chance that something will happen.

LO5-1
Define the terms
probability, experiment,
event, and outcome.

PROBABILITY A value between zero and one, inclusive, describing the relative
possibility (chance or likelihood) an event will occur.

A probability is frequently expressed as a decimal, such as .70, .27, or .50, or a
percent such as 70%, 27% or 50%. It also may be reported as a fraction such as 7/10,
27/100, or 1/2. It can assume any number from 0 to 1, inclusive. Expressed as a per-
centage, the range is between 0% and 100%, inclusive. If a company has only five sales
regions, and each region’s name or number is written on a slip of paper and the slips put
in a hat, the probability of selecting one of the five regions is 1. The probability of select-
ing from the hat a slip of paper that reads “Pittsburgh Steelers” is 0. Thus, the probability
of 1 represents something that is certain to happen, and the probability of 0 represents
something that cannot happen.

The closer a probability is to 0, the more improbable it is the event will happen. The
closer the probability is to 1, the more likely it will happen. The relationship is shown in
the following diagram along with a few of our personal beliefs. You might, however, se-
lect a different probability for Slo Poke’s chances to win the Kentucky Derby or for an
increase in federal taxes.

Cannot Sure to
happen happen

0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00

Probability
our sun will
disappear
this year

Chance
Slo Poke will

win the
Kentucky

Derby

Chance of a
head in

single toss
of a coin

Chance
of an

increase
in federal

taxes

Chance of
rain in
Florida

this year

Sometimes, the likelihood of an event is expressed using the term odds. To explain,
someone says the odds are “five to two” that an event will occur. This means that in a
total of seven trials (5 + 2), the event will occur five times and not occur two times. Using
odds, we can compute the probability that the event occurs as 5/(5 + 2) or 5/7. So, if the
odds in favor of an event are x to y, the probability of the event is x/(x + y).

Three key words are used in the study of probability: experiment, outcome, and
event. These terms are used in our everyday language, but in statistics they have spe-
cific meanings.

EXPERIMENT A process that leads to the occurrence of one and only one of
several possible results.

A SURVEY OF PROBABILITY CONCEPTS 135

This definition is more general than the one used in the physical sciences, where
we picture someone manipulating test tubes or microscopes. In reference to proba-
bility, an experiment has two or more possible results, and it is uncertain which will
occur.

OUTCOME A particular result of an experiment.

EVENT A collection of one or more outcomes of an experiment.

For example, the tossing of a coin is an experiment. You are unsure of the outcome.
When a coin is tossed, one particular outcome is a “head.” The alternative outcome is a
“tail.” Similarly, asking 500 college students if they would travel more than 100 miles to
attend a Mumford and Sons concert is an experiment. In this experiment, one possible
outcome is that 273 students indicate they would travel more than 100 miles to attend
the concert. Another outcome is that 317 students would attend the concert. Still an-
other outcome is that 423 students indicate they would attend the concert. When one
or more of the experiment’s outcomes are observed, we call this an event.

Examples to clarify the definitions of the terms experiment, outcome, and event are
presented in the following figure.

In the die-rolling experiment, there are six possible outcomes, but there are many
possible events. When counting the number of members of the board of directors for
Fortune 500 companies over 60 years of age, the number of possible outcomes can be
anywhere from zero to the total number of members. There are an even larger number
of possible events in this experiment.

Experiment

All possible outcomes

Some possible events

Roll a die

Observe a 1

Observe a 2

Observe a 3

Observe a 4

Observe a 5

Observe a 6

Observe an even number
Observe a number greater than 4

Observe a number 3 or less

Count the number
of members of the board of directors

for Fortune 500 companies who
are over 60 years of age

None is over 60

One is over 60

Two are over 60

...

29 are over 60

...

...

48 are over 60

...

More than 13 are over 60
Fewer than 20 are over 60

136 CHAPTER 5

APPROACHES TO ASSIGNING PROBABILITIES
In this section, we describe three ways to assign a probability to an event: classical,
empirical, and subjective. The classical and empirical methods are objective and are
based on information and data. The subjective method is based on a person’s belief or
estimate of an event’s likelihood.

Classical Probability
Classical probability is based on the assumption that the outcomes of an experiment are
equally likely. Using the classical viewpoint, the probability of an event happening is com-
puted by dividing the number of favorable outcomes by the number of possible outcomes:

LO5-2
Assign probabilities using
a classical, empirical, or
subjective approach.

Video Games Inc. recently developed a new video game. Its playability is to be tested by
80 veteran game players.
(a) What is the experiment?
(b) What is one possible outcome?
(c) Suppose 65 of the 80 players testing the new game said they liked it. Is 65 a probability?
(d) The probability that the new game will be a success is computed to be −1.0. Comment.
(e) Specify one possible event.

S E L F - R E V I E W 5–1

Probability
of an event =

Number of favorable outcomes
Total number of possible outcomes

[5–1]CLASSICAL
PROBABILITY

E X A M P L E

Consider an experiment of rolling a six-sided die. What is the probability of the
event “an even number of spots appear face up”?

S O L U T I O N

The possible outcomes are:

a one-spot

a two-spot

a three-spot

a four-spot

a five-spot

a six-spot

There are three “favorable” outcomes (a two, a four, and a six) in the collection of
six equally likely possible outcomes. Therefore:

Probability of an even number =
3
6

←
←

Number of favorable outcomes

Total number of possible outcomes
= .5

The mutually exclusive concept appeared earlier in our study of frequency distri-
butions in Chapter 2. Recall that we create classes so that a particular value is included
in only one of the classes and there is no overlap between classes. Thus, only one of
several events can occur at a particular time.

A SURVEY OF PROBABILITY CONCEPTS 137

MUTUALLY EXCLUSIVE The occurrence of one event means that none of the
other events can occur at the same time.

COLLECTIVELY EXHAUSTIVE At least one of the events must occur when an
experiment is conducted.

EMPIRICAL PROBABILITY The probability of an event happening is the fraction of
the time similar events happened in the past.

LAW OF LARGE NUMBERS Over a large number of trials, the empirical probability
of an event will approach its true probability.

The variable “gender” presents mutually exclusive outcomes, male and female. An
employee selected at random is either male or female but cannot be both. A manufac-
tured part is acceptable or unacceptable. The part cannot be both acceptable and unac-
ceptable at the same time. In a sample of manufactured parts, the event of selecting an
unacceptable part and the event of selecting an acceptable part are mutually exclusive.

If an experiment has a set of events that includes every possible outcome, such as
the events “an even number” and “an odd number” in the die-tossing experiment, then
the set of events is collectively exhaustive. For the die-tossing experiment, every out-
come will be either even or odd. So the set is collectively exhaustive.

If the set of events is collectively exhaustive and the events are mutually exclusive,
the sum of the probabilities is 1. Historically, the classical approach to probability was
developed and applied in the 17th and 18th centuries to games of chance, such as
cards and dice. It is unnecessary to do an experiment to determine the probability of an
event occurring using the classical approach because the total number of outcomes is
known before the experiment. The flip of a coin has two possible outcomes; the roll of a
die has six possible outcomes. We can logically arrive at the probability of getting a tail
on the toss of one coin or three heads on the toss of three coins.

The classical approach to probability can also be applied to lotteries. In South
Carolina, one of the games of the Education Lottery is “Pick 3.” A person buys a lottery
ticket and selects three numbers between 0 and 9. Once per week, the three numbers
are randomly selected from a machine that tumbles three containers each with balls
numbered 0 through 9. One way to win is to match the numbers and the order of the
numbers. Given that 1,000 possible outcomes exist (000 through 999), the probability
of winning with any three-digit number is 0.001, or 1 in 1,000.

Empirical Probability
Empirical or relative frequency is the second type of objective probability. It is based
on the number of times an event occurs as a proportion of a known number of trials.

The formula to determine an empirical probability is:

Empirical probability =
Number of times the event occurs

Total number of observations
The empirical approach to probability is based on what is called the law of large numbers.
The key to establishing probabilities empirically is that more observations will provide a
more accurate estimate of the probability.

To explain the law of large numbers, suppose we toss a fair coin. The result of each
toss is either a head or a tail. With just one toss of the coin the empirical probability for

138 CHAPTER 5

heads is either zero or one. If we toss the coin a great number of times, the probability
of the outcome of heads will approach .5. The following table reports the results of
seven different experiments of flipping a fair coin 1, 10, 50, 100, 500, 1,000, and
10,000 times and then computing the relative frequency of heads. Note as we increase
the number of trials, the empirical probability of a head appearing approaches .5, which
is its value based on the classical approach to probability.

Number of Trials Number of Heads Relative Frequency of Heads

1 0 .00
10 3 .30
50 26 .52
100 52 .52
500 236 .472
1,000 494 .494
10,000 5,027 .5027

What have we demonstrated? Based on the classical definition of probability, the likeli-
hood of obtaining a head in a single toss of a fair coin is .5. Based on the empirical or
relative frequency approach to probability, the probability of the event happening ap-
proaches the same value based on the classical definition of probability.

This reasoning allows us to use the empirical or relative frequency approach to
finding a probability. Here are some examples.

• Last semester, 80 students registered for Business Statistics 101 at Scandia University.
Twelve students earned an A. Based on this information and the empirical approach
to assigning a probability, we estimate the likelihood a student at Scandia will earn
an A is .15.

• Stephen Curry of the Golden State Warriors made 363 out of 400 free throw
attempts during the 2015–16 NBA season. Based on the empirical approach to
probability, the likelihood of him making his next free throw attempt is .908.

Life insurance companies rely on past data to determine the acceptability of an appli-
cant as well as the premium to be charged. Mortality tables list the likelihood a person
of a particular age will die within the upcoming year. For example, the likelihood a
20-year-old female will die within the next year is .00105.

The empirical concept is illustrated with the following example.

E X A M P L E

On February 1, 2003, the Space Shuttle Columbia exploded. This was the second
disaster in 113 space missions for NASA. On the basis of this information, what is
the probability that a future mission is successfully completed?

S O L U T I O N

We use letters or numbers to simplify the equations. P stands for probability and
A represents the event of a successful mission. In this case, P(A) stands for the
probability a future mission is successfully completed.

Probability of a successful flight =
Number of successful flights

Total number of flights

P(A) =
111
113

= .98

We can use this as an estimate of probability. In other words, based on past experience,
the probability is .98 that a future space shuttle mission will be safely completed.

A SURVEY OF PROBABILITY CONCEPTS 139

Subjective Probability
If there is little or no experience or information on which to base a probability, it is esti-
mated subjectively. Essentially, this means an individual evaluates the available opin-
ions and information and then estimates or assigns the probability. This probability is
called a subjective probability.

SUBJECTIVE CONCEPT OF PROBABILITY The likelihood (probability) of a particular
event happening that is assigned by an individual based on whatever information is
available.

Illustrations of subjective probability are:

1. Estimating the likelihood the New England Patriots will play in the Super Bowl next
year.

2. Estimating the likelihood you are involved in an automobile accident during the
next 12 months.

3. Estimating the likelihood the U.S. budget deficit will be reduced by half in the next
10 years.

The types of probability are summarized in Chart 5–1. A probability statement al-
ways assigns a likelihood to an event that has not yet occurred. There is, of course,
considerable latitude in the degree of uncertainty that surrounds this probability, based
primarily on the knowledge possessed by the individual concerning the underlying pro-
cess. The individual possesses a great deal of knowledge about the toss of a die and
can state that the probability that a one-spot will appear face up on the toss of a true die
is one-sixth. But we know very little concerning the acceptance in the marketplace of a
new and untested product. For example, even though a market research director tests
a newly developed product in 40 retail stores and states that there is a 70% chance that
the product will have sales of more than 1 million units, she has limited knowledge of
how consumers will react when it is marketed nationally. In both cases (the case of the
person rolling a die and the testing of a new product), the individual is assigning a prob-
ability value to an event of interest, and a difference exists only in the predictor’s confi-
dence in the precision of the estimate. However, regardless of the viewpoint, the same
laws of probability (presented in the following sections) will be applied.

Approaches to Probability

SubjectiveObjective

Empirical ProbabilityClassical Probability
Based on available
information

Based on equally
likely outcomes

Based on relative
frequencies

CHART 5–1 Summary of Approaches to Probability

140 CHAPTER 5

1. One card will be randomly selected from a standard 52-card deck. What is the proba-
bility the card will be a queen? Which approach to probability did you use to answer
this question?

2. The Center for Child Care reports on 539 children and the marital status of their par-
ents. There are 333 married, 182 divorced, and 24 widowed parents. What is the
probability a particular child chosen at random will have a parent who is divorced?
Which approach did you use?

3. What is the probability you will save one million dollars by the time you retire? Which
approach to probability did you use to answer this question?

S E L F - R E V I E W 5–2

1. Some people are in favor of reducing federal taxes to increase consumer spending
and others are against it. Two persons are selected and their opinions are recorded.
Assuming no one is undecided, list the possible outcomes.

2. A quality control inspector selects a part to be tested. The part is then declared
acceptable, repairable, or scrapped. Then another part is tested. List the possible
outcomes of this experiment regarding two parts.

3. A survey of 34 students at the Wall College of Business showed the following
majors:

Accounting 10
Finance 5
Economics 3
Management 6
Marketing 10

From the 34 students, suppose you randomly select a student.
a. What is the probability he or she is a management major?
b. Which concept of probability did you use to make this estimate?

4. A large company must hire a new president. The Board of Directors prepares a list
of five candidates, all of whom are equally qualified. Two of these candidates are
members of a minority group. To avoid bias in the selection of the candidate, the
company decides to select the president by lottery.

a. What is the probability one of the minority candidates is hired?
b. Which concept of probability did you use to make this estimate?

5. In each of the following cases, indicate whether classical, empirical, or subjective
probability is used.

a. A baseball player gets a hit in 30 out of 100 times at bat. The probability is .3
that he gets a hit in his next at bat.

b. A seven-member committee of students is formed to study environmental issues.
What is the likelihood that any one of the seven is randomly chosen as the
spokesperson?

c. You purchase a ticket for the Lotto Canada lottery. Over 5 million tickets were
sold. What is the likelihood you will win the $1 million jackpot?

d. The probability of an earthquake in northern California in the next 10 years
above 5.0 on the Richter Scale is .80.

6. A firm will promote two employees out of a group of six men and three women.
a. List all possible outcomes.
b. What probability concept would be used to assign probabilities to the outcomes?

7. A sample of 40 oil industry executives was selected to test a questionnaire. One
question about environmental issues required a yes or no answer.

a. What is the experiment?
b. List one possible event.
c. Ten of the 40 executives responded yes. Based on these sample responses,

what is the probability that an oil industry executive will respond yes?
d. What concept of probability does this illustrate?
e. Are each of the possible outcomes equally likely and mutually exclusive?

E X E R C I S E S

A SURVEY OF PROBABILITY CONCEPTS 141

RULES OF ADDITION FOR COMPUTING
PROBABILITIES
There are two rules of addition, the special rule of addition and the general rule of addi-
tion. We begin with the special rule of addition.

Special Rule of Addition
When we use the special rule of addition, the events must be mutually exclusive.
Recall that mutually exclusive means that when one event occurs, none of the other
events can occur at the same time. An illustration of mutually exclusive events in the
die-tossing experiment is the events “a number 4 or larger” and “a number 2 or
smaller.” If the outcome is in the first group {4, 5, and 6}, then it cannot also be in the
second group {1 and 2}. Another illustration is a product coming off the assembly line
cannot be defective and satisfactory at the same time.

If two events A and B are mutually exclusive, the special rule of addition states that
the probability of one or the other event’s occurring equals the sum of their probabili-
ties. This rule is expressed in the following formula:

LO5-3
Calculate probabilities
using the rules of
addition.

8. A sample of 2,000 licensed drivers revealed the following number of speed-
ing violations.

Number of Violations Number of Drivers

0 1,910
1 46
2 18
3 12
4 9
5 or more 5

Total 2,000

a. What is the experiment?
b. List one possible event.
c. What is the probability that a particular driver had exactly two speeding violations?
d. What concept of probability does this illustrate?

9. Bank of America customers select their own three-digit personal identification num-
ber (PIN) for use at ATMs.

a. Think of this as an experiment and list four possible outcomes.
b. What is the probability that a customer will pick 259 as their PIN?
c. Which concept of probability did you use to answer (b)?

10. An investor buys 100 shares of AT&T stock and records its price change daily.
a. List several possible events for this experiment.
b. Which concept of probability did you use in (a)?

SPECIAL RULE OF ADDITION P(A or B) = P(A) + P(B) [5–2]

For three mutually exclusive events designated A, B, and C, the rule is written:

P(A or B or C) = P(A) + P(B) + P(C)

An example will show the details.

142 CHAPTER 5

English logician J. Venn (1834–1923) developed a diagram to portray graphically
the outcome of an experiment. The mutually exclusive concept and various other rules
for combining probabilities can be illustrated using this device. To construct a Venn dia-
gram, a space is first enclosed representing the total of all possible outcomes. This
space is usually in the form of a rectangle. An event is then represented by a circular
area that is drawn inside the rectangle proportional to the probability of the event. The
following Venn diagram represents the mutually exclusive concept. There is no overlap-
ping of events, meaning that the events are mutually exclusive. In the following Venn
diagram, assume the events A, B, and C are about equally likely.

E X A M P L E

A machine fills plastic bags with a mixture
of beans, broccoli, and other vegetables.
Most of the bags contain the correct
weight, but because of the variation in the
size of the beans and other vegetables, a
package might be underweight or over-
weight. A check of 4,000 packages filled
in the past month revealed:

Number of Probability of
Weight Event Packages Occurrence

Underweight A 100 .025 ←
100

4,000Satisfactory B 3,600 .900
Overweight C 300 .075

4,000 1.000

What is the probability that a particular package will be either underweight or
overweight?

S O L U T I O N

The outcome “underweight” is the event A. The outcome “overweight” is the event
C. Applying the special rule of addition:

P(A or C) = P(A) + P(C) = .025 + .075 = .10

Note that the events are mutually exclusive, meaning that a package of mixed veg-
etables cannot be underweight, satisfactory, and overweight at the same time. They
are also collectively exhaustive; that is, a selected package must be either under-
weight, satisfactory, or overweight.

© Ian Dagnall/Alamy Stock Photo

Event
A

Event
B

Event
C

A SURVEY OF PROBABILITY CONCEPTS 143

Complement Rule
The probability that a bag of mixed vegetables selected is underweight, P(A), plus the
probability that it is not an underweight bag, written P(∼A) and read “not A,” must logi-
cally equal 1. This is written:

P(A) + P(∼A) = 1

This can be revised to read:

COMPLEMENT RULE P(A) = 1 − P(∼A) [5–3]

This is the complement rule. It is used to determine the probability of an event occurring
by subtracting the probability of the event not occurring from 1. This rule is useful because
sometimes it is easier to calculate the probability of an event happening by determining
the probability of it not happening and subtracting the result from 1. Notice that the events
A and ∼A are mutually exclusive and collectively exhaustive. Therefore, the probabilities of
A and ∼A sum to 1. A Venn diagram illustrating the complement rule is shown as:

Event
A

, A

E X A M P L E

Referring to the previous example/solution, the probability a bag of mixed vegeta-
bles is underweight is .025 and the probability of an overweight bag is .075. Use
the complement rule to show the probability of a satisfactory bag is .900. Show the
solution using a Venn diagram.

S O L U T I O N

The probability the bag is unsatisfactory equals the probability the bag is over-
weight plus the probability it is underweight. That is, P(A or C) = P(A) + P(C) = .025 +
.075 = .100. The bag is satisfactory if it is not underweight or overweight, so
P(B) = 1 − [P(A) + P(C)] = 1 − [.025 + .075] = 0.900. The Venn diagram portraying
this situation is:

A
.025

not (A or C)
.90

C
.075

144 CHAPTER 5

The General Rule of Addition
The outcomes of an experiment may not be mutually exclusive. For example, the Florida
Tourist Commission selected a sample of 200 tourists who visited the state during the
year. The survey revealed that 120 tourists went to Disney World and 100 went to Busch
Gardens near Tampa. What is the probability that a person selected visited either Disney
World or Busch Gardens? If the special rule of addition is used, the probability of selecting
a tourist who went to Disney World is .60, found by 120/200. Similarly, the probability of a
tourist going to Busch Gardens is .50. The sum of these probabilities is 1.10. We know,
however, that this probability cannot be greater than 1. The explanation is that many tour-
ists visited both attractions and are being counted twice! A check of the survey responses
revealed that 60 out of 200 sampled did, in fact, visit both attractions.

To answer our question, “What is the probability a selected person visited either
Disney World or Busch Gardens?” (1) add the probability that a tourist visited Disney
World and the probability he or she visited Busch Gardens, and (2) subtract the proba-
bility of visiting both. Thus:

P(Disney or Busch) = P(Disney) + P(Busch) − P(both Disney and Busch)
= .60 + .50 − .30 = .80

When two events both occur, the probability is called a joint probability. The prob-
ability (.30) that a tourist visits both attractions is an example of a joint probability.

© Rostislav Glinsky/Shutterstock.com

The following Venn diagram shows two events that are not mutually exclusive. The two
events overlap to illustrate the joint event that some people have visited both attractions.

A sample of employees of Worldwide Enterprises is to be surveyed about a new health
care plan. The employees are classified as follows:

Classification Event Number of Employees

Supervisors A 120
Maintenance B 50
Production C 1,460
Management D 302
Secretarial E 68

(a) What is the probability that the first person selected is:
(i) either in maintenance or a secretary?
(ii) not in management?
(b) Draw a Venn diagram illustrating your answers to part (a).
(c) Are the events in part (a)(i) complementary or mutually exclusive or both?

S E L F - R E V I E W 5–3

STATISTICS IN ACTION

If you wish to get some
attention at the next gath-
ering you attend, announce
that you believe that at
least two people present
were born on the same
date—that is, the same
day of the year but not
necessarily the same year.
If there are 30 people in
the room, the probability of
a duplicate is .706. If there
are 60 people in the room,
the probability is .994 that
at least two people share the
same birthday. With as few
as 23 people the chances
are even, that is .50, that at
least two people share the
same birthday. Hint: To
compute this, find the
probability everyone was
born on a different day and
use the complement rule.
Try this in your class.

A SURVEY OF PROBABILITY CONCEPTS 145

P (Disney) = .60 P (Busch) = .50

P (Disney and Busch) = .30

JOINT PROBABILITY A probability that measures the likelihood two or more
events will happen concurrently.

So the general rule of addition, which is used to compute the probability of two
events that are not mutually exclusive, is:

GENERAL RULE OF ADDITION P(A or B) = P(A) + P(B) − P(A and B) [5–4]

For the expression P(A or B), the word or suggests that A may occur or B may occur.
This also includes the possibility that A and B may occur. This use of or is sometimes
called an inclusive. You could also write P(A or B or both) to emphasize that the union of
the events includes the intersection of A and B.

If we compare the general and special rules of addition, the important difference is
determining if the events are mutually exclusive. If the events are mutually exclusive, then
the joint probability P(A and B) is 0 and we could use the special rule of addition. Other-
wise, we must account for the joint probability and use the general rule of addition.

E X A M P L E

What is the probability that a card chosen at random from a standard deck of cards
will be either a king or a heart?

S O L U T I O N

We may be inclined to add the probability of a king and the probability of a heart. But this
creates a problem. If we do that, the king of hearts is counted with the kings and also
with the hearts. So, if we simply add the probability of a king (there are 4 in a deck of 52
cards) to the probability of a heart (there are 13 in a deck of 52 cards) and report that 17
out of 52 cards meet the requirement, we have counted the king of hearts twice. We
need to subtract 1 card from the 17 so the king of hearts is counted only once. Thus,
there are 16 cards that are either hearts or kings. So the probability is 16/52 = .3077.

Card Probability Explanation

King P(A) = 4/52 4 kings in a deck of 52 cards
Heart P(B) = 13/52 13 hearts in a deck of 52 cards
King of Hearts P(A and B) = 1/52 1 king of hearts in a deck of 52 cards

146 CHAPTER 5

From formula (5–4):

P(A or B) = P(A) + P(B) − P(A and B)
= 4/52 + 13/52 − 1/52
= 16/52, or .3077

A Venn diagram portrays these outcomes, which are not mutually exclusive.

Kings

Hearts

Both

A B
A

and

B

Routine physical examinations are conducted annually as part of a health service program
for General Concrete Inc. employees. It was discovered that 8% of the employees need
corrective shoes, 15% need major dental work, and 3% need both corrective shoes and
major dental work.
(a) What is the probability that an employee selected at random will need either corrective

shoes or major dental work?
(b) Show this situation in the form of a Venn diagram.

S E L F - R E V I E W 5–4

11. The events A and B are mutually exclusive. Suppose P(A) = .30 and P(B) = .20.
What is the probability of either A or B occurring? What is the probability that neither
A nor B will happen?

12. The events X and Y are mutually exclusive. Suppose P(X) = .05 and P(Y) = .02.
What is the probability of either X or Y occurring? What is the probability that neither
X nor Y will happen?

13. A study of 200 advertising firms revealed their income after taxes:

Income after Taxes Number of Firms

Under $1 million 102
$1 million to $20 million 61
$20 million or more 37

a. What is the probability an advertising firm selected at random has under $1 million
in income after taxes?

b. What is the probability an advertising firm selected at random has either an in-
come between $1 million and $20 million, or an income of $20 million or more?
What rule of probability was applied?

14. The chair of the board of directors says, “There is a 50% chance this company will earn a
profit, a 30% chance it will break even, and a 20% chance it will lose money next quarter.”

a. Use an addition rule to find the probability the company will not lose money next
quarter.

b. Use the complement rule to find the probability it will not lose money next quarter.
15. Suppose the probability you will get an A in this class is .25 and the probability you

will get a B is .50. What is the probability your grade will be above a C?

E X E R C I S E S

A SURVEY OF PROBABILITY CONCEPTS 147

16. Two coins are tossed. If A is the event “two heads” and B is the event “two tails,” are
A and B mutually exclusive? Are they complements?

17. The probabilities of the events A and B are .20 and .30, respectively. The probability
that both A and B occur is .15. What is the probability of either A or B occurring?

18. Let P(X) = .55 and P(Y) = .35. Assume the probability that they both occur is .20.
What is the probability of either X or Y occurring?

19. Suppose the two events A and B are mutually exclusive. What is the probability of
their joint occurrence?

20. A student is taking two courses, history and math. The probability the student will
pass the history course is .60, and the probability of passing the math course is .70.
The probability of passing both is .50. What is the probability of passing at least one?

21. The aquarium at Sea Critters Depot contains 140 fish. Eighty of these fish are green
swordtails (44 female and 36 male) and 60 are orange swordtails (36 female and
24 males). A fish is randomly captured from the aquarium:

a. What is the probability the selected fish is a green swordtail?
b. What is the probability the selected fish is male?
c. What is the probability the selected fish is a male green swordtail?
d. What is the probability the selected fish is either a male or a green swordtail?

22. A National Park Service survey of visitors to the Rocky Mountain region revealed
that 50% visit Yellowstone Park, 40% visit the Tetons, and 35% visit both.

a. What is the probability a vacationer will visit at least one of these attractions?
b. What is the probability .35 called?
c. Are the events mutually exclusive? Explain.

RULES OF MULTIPLICATION
TO CALCULATE PROBABILITY
In this section, we discuss the rules for computing the likelihood that two events both
happen, or their joint probability. For example, 16% of the 2016 tax returns were pre-
pared by H&R Block and 75% of those returns showed a refund. What is the likelihood
a person’s tax form was prepared by H&R Block and the person received a refund?
Venn diagrams illustrate this as the intersection of two events. To find the likelihood of
two events happening, we use the rules of multiplication. There are two rules of multipli-
cation: the special rule and the general rule.

Special Rule of Multiplication
The special rule of multiplication requires that two events A and B are independent.
Two events are independent if the occurrence of one event does not alter the probabil-
ity of the occurrence of the other event.

LO5-4
Calculate probabilities
using the rules of
multiplication.

INDEPENDENCE The occurrence of one event has no effect on the probability of
the occurrence of another event.

One way to think about independence is to assume that events A and B occur at differ-
ent times. For example, when event B occurs after event A occurs, does A have any effect
on the likelihood that event B occurs? If the answer is no, then A and B are independent
events. To illustrate independence, suppose two coins are tossed. The outcome of a coin
toss (head or tail) is unaffected by the outcome of any other prior coin toss (head or tail).

For two independent events A and B, the probability that A and B will both occur is
found by multiplying the two probabilities. This is the special rule of multiplication and
is written symbolically as:

SPECIAL RULE OF MULTIPLICATION P(A and B) = P(A)P(B) [5–5]

148 CHAPTER 5

For three independent events, A, B, and C, the special rule of multiplication used to
determine the probability that all three events will occur is:

P(A and B and C) = P(A)P(B)P(C)

STATISTICS IN ACTION

In 2000 George W. Bush
won the U.S. presidency by
the slimmest of margins.
Many election stories
resulted, some involving
voting irregularities, others
raising interesting election
questions. In a local Michigan
election, there was a tie
between two candidates
for an elected position. To
break the tie, the candi-
dates drew a slip of paper
from a box that contained
two slips of paper, one
marked “Winner” and the
other unmarked. To deter-
mine which candidate
drew first, election officials
flipped a coin. The winner
of the coin flip also drew the
winning slip of paper. But
was the coin flip really
necessary? No, because
the two events are indepen-
dent. Winning the coin flip
did not alter the probability
of either candidate drawing
the winning slip of paper.

From experience, Teton Tire knows the probability is .95 that a particular XB-70 tire will last
60,000 miles before it becomes bald or fails. An adjustment is made on any tire that does
not last 60,000 miles. You purchase four XB-70s. What is the probability all four tires will
last at least 60,000 miles?

S E L F - R E V I E W 5–5

General Rule of Multiplication
If two events are not independent, they are referred to as dependent. To illustrate depen-
dency, suppose there are 10 cans of soda in a cooler; 7 are regular and 3 are diet. A can
is selected from the cooler. The probability of selecting a can of diet soda is 3/10, and the
probability of selecting a can of regular soda is 7/10. Then a second can is selected from
the cooler, without returning the first. The probability the second is diet depends on
whether the first one selected was diet or not. The probability that the second is diet is:

2/9, if the first can is diet. (Only two cans of diet soda remain in the cooler.)
3/9, if the first can selected is regular. (All three diet sodas are still in the cooler.)

E X A M P L E

A survey by the American Automobile Association (AAA) revealed 60% of its mem-
bers made airline reservations last year. Two members are selected at random.
What is the probability both made airline reservations last year?

S O L U T I O N

The probability the first member made an airline reservation last year is .60, written
P(R1) = .60, where R1 refers to the fact that the first member made a reservation.
The probability that the second member selected made a reservation is also .60, so
P(R2) = .60. Because the number of AAA members is very large, you may assume
that R1 and R2 are independent. Consequently, using formula (5–5), the probability
they both make a reservation is .36, found by:

P(R1 and R2) = P(R1)P(R2) = (.60)(.60) = .36

All possible outcomes can be shown as follows. R means a reservation is made,
and ∼R means no reservation is made.

With the probabilities and the complement rule, we can compute the joint prob-
ability of each outcome. For example, the probability that neither member makes a
reservation is .16. Further, the probability of the first or the second member (special
addition rule) making a reservation is .48 (.24 + .24). You can also observe that the
outcomes are mutually exclusive and collectively exhaustive. Therefore, the proba-
bilities sum to 1.00.

Outcomes Joint Probability

R1 R2 (.60)(.60) = .36
R1 ∼R2 (.60)(.40) = .24
∼R1 R2 (.40)(.60) = .24
∼R1 ∼R2 (.40)(.40) = .16
Total 1.00

A SURVEY OF PROBABILITY CONCEPTS 149

The fraction 2/9 (or 3/9) is called a conditional probability because its value is conditional
on (dependent on) whether a diet or regular soda was the first selection from the cooler.

CONDITIONAL PROBABILITY The probability of a particular event occurring,
given that another event has occurred.

In the general rule of multiplication, the conditional probability is required to compute
the joint probability of two events that are not independent. For two events, A and B,
that are not independent, the conditional probability is represented as P(B | A), and ex-
pressed as the probability of B given A. Or the probability of B is conditional on the oc-
currence and effect of event A. Symbolically, the general rule of multiplication for two
events that are not independent is:

GENERAL RULE OF MULTIPLICATION P(A and B) = P(A)P(B | A) [5–6]

We can extend the general rule of multiplication to more than two events. For three
events A, B, and C, the formula is:

P(A and B and C) = P(A)P(B | A)P(C | A and B)

In the case of the golf shirt example, the probability of selecting three white shirts with-
out replacement is:

P(W1 and W2 and W3) = P(W1)P(W2 | W1)P(W3 | W1 and W2) = (
9
12)(

8
11)(

7
10) = .38

So the likelihood of selecting three shirts without replacement and all being white is .38.

E X A M P L E

A golfer has 12 golf shirts in his closet. Suppose 9 of these shirts are white and the
others blue. He gets dressed in the dark, so he just grabs a shirt and puts it on. He
plays golf two days in a row and does not launder and return the used shirts to the
closet. What is the likelihood both shirts selected are white?

S O L U T I O N

The event that the first shirt selected is white is W1. The probability is P(W1) = 9/12
because 9 of the 12 shirts are white. The event that the second shirt selected is
also white is identified as W2. The conditional probability that the second shirt
selected is white, given that the first shirt selected is also white, is P(W2 | W1) = 8/11.
Why is this so? Because after the first shirt is selected, there are only 11 shirts
remaining in the closet and 8 of these are white. To determine the probability of
2 white shirts being selected, we use formula (5–6).

P(W1 and W2) = P(W1)P(W2 | W1) = (
9
12)(

8
11) = .55

So the likelihood of selecting two shirts and finding them both to be white is .55.

150 CHAPTER 5

The board of directors of Tarbell Industries consists of eight men and four women. A
four-member search committee is to be chosen at random to conduct a nationwide search
for a new company president.
(a) What is the probability all four members of the search committee will be women?
(b) What is the probability all four members will be men?
(c) Does the sum of the probabilities for the events described in parts (a) and (b) equal 1?

Explain.

S E L F - R E V I E W 5–6

CONTINGENCY TABLES
Often we tally the results of a survey in a two-way table and use the results of this tally
to determine various probabilities. We described this idea on page 116 in Chapter 4. To
review, we refer to a two-way table as a contingency table.

LO5-5
Compute probabilities
using a contingency
table.

CONTINGENCY TABLE A table used to classify sample observations according to
two or more identifiable categories or classes.

A contingency table is a cross-tabulation that simultaneously summarizes two
variables of interest and their relationship. The level of measurement can be nominal.
Below are several examples.

• One hundred fifty adults were asked their gender and the number of Facebook
accounts they used. The following table summarizes the results.

Gender

Facebook Accounts Men Women Total

0 20 40 60
1 40 30 70
2 or more 10 10 20

Total 70 80 150

• The American Coffee Producers Association reports the following information on
age and the amount of coffee consumed in a month.

Coffee Consumption

Age (Years) Low Moderate High Total

Under 30 36 32 24 92
30 up to 40 18 30 27 75
40 up to 50 10 24 20 54
50 and over 26 24 29 79

Total 90 110 100 300

According to this table, each of the 300 respondents is classified according to two
criteria: (1) age and (2) the amount of coffee consumed.

The following example shows how the rules of addition and multiplication are used
when we employ contingency tables.

A SURVEY OF PROBABILITY CONCEPTS 151

E X A M P L E

Last month, the National Association of Theater Managers conducted a survey of 500
randomly selected adults. The survey asked respondents their age and the number
of times they saw a movie in a theater. The results are summarized in Table 5–1.

TABLE 5–1 Number of Movies Attended per Month by Age

Age

Less than 30 30 up to 60 60 or Older
Movies per Month B1 B2 B3 Total

0 A1 15 50 10 75
1 or 2 A2 25 100 75 200
3, 4, or 5 A3 55 60 60 175
6 or more A4 5 15 30 50

Total 100 225 175 500

The association is interested in understanding the probabilities that an adult will see
a movie in a theater, especially for adults 60 and older. This information is useful for
making decisions regarding discounts on tickets and concessions for seniors.
Determine the probability of:

1. Selecting an adult who attended 6 or more movies per month.
2. Selecting an adult who attended 2 or fewer movies per month.
3. Selecting an adult who attended 6 or more movies per month or is 60 years of

age or older.
4. Selecting an adult who attended 6 or more movies per month given the person

is 60 years of age or older.
5. Selecting an adult who attended 6 or more movies per month and is 60 years

of age or older.

Determine the independence of:

6. Number of movies per month attended and the age of the adult.

S O L U T I O N

Table 5–1 is called a contingency table. In a contingency table, an individual or an
object is classified according to two criteria. In this example, a sampled adult is classi-
fied by age and by the number of movies attended per month. The rules of addition
[formulas (5–2) and (5–4)] and the rules of multiplication [formulas (5–5) and (5–6)]
allow us to answer the various probability questions based on the contingency table.

1. To find the probability that a randomly selected adult attended 6 or more mov-
ies per month, focus on the row labeled “6 or more” (also labeled A4) in Table
5–1. The table shows that 50 of the total of 500 adults are in this class. Using
the empirical approach, the probability is computed:

P(6 or more) = P(A4) =
50

500
= .10

This probability indicates 10% of the 500 adults attend 6 or more movies per
month.

2. To determine the probability of randomly selecting an adult who went to 2 or
fewer movies per month, two outcomes must be combined: attending 0 mov-
ies per month and attending 1 or 2 movies per month. These two outcomes
are mutually exclusive. That is, a person can only be classified as attending 0

152 CHAPTER 5

movies per month, or 1 or 2 movies per month, not both. Because the two
outcomes are mutually exclusive, we use the special rule of addition [formula
(5–2)] by adding the probabilities of attending no movies and attending 1 or
2 movies:

P [(attending 0) or (attending 1 or 2)] = P(A1) + P(A2) = (
75

500
+

200
500) = .55

So 55% of the adults in the sample attended 2 or fewer movies a month.
3. To determine the probability of randomly selecting an adult who went to “6 or

more” movies per month or whose age is “60 or older,” we again use the
rules of addition. However, in this case the outcomes are not mutually exclu-
sive. Why is this? Because a person can attend more than 6 movies per
month, be 60 or older, or be both. So the two groups are not mutually exclu-
sive because it is possible that a person would be counted in both groups. To
determine this probability, the general rule of addition [formula (5–4)] is used.

P [(6 or more) or (60 or older)] = P(A4) + P(B3) − P(A4 and B3)

= (
50

500
+

175
500

−
30

500) = .39

So 39% of the adults are either 60 or older, attend 6 or more movies per month,
or both.

4. To determine the probability of selecting a person who attends 6 or more mov-
ies per month given that the person is 60 or older, focus only on the column
labeled B3 in Table 5–1. That is, we are only interested in the 175 adults who
are 60 or older. Of these 175 adults, 30 attended 6 or more movies. Using the
general rule of multiplication [formula (5–6)]:

P[(6 or more) given (60 or older)] = P(A4 | B3) =
30
175

= .17

Of the 500 adults, 17% of adults who are 60 or older attend 6 or more movies
per month. This is called a conditional probability because the probability is
based on the “condition” of being the age of 60 or older. Recall that in part (1),
10% of all adults attend 6 or more movies per month; here we see that 17% of
adults who are 60 or older attend movies. This is valuable information for the-
ater managers regarding the characteristics of their customers.

5. The probability a person attended 6 or more movies and is 60 or older is based
on two conditions and they must both happen. That is, the two outcomes “6 or
more movies” (A4) and “60 or older” (B3) must occur jointly. To find this joint
probability we use the special rule of multiplication [formula (5–6)].

P[(6 or more) and (60 or older)] = P(A4 and B3) = P(A4)P(B3| A4)

To compute the joint probability, first compute the simple probability of the first
outcome, A4, randomly selecting a person who attends 6 or more movies. To
find the probability, refer to row A4 in Table 5–1. There are 50 of 500 adults
that attended 6 or more movies. So P(A4) = 50/500.

Next, compute the conditional probability P(B3 | A4). This is the probability of
selecting an adult who is 60 or older given that the person attended 6 or more
movies. The conditional probability is:

P[(60 or older) given (60 or more)] = P(B3| A4) = 30/50

Using these two probabilities, the joint probability that an adult attends 6 or
more movies and is 60 or older is:

P[(6 or more) and (60 or older)] = P(A4 and B3) = P(A4)P(B3| A4)
= (50/500)(30/50) = .06

A SURVEY OF PROBABILITY CONCEPTS 153

Based on the sample information from Table 5–1, the probability that an adult
is both over 60 and attended 6 or more movies is 6%. It is important to know
that the 6% is relative to all 500 adults.

Is there another way to determine this joint probability without using the
special rule of multiplication formula? Yes. Look directly at the cell where row
A4, attends 6 or more movies, and column B3, 60 or older, intersect. There are
30 adults in this cell that meet both criteria, so P(A4 and B3) = 30/500 = .06.
This is the same as computed with the formula.

6. Are the events independent? We can answer this question with the help of the
results in part 4. In part 4 we found the probability of selecting an adult who
was 60 or older given that the adult attended 6 or more movies was .17. If age
is not a factor in movie attendance then we would expect the probability of a
person who is 30 or less that attended 6 or more movies to also be 17%. That
is, the two conditional probabilities would be the same. The probability that an
adult attends 6 or more movies per month given the adult is less than 30 years
old is:

P[(6 or more) given (less than 30)] =
5

100
= .05

Because these two probabilities are not the same, the number of movies at-
tended and age are not independent. To put it another way, for the 500 adults,
age is related to the number of movies attended. In Chapter 15, we investigate
this concept of independence in greater detail.

Refer to Table 5–1 on page 151 to find the following probabilities.
(a) What is the probability of selecting an adult that is 30 up to 60 years old?
(b) What is the probability of selecting an adult who is under 60 years of age?
(c) What is the probability of selecting an adult who is less than 30 years old or attended

no movies?
(d) What is the probability of selecting an adult who is less than 30 years old and went to

no movies?

S E L F - R E V I E W 5–7

Tree Diagrams
A tree diagram is a visual that is helpful in organizing and calculating probabilities for
problems similar to the previous example/solution. This type of problem involves sev-
eral stages and each stage is illustrated with a branch of the tree. The branches of a
tree diagram are labeled with probabilities. We will use the information in Table 5–1 to
show the construction of a tree diagram.

1. We begin the construction by drawing a box with the variable, age, on the left to
represent the root of the tree (see Chart 5–2).

2. There are three main branches going out from the root. The upper branch rep-
resents the outcome that an adult is less than 30 years old. The branch is labeled
with the probability, P(B1) = 100/500. The next branch represents the outcome that
adults are 30 up to 60 years old. This branch is labeled with the probability P(B2) =
225/500. The remaining branch is labeled P(B3) = 175/500.

3. Four branches “grow” out of each of the four main branches. These branches rep-
resent the four categories of movies attended per month—0; 1 or 2; 3, 4, or 5; and
6 or more. The upper branches of the tree represent the conditional probabilities
that an adult did not attend any movies given they are less than 30 years old. These
are written P(A1 | B1), P(A2 | B1), P(A3 | B1), and P(A4 | B1) where A1 refers to attending
no movies; A2 attending 1 or 2 movies per month; A3 attending 3, 4, or 5 movies

154 CHAPTER 5

Age 30 up to 60years old

30 years old
or younger

60 years old
or older

175
500 5 .35

15
100 5.15

25
100 5.25

55
100 5.55

5
100 5.05

50
225 5.22

100
225 5.44

60
225 5.27

15
225 5.07

10
175 5.06

75
175 5.43

60
175 5.34

30
175 5.17

100
500 5 .20

225
500 5 .45

0 movies

1 or 2 movies

3, 4, or 5
movies

6 or more
movies

0 movies

1 or 2 movies

3, 4, or 5
movies

6 or more
movies

0 movies

1 or 2 movies

3, 4, or 5
movies

6 or more
movies

100
500

15
100 5 .033

100
500

25
100 5 .053

5 .11100500
55

1003

5 .01100500
5

1003

5 .10225500
50

2253

5 .20225500
100
2253

5 .12225500
60

2253

5 .03225500
15

2253

5 .02175500
10

1753

5 .15175500
75

1753

5 .12175500
60

1753

5 .06175500
30

1753

CHART 5–2 Tree Diagram Showing Age and Number of Movies Attended

per month; and A4 attending 6 or more movies per month. For the upper branch of
the tree, these probabilities are 15/100, 25/100, 55/100, and 5/100. We write the
conditional probabilities in a similar fashion on the other branches.

4. Finally we determine the various joint probabilities. For the top branches, the events
are an adult attends no movies per month and is 30 years old or younger; an adult
attends 1 or 2 movies and is 30 years old or younger; an adult attends 3, 4, or 5
movies per month and is 30 years old or younger; and an adult attends 6 or more
movies per month and is 30 years old or younger. These joint probabilities are
shown on the right side of Chart 5–2. To explain, the joint probability that a ran-
domly selected adult is less than 30 years old and attends 0 movies per month is:

P(B1 and A1) = P(B1)P(A1| B1) = (
100
500)(

15
100) = .03

The tree diagram summarizes all the probabilities based on the contingency table in
Table 5–1. For example, the conditional probabilities show that the 60-and-older
group has the highest percentage, 17%, attending 6 or movies per month. The

A SURVEY OF PROBABILITY CONCEPTS 155

30-to-60-year-old group has the highest percentage, 22%, of seeing no movies per
month. Based on the joint probabilities, 20% of the adults sampled attend 1 or
2 movies per month and are 30 up to 60 years of age. As you can see, there are
many observations that we can make based on the information presented in the
tree diagram.

Consumers were surveyed on the relative number of visits to a Sears store (often, occa-
sional, and never) and if the store was located in an enclosed mall (yes and no). When
variables are measured nominally, such as these data, the results are usually summarized
in a contingency table.

Enclosed Mall

Visits Yes No Total

Often 60 20 80
Occasional 25 35 60
Never 5 50 55

90 105 195

What is the probability of selecting a shopper who:

(a) Visited a Sears store often?
(b) Visited a Sears store in an enclosed mall?
(c) Visited a Sears store in an enclosed mall or visited a Sears store often?
(d) Visited a Sears store often, given that the shopper went to a Sears store in an

enclosed mall?

In addition:

(e) Are the number of visits and the enclosed mall variables independent?
(f) What is the probability of selecting a shopper who visited a Sears store often and it

was in an enclosed mall?
(g) Draw a tree diagram and determine the various joint probabilities.

S E L F - R E V I E W 5–8

23. Suppose P(A) = .40 and P(B | A) = .30. What is the joint probability of A and B?
24. Suppose P(X1) = .75 and P(Y2 | X1) = .40. What is the joint probability of X1 and Y2?
25. A local bank reports that 80% of its customers maintain a checking account, 60%

have a savings account, and 50% have both. If a customer is chosen at random, what
is the probability the customer has either a checking or a savings account? What is
the probability the customer does not have either a checking or a savings account?

26. All Seasons Plumbing has two service trucks that frequently need repair. If the prob-
ability the first truck is available is .75, the probability the second truck is available
is .50, and the probability that both trucks are available is .30, what is the probabil-
ity neither truck is available?

27. Refer to the following table.

First Event

Second Event A1 A2 A3 Total

B1 2 1 3 6
B2 1 2 1 4

Total 3 3 4 10

E X E R C I S E S

156 CHAPTER 5

a. Determine P(A1).
b. Determine P(B1 | A2).
c. Determine P(B2 and A3).

28. Three defective electric toothbrushes were accidentally shipped to a drugstore by
Cleanbrush Products along with 17 nondefective ones.

a. What is the probability the first two electric toothbrushes sold will be returned to
the drugstore because they are defective?

b. What is the probability the first two electric toothbrushes sold will not be
defective?

29. Each salesperson at Puchett, Sheets, and Hogan Insurance Agency is rated
either below average, average, or above average with respect to sales ability. Each
salesperson also is rated with respect to his or her potential for advancement—
either fair, good, or excellent. These traits for the 500 salespeople were cross-
classified into the following table.

Potential for Advancement

Sales Ability Fair Good Excellent

Below average 16 12 22
Average 45 60 45
Above average 93 72 135

a. What is this table called?
b. What is the probability a salesperson selected at random will have above aver-

age sales ability and excellent potential for advancement?
c. Construct a tree diagram showing all the probabilities, conditional probabilities,

and joint probabilities.
30. An investor owns three common stocks. Each stock, independent of the others, has

equally likely chances of (1) increasing in value, (2) decreasing in value, or (3) re-
maining the same value. List the possible outcomes of this experiment. Estimate the
probability at least two of the stocks increase in value.

31. A survey of 545 college students asked: What is your favorite winter sport?
And, what type of college do you attend? The results are summarized below:

Favorite Winter Sport

College Type Snowboarding Skiing Ice Skating Total

Junior College 68 41 46 155
Four-Year College 84 56 70 210
Graduate School 59 74 47 180
Total 211 171 163 545

Using these 545 students as the sample, a student from this study is randomly
selected.

a. What is the probability of selecting a student whose favorite sport is skiing?
b. What is the probability of selecting a junior-college student?
c. If the student selected is a four-year-college student, what is the probability that

the student prefers ice skating?
d. If the student selected prefers snowboarding, what is the probability that the

student is in junior college?
e. If a graduate student is selected, what is the probability that the student prefers

skiing or ice skating?
32. If you ask three strangers about their birthdays, what is the probability (a) All were

born on Wednesday? (b) All were born on different days of the week? (c) None was
born on Saturday?

A SURVEY OF PROBABILITY CONCEPTS 157

BAYES’ THEOREM
In the 18th century, Reverend Thomas Bayes, an English Presbyterian minister, pon-
dered this question: Does God really exist? Being interested in mathematics, he at-
tempted to develop a formula to arrive at the probability God does exist based on
evidence available to him on earth. Later Pierre-Simon Laplace refined Bayes’ work and
gave it the name “Bayes’ theorem.” The formula for Bayes’ theorem is:

LO5-6
Calculate probabilities
using Bayes’ theorem.

BAYES’ THEOREM P(Ai ∣ B) =
P(Ai)P(B ∣ Ai)

P(A1)P(B ∣ A1) + P(A2)P(B ∣ A2)
[5–7]

Assume in formula (5–7) that the events A1 and A2 are mutually exclusive and collectively
exhaustive, and Ai refers to either event A1 or A2. Hence A1 and A2 are in this case com-
plements. The meaning of the symbols used is illustrated by the following example.

Suppose 5% of the population of Umen, a fictional Third World country, have a dis-
ease that is peculiar to that country. We will let A1 refer to the event “has the disease”
and A2 refer to the event “does not have the disease.” Thus, we know that if we select a
person from Umen at random, the probability the individual chosen has the disease is
.05, or P(A1) = .05. This probability, P(A1) = P(has the disease) = .05, is called the prior
probability. It is given this name because the probability is assigned before any empiri-
cal data are obtained.

STATISTICS IN ACTION

A recent study by the
National Collegiate Athletic
Association (NCAA) reported
that of 150,000 senior boys
playing on their high school
basketball team, 64 would
make a professional team.
To put it another way, the
odds of a high school senior
basketball player making a
professional team are 1 in
2,344. From the same study:
1. The odds of a high

school senior basket-
ball player playing
some college basket-
ball are about 1 in 40.

2. The odds of a high
school senior playing
college basketball as
a senior in college are
about 1 in 60.

3. If you play basketball
as a senior in college,
the odds of making a
professional team are
about 1 in 37.5.

PRIOR PROBABILITY The initial probability based on the present level of
information.

POSTERIOR PROBABILITY A revised probability based on additional information.

The prior probability a person is not afflicted with the disease is therefore .95, or
P(A2) = .95, found by 1 − .05.

There is a diagnostic technique to detect the disease, but it is not very accurate. Let
B denote the event “test shows the disease is present.” Assume that historical evidence
shows that if a person actually has the disease, the probability that the test will indicate
the presence of the disease is .90. Using the conditional probability definitions devel-
oped earlier in this chapter, this statement is written as:

P(B | A1) = .90

Assume the probability is .15 that for a person who actually does not have the disease
the test will indicate the disease is present.

P(B | A2) = .15

Let’s randomly select a person from Umen and perform the test. The test results in-
dicate the disease is present. What is the probability the person actually has the disease?
In symbolic form, we want to know P(A1 | B), which is interpreted as: P(has the disease |
the test results are positive). The probability P(A1 | B) is called a posterior probability.

With the help of Bayes’ theorem, formula (5–7), we can determine the posterior
probability.

P(A1 ∣ B) =
P(A1)P(B ∣ A1)

P(A1)P(B ∣ A1) + P(A2)P(B ∣ A2)

=
(.05) (.90)

(.05) (.90) + (.95) (.15)
=

.0450
.1875

= .24

158 CHAPTER 5

So the probability that a person has the disease, given that he or she tested posi-
tive, is .24. How is the result interpreted? If a person is selected at random from the
population, the probability that he or she has the disease is .05. If the person is tested
and the test result is positive, the probability that the person actually has the disease is
increased about fivefold, from .05 to .24.

In the preceding problem, we had only two mutually exclusive and collectively ex-
haustive events, A1 and A2. If there are n such events, A1, A2, . . ., An, Bayes’ theorem,
formula (5–7), becomes

P(Ai ∣ B) =
P(Ai)P(B ∣ Ai)

P(A1)P(B ∣ A1) + P(A2)P(B ∣ A2) + … + P(An)P(B ∣ An)

With the preceding notation, the calculations for the Umen problem are summarized in
the following table.

Prior Conditional Joint Posterior
Event, Probability, Probability, Probability, Probability,

Ai P(Ai ) P(B ∣ Ai ) P(Ai and B) P(Ai ∣ B)
Disease, A1 .05 .90 .0450 .0450/.1875 = .24
No disease, A2 .95 .15 .1425 .1425/.1875 = .76
P(B) = .1875 1.00

Another illustration of Bayes’ theorem follows.

E X A M P L E

A manufacturer of cell phones purchases a microchip, called
the LS-24, from three suppliers: Hall Electronics, Schuller
Sales, and Crawford Components. Forty-five percent of the
LS-24 chips are purchased from Hall Electronics, 30% from
Schuller Sales, and the remaining 25% from Crawford Compo-
nents. The manufacturer has extensive histories on the three
suppliers and knows that 3% of the LS-24 chips from Hall Elec-
tronics are defective, 6% of chips from Schuller Sales are de-
fective, and 4% of the chips purchased from Crawford
Components are defective.

When the LS-24 chips arrive from the three suppliers, they
are placed directly in a bin and not inspected or otherwise
identified by supplier. A worker selects a chip for installation
and finds it defective. What is the probability that it was manu-
factured by Schuller Sales?

S O L U T I O N

As a first step, let’s summarize some of the information given in the problem
statement.

• There are three mutually exclusive and collectively exhaustive events, that is,
three suppliers.

A1 The LS-24 was purchased from Hall Electronics.
A2 The LS-24 was purchased from Schuller Sales.
A3 The LS-24 was purchased from Crawford Components.

© McGraw-Hill Education/ Marker
Dierker, photographer

A SURVEY OF PROBABILITY CONCEPTS 159

• The prior probabilities are:

P(A1) = .45 The probability the LS-24 was manufactured by Hall
Electronics.

P(A2) = .30 The probability the LS-24 was manufactured by Schuller
Sales.

P(A3) = .25 The probability the LS-24 was manufactured by Crawford
Components.

• The additional information can be either:

B1 The LS-24 is defective, or
B2 The LS-24 is not defective.

• The following conditional probabilities are given.

P(B1 | A1) = .03 The probability that an LS-24 chip produced by Hall Elec-
tronics is defective.

P(B1 | A2) = .06 The probability that an LS-24 chip produced by Schuller
Sales is defective.

P(B1 | A3) = .04 The probability that an LS-24 chip produced by Crawford
Components is defective.

• A chip is selected from the bin. Because the chips are not identified by supplier,
we are not certain which supplier manufactured the chip. We want to deter-
mine the probability that the defective chip was purchased from Schuller Sales.
The probability is written P(A2 | B1).

Look at Schuller’s quality record. It is the worst of the three suppliers. They produce
30 percent of the product, but 6% are defective. Now that we have found a defec-
tive LS-24 chip, we suspect that P(A2 | B1) is greater than the 30% of P(A2). That is,
we expect the revised probability to be greater than .30. But how much greater?
Bayes’ theorem can give us the answer. As a first step, consider the tree diagram in
Chart 5–3.

B1 = Defective

B2 = Good

B1 = Defective

B2 = Good

B1 = Defective

B2 = Good

P (B1| A1) = .03

P (B2| A1) = .97

P (B1| A2) = .06

P (B2|A2) = .94

P (B1|A3) = .04

P (B2|A3) = .96

A2 = Schuller
P (A2) = .30

A1 = Hall
P (A1) = .45

A3 = Crawford
P (A3) = .25

Joint probability
Conditional probability

Prior probability

P (A1 and B1) = P (A1) P (B1| A1)
= (.45) (.03) = .0135

P (A1 and B2) = P (A1) P (B2|A1)
= (.45) (.97) = .4365

P (A2 and B1) = P (A2) P (B1|A2)
= (.30) (.06) = .0180

P (A2 and B2) = P (A2) P (B2|A2)
= (.30) (.94) = .2820

P (A3 and B1) = P (A3) P (B1|A3)
= (.25) (.04) = .0100

P (A3 and B2) = P (A3) P (B2|A3)
= (.25) (.96) = .2400

Total 1.000

CHART 5–3 Tree Diagram of the Cell Phone Manufacturing Problem

160 CHAPTER 5

REVISION OF CHART 5-3
The events are dependent, so the prior probability in the first branch is multiplied
by the conditional probability in the second branch to obtain the joint probability.
The joint probability is reported in the last column of Chart 5–3. To construct the
tree diagram of Chart 5–3, we used a time sequence that moved from the supplier
to the determination of whether the chip was defective.

What we need to do is reverse the time process. That is, instead of mov-
ing from left to right in Chart 5–3, we need to move from right to left. We have a
defective chip, and we want to determine the likelihood that it was purchased
from Schuller Sales. How is that accomplished? We first look at the joint probabil-
ities as relative frequencies out of 10,000 cases. For example, the likelihood of
a defective LS-24 chip that was produced by Hall Electronics is .0135. So of
10,000 cases, we would expect to find 135 defective chips produced by Hall
Electronics. We observe that in 415 of 10,000 cases the LS-24 chip selected for
assembly is defective, found by 135 + 180 + 100. Of these 415 defective chips,
180 were produced by Schuller Sales. Thus, the probability that the defective
LS-24 chip was purchased from Schuller Sales is 180/415 = .4337. We have now
determined the revised probability of P(A2 | B1). Before we found the defective
chip, the likelihood that it was purchased from Schuller Sales was .30. This likeli-
hood has been increased to .4337. What have we accomplished by using Bayes’
Theorem? Once we found the defective part, we conclude that it is much more
likely it is a product of Schuller Sales. The increase in the probability is rather
dramatic moving from .30 to .4337.

This information is summarized in the following table.

Prior Conditional Joint Posterior
Event, Probability, Probability, Probability, Probability,

Ai P(Ai ) P(B1 ∣ Ai ) P(Ai and B1) P(Ai ∣ B1)
Hall .45 .03 .0135 .0135/.0415 = .3235
Schuller .30 .06 .0180 .0180/.0415 = .4337
Crawford .25 .04 .0100 .0100/.0415 = .2410
P(B1) = .0415 1.0000

The probability the defective LS-24 chip came from Schuller Sales can be formally
found by using Bayes’ theorem. We compute P(A2 | B1), where A2 refers to Schuller
Sales and B1 to the fact that the selected LS-24 chip was defective.

P(A2 ∣ B1) =
P(A2)P(B1 ∣ A2)

P(A1)P(B1 ∣ A1) + P(A2)P(B1 ∣ A2) + P(A3) (B1 ∣ A3)

=
(.30) (.06)

(.45) (.03) + (.30) (.06) + (.25) (.04)
=

.0180
.04850

= .4337

This is the same result obtained from Chart 5–3 and from the conditional probabil-
ity table.

Refer to the preceding example and solution.

(a) Design a formula to find the probability the part selected came from Crawford Compo-
nents, given that it was a good chip.

(b) Compute the probability using Bayes’ theorem.

S E L F - R E V I E W 5–9

A SURVEY OF PROBABILITY CONCEPTS 161

PRINCIPLES OF COUNTING
If the number of possible outcomes in an experiment is small, it is relatively easy to count
them. There are six possible outcomes, for example, resulting from the roll of a die, namely:

If, however, there are a large number of possible outcomes, such as the number of
heads and tails for an experiment with 10 tosses, it would be tedious to count all the
possibilities. They could have all heads, one head and nine tails, two heads and eight
tails, and so on. To facilitate counting, we describe three formulas: the multiplication
formula (not to be confused with the multiplication rule described earlier in the chapter),
the permutation formula, and the combination formula.

The Multiplication Formula
We begin with the multiplication formula.

LO5-7
Determine the number of
outcomes using principles
of counting.

MULTIPLICATION FORMULA If there are m ways of doing one thing and n ways of
doing another thing, there are m x n ways of doing both.

In terms of a formula:

MULTIPLICATION FORMULA Total number of arrangements = (m)(n) [5–8]

This can be extended to more than two events. For three events m, n, and o:

Total number of arrangements = (m)(n)(o)

33. P(A1) = .60, P(A2) = .40, P(B1 | A1) = .05, and P(B1 | A2) = .10. Use Bayes’ theorem to
determine P(A1 | B1).

34. P(A1) = .20, P(A2) = .40, P(A3) = .40, P(B1 | A1) = .25, P(B1 | A2) = .05, and P(B1 | A3) = .10.
Use Bayes’ theorem to determine P(A3 | B1).

35. The Ludlow Wildcats baseball team, a minor league team in the Cleveland Indians or-
ganization, plays 70% of their games at night and 30% during the day. The team wins
50% of their night games and 90% of their day games. According to today’s news-
paper, they won yesterday. What is the probability the game was played at night?

36. Dr. Stallter has been teaching basic statistics for many years. She knows that 80%
of the students will complete the assigned problems. She has also determined that
among those who do their assignments, 90% will pass the course. Among those
students who do not do their homework, 60% will pass. Mike Fishbaugh took statis-
tics last semester from Dr. Stallter and received a passing grade. What is the proba-
bility that he completed the assignments?

37. The credit department of Lion’s Department Store in Anaheim, California, reported
that 30% of their sales are cash, 30% are paid with a credit card, and 40% with a
debit card. Twenty percent of the cash purchases, 90% of the credit card purchases,
and 60% of the debit card purchases are for more than $50. Ms. Tina Stevens just
purchased a new dress that cost $120. What is the probability that she paid cash?

38. One-fourth of the residents of the Burning Ridge Estates leave their garage doors
open when they are away from home. The local chief of police estimates that 5% of
the garages with open doors will have something stolen, but only 1% of those
closed will have something stolen. If a garage is robbed, what is the probability the
doors were left open?

E X E R C I S E S

162 CHAPTER 5

E X A M P L E

An automobile dealer wants to advertise that for $29,999 you can buy a convert-
ible, a two-door sedan, or a four-door model with your choice of either wire wheel
covers or solid wheel covers. Based on the number of models and wheel covers,
how many different vehicles can the dealer offer?

S O L U T I O N

Of course, the dealer could determine the total number of different cars by pictur-
ing and counting them. There are six.

Convertible with
wire wheel covers

Convertible with
solid wheel covers

Four-door with
wire wheel covers

Two-door with
solid wheel covers

Four-door with
solid wheel covers

Two-door with
wire wheel covers

We can employ the multiplication formula as a check (where m is the number of
models and n the wheel cover type). From formula (5–8):

Total possible arrangements = (m)(n) = (3)(2) = 6

It was not difficult to count all the possible model and wheel cover combinations
in this example. Suppose, however, that the dealer decided to offer eight models and
six types of wheel covers. It would be tedious to picture and count all the possible
alternatives. Instead, the multiplication formula can be used. In this case, there are
(m)(n) = (8)(6) = 48 possible arrangements.

Note in the preceding applications of the multiplication formula that there were two
or more groupings from which you made selections. The automobile dealer, for exam-
ple, offered a choice of models and a choice of wheel covers. If a home builder offered
you four different exterior styles of a home to choose from and three interior floor plans,
the multiplication formula would be used to find how many different arrangements were
possible. There are 12 possibilities.

1. The Women’s Shopping Network on cable TV offers sweaters and slacks for women.
The sweaters and slacks are offered in coordinating colors. If sweaters are available in
five colors and the slacks are available in four colors, how many different outfits can be
advertised?

S E L F - R E V I E W 5–10

A SURVEY OF PROBABILITY CONCEPTS 163

2. Pioneer manufactures three models of Wifi Internet radios, two MP3 docking stations,
four different sets of speakers, and three CD carousel changers. When the four types
of components are sold together, they form a “system.” How many different systems
can the electronics firm offer?

The Permutation Formula
The multiplication formula is applied to find the number of possible arrangements for
two or more groups. In contrast, we use the permutation formula to find the number of
possible arrangements when there is a single group of objects. Illustrations of this type
of problem are:

• Three electronic parts, a transistor, an LED, and a synthesizer, are assembled into a
plug-in component for a HDTV. The parts can be assembled in any order. How
many different ways can the three parts be assembled?

• A machine operator must make four safety checks before starting his machine. It
does not matter in which order the checks are made. In how many different ways
can the operator make the checks?

One order for the first illustration might be the transistor first, the LED second, and the
synthesizer third. This arrangement is called a permutation.

PERMUTATION Any arrangement of r objects selected from a single group of n
possible objects.

Note that the arrangements a b c and b a c are different permutations. The formula to
count the total number of different permutations is:

where:
n is the total number of objects.
r is the number of objects selected.

Before we solve the two problems illustrated, the permutations and combinations (to be
discussed shortly) use a notation called n factorial. It is written n! and means the product
of n (n − 1 )(n − 2)(n − 3) ⋅ ⋅ ⋅ (1). For instance, 5! = 5 · 4 · 3 · 2 · 1 = 120.

Many of your calculators have a button with x! that will perform this calculation for
you. It will save you a great deal of time. For example the Texas Instrument Pro Scientific
calculator has the following key:

10x

LOG

x !

It is the “third function,” so check your users’ manual or the Internet for instructions.
The factorial notation can also be canceled when the same number appears in both

the numerator and the denominator, as shown below.

6!3!
4!

=
6 · 5 · 4 · 3 · 2 · 1 (3 · 2 · 1)

4 · 3 · 2 · 1
= 180

By definition, zero factorial, written 0!, is 1. That is, 0! = 1.

PERMUTATION FORMULA n Pr =
n!

(n − r)!
[5–9]

164 CHAPTER 5

The Combination Formula
If the order of the selected objects is not important, any selection is called a combination.
Logically, the number of combinations is always less than the number of permutations.
The formula to count the number of r object combinations from a set of n objects is:

E X A M P L E

Referring to the group of three electronic parts that are to be assembled in any
order, in how many different ways can they be assembled?

S O L U T I O N

There are three electronic parts to be assembled, so n = 3. Because all three are to
be inserted into the plug-in component, r = 3. Solving using formula (5–9) gives:

n Pr =
n!

(n − r)!
=

3!
(3 − 3)!

=
3!
0!

=
3!
1

= 6

We can check the number of permutations arrived at by using the permutation for-
mula. We determine how many “spaces” have to be filled and the possibilities for
each “space.” In the problem involving three electronic parts, there are three loca-
tions in the plug-in unit for the three parts. There are three possibilities for the first
place, two for the second (one has been used up), and one for the third, as follows:

(3)(2)(1) = 6 permutations

The six ways in which the three electronic parts, lettered A, B, C, can be arranged are:

ABC BAC CAB ACB BCA CBA

In the previous example, we selected and arranged all the objects, that is n = r. In
many cases, only some objects are selected and arranged from the n possible
objects. We explain the details of this application in the following example.

E X A M P L E

The Fast Media Company is producing a one-minute video advertisement. In the pro-
duction process, eight different video segments were made. To make the one-minute
ad, they can only select three of the eight segments. How many different ways can
the eight video segments be arranged in the three spaces available in the ad?

S O L U T I O N

There are eight possibilities for the first available space in the ad, seven for the
second space (one has been used up), and six for the third space. Thus:

(8)(7)(6) = 336,

that is, there are a total of 336 different possible arrangements. This could also be
found by using formula (5–9). If n = 8 video segments and r = 3 spaces available,
the formula leads to

n Pr =
n!

(n − r)!
=

8!
(8 − 3)!

=
8!
5!

=
(8) (7) (6)5!

5!
= 336

COMBINATION FORMULA nCr =
n!

r!(n − r)!
[5–10]

A SURVEY OF PROBABILITY CONCEPTS 165

For example, if executives Able, Baker, and Chauncy are to be chosen as a committee
to negotiate a merger, there is only one possible combination of these three; the com-
mittee of Able, Baker, and Chauncy is the same as the committee of Baker, Chauncy,
and Able. Using the combination formula:

nCr =
n!

r!(n − r)!
=

3 · 2 · 1
3 · 2 · 1(1)

= 1

E X A M P L E

The Grand 16 movie theater uses teams of three employees to work the conces-
sion stand each evening. There are seven employees available to work each eve-
ning. How many different teams can be scheduled to staff the concession stand?

S O L U T I O N

According to formula (5–10), there are 35 combinations, found by

7C3 =
n!

r!(n − r)!
=

7!
3!(7 − 3)!

=
7!

3!4!
= 35

The seven employees taken three at a time would create the possibility of 35 differ-
ent teams.

When the number of permutations or combinations is large, the calculations are tedious.
Computer software and handheld calculators have “functions” to compute these num-
bers. The Excel output for the selection of three video segments for the eight available
at the Fast Media Company is shown below. There are a total of 336 arrangements.

Below is the output for the number of teams at the Grand 16 movie theater. Three em-
ployees are chosen from seven possible employees.

166 CHAPTER 5

1. A musician wants to write a score based on only five chords: B-flat, C, D, E, and G.
However, only three chords out of the five will be used in succession, such as C, B-flat,
and E. Repetitions, such as B-flat, B-flat, and E, will not be permitted.

(a) How many permutations of the five chords, taken three at a time, are possible?
(b) Using formula (5–9), how many permutations are possible?
2. The 10 numbers 0 through 9 are to be used in code groups of four to identify an item

of clothing. Code 1083 might identify a blue blouse, size medium; the code group
2031 might identify a pair of pants, size 18; and so on. Repetitions of numbers are not
permitted. That is, the same number cannot be used twice (or more) in a total
sequence. For example, 2256, 2562, or 5559 would not be permitted. How many
different code groups can be designed?

3. In the preceding example/solution involving the Grand 16 movie theater, there were
35 possible teams of three taken from seven employees.

(a) Use formula (5–10) to show this is true.
(b) The manager of the theater wants to plan for staffing the concession stand with

teams of five employees on the weekends to serve the larger crowds. From the
seven employees, how many teams of five employees are possible?

4. In a lottery game, three numbers are randomly selected from a tumbler of balls num-
bered 1 through 50.

(a) How many permutations are possible?
(b) How many combinations are possible?

S E L F - R E V I E W 5–11

39. Solve the following:
a. 40!/35!
b. 7P4
c. 5C2

40. Solve the following:
a. 20!/17!
b. 9P3
c. 7C2

41. A pollster randomly selected 4 of 10 available people. How many different groups
of 4 are possible?

42. A telephone number consists of seven digits, the first three representing the ex-
change. How many different telephone numbers are possible within the 537
exchange?

43. An overnight express company must include five cities on its route. How many dif-
ferent routes are possible, assuming that it does not matter in which order the cities
are included in the routing?

44. A representative of the Environmental Protection Agency (EPA) wants to select
samples from 10 landfills. The director has 15 landfills from which she can collect
samples. How many different samples are possible?

45. Sam Snead’s restaurant in Conway, South Carolina, offers an early bird special
from 4–6 pm each week day evening. If each patron selects a Starter Selection (4
options), an Entrée (8 options), and a Dessert (3 options), how many different
meals are possible?

46. A company is creating three new divisions and seven managers are eligible
to be appointed head of a division. How many different ways could the three
new heads be appointed? Hint: Assume the division assignment makes a
difference.

E X E R C I S E S

C H A P T E R S U M M A R Y

I. A probability is a value between 0 and 1 inclusive that represents the likelihood a partic-
ular event will happen.
A. An experiment is the observation of some activity or the act of taking some

measurement.
B. An outcome is a particular result of an experiment.
C. An event is the collection of one or more outcomes of an experiment.

II. There are three definitions of probability.
A. The classical definition applies when there are n equally likely outcomes to an

experiment.
B. The empirical definition occurs when the number of times an event happens is

divided by the number of observations.
C. A subjective probability is based on whatever information is available.

III. Two events are mutually exclusive if by virtue of one event happening the other cannot
happen.

IV. Events are independent if the occurrence of one event does not affect the occurrence
of another event.

V. The rules of addition refer to the probability that any of two or more events can occur.
A. The special rule of addition is used when events are mutually exclusive.

P(A or B) = P(A) + P(B) [5–2]
B. The general rule of addition is used when the events are not mutually exclusive.

P(A or B) = P(A) + P(B) − P(A and B) [5–4]
C. The complement rule is used to determine the probability of an event happening by

subtracting the probability of the event not happening from 1.

P(A) = 1 − P(~A) [5–3]
VI. The rules of multiplication are applied when two or more events occur simultaneously.

A. The special rule of multiplication refers to events that are independent.

P(A and B) = P(A)P(B) [5–5]
B. The general rule of multiplication refers to events that are not independent.

P(A and B) = P(A)P(B | A) [5–6]
C. A joint probability is the likelihood that two or more events will happen at the same time.
D. A conditional probability is the likelihood that an event will happen, given that an-

other event has already happened.
E. Bayes’ theorem is a method of revising a probability, given that additional information

is obtained. For two mutually exclusive and collectively exhaustive events:

P(A1 ∣ B) =
P(A1)P(B ∣ A1)

P(A1)P(B ∣ A1) + P(A2)P(B ∣ A2)
[5–7]

VII. There are three counting rules that are useful in determining the number of outcomes in
an experiment.
A. The multiplication rule states that if there are m ways one event can happen and n

ways another event can happen, then there are mn ways the two events can happen.

Number of arrangements = (m)(n) [5–8]
B. A permutation is an arrangement in which the order of the objects selected from a

specific pool of objects is important.

n Pr =
n!

(n − r)!
[5–9]

C. A combination is an arrangement where the order of the objects selected from a
specific pool of objects is not important.

n Cr =
n!

r!(n − r)!
[5–10]

A SURVEY OF PROBABILITY CONCEPTS 167

168 CHAPTER 5

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

P(A) Probability of A P of A

P(∼A) Probability of not A P of not A
P(A and B) Probability of A and B P of A and B

P(A or B) Probability of A or B P of A or B

P(A | B) Probability of A given B has happened P of A given B

nPr Permutation of n items selected r at a time Pnr

nCr Combination of n items selected r at a time Cnr

C H A P T E R E X E R C I S E S

47. The marketing research department at Pepsico plans to survey teenagers about a newly
developed soft drink. Each will be asked to compare it with his or her favorite soft drink.
a. What is the experiment?
b. What is one possible event?

48. The number of times a particular event occurred in the past is divided by the number of
occurrences. What is this approach to probability called?

49. The probability that the cause and the cure for all cancers will be discovered before the
year 2020 is .20. What viewpoint of probability does this statement illustrate?

50. Berdine’s Chicken Factory has several stores in the Hilton Head, South Carolina,
area. When interviewing applicants for server positions, the owner would like to in-
clude information on the amount of tip a server can expect to earn per check (or bill).
A study of 500 recent checks indicated the server earned the following amounts in
tips per 8-hour shift.

Amount of Tip Number

$0 up to $ 20 200
20 up to 50 100
50 up to 100 75
100 up to 200 75
200 or more 50

Total 500

a. What is the probability of a tip of $200 or more?
b. Are the categories “$0 up to $20,” “$20 up to $50,” and so on considered mutually

exclusive?
c. If the probabilities associated with each outcome were totaled, what would that total be?
d. What is the probability of a tip of up to $50?
e. What is the probability of a tip of less than $200?

51. Winning all three “Triple Crown” races is considered the greatest feat of a pedigree
racehorse. After a successful Kentucky Derby, Corn on the Cob is a heavy favorite at 2
to 1 odds to win the Preakness Stakes.
a. If he is a 2 to 1 favorite to win the Belmont Stakes as well, what is his probability of

winning the Triple Crown?
b. What do his chances for the Preakness Stakes have to be in order for him to be

“even money” to earn the Triple Crown?
52. The first card selected from a standard 52-card deck is a king.

a. If it is returned to the deck, what is the probability that a king will be drawn on the
second selection?

b. If the king is not replaced, what is the probability that a king will be drawn on the
second selection?

A SURVEY OF PROBABILITY CONCEPTS 169

c. What is the probability that a king will be selected on the first draw from the deck and
another king on the second draw (assuming that the first king was not replaced)?

53. Armco, a manufacturer of traffic light systems, found that under accelerated-life tests,
95% of the newly developed systems lasted 3 years before failing to change signals
properly.
a. If a city purchased four of these systems, what is the probability all four systems

would operate properly for at least 3 years?
b. Which rule of probability does this illustrate?
c. Using letters to represent the four systems, write an equation to show how you ar-

rived at the answer to part (a).
54. Refer to the following picture.

B

,B

a. What is the picture called?
b. What rule of probability is illustrated?
c. B represents the event of choosing a family that receives welfare payments. What

does P(B) + P(∼B) equal?
55. In a management trainee program at Claremont Enterprises, 80% of the trainees are

female and 20% male. Ninety percent of the females attended college, and 78% of the
males attended college.
a. A management trainee is selected at random. What is the probability that the person

selected is a female who did not attend college?
b. Are gender and attending college independent? Why?
c. Construct a tree diagram showing all the probabilities, conditional probabilities, and

joint probabilities.
d. Do the joint probabilities total 1.00? Why?

56. Assume the likelihood that any flight on Delta Airlines arrives within 15 minutes of the
scheduled time is .90. We randomly selected a Delta flight on four different days.
a. What is the likelihood all four of the selected flights arrived within 15 minutes of the

scheduled time?
b. What is the likelihood that none of the selected flights arrived within 15 minutes of

the scheduled time?
c. What is the likelihood at least one of the selected flights did not arrive within 15 min-

utes of the scheduled time?
57. There are 100 employees at Kiddie Carts International. Fifty-seven of the employees are

hourly workers, 40 are supervisors, 2 are secretaries, and the remaining employee is
the president. Suppose an employee is selected:
a. What is the probability the selected employee is an hourly worker?
b. What is the probability the selected employee is either an hourly worker or a

supervisor?
c. Refer to part (b). Are these events mutually exclusive?
d. What is the probability the selected employee is neither an hourly worker nor a

supervisor?
58. DJ LeMahieu of the Colorado Rockies had the highest batting average in the 2016 Major

League Baseball season. His average was .348. So assume the probability of getting a
hit is .348 for each time he batted. In a particular game, assume he batted three times.
a. This is an example of what type of probability?
b. What is the probability of getting three hits in a particular game?
c. What is the probability of not getting any hits in a game?
d. What is the probability of getting at least one hit?

170 CHAPTER 5

59. Four women’s college basketball teams are participating in a single-elimination holiday
basketball tournament. If one team is favored in its semifinal match by odds of 2 to 1
and another squad is favored in its contest by odds of 3 to 1, what is the probability that:
a. Both favored teams win their games?
b. Neither favored team wins its game?
c. At least one of the favored teams wins its game?

60. There are three clues labeled “daily double” on the game show Jeopardy. If three
equally matched contenders play, what is the probability that:
a. A single contestant finds all three “daily doubles”?
b. The returning champion gets all three of the “daily doubles”?
c. Each of the players selects precisely one of the “daily doubles”?

61. Brooks Insurance Inc. wishes to offer life insurance to men age 60 via the Internet. Mor-
tality tables indicate the likelihood of a 60-year-old man surviving another year is .98. If
the policy is offered to five men age 60:
a. What is the probability all five men survive the year?
b. What is the probability at least one does not survive?

62. Forty percent of the homes constructed in the Quail Creek area include a security sys-
tem. Three homes are selected at random:
a. What is the probability all three of the selected homes have a security system?
b. What is the probability none of the three selected homes has a security system?
c. What is the probability at least one of the selected homes has a security system?
d. Did you assume the events to be dependent or independent?

63. Refer to Exercise 62, but assume there are 10 homes in the Quail Creek area and 4 of
them have a security system. Three homes are selected at random:
a. What is the probability all three of the selected homes have a security system?
b. What is the probability none of the three selected homes has a security system?
c. What is the probability at least one of the selected homes has a security system?
d. Did you assume the events to be dependent or independent?

64. There are 20 families living in the Willbrook Farms Development. Of these families, 10
prepared their own federal income taxes for last year, 7 had their taxes prepared by a
local professional, and the remaining 3 by H&R Block.
a. What is the probability of selecting a family that prepared their own taxes?
b. What is the probability of selecting two families, both of which prepared their own taxes?
c. What is the probability of selecting three families, all of which prepared their own taxes?
d. What is the probability of selecting two families, neither of which had their taxes pre-

pared by H&R Block?
65. The board of directors of Saner Automatic Door Company consists of 12 members, 3 of

whom are women. A new policy and procedures manual is to be written for the com-
pany. A committee of three is randomly selected from the board to do the writing.
a. What is the probability that all members of the committee are men?
b. What is the probability that at least one member of the committee is a woman?

66. A recent survey reported in BloombergBusinessweek dealt with the salaries of CEOs
at large corporations and whether company shareholders made money or lost money.

CEO Paid More CEO Paid Less
Than $1 Million Than $1 Million Total

Shareholders made money 2 11 13
Shareholders lost money 4 3 7

Total 6 14 20

If a company is randomly selected from the list of 20 studied, what is the probability:
a. The CEO made more than $1 million?
b. The CEO made more than $1 million or the shareholders lost money?
c. The CEO made more than $1 million given the shareholders lost money?
d. Of selecting two CEOs and finding they both made more than $1 million?

67. Althoff and Roll, an investment firm in Augusta, Georgia, advertises extensively in the
Augusta Morning Gazette, the newspaper serving the region. The Gazette marketing

A SURVEY OF PROBABILITY CONCEPTS 171

staff estimates that 60% of Althoff and Roll’s potential market read the newspaper. It is
further estimated that 85% of those who read the Gazette remember the Althoff and Roll
advertisement.
a. What percent of the investment firm’s potential market sees and remembers the

advertisement?
b. What percent of the investment firm’s potential market sees, but does not remember,

the advertisement?
68. An Internet company located in Southern California has season tickets to the Los Angeles

Lakers basketball games. The company president always invites one of the four vice
presidents to attend games with him, and claims he selects the person to attend at ran-
dom. One of the four vice presidents has not been invited to attend any of the last five
Lakers home games. What is the likelihood this could be due to chance?

69. A computer-supply retailer purchased a batch of 1,000 CD-R disks and attempted to
format them for a particular application. There were 857 perfect CDs, 112 CDs were
usable but had bad sectors, and the remainder could not be used at all.
a. What is the probability a randomly chosen CD is not perfect?
b. If the disk is not perfect, what is the probability it cannot be used at all?

70. An investor purchased 100 shares of Fifth Third Bank stock and 100 shares of Santee
Electric Cooperative stock. The probability the bank stock will appreciate over a year is
.70. The probability the electric utility will increase over the same period is .60. Assume
the two events are independent.
a. What is the probability both stocks appreciate during the period?
b. What is the probability the bank stock appreciates but the utility does not?
c. What is the probability at least one of the stocks appreciates?

71. Flashner Marketing Research Inc. specializes in providing assessments of the prospects
for women’s apparel shops in shopping malls. Al Flashner, president, reports that he
assesses the prospects as good, fair, or poor. Records from previous assessments show
that 60% of the time the prospects were rated as good, 30% of the time fair, and 10% of
the time poor. Of those rated good, 80% made a profit the first year; of those rated fair,
60% made a profit the first year; and of those rated poor, 20% made a profit the first
year. Connie’s Apparel was one of Flashner’s clients. Connie’s Apparel made a profit last
year. What is the probability that it was given an original rating of poor?

72. Two boxes of men’s Old Navy shirts were received from the factory. Box 1 contained
25 mesh polo shirts and 15 Super-T shirts. Box 2 contained 30 mesh polo shirts and
10 Super-T shirts. One of the boxes was selected at random, and a shirt was chosen at
random from that box to be inspected. The shirt was a mesh polo shirt. Given this infor-
mation, what is the probability that the mesh polo shirt came from Box 1?

73. With each purchase of a large pizza at Tony’s Pizza, the customer receives a coupon
that can be scratched to see if a prize will be awarded. The probability of winning a free
soft drink is 0.10, and the probability of winning a free large pizza is 0.02. You plan to
eat lunch tomorrow at Tony’s. What is the probability:
a. That you will win either a large pizza or a soft drink?
b. That you will not win a prize?
c. That you will not win a prize on three consecutive visits to Tony’s?
d. That you will win at least one prize on one of your next three visits to Tony’s?

74. For the daily lottery game in Illinois, participants select three numbers between 0 and 9.
A number cannot be selected more than once, so a winning ticket could be, say, 307
but not 337. Purchasing one ticket allows you to select one set of numbers. The winning
numbers are announced on TV each night.
a. How many different outcomes (three-digit numbers) are possible?
b. If you purchase a ticket for the game tonight, what is the likelihood you will win?
c. Suppose you purchase three tickets for tonight’s drawing and select a different num-

ber for each ticket. What is the probability that you will not win with any of the
tickets?

75. Several years ago, Wendy’s Hamburgers advertised that there are 256 different ways to
order your hamburger. You may choose to have, or omit, any combination of the follow-
ing on your hamburger: mustard, ketchup, onion, pickle, tomato, relish, mayonnaise, and
lettuce. Is the advertisement correct? Show how you arrive at your answer.

172 CHAPTER 5

76. Recent surveys indicate 60% of tourists to China visited the Forbidden City, the Temple
of Heaven, the Great Wall, and other historical sites in or near Beijing. Forty percent
visited Xi’an with its magnificent terra-cotta soldiers, horses, and chariots, which lay bur-
ied for over 2,000 years. Thirty percent of the tourists went to both Beijing and Xi’an.
What is the probability that a tourist visited at least one of these places?

77. A new chewing gum has been developed that is helpful to those who want to stop
smoking. If 60% of those people chewing the gum are successful in stopping smoking,
what is the probability that in a group of four smokers using the gum at least one quits
smoking?

78. Reynolds Construction Company has agreed not to erect all “look-alike” homes in a new
subdivision. Five exterior designs are offered to potential home buyers. The builder has
standardized three interior plans that can be incorporated in any of the five exteriors.
How many different ways can the exterior and interior plans be offered to potential
home buyers?

79. A new sports car model has defective brakes 15% of the time and a defective steering
mechanism 5% of the time. Let’s assume (and hope) that these problems occur inde-
pendently. If one or the other of these problems is present, the car is called a “lemon.” If
both of these problems are present, the car is a “hazard.” Your instructor purchased one
of these cars yesterday. What is the probability it is:
a. A lemon?
b. A hazard?

80. The state of Maryland has license plates with three numbers followed by three letters.
How many different license plates are possible?

81. There are four people being considered for the position of chief executive officer of
Dalton Enterprises. Three of the applicants are over 60 years of age. Two are female, of
which only one is over 60.
a. What is the probability that a candidate is over 60 and female?
b. Given that the candidate is male, what is the probability he is less than 60?
c. Given that the person is over 60, what is the probability the person is female?

82. Tim Bleckie is the owner of Bleckie Investment and Real Estate Company. The com-
pany recently purchased four tracts of land in Holly Farms Estates and six tracts in
Newburg Woods. The tracts are all equally desirable and sell for about the same
amount.
a. What is the probability that the next two tracts sold will be in Newburg Woods?
b. What is the probability that of the next four sold at least one will be in Holly Farms?
c. Are these events independent or dependent?

83. A computer password consists of four characters. The characters can be one of the 26
letters of the alphabet. Each character may be used more than once. How many differ-
ent passwords are possible?

84. A case of 24 cans contains 1 can that is contaminated. Three cans are to be chosen
randomly for testing.
a. How many different combinations of three cans could be selected?
b. What is the probability that the contaminated can is selected for testing?

85. A puzzle in the newspaper presents a matching problem. The names of 10 U.S. presi-
dents are listed in one column, and their vice presidents are listed in random order in
the second column. The puzzle asks the reader to match each president with his vice
president. If you make the matches randomly, how many matches are possible? What is
the probability all 10 of your matches are correct?

86. Two components, A and B, operate in series. Being in series means that for the system
to operate, both components A and B must work. Assume the two components are in-
dependent. What is the probability the system works under these conditions? The prob-
ability A works is .90 and the probability B functions is also .90.

87. Horwege Electronics Inc. purchases TV picture tubes from four different suppliers.
Tyson Wholesale supplies 20% of the tubes, Fuji Importers 30%, Kirkpatricks 25%, and
Parts Inc. 25%. Tyson Wholesale tends to have the best quality, as only 3% of its tubes
arrive defective. Fuji Importers’ tubes are 4% defective, Kirkpatricks’ 7%, and Parts Inc.’s
are 6.5% defective.
a. What is the overall percent defective?

A SURVEY OF PROBABILITY CONCEPTS 173

b. A defective picture tube was discovered in the latest shipment. What is the probabil-
ity that it came from Tyson Wholesale?

88. ABC Auto Insurance classifies drivers as good, medium, or poor risks. Drivers who apply
to them for insurance fall into these three groups in the proportions 30%, 50%, and 20%,
respectively. The probability a “good” driver will have an accident is .01, the probability
a “medium” risk driver will have an accident is .03, and the probability a “poor” driver
will have an accident is .10. The company sells Mr. Brophy an insurance policy and he
has an accident. What is the probability Mr. Brophy is:
a. A “good” driver?
b. A “medium” risk driver?
c. A “poor” driver?

89. You take a trip by air that involves three independent flights. If there is an 80% chance
each specific leg of the trip is on time, what is the probability all three flights arrive
on time?

90. The probability a D-Link network server is down is .05. If you have three independent
servers, what is the probability that at least one of them is operational?

91. Twenty-two percent of all light emitting diode (LED) displays are manufactured by Sam-
sung. What is the probability that in a collection of three independent LED HDTV pur-
chases, at least one is a Samsung?

D A T A A N A L Y T I C S

92. Refer to the North Valley Real Estate data, which report information on homes sold
during the last year.
a. Sort the data into a table that shows the number of homes that have a pool versus

the number that don’t have a pool in each of the five townships. If a home is selected
at random, compute the following probabilities.

1. The home has a pool.
2. The home is in Township 1 or has a pool.
3. Given that it is in Township 3, that it has a pool.
4. The home has a pool and is in Township 3.
b. Sort the data into a table that shows the number of homes that have a garage at-

tached versus those that don’t in each of the five townships. If a home is selected at
random, compute the following probabilities:

1. The home has a garage attached.
2. The home does not have a garage attached, given that it is in Township 5.
3. The home has a garage attached and is in Township 3.
4. The home does not have a garage attached or is in Township 2.

93. Refer to the Baseball 2016 data, which reports information on the 30 Major League
Baseball teams for the 2016 season. Set up three variables:
• Divide the teams into two groups, those that had a winning season and those that did

not. That is, create a variable to count the teams that won 81 games or more, and
those that won 80 or less.

• Create a new variable for attendance, using three categories: attendance less than
2.0 million, attendance of 2.0 million up to 3.0 million, and attendance of 3.0 million
or more.

• Create a variable that shows the teams that play in a stadium less than 20 years old
versus one that is 20 years old or more.

Answer the following questions.
a. Create a table that shows the number of teams with a winning season versus those

with a losing season by the three categories of attendance. If a team is selected at
random, compute the following probabilities:

1. The team had a winning season.
2. The team had a winning season or attendance of more than 3.0 million.
3. The team had a winning season given attendance was more than 3.0 million.
4. The team has a winning season and attracted fewer than 2.0 million fans.

174 CHAPTER 5

b. Create a table that shows the number of teams with a winning season versus those
that play in new or old stadiums. If a team is selected at random, compute the follow-
ing probabilities:

1. Selecting a team with a stadium that is at least 20 years old.
2. The likelihood of selecting a team with a winning record and playing in a new

stadium.
3. The team had a winning record or played in a new stadium.

94. Refer to the Lincolnville school bus data. Set up a variable that divides the age of
the buses into three groups: new (less than 5 years old), medium (5 but less than 10 years),
and old (10 or more years). The median maintenance cost is $4,179. Based on this
value, create a variable for those less than or equal to the median (low maintenance)
and those more than the median (high maintenance cost). Finally, develop a table to
show the relationship between maintenance cost and age of the bus.
a. What percentage of the buses are less than five years old?
b. What percentage of the buses less than five years old have low maintenance costs?
c. What percentage of the buses ten or more years old have high maintenance costs?
d. Does maintenance cost seem to be related to the age of the bus? Hint: Compare the

maintenance cost of the old buses with the cost of the new buses? Would you con-
clude maintenance cost is independent of the age?

Discrete Probability
Distributions 6

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO6-1 Identify the characteristics of a probability distribution.

LO6-2 Distinguish between discrete and continuous random variables.

LO6-3 Compute the mean, variance, and standard deviation of a discrete probability distribution.

LO6-4 Explain the assumptions of the binomial distribution and apply it to calculate probabilities.

LO6-5 Explain the assumptions of the hypergeometric distribution and apply it to calculate
probabilities.

LO6-6 Explain the assumptions of the Poisson distribution and apply it to calculate probabilities.

RECENT STATISTICS SUGGEST that 15% of those who visit a retail site on the Web
make a purchase. A retailer wished to verify this claim. To do so, she selected a sample
of 16 “hits” to her site and found that 4 had actually made a purchase. What is the
likelihood of exactly four purchases? How many purchases should she expect? What is
the likelihood that four or more “hits” result in a purchase? (See Exercise 49 and LO6-4.)

© JGI/Jamie Grill/Getty Images

176 CHAPTER 6

INTRODUCTION
Chapters 2 through 4 are devoted to descriptive statistics. We describe raw data by or-
ganizing the data into a frequency distribution and portraying the distribution in tables,
graphs, and charts. Also, we compute a measure of location—such as the arithmetic
mean, median, or mode—to locate a typical value near the center of the distribution.
The range and the standard deviation are used to describe the spread in the data.
These chapters focus on describing something that has already happened.

Starting with Chapter 5, the emphasis changes—we begin examining something
that could happen. We note that this facet of statistics is called statistical inference. The
objective is to make inferences (statements) about a population based on a number of
observations, called a sample, selected from the population. In Chapter 5, we state that
a probability is a value between 0 and 1 inclusive, and we examine how probabilities
can be combined using rules of addition and multiplication.

This chapter begins the study of probability distributions. A probability distribu-
tion is like a relative frequency distribution. However, instead of describing the past, it
is used to provide estimates of the likelihood of future events. Probability distributions
can be described by measures of location and dispersion so we show how to com-
pute a distribution’s mean, variance, and standard deviation. We also discuss three
frequently occurring discrete probability distributions: the binomial, hypergeometric,
and Poisson.

WHAT IS A PROBABILITY DISTRIBUTION?
A probability distribution defines or describes the likelihoods for a range of possible
future outcomes. For example, Spalding Golf Products, Inc. assembles golf clubs with
three components: a club head, a shaft, and a grip. From experience five percent of the
shafts received from their Asian supplier are defective. As part of Spalding’s statistical
process control they inspect twenty shafts from each arriving shipment. From experi-
ence, we know that the probability of a defective shaft is five percent. Therefore, in a
sample of twenty shafts, we would expect one shaft to be defective and the other nine-
teen shafts to be acceptable. But, by using a probability distribution we can completely
describe the range of possible outcomes. For example, we would know the probability
that none of the twenty shafts are defective, or that two, or three, or four, or continuing
up to twenty shafts in the sample are defective. Given the small probability of a defec-
tive shaft, the probability distribution would show that there is a very small probability of
four or more defective shafts.

LO6-1
Identify the characteristics
of a probability distribution.

PROBABILITY DISTRIBUTION A listing of all the outcomes of an experiment and
the probability associated with each outcome.

The important characteristics of a probability distribution are:

CHARACTERISTICS OF A PROBABILITY DISTRIBUTION

1. The probability of a particular outcome is between 0 and 1 inclusive.
2. The outcomes are mutually exclusive.
3. The list of outcomes is exhaustive. So the sum of the probabilities of the out-

comes is equal to 1.

How can we generate a probability distribution? The following example will explain.

DISCRETE PROBABILITY DISTRIBUTIONS 177

E X A M P L E

Suppose we are interested in the number of heads showing face up on three
tosses of a coin. This is the experiment. The possible results are zero heads, one
head, two heads, and three heads. What is the probability distribution for the num-
ber of heads?

S O L U T I O N

There are eight possible outcomes. A tail might appear face up on the first toss,
another tail on the second toss, and another tail on the third toss of the coin. Or
we might get a tail, tail, and head, in that order. We use the multiplication formula
for counting outcomes (5–8). There are (2)(2)(2) or 8 possible results. These
results are shown in the following table.

Possible Coin Toss Number of
Result First Second Third Heads

1 T T T 0
2 T T H 1
3 T H T 1
4 T H H 2
5 H T T 1
6 H T H 2
7 H H T 2
8 H H H 3

Note that the outcome “zero heads” occurred only once, “one head” occurred
three times, “two heads” occurred three times, and the outcome “three heads”
occurred only once. That is, “zero heads” happened one out of eight times.
Thus, the probability of zero heads is one-eighth, the probability of one head is
three-eighths, and so on. The probability distribution is shown in Table 6–1.
Because one of these outcomes must happen, the total of the probabilities of all
possible events is 1.000. This is always true. The same information is shown in
Chart 6–1.

TABLE 6–1 Probability Distribution for the Events of Zero, One, Two, and Three Heads Showing
Face Up on Three Tosses of a Coin

Number of Probability
Heads, of Outcome,
x P(x)

0
1
8

= .125

1
3
8

= .375

2
3
8

= .375

3
1
8

= .125

Total
8
8

= 1.000

178 CHAPTER 6

The possible outcomes of an experiment involving the roll of a six-sided die are a one-spot,
a two-spot, a three-spot, a four-spot, a five-spot, and a six-spot.
(a) Develop a probability distribution for the number of possible spots.
(b) Portray the probability distribution graphically.
(c) What is the sum of the probabilities?

S E L F - R E V I E W 6–1

RANDOM VARIABLES
In any experiment of chance, the outcomes occur randomly. So it is often called a ran-
dom variable. For example, rolling a single die is an experiment: Any one of six possible
outcomes can occur. Some experiments result in outcomes that are measured with
quantitative variables (such as dollars, weight, or number of children), and other experi-
mental outcomes are measured with qualitative variables (such as color or religious
preference). A few examples will further illustrate what is meant by a random variable.
• The number of employees absent from the day shift on Monday, the number might

be 0, 1, 2, 3, . . . The number absent is the random variable.
• The hourly wage of a sample of 50 plumbers in Jacksonville, FL. The hourly wage is

the random variable.
• The number of defective lightbulbs produced in an hour at the Cleveland Electric

Company, Inc.
• The grade level (Freshman, Sophomore, Junior, or Senior) of the members of the

St. James High School Varsity girls’ basketball team. The grade level is the random
variable and notice that it is a qualitative variable.

• The number of participants in the 2016 New York City Marathon.
• The daily number of drivers charged with driving under the influence of alcohol in

Brazoria County, Texas, last month.

A random variable is defined as follows:

LO6-2
Distinguish between
discrete and continuous
random variables.

RANDOM VARIABLE A variable measured or observed as the result of an
experiment. By chance, the variable can have different values.

Refer to the coin-tossing example in Table 6–1. We write the probability of x as P(x).
So the probability of zero heads is P(0 heads) = .125, and the probability of one head is
P(1 head) = .375, and so forth. The sum of these mutually exclusive probabilities is 1;
that is, from Table 6–1, .125 + .375 + .375 + .125 = 1.00.

Pr
ob

ab
ili

ty

Number of Heads

0 1 2 3
0

3
8

1
2

1
4

1
8

P (x )

CHART 6–1 Graphical Presentation of the Number of Heads Resulting from Three Tosses of a Coin
and the Corresponding Probability

DISCRETE PROBABILITY DISTRIBUTIONS 179

In Chapter 5 we defined the terms experiment, outcome, and event. Consider the
example we just described regarding the experiment of tossing a fair coin three times.
In this case the random variable is the number of heads that appear in the three tosses.
There are eight possible outcomes to this experiment. These outcomes are shown in
the following diagram.

TTT
TTH
THT
HTT

THH
HTH
HHT

HHH

Possible outcomes for three coin tosses

The event {one head} occurs and the random variable x 5 1.

So, one possible outcome is that a tail appears on each toss: TTT. This single out-
come would describe the event of zero heads appearing in three tosses. Another pos-
sible outcome is a head followed by two tails: HTT. If we wish to determine the event of
exactly one head appearing in the three tosses, we must consider the three possible
outcomes: TTH, THT, and HTT. These three outcomes describe the event of exactly one
head appearing in three tosses.

In this experiment, the random variable is the number of heads in three tosses. The
random variable can have four different values, 0, 1, 2, or 3. The outcomes of the exper-
iment are unknown. But, using probability, we can compute the probability of a single
head in three tosses as 3/8 or 0.375. As shown in Chapter 5, the probability of each
value of the random variable can be computed to create a probability distribution for the
random variable, number of heads in three tosses of a coin.

There are two types of random variables: discrete or continuous.

Discrete Random Variable
A discrete random variable can assume only a certain number of separated values. For
example, the Bank of the Carolinas counts the number of credit cards carried for a group
of customers. The data are summarized with the following relative frequency table.

Number of Credit Cards Relative Frequency

0 .03
1 .10
2 .18
3 .21
4 or more .48
Total 1.00

In this frequency table, the number of cards carried is the discrete random variable.

DISCRETE RANDOM VARIABLE A random variable that can assume only certain
clearly separated values.

A discrete random variable can, in some cases, assume fractional or decimal val-
ues. To be a discrete random variable, these values must be separated—that is, have
distance between them. As an example, a department store offers coupons with dis-
counts of 10%, 15%, and 25%. In terms of probability, we could compute the probability
that a customer would use a 10% coupon versus a 15% or 25% coupon.

180 CHAPTER 6

Continuous Random Variable
On the other hand, a continuous random variable can assume an infinite number of values
within a given range. It is measured on a continuous interval or ratio scale. Examples include:
• The times of commercial flights between Atlanta and Los Angeles are 4.67 hours,

5.13 hours, and so on. The random variable is the time in hours and is measured on
a continuous scale of time.

• The annual snowfall in Minneapolis, Minnesota. The random variable is the amount
of snow, measured on a continuous scale.

As with discrete random variables, the likelihood of a continuous random variable can
be summarized with a probability distribution. For example, with a probability distribution
for the flight time between Atlanta and Los Angeles, we could say that there is a probability
of 0.90 that the flight will be less than 4.5 hours. This also implies that there is a probability
of 0.10 that the flight will be more than 4.5 hours. With a probability of snowfall in Minneap-
olis, we could say that there is probability of 0.25 that the annual snowfall will exceed 48
inches. This also implies that there is a probability of 0.75 that annual snowfall will be less
than 48 inches. Notice that these examples refer to a continuous range of values.

THE MEAN, VARIANCE, AND STANDARD
DEVIATION OF A DISCRETE PROBABILITY
DISTRIBUTION
In Chapter 3, we discussed measures of location and variation for a frequency distribu-
tion. The mean reports the central location of the data, and the variance describes the
spread in the data. In a similar fashion, a probability distribution is summarized by its
mean and variance. We identify the mean of a probability distribution by the lowercase
Greek letter mu (μ) and the standard deviation by the lowercase Greek letter sigma (σ).

Mean
The mean is a typical value used to represent the central location of a probability distri-
bution. It also is the long-run average value of the random variable. The mean of a prob-
ability distribution is also referred to as its expected value. It is a weighted average
where the possible values of a random variable are weighted by their corresponding
probabilities of occurrence.

The mean of a discrete probability distribution is computed by the formula:

LO6-3
Compute the mean,
variance, and standard
deviation of a probability
distribution.

MEAN OF A PROBABILITY DISTRIBUTION μ = Σ[xP(x)] (6–1)

VARIANCE OF A PROBABILITY DISTRIBUTION σ2 = Σ[(x − μ)2P(x)] (6–2)

where P(x) is the probability of a particular value x. In other words, multiply each x value
by its probability of occurrence, and then add these products.

Variance and Standard Deviation
The mean is a typical value used to summarize a discrete probability distribution. How-
ever, it does not describe the amount of spread (variation) in a distribution. The variance
does this. The formula for the variance of a probability distribution is:

The computational steps are:

1. Subtract the mean from each value of the random variable, and square this difference.
2. Multiply each squared difference by its probability.
3. Sum the resulting products to arrive at the variance.

DISCRETE PROBABILITY DISTRIBUTIONS 181

The standard deviation, σ, is found by taking the positive square root of σ2; that is,
σ = √σ2

An example will help explain the details of the calculation and interpretation of the
mean and standard deviation of a probability distribution.

E X A M P L E

John Ragsdale sells new cars for Pelican Ford.
John usually sells the largest number of cars
on Saturday. He has developed the following
probability distribution for the number of cars
he expects to sell on a particular Saturday.

Number of Probability,
Cars Sold, x P(x)

0 .1
1 .2
2 .3
3 .3
4 .1

1.0

1. What type of distribution is this?
2. On a typical Saturday, how many cars

does John expect to sell?
3. What is the variance of the distribution?

S O L U T I O N

1. This is a discrete probability distribution for the random variable called “number of
cars sold.” Note that John expects to sell only within a certain range of cars; he
does not expect to sell 5 cars or 50 cars. Further, he cannot sell half a car. He can
sell only 0, 1, 2, 3, or 4 cars. Also, the outcomes are mutually exclusive—he cannot
sell a total of both 3 and 4 cars on the same Saturday. The sum of the possible
outcomes total 1. Hence, these circumstance qualify as a probability distribution.

2. The mean number of cars sold is computed by weighting the number of cars
sold by the probability of selling that number and adding or summing the prod-
ucts, using formula (6–1):

μ = Σ[xP(x)]
= 0(.1) + 1(.2) + 2(.3) + 3(.3) + 4(.1)
= 2.1

These calculations are summarized in the following table.

Number of
Cars Sold, Probability,
x P(x) x · P(x)

0 .1 0.0
1 .2 0.2
2 .3 0.6
3 .3 0.9
4 .1 0.4

1.0 μ = 2.1

How do we interpret a mean of 2.1? This value indicates that, over a large num-
ber of Saturdays, John Ragsdale expects to sell a mean of 2.1 cars a day. Of

© Thinkstock/JupiterImages RF

182 CHAPTER 6

course, it is not possible for him to sell exactly 2.1 cars on any particular Satur-
day. However, the expected value can be used to predict the arithmetic mean
number of cars sold on Saturdays in the long run. For example, if John works
50 Saturdays during a year, he can expect to sell (50) (2.1) or 105 cars just on
Saturdays. Thus, the mean is sometimes called the expected value.

3. The following table illustrates the steps to calculate the variance using formula
(6–2). The first two columns repeat the probability distribution. In column three,
the mean is subtracted from each value of the random variable. In column four,
the differences from column three are squared. In the fifth column, each
squared difference in column four is multiplied by the corresponding probabil-
ity. The variance is the sum of the values in column five.

Number of
Cars Sold, Probability,
x P(x) (x − μ) (x − μ)2 (x − μ)2P(x)
0 .1 0 − 2.1 4.41 0.441
1 .2 1 − 2.1 1.21 0.242
2 .3 2 − 2.1 0.01 0.003
3 .3 3 − 2.1 0.81 0.243
4 .1 4 − 2.1 3.61 0.361
σ2 = 1.290

Recall that the standard deviation, σ, is the positive square root of the variance. In
this example, √σ2 = √1.290 = 1.136 cars. How do we apply a standard deviation of
1.136 cars? If salesperson Rita Kirsch also sold a mean of 2.1 cars on Saturdays,
and the standard deviation in her sales was 1.91 cars, we would conclude that there
is more variability in the Saturday sales of Ms. Kirsch than in those of Mr. Ragsdale
(because 1.91 > 1.136).

The Pizza Palace offers three sizes of cola. The smallest size sells for $1.99, the medium for
$2.49, and the large for $2.89. Thirty percent of the drinks sold are small, 50% are medium,
and 20% are large. Create a probability distribution for the random variable price and an-
swer the following questions.
(a) Is this a discrete probability distribution? Indicate why or why not.
(b) Compute the mean amount charged for a cola.
(c) What is the variance in the amount charged for a cola? The standard deviation?

S E L F - R E V I E W 6–2

1. Compute the mean and variance of the following discrete probability distribution.

x P(x)

0 .2
1 .4
2 .3
3 .1

2. Compute the mean and variance of the following discrete probability distribution.

x P(x)

2 .5
8 .3
10 .2

E X E R C I S E S

DISCRETE PROBABILITY DISTRIBUTIONS 183

3. Compute the mean and variance of the following probability distribution.

x P(x)

5 .1
10 .3
15 .2
20 .4

4. Which of these variables are discrete and which are continuous random variables?
a. The number of new accounts established by a salesperson in a year.
b. The time between customer arrivals to a bank ATM.
c. The number of customers in Big Nick’s barber shop.
d. The amount of fuel in your car’s gas tank.
e. The number of minorities on a jury.
f. The outside temperature today.

5. The information below is the number of daily emergency service calls made
by the volunteer ambulance service of Walterboro, South Carolina, for the last 50
days. To explain, there were 22 days on which there were two emergency calls,
and 9 days on which there were three emergency calls.

Number of Calls Frequency

0 8
1 10
2 22
3 9
4 1

Total 50

a. Convert this information on the number of calls to a probability distribution.
b. Is this an example of a discrete or continuous probability distribution?
c. What is the mean number of emergency calls per day?
d. What is the standard deviation of the number of calls made daily?

6. The director of admissions at Kinzua University in Nova Scotia estimated the
distribution of student admissions for the fall semester on the basis of past experi-
ence. What is the expected number of admissions for the fall semester? Compute
the variance and the standard deviation of the number of admissions.

Admissions Probability

1,000 .6
1,200 .3
1,500 .1

7. Belk Department Store is having a special sale this weekend. Customers
charging purchases of more than $50 to their Belk credit card will be given a spe-
cial Belk Lottery card. The customer will scratch off the card, which will indicate the
amount to be taken off the total amount of the purchase. Listed below are the
amount of the prize and the percent of the time that amount will be deducted from
the total amount of the purchase.

Prize Amount Probability

$ 10 .50
25 .40
50 .08
100 .02

184 CHAPTER 6

a. What is the mean amount deducted from the total purchase amount?
b. What is the standard deviation of the amount deducted from the total

purchase?
8. The Downtown Parking Authority of Tampa, Florida, reported the following

information for a sample of 250 customers on the number of hours cars are parked
and the amount they are charged.

Number of Hours Frequency Amount Charged

1 20 $ 3
2 38 6
3 53 9
4 45 12
5 40 14
6 13 16
7 5 18
8 36 20

250

a. Convert the information on the number of hours parked to a probability distribu-
tion. Is this a discrete or a continuous probability distribution?

b. Find the mean and the standard deviation of the number of hours parked. How
would you answer the question: How long is a typical customer parked?

c. Find the mean and the standard deviation of the amount charged.

BINOMIAL PROBABILITY DISTRIBUTION
The binomial probability distribution is a widely occurring discrete probability distribu-
tion. To describe experimental outcomes with a binomial distribution, there are four re-
quirements. The first requirement is there are only two possible outcomes on a particular
experimental trial. For example, on a test, a true/false question is either answered cor-
rectly or incorrectly. In a resort, a housekeeping supervisor reviews an employee’s work
and evaluates it as acceptable or unacceptable. A key characteristic of the two out-
comes is that they must be mutually exclusive. This means that the answer to a true/
false question must be either correct or incorrect but cannot be both correct and incor-
rect at the same time. Another example is the outcome of a sales call. Either a customer
purchases or does not purchase the product, but the sale cannot result in both out-
comes. Frequently, we refer to the two possible outcomes of a binomial experiment as
a “success” and a “failure.” However, this distinction does not imply that one outcome is
good and the other is bad, only that there are two mutually exclusive outcomes.

The second binomial requirement is that the random variable is the number of suc-
cesses for a fixed and known number of trials. For example, we flip a coin five times and
count the number of times a head appears in the five flips, we randomly select 10 em-
ployees and count the number who are older than 50 years of age, or we randomly se-
lect 20 boxes of Kellogg’s Raisin Bran and count the number that weigh more than the
amount indicated on the package. In each example, we count the number of successes
from the fixed number of trials.

A third requirement is that we know the probability of a success and it is the same
for each trial. Three examples are:

• For a test with 10 true/false questions, we know there are 10 trials and the proba-
bility of correctly guessing the answer for any of the 10 trials is 0.5. Or, for a test
with 20 multiple-choice questions with four options and only one correct answer,
we know that there are 20 trials and the probability of randomly guessing the
correct answer for each of the 20 trials is 0.25.

LO6-4
Explain the assumptions of
the binomial distribution
and apply it to calculate
probabilities.

DISCRETE PROBABILITY DISTRIBUTIONS 185

• Bones Albaugh is a Division I college
basketball player who makes 70% of
his foul shots. If he has five opportuni-
ties in tonight’s game, the likelihood he
will be successful on each of the five
attempts is 0.70.

• In a recent poll, 18% of adults indicated
a Snickers bar was their favorite candy
bar. We select a sample of 15 adults
and ask each for his or her favorite
candy bar. The likelihood a Snickers
bar is the answer for each adult is 0.18.

The final requirement of a binomial probability distribution is that each trial is indepen-
dent of any other trial. Independent means there is no pattern to the trials. The outcome of
a particular trial does not affect the outcome of any other trial. Two examples are:

• A young family has two children, both boys. The probability of a third birth being a
boy is still .50. That is, the gender of the third child is independent of the gender of
the other two.

• Suppose 20% of the patients served in the emergency room at Waccamaw Hospital
do not have insurance. If the second patient served on the afternoon shift today did
not have insurance, that does not affect the probability the third, the tenth, or any of
the other patients will or will not have insurance.

BINOMIAL PROBABILITY EXPERIMENT

1. An outcome on each trial of an experiment is classified into one of two mutually
exclusive categories—a success or a failure.

2. The random variable is the number of successes in a fixed number of trials.
3. The probability of success is the same for each trial.
4. The trials are independent, meaning that the outcome of one trial does not affect

the outcome of any other trial.

How Is a Binomial Probability Computed?
To construct a particular binomial probability, we use (1) the number of trials and (2) the
probability of success on each trial. For example, if the Hannah Landscaping Company
plants 10 Norfolk pine trees today knowing that 90% of these trees survive, we can
compute the binomial probability that exactly 8 trees survive. In this case the number of
trials is the 10 trees, the probability of success is .90, and the number of successes is
eight. In fact, we can compute a binomial probability for any number of successes from
0 to 10 surviving trees.

A binomial probability is computed by the formula:

BINOMIAL PROBABILITY FORMULA P(x) = nCxπx(1 − π)n−x (6–3)

where:
C denotes a combination.
n is the number of trials.
x is the random variable defined as the number of successes.
π is the probability of a success on each trial.

We use the Greek letter π (pi) to denote a binomial population parameter. Do not con-
fuse it with the mathematical constant 3.1416.

© David Madison/Digital Vision/Getty Images

186 CHAPTER 6

E X A M P L E

There are five flights daily from Pittsburgh via American Airlines into the Bradford
Regional Airport in Bradford, Pennsylvania. Suppose the probability that any flight
arrives late is .20. What is the probability that none of the flights are late today?
What is the probability that exactly one of the flights is late today?

S O L U T I O N

We can use formula (6–3). The probability that a particular flight is late is .20, so let
π = .20. There are five flights, so n = 5, and X, the random variable, refers to the
number of successes. In this case, a “success” is a flight that arrives late. The ran-
dom variable, x, can be equal to 0 late flights in the five trials, 1 late flight in the five
trials, or 2, 3, 4, or 5. The probability for no late arrivals, x = 0, is,

P(0) = nCx(π)x(1 − π)n−x

= 5C0(.20)0(1 − .20)5−0 = (1)(1)(.3277) = .3277

The probability that exactly one of the five flights will arrive late today is .4096,
found by

P(1) = nCx(π)x(1 − π)n−x

= 5C1(.20)1(1 − .20)5−1 = (5)(.20)(.4096) = .4096

The entire binomial probability distribution with π = .20 and n = 5 is shown in the
following bar chart. We observe that the probability of exactly three late flights is
.0512 and from the bar chart that the distribution of the number of late arrivals is
positively skewed.

Pr
ob

ab
ili

ty

.30

.35

.40

.45

.25

.20

.15

.10

.05

.00
0

0.3277

1

0.4096

2

0.2048

3

0.0512

4

0.0064

5

0.0003

Probability Distribution for the Number of Late Flights

Number of Late Flights

DISCRETE PROBABILITY DISTRIBUTIONS 187

The mean (μ) and the variance (σ2) of a binomial distribution are computed in a
“shortcut” fashion by:

MEAN OF A BINOMIAL DISTRIBUTION μ = nπ (6–4)

VARIANCE OF A BINOMIAL DISTRIBUTION σ2 = nπ(1 − π) (6–5)

For the example regarding the number of late flights, recall that π = .20 and n = 5.
Hence:
μ = nπ = (5)(.20) = 1.0

σ2 = nπ(1 − π) = 5(.20)(1 − .20) = .7997

The mean of 1.0 and the variance of .7997 can be verified from formulas (6–1) and
(6–2). The probability distribution from the bar chart shown earlier and the details of the
calculations are shown below.

Number of Late
Flights,
x P(x) xP(x) x − μ (x − μ)2 (x − μ)2P(x)
0 0.3277 0.0000 −1 1 0.3277
1 0.4096 0.4096 0 0 0
2 0.2048 0.4096 1 1 0.2048
3 0.0512 0.1536 2 4 0.2048
4 0.0064 0.0256 3 9 0.0576
5 0.0003 0.0015 4 16 0.0048
μ = 1.0000 σ2 = 0.7997

Binomial Probability Tables
Formula (6–3) can be used to build a binomial probability distribution for any value of
n and π. However, for a larger n, the calculations take more time. For convenience,
the tables in Appendix B.1 show the result of using the formula for various values of
n and π. Table 6–2 shows part of Appendix B.1 for n = 6 and various values of π.

TABLE 6–2 Binomial Probabilities for n = 6 and Selected Values of π

n = 6
Probability

x\π .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .95
0 .735 .531 .262 .118 .047 .016 .004 .001 .000 .000 .000
1 .232 .354 .393 .303 .187 .094 .037 .010 .002 .000 .000
2 .031 .098 .246 .324 .311 .234 .138 .060 .015 .001 .000
3 .002 .015 .082 .185 .276 .313 .276 .185 .082 .015 .002
4 .000 .001 .015 .060 .138 .234 .311 .324 .246 .098 .031
5 .000 .000 .002 .010 .037 .094 .187 .303 .393 .354 .232
6 .000 .000 .000 .001 .004 .016 .047 .118 .262 .531 .735

E X A M P L E

In the Southwest, 5% of all cell phone calls are dropped. What is the probability that
out of six randomly selected calls, none was dropped? Exactly one? Exactly two?
Exactly three? Exactly four? Exactly five? Exactly six out of six?

188 CHAPTER 6

S O L U T I O N

The binomial conditions are met: (a) there are only two possible outcomes (a
particular call is either dropped or not dropped), (b) there are a fixed number of
trials (6), (c) there is a constant probability of success (.05), and (d) the trials are
independent.

Refer to Table 6–2 on the previous page for the probability of exactly zero
dropped calls. Go down the left margin to an x of 0. Now move horizontally to
the column headed by a π of .05 to find the probability. It is .735. The values in
Table 6–2 are rounded to three decimal places.

The probability of exactly one dropped call in a sample of six calls is .232.
The complete binomial probability distribution for n = 6 and π = .05 is:

Number of Number of
Dropped Probability of Dropped Probability of
Calls, Occurrence, Calls, Occurrence,
x P(x) x P(x)

0 .735 4 .000
1 .232 5 .000
2 .031 6 .000
3 .002

Of course, there is a slight chance of getting exactly five dropped calls out of
six random selections. It is .00000178, found by inserting the appropriate values
in the binomial formula:

P(5) = 6C5(.50)5(.95)1 = (6)(.05)5(.95) = .00000178

For six out of the six, the exact probability is .000000016. Thus, the probability is
very small that five or six calls will be dropped in six trials.

We can compute the mean or expected value of the distribution of the num-
ber defective:

μ = nπ = (6)(.05) = 0.30

σ2 = nπ(1 − π) = 6(.05)(.95) = 0.285

Ninety-five percent of the employees at the J. M. Smucker Company plant on Laskey
Road have their bimonthly wages sent directly to their bank by electronic funds trans-
fer. This is also called direct deposit. Suppose we select a random sample of seven
employees.
(a) Does this situation fit the assumptions of the binomial distribution?
(b) What is the probability that all seven employees use direct deposit?
(c) Use formula (6–3) to determine the exact probability that four of the seven sampled

employees use direct deposit.
(d) Use Excel to verify your answers to parts (b) and (c).

S E L F - R E V I E W 6–3

Appendix B.1 is limited. It gives probabilities for n values from 1 to 15 and π values
of .05, .10, . . . , .90, and .95. A software program can generate the probabilities for a
specified number of successes, given n and π. The Excel output on the next page shows
the probability when n = 40 and π = .09. Note that the number of successes stops at 15
because the probabilities for 16 to 40 are very close to 0. The instructions are detailed
in the Software Commands in Appendix C.

DISCRETE PROBABILITY DISTRIBUTIONS 189

Several additional points should be made regarding the binomial probability
distribution.

1. If n remains the same but π increases from .05 to .95, the shape of the distribution
changes. Look at Table 6–3 and Chart 6–2. The distribution for a π of .05 is posi-
tively skewed. As π approaches .50, the distribution becomes symmetrical. As π
goes beyond .50 and moves toward .95, the probability distribution becomes
negatively skewed. Table 6–3 highlights probabilities for n = 10 and a π of .05,
.10, .20, .50, and .70. The graphs of these probability distributions are shown in
Chart 6–2.

2. If π, the probability of success, remains the same but n becomes larger, the shape
of the binomial distribution becomes more symmetrical. Chart 6–3 shows a situa-
tion where π remains constant at .10 but n increases from 7 to 40.

P (x )

.60

.50

.40

.30

.20

.10

.00

p = .05
n = 10

p = .10
n = 10

p = .20
n = 10

p = .50
n = 10

p = .70
n = 10

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

x
Successes

x
Successes

x
Successes

x
Successes

x
Successes

CHART 6–2 Graphing the Binomial Probability Distribution for a π of .05, .10, .20, .50, and .70, and
an n of 10

TABLE 6–3 Probability of 0, 1, 2, . . . Successes for a π of .05, .10, .20, .50, and .70, and an n of 10

x \π .05 .1 .2 .3 .4 .5 .6 .7 .8 .9 .95
0 .599 .349 .107 .028 .006 .001 .000 .000 .000 .000 .000
1 .315 .387 .268 .121 .040 .010 .002 .000 .000 .000 .000
2 .075 .194 .302 .233 .121 .044 .011 .001 .000 .000 .000
3 .010 .057 .201 .267 .215 .117 .042 .009 .001 .000 .000
4 .001 .011 .088 .200 .251 .205 .111 .037 .006 .000 .000
5 .000 .001 .026 .103 .201 .246 .201 .103 .026 .001 .000
6 .000 .000 .006 .037 .111 .205 .251 .200 .088 .011 .001
7 .000 .000 .001 .009 .042 .117 .215 .267 .201 .057 .010
8 .000 .000 .000 .001 .011 .044 .121 .233 .302 .194 .075
9 .000 .000 .000 .000 .002 .010 .040 .121 .268 .387 .315
10 .000 .000 .000 .000 .000 .001 .006 .028 .107 .349 .599

190 CHAPTER 6

11

P (x )

.50

.40

.30

.20

.10

.00

n = 7 n = 12 n = 20 n = 40

Number of Successes
(x )

0 1 2 3 4 5 6 7 8 9100 1 2 3 4 5 6 7 80 1 2 3 4 5 6 70 1 2 3 4

CHART 6–3 Chart Representing the Binomial Probability Distribution for a π of .10 and an n of 7, 12,
20, and 40

9. In a binomial situation, n = 4 and π = .25. Determine the probabilities of the follow-
ing events using the binomial formula.

a. x = 2
b. x = 3

10. In a binomial situation, n = 5 and π = .40. Determine the probabilities of the follow-
ing events using the binomial formula.

a. x = 1
b. x = 2

11. Assume a binomial distribution where n = 3 and π = .60.
a. Refer to Appendix B.1, and list the probabilities for values of x from 0 to 3.
b. Determine the mean and standard deviation of the distribution from the general

definitions given in formulas (6–1) and (6–2).
12. Assume a binomial distribution where n = 5 and π = .30.

a. Refer to Appendix B.1 and list the probabilities for values of x from 0 to 5.
b. Determine the mean and standard deviation of the distribution from the general

definitions given in formulas (6–1) and (6–2).
13. An American Society of Investors survey found 30% of individual investors have

used a discount broker. In a random sample of nine individuals, what is the
probability:

a. Exactly two of the sampled individuals have used a discount broker?
b. Exactly four of them have used a discount broker?
c. None of them has used a discount broker?

14. The U.S. Postal Service reports 95% of first-class mail within the same city is
delivered within 2 days of the time of mailing. Six letters are randomly sent to differ-
ent locations.

a. What is the probability that all six arrive within 2 days?
b. What is the probability that exactly five arrive within 2 days?
c. Find the mean number of letters that will arrive within 2 days.
d. Compute the variance and standard deviation of the number that will arrive

within 2 days.

E X E R C I S E S

DISCRETE PROBABILITY DISTRIBUTIONS 191

Cumulative Binomial Probability Distributions
We may wish to know the probability of correctly guessing the answers to 6 or more
true/false questions out of 10. Or we may be interested in the probability of selecting
less than two defectives at random from production during the previous hour. In these
cases, we need cumulative frequency distributions similar to the ones developed in
the Chapter 2, Cumulative Distribution section on page 38. The following example
will illustrate.

E X A M P L E

A study by the Illinois Department of Transportation concluded that 76.2% of front
seat occupants used seat belts. That is, both occupants of the front seat were using
their seat belts. Suppose we decide to compare that information with current us-
age. We select a sample of 12 vehicles.

1. What is the probability the front seat occupants in exactly 7 of the 12 vehicles
selected are wearing seat belts?

2. What is the probability the front seat occupants in at least 7 of the 12 vehicles
are wearing seat belts?

S O L U T I O N

This situation meets the binomial requirements.

15. Industry standards suggest that 10% of new vehicles require warranty ser-
vice within the first year. Jones Nissan in Sumter, South Carolina, sold 12 Nissans
yesterday.

a. What is the probability that none of these vehicles requires warranty service?
b. What is the probability exactly one of these vehicles requires warranty service?
c. Determine the probability that exactly two of these vehicles require warranty

service.
d. Compute the mean and standard deviation of this probability distribution.

16. A telemarketer makes six phone calls per hour and is able to make a sale on
30% of these contacts. During the next 2 hours, find:

a. The probability of making exactly four sales.
b. The probability of making no sales.
c. The probability of making exactly two sales.
d. The mean number of sales in the 2-hour period.

17. A recent survey by the American Accounting Association revealed 23% of
students graduating with a major in accounting select public accounting. Suppose
we select a sample of 15 recent graduates.

a. What is the probability two select public accounting?
b. What is the probability five select public accounting?
c. How many graduates would you expect to select public accounting?

18. It is reported that 41% of American households use a cell phone exclusively
for their telephone service. In a sample of eight households,

a. Find the probability that no household uses a cell phone as their exclusive tele-
phone service.

b. Find the probability that exactly 5 households exclusively use a cell phone for
telephone service.

c. Find the mean number of households exclusively using cell phones.

192 CHAPTER 6

• In a particular vehicle, both the front seat occupants are either wearing seat
belts or they are not. There are only two possible outcomes.

• There are a fixed number of trials, 12 in this case, because 12 vehicles are
checked.

• The probability of a “success” (occupants wearing seat belts) is the same from
one vehicle to the next: 76.2%.

• The trials are independent. If the fourth vehicle selected in the sample has all
the occupants wearing their seat belts, this does not have any effect on the
results for the fifth or tenth vehicle.

To find the likelihood the occupants of exactly 7 of the sampled vehicles are wear-
ing seat belts, we use formula (6–3). In this case, n = 12 and π = .762.

P(x = 7) = 12C7(.762)7(1 − .762)12−7 = 792(.149171)(.000764) = .0902

So we conclude the likelihood that the occupants of exactly 7 of the 12 sampled
vehicles will be wearing their seat belts is about 9%.

To find the probability that the occupants in seven or more of the vehicles
will be wearing seat belts, we use formula (6–3) from this chapter as well as
the special rule of addition from the previous chapter. See formula (5-2) on
page 141.

Because the events are mutually exclusive (meaning that a particular
sample of 12 vehicles cannot have both a total of 7 and a total of 8 vehi-
cles where the occupants are wearing seat belts), we find the probability of
7 vehicles where the occupants are wearing seat belts, the probability of 8,
and so on up to the probability that occupants of all 12 sample vehicles
are wearing seat belts. The probability of each of these outcomes is then
totaled.

P(x ≥ 7) = P(x = 7) + P(x = 8) + P(x = 9) + P(x = 10) + P(x = 11) + P(x = 12)
= .0902 + .1805 + .2569 + .2467 + .1436 + .0383
= .9562

So the probability of selecting 12 cars and finding that the occupants of 7 or more
vehicles were wearing seat belts is .9562. This information is shown on the
following Excel spreadsheet. There is a slight difference in the software answer
due to rounding. The Excel commands are similar to those detailed in the Software
Commands in Appendix C.

DISCRETE PROBABILITY DISTRIBUTIONS 193

HYPERGEOMETRIC PROBABILITY
DISTRIBUTION
For the binomial distribution to be applied, the probability of a success must stay the
same for each trial. For example, the probability of guessing the correct answer to a
true/false question is .50. This probability remains the same for each question on an
examination. Likewise, suppose that 40% of the registered voters in a precinct are

LO6-5
Explain the assumptions
of the hypergeometric
distribution and apply it to
calculate probabilities.

A recent study revealed that 40% of women in the San Diego metropolitan area who work
full time also volunteer in the community. Suppose we randomly select eight women in the
San Diego area.
(a) What are the values for n and π?
(b) What is the probability exactly three of the women volunteer in the community?
(c) What is the probability at least one of the women volunteers in the community?

S E L F - R E V I E W 6–4

19. In a binomial distribution, n = 8 and π = .30. Find the probabilities of the following
events.

a. x = 2.
b. x ≤ 2 (the probability that x is equal to or less than 2).
c. x ≥ 3 (the probability that x is equal to or greater than 3).

20. In a binomial distribution, n = 12 and π = .60. Find the following probabilities.
a. x = 5.
b. x ≤ 5.
c. x ≥ 6.

21. In a recent study, 90% of the homes in the United States were found to have
large-screen TVs. In a sample of nine homes, what is the probability that:

a. All nine have large-screen TVs?
b. Less than five have large-screen TVs?
c. More than five have large-screen TVs?
d. At least seven homes have large-screen TVs?

22. A manufacturer of window frames knows from long experience that 5% of the
production will have some type of minor defect that will require an adjustment.
What is the probability that in a sample of 20 window frames:

a. None will need adjustment?
b. At least one will need adjustment?
c. More than two will need adjustment?

23. The speed with which utility companies can resolve problems is very import-
ant. GTC, the Georgetown Telephone Company, reports it can resolve customer
problems the same day they are reported in 70% of the cases. Suppose the
15 cases reported today are representative of all complaints.

a. How many of the problems would you expect to be resolved today? What is the
standard deviation?

b. What is the probability 10 of the problems can be resolved today?
c. What is the probability 10 or 11 of the problems can be resolved today?
d. What is the probability more than 10 of the problems can be resolved today?

24. It is asserted that 80% of the cars approaching an individual toll booth in New
Jersey are equipped with an E-ZPass transponder. Find the probability that in a
sample of six cars:

a. All six will have the transponder.
b. At least three will have the transponder.
c. None will have a transponder.

E X E R C I S E S

194 CHAPTER 6

Republicans. If 27 registered voters are selected at random, the probability of choosing
a Republican on the first selection is .40. The chance of choosing a Republican on the
next selection is also .40, assuming that the sampling is done with replacement, mean-
ing that the person selected is put back in the population before the next person is
selected.

Most sampling, however, is done without replacement. Thus, if the population is
small, the probability of a success will change for each observation. For example, if
the population consists of 20 items, the probability of selecting a particular item
from that population is 1/20. If the sampling is done without replacement, after
the first selection there are only 19 items remaining; the probability of selecting a
particular item on the second selection is only 1/19. For the third selection, the
probability is 1/18, and so on. This assumes that the population is finite—that is, the
number in the population is known and relatively small in number. Examples of a
finite population are 2,842 Republicans in the precinct, 9,241 applications for med-
ical school, and the eighteen Dakota 4x4 Crew Cabs at Helfman Dodge Chrysler
Jeep in Houston, Texas.

Recall that one of the criteria for the binomial distribution is that the probabil-
ity of success remains the same from trial to trial. Because the probability of
success does not remain the same from trial to trial when sampling is from a
relatively small population without replacement, the binomial distribution should not
be used. Instead, the hypergeometric distribution is applied. Therefore, (1) if a
sample is selected from a finite population without replacement and (2) if the size of
the sample n is more than 5% of the size of the population N, then the hypergeo-
metric distribution is used to determine the probability of a specified number of
successes or failures. It is especially appropriate when the size of the population
is small.

The formula for the hypergeometric distribution is:

HYPERGEOMETRIC DISTRIBUTION P(x) =
(SCx) (N−SCn−x)

NCn
[6–6]

where:

N is the size of the population.
S is the number of successes in the population.
x is the number of successes in the sample. It may be 0, 1, 2, 3, . . . .
n is the size of the sample or the number of trials.
C is the symbol for a combination.

In summary, a hypergeometric probability distribution has these characteristics:

HYPERGEOMETRIC PROBABILITY EXPERIMENT

1. An outcome on each trial of an experiment is classified into one of two mutually
exclusive categories—a success or a failure.

2. The random variable is the number of successes in a fixed number of trials.
3. The trials are not independent.
4. We assume that we sample from a finite population without replacement and

n/N > 0.05. So, the probability of a success changes for each trial.

The following example illustrates the details of determining a probability using the
hypergeometric distribution.

DISCRETE PROBABILITY DISTRIBUTIONS 195

E X A M P L E

PlayTime Toys Inc. employs 50 people in the
Assembly Department. Forty of the employ-
ees belong to a union and 10 do not. Five
employees are selected at random to form a
committee to meet with management re-
garding shift starting times. What is the
probability that four of the five selected for
the committee belong to a union?

S O L U T I O N

The population in this case is the 50 Assembly Department employees. An employee
can be selected for the committee only once. Hence, the sampling is done without
replacement. Thus, the probability of selecting a union employee, for example,
changes from one trial to the next. The hypergeometric distribution is appropriate for
determining the probability. In this problem,

N is 50, the number of employees.
S is 40, the number of union employees.
x is 4, the number of union employees selected.
n is 5, the number of employees selected.

We wish to find the probability 4 of the 5 committee members belong to a
union. Inserting these values into formula (6–6):

P(4) =
(40C4) (50−40C5−4)

50C5
=

(91,390) (10)
2,118,760

= .431

Thus, the probability of selecting 5 assembly workers at random from the 50 work-
ers and finding 4 of the 5 are union members is .431.

© Howard Berman/Getty Images

Table 6–4 shows the hypergeometric probabilities of finding 0, 1, 2, 3, 4, and 5
union members on the committee.

Union Members Probability

0 .000
1 .004
2 .044
3 .210
4 .431
5 .311
1.000

TABLE 6–4 Hypergeometric Probabilities (n = 5, N = 50, and S = 40) for the Number of Union Members
on the Committee

Table 6–5 shows a comparison of the results using the binomial distribution and
the hypergeometric distribution. Because 40 of the 50 Assembly Department
employees belong to the union, we let π = .80 for the binomial distribution. The bi-
nomial probabilities for Table 6–5 come from the binomial distribution with n = 5
and π = .80.

196 CHAPTER 6

As Table 6–5 shows, when the binomial requirement of a constant probability of
success cannot be met, the hypergeometric distribution should be used. There are
clear differences between the probabilities.

However, under certain conditions the results of the binomial distribution can be
used to approximate the hypergeometric. This leads to a rule of thumb: if selected items
are not returned to the population, the binomial distribution can be used to closely ap-
proximate the hypergeometric distribution when n < .05N. In other words, the binomial
will closely approximate the hypergeometric distribution if the sample is less than 5% of
the population. For example, if the population, N, is 150, the number of successes in the
population, S, is 120, and the sample size, n, is five, then the rule of thumb is true. That
is, 5 < 0.05(150), or 5 < 7.5. The sample size is less than 5% of the population. In the
following table, hypergeometric and binomial probability distributions are compared for
this situation. The probabilities are very close.

Number of Union Hypergeometric Binomial Probability
Members on Committee Probability, P(x) (n = 5 and π = .80)
0 .000 .000
1 .004 .006
2 .044 .051
3 .210 .205
4 .431 .410
5 .311 .328
1.000 1.000

TABLE 6–5 Hypergeometric and Binomial Probabilities for PlayTime Toys Inc. Assembly Department

Hypergeometric Binomial Probability (n = 5
x Probability, P(x) and π = .80 = (120/150)
0 .000 .000
1 .006 .006
2 .049 .051
3 .206 .205
4 .417 .410
5 .322 .328
1.000 1.000

TABLE 6–6 A Comparison of Hypergeometric and Binomial Probabilities When the Sample Size is Less
than 0.05(n)

Horwege Discount Brokers plans to hire five new financial analysts this year. There is a pool
of 12 approved applicants, and George Horwege, the owner, decides to randomly select
those who will be hired. There are eight men and four women among the approved appli-
cants. What is the probability that three of the five hired are men?

S E L F - R E V I E W 6–5

A hypergeometric distribution can be created using Excel. See the output for Table 6–5
on the left. The necessary steps are given in Appendix C in the back of the text.

DISCRETE PROBABILITY DISTRIBUTIONS 197

POISSON PROBABILITY DISTRIBUTION
The Poisson probability distribution describes the number of times some event occurs
during a specified interval. Examples of an interval may be time, distance, area, or volume.

The distribution is based on two assumptions. The first assumption is that the prob-
ability is proportional to the length of the interval. The second assumption is that the
intervals are independent. To put it another way, the longer the interval, the larger the
probability, and the number of occurrences in one interval does not affect the other in-
tervals. This distribution is a limiting form of the binomial distribution when the probabil-
ity of a success is very small and n is large. It is often referred to as the “law of improbable
events,” meaning that the probability, π, of a particular event’s happening is quite small.
The Poisson distribution is a discrete probability distribution because it is formed by
counting.

LO6-6
Explain the assumptions of
the Poisson distribution
and apply it to calculate
probabilities.

E X E R C I S E S
25. A CD contains 10 songs; 6 are classical and 4 are rock and roll. In a sample of three

songs, what is the probability that exactly two are classical? Assume the samples
are drawn without replacement.

26. A population consists of 15 items, 10 of which are acceptable. In a sample of four
items, what is the probability that exactly three are acceptable? Assume the sam-
ples are drawn without replacement.

27. The Riverton Branch of the National Bank of Wyoming has 10 real estate loans
over $1,000,000. Of these 10 loans, 3 are “underwater.” A loan is underwater if
the amount of the loan is greater than the value of the property. The chief loan
officer decided to randomly select two of these loans to determine if they met all
banking standards. What is the probability that neither of the selected loans is
underwater?

28. The Computer Systems Department has eight faculty, six of whom are tenured.
Dr. Vonder, the chairman, wants to establish a committee of three department
faculty members to review the curriculum. If she selects the committee at
random:

a. What is the probability all members of the committee are tenured?
b. What is the probability that at least one member is not tenured? (Hint: For this

question, use the complement rule.)
29. Keith’s Florists has 15 delivery trucks, used mainly to deliver flowers and flower ar-

rangements in the Greenville, South Carolina, area. Of these 15 trucks, 6 have
brake problems. A sample of five trucks is randomly selected. What is the probabil-
ity that two of those tested have defective brakes?

30. The game called Lotto sponsored by the Louisiana Lottery Commission pays its
largest prize when a contestant matches all 6 of the 40 possible numbers.
Assume there are 40 ping-pong balls each with a single number between 1 and
40. Any number appears only once, and the winning balls are selected without
replacement.

a. The commission reports that the probability of matching all the numbers are 1 in
3,838,380. What is this in terms of probability?

b. Use the hypergeometric formula to find this probability.
The lottery commission also pays if a contestant matches four or five of the six

winning numbers. Hint: Divide the 40 numbers into two groups, winning numbers
and nonwinning numbers.

c. Find the probability, again using the hypergeometric formula, for matching 4 of
the 6 winning numbers.

d. Find the probability of matching 5 of the 6 winning numbers.

198 CHAPTER 6

The Poisson probability distribution has these characteristics:

POISSON PROBABILITY EXPERIMENT

1. The random variable is the number of times some event occurs during a defined
interval.

2. The probability of the event is proportional to the size of the interval.
3. The intervals do not overlap and are independent.

STATISTICS IN ACTION

Near the end of World War
II, the Germans developed
rocket bombs, which were
fired at the city of London.
The Allied military command
didn’t know whether these
bombs were fired at random
or whether they had an
aiming device. To investi-
gate, the city of London was
divided into 586 square
regions. The distribution
of hits in each square was
recorded as follows:

Hits
0 1 2 3 4 5
Regions
229 221 93 35 7 1

To interpret, the above chart
indicates that 229 regions
were not hit with one of the
bombs. Seven regions were
hit four times. Using the
Poisson distribution, with a
mean of 0.93 hits per re-
gion, the expected number
of hits is as follows:

Hits
0 1 2 3 4 5 or
more
Regions
231.2 215.0 100.0 31.0 7.2 1.6

Because the actual number
of hits was close to the
expected number of hits,
the military command
concluded that the bombs
were falling at random.
The Germans had not
developed a bomb with
an aiming device.

This probability distribution has many applications. It is used as a model to
describe the distribution of errors in data entry, the number of scratches and other
imperfections in newly painted car panels, the number of defective parts in outgoing
shipments, the number of customers waiting to be served at a restaurant or waiting to
get into an attraction at Disney World, and the number of accidents on I–75 during a
three-month period.

The Poisson distribution is described mathematically by the formula:

POISSON DISTRIBUTION P(x) =
μxe−μ

x!
(6–7)

MEAN OF A POISSON DISTRIBUTION μ = nπ (6–8)

where:
μ (mu) is the mean number of occurrences (successes) in a particular interval.
e is the constant 2.71828 (base of the Napierian logarithmic system).
x is the number of occurrences (successes).
P(x) is the probability for a specified value of x.

The mean number of successes, μ, is found by nπ, where n is the total number of
trials and π the probability of success.

The variance of the Poisson is equal to its mean. If, for example, the probability that
a check cashed by a bank will bounce is .0003, and 10,000 checks are cashed, the
mean and the variance for the number of bad checks is 3.0, found by μ = nπ =
10,000(.0003) = 3.0.

Recall that for a binomial distribution there are a fixed number of trials. For example,
for a four-question multiple-choice test there can only be zero, one, two, three, or four
successes (correct answers). The random variable, x, for a Poisson distribution, how-
ever, can assume an infinite number of values—that is, 0, 1, 2, 3, 4, 5, . . . However, the
probabilities become very small after the first few occurrences (successes).

E X A M P L E

Budget Airlines is a seasonal airline that operates flights from Myrtle Beach, South
Carolina, to various cities in the northeast. The destinations include Boston, Pittsburgh,
Buffalo, and both LaGuardia and JFK airports in New York City. Recently Budget has
been concerned about the number of lost bags. Ann Poston from the Analytics Depart-
ment was asked to study the issue. She randomly selected a sample of 500 flights and
found that a total of twenty bags were lost on the sampled flights.

DISCRETE PROBABILITY DISTRIBUTIONS 199

Part of Appendix B.2 is repeated as Table 6–7. For certain values of µ, the mean
of the Poisson distribution, we can read the probability directly from the table. Turning
to another example, NewYork-LA Trucking Company finds the mean number of break-
downs on the New York to Los Angeles route is 0.30. From Table 6–7 we can locate
the probability of no breakdowns on a particular run. First find the column headed
“0.30” then read down that column to the row labeled “0”. The value at the intersec-
tion is .7408, so this value is the probability of no breakdowns on a particular run. The
probability of one breakdown is .2222.

Show that this situation follows the Poisson distribution. What is the mean
number of bags lost per flight? What is the likelihood that no bags are lost on a
flight? What is the probability at least one bag is lost?

S O L U T I O N

To begin, let’s confirm that the Budget Airlines situation follows a Poisson Distribu-
tion. Refer to the highlighted box labeled Poisson Probability Experiment in this
section. We count the number of bags lost on a particular flight. On most flights
there were no bags lost, on a few flights one was lost, and perhaps in very rare cir-
cumstances more than one bag was lost. The continuum or interval is a particular
flight. Each flight is assumed to be independent of any other flight.

Based on the sample information we can estimate the mean number of bags
lost per flight. There were 20 bags lost in 500 flights so the mean number of
bags lost per flight is .04, found by 20/500. Hence μ = .04.

We use formula (6–7) to find the probability of any number of lost bags. In this
case x, the number of lost bags is 0.

P(0) =
μxe−μ

x!
=

.040e−0.04

0!
= .9608

The probability of exactly one lost bag is:

P(1) =
μxe−μ

x!
=

.040e−0.04

1!
= .0384

The probability of one or more lost bags is:

1 − P(0) = 1 −
μxe−μ

x!
= 1 −

.040e−0.04

0!
= 1 − .9608 = .0392

These probabilities can also be found using Excel. The commands to compute Poisson
probabilities are in Appendix C.

200 CHAPTER 6

Earlier in this section, we mentioned that the Poisson probability distribution is a limiting
form of the binomial. That is, we could estimate a binomial probability using the Poisson.
In the following example, we use the Poisson distribution to estimate a binomial proba-
bility when n, the number of trials, is large and π, the probability of a success, small.

TABLE 6–7 Poisson Table for Various Values of μ (from Appendix B.2)

E X A M P L E

Coastal Insurance Company underwrites insurance for beachfront properties
along the Virginia, North and South Carolina, and Georgia coasts. It uses the esti-
mate that the probability of a named Category III hurricane (sustained winds of
more than 110 miles per hour) or higher striking a particular region of the coast
(for example, St. Simons Island, Georgia) in any one year is .05. If a homeowner
takes a 30-year mortgage on a recently purchased property in St. Simons, what is
the likelihood that the owner will experience at least one hurricane during the
mortgage period?

S O L U T I O N

To use the Poisson probability distribution, we begin by determining the mean
or expected number of storms meeting the criterion hitting St. Simons during the
30-year period. That is:

μ = nπ = 30(.05) = 1.5
where:

n is the number of years, 30 in this case.
π is the probability a hurricane meeting the strength criteria comes ashore.
μ is the mean or expected number of storms in a 30-year period.

To find the probability of at least one storm hitting St. Simons Island, Georgia, we
first find the probability of no storms hitting the coast and subtract that value from 1.

P(x ≥ 1) = 1 − P(x = 0) = 1 −
μ0e−1.5

0!
= 1 − .2231 = .7769

We conclude that the likelihood a hurricane meeting the strength criteria will strike
the beachfront property at St. Simons during the 30-year period when the mortgage
is in effect is .7769. To put it another way, the probability St. Simons will be hit by a
Category III or higher hurricane during the 30-year period is a little more than 75%.

We should emphasize that the continuum, as previously described, still exists.
That is, there are expected to be 1.5 storms hitting the coast per 30-year period.
The continuum is the 30-year period.

μ

x 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0 0.9048 0.8187 0.7408 0.6703 0.6065 0.5488 0.4966 0.4493 0.4066
1 0.0905 0.1637 0.2222 0.2681 0.3033 0.3293 0.3476 0.3595 0.3659
2 0.0045 0.0164 0.0333 0.0536 0.0758 0.0988 0.1217 0.1438 0.1647
3 0.0002 0.0011 0.0033 0.0072 0.0126 0.0198 0.0284 0.0383 0.0494
4 0.0000 0.0001 0.0003 0.0007 0.0016 0.0030 0.0050 0.0077 0.0111
5 0.0000 0.0000 0.0000 0.0001 0.0002 0.0004 0.0007 0.0012 0.0020
6 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0001 0.0002 0.0003
7 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000

DISCRETE PROBABILITY DISTRIBUTIONS 201

In the preceding case, we are actually using the Poisson distribution as an estimate
of the binomial. Note that we’ve met the binomial conditions outlined on page 183.

• There are only two possible outcomes: a hurricane hits the St. Simons area or it
does not.

• There are a fixed number of trials, in this case 30 years.
• There is a constant probability of success; that is, the probability of a hurricane hit-

ting the area is .05 each year.
• The years are independent. That means if a named storm strikes in the fifth year,

that has no effect on any other year.

To find the probability of at least one storm striking the area in a 30-year period
using the binomial distribution:

P(x ≥ 1) = 1 − P(x = 0) = 1 − [30C0(.05)0(.95)30] = 1 − [(1)(1)(.2146)] = .7854

The probability of at least one hurricane hitting the St. Simons area during the 30-
year period using the binomial distribution is .7854.

Which answer is correct? Why should we look at the problem both ways? The bino-
mial is the more “technically correct” solution. The Poisson can be thought of as an ap-
proximation for the binomial, when n, the number of trials is large, and π, the probability
of a success, is small. We look at the problem using both distributions to emphasize the
convergence of the two discrete distributions. In some instances, using the Poisson may
be the quicker solution, and as you see there is little practical difference in the answers.
In fact, as n gets larger and π smaller, the difference between the two distributions
gets smaller.

The Poisson probability distribution is always positively skewed and the random
variable has no specific upper limit. In the lost bags example/solution, the Poisson
distribution, with μ = 0.04, is highly skewed. As μ becomes larger, the Poisson distri-
bution becomes more symmetrical. For example, Chart 6–4 shows the distributions of

.50

.60

.40

.30

.20

.10

.00

Pr
ob

ab
ili

ty
o

f O
cc

ur
re

nc
e

Number of Occurrences

Transmission
Services

Muffler
Replacements

Oil changes
110 1 2 3 4 5 6 7 8 9100 1 2 3 4 5 60 1 2 3 4

P(x)

m = 0.7

m = 2.0

m = 6.0

CHART 6–4 Poisson Probability Distributions for Means of 0.7, 2.0, and 6.0

202 CHAPTER 6

the number of transmission services, muffler replacements, and oil changes per day at
Avellino’s Auto Shop. They follow Poisson distributions with means of 0.7, 2.0, and
6.0, respectively.

In summary, the Poisson distribution is a family of discrete distributions. All that is
needed to construct a Poisson probability distribution is the mean number of defects,
errors, or other random variable, designated as μ.

From actuary tables, Washington Insurance Company determined the likelihood that a man
age 25 will die within the next year is .0002. If Washington Insurance sells 4,000 policies to
25-year-old men this year, what is the probability they will pay on exactly one policy?

S E L F - R E V I E W 6–6

31. In a Poisson distribution μ = 0.4.
a. What is the probability that x = 0?
b. What is the probability that x > 0?

32. In a Poisson distribution μ = 4.
a. What is the probability that x = 2?
b. What is the probability that x ≤ 2?
c. What is the probability that x > 2?

33. Ms. Bergen is a loan officer at Coast Bank and Trust. From her years of experience,
she estimates that the probability is .025 that an applicant will not be able to repay
his or her installment loan. Last month she made 40 loans.

a. What is the probability that three loans will be defaulted?
b. What is the probability that at least three loans will be defaulted?

34. Automobiles arrive at the Elkhart exit of the Indiana Toll Road at the rate of two per
minute. The distribution of arrivals approximates a Poisson distribution.

a. What is the probability that no automobiles arrive in a particular minute?
b. What is the probability that at least one automobile arrives during a particular

minute?
35. It is estimated that 0.5% of the callers to the Customer Service department of Dell

Inc. will receive a busy signal. What is the probability that of today’s 1,200 callers at
least 5 received a busy signal?

36. In the past, schools in Los Angeles County have closed an average of 3 days each
year for weather emergencies. What is the probability that schools in Los Angeles
County will close for 4 days next year?

E X E R C I S E S

C H A P T E R S U M M A R Y

I. A random variable is a numerical value determined by the outcome of an experiment.
II. A probability distribution is a listing of all possible outcomes of an experiment and the

probability associated with each outcome.
A. A discrete probability distribution can assume only certain values. The main features are:

1. The sum of the probabilities is 1.00.
2. The probability of a particular outcome is between 0.00 and 1.00.
3. The outcomes are mutually exclusive.

B. A continuous distribution can assume an infinite number of values within a specific range.
III. The mean and variance of a probability distribution are computed as follows.

A. The mean is equal to:

μ = Σ[xP(x)] (6–1)
B. The variance is equal to:

σ2 = Σ[(x − μ)2P(x)] (6–2)

DISCRETE PROBABILITY DISTRIBUTIONS 203

IV. The binomial distribution has the following characteristics.
A. Each outcome is classified into one of two mutually exclusive categories.
B. The distribution results from a count of the number of successes in a fixed number of

trials.
C. The probability of a success remains the same from trial to trial.
D. Each trial is independent.
E. A binomial probability is determined as follows:

P(x) = nCxπx(1 − π)n−x (6–3)

F. The mean is computed as:

μ = nπ (6–4)

G. The variance is

σ2 = nπ(1 − π) (6–5)

V. The hypergeometric distribution has the following characteristics.
A. There are only two possible outcomes.
B. The probability of a success is not the same on each trial.
C. The distribution results from a count of the number of successes in a fixed number of

trials.
D. It is used when sampling without replacement from a finite population.
E. A hypergeometric probability is computed from the following equation:

P(x) = (SCx) (N−SCn−x)
(NCn)

(6–6)

VI. The Poisson distribution has the following characteristics.
A. It describes the number of times some event occurs during a specified interval.
B. The probability of a “success” is proportional to the length of the interval.
C. Nonoverlapping intervals are independent.
D. It is a limiting form of the binomial distribution when n is large and π is small.
E. A Poisson probability is determined from the following equation:

P(x) =
μxe−μ

x!
(6–7)

F. The mean and the variance are:

μ = nπ
σ2 = nπ (6–8)

C H A P T E R E X E R C I S E S

37. What is the difference between a random variable and a probability distribution?
38. For each of the following indicate whether the random variable is discrete or continuous.

a. The length of time to get a haircut.
b. The number of cars a jogger passes each morning while running.
c. The number of hits for a team in a high school girls’ softball game.
d. The number of patients treated at the South Strand Medical Center between 6 and

10 p.m. each night.
e. The distance your car traveled on the last fill-up.
f. The number of customers at the Oak Street Wendy’s who used the drive-through

facility.
g. The distance between Gainesville, Florida, and all Florida cities with a population of

at least 50,000.
39. An investment will be worth $1,000, $2,000, or $5,000 at the end of the year. The

probabilities of these values are .25, .60, and .15, respectively. Determine the mean and
variance of the investment’s dollar value.

204 CHAPTER 6

40. The following notice appeared in the golf shop at a Myrtle Beach, South Carolina, golf
course.

Blackmoor Golf Club Members

The golf shop is holding a raffle to win a
TaylorMade M1 10.5° Regular Flex Driver ($300 value).

Tickets are $5.00 each.
Only 80 tickets will be sold.

Please see the golf shop to get your tickets!

John Underpar buys a ticket.
a. What are Mr. Underpar’s possible monetary outcomes?
b. What are the probabilities of the possible outcomes?
c. Summarize Mr. Underpar’s “experiment” as a probability distribution.
d. What is the mean or expected value of the probability distribution? Explain your result.
e. If all 80 tickets are sold, what is the expected return to the Club?

41. Croissant Bakery Inc. offers special decorated cakes for birthdays, weddings, and
other occasions. It also has regular cakes available in its bakery. The following table
gives the total number of cakes sold per day and the corresponding probability. Com-
pute the mean, variance, and standard deviation of the number of cakes sold per day.

Number of Cakes
Sold in a Day Probability

12 .25
13 .40
14 .25
15 .10

42. The payouts for the Powerball lottery and their corresponding odds and probabili-
ties of occurrence are shown below. The price of a ticket is $1.00. Find the mean and
standard deviation of the payout. Hint: Don’t forget to include the cost of the ticket and
its corresponding probability.

Divisions Payout Odds Probability

Five plus Powerball $50,000,000 146,107,962 0.000000006844
Match 5 200,000 3,563,609 0.000000280614
Four plus Powerball 10,000 584,432 0.000001711060
Match 4 100 14,255 0.000070145903
Three plus Powerball 100 11,927 0.000083836351
Match 3 7 291 0.003424657534
Two plus Powerball 7 745 0.001340482574
One plus Powerball 4 127 0.007812500000
Zero plus Powerball 3 69 0.014285714286

43. In a recent study, 35% of people surveyed indicated chocolate was their favorite flavor
of ice cream. Suppose we select a sample of 10 people and ask them to name their
favorite flavor of ice cream.
a. How many of those in the sample would you expect to name chocolate?
b. What is the probability exactly four of those in the sample name chocolate?
c. What is the probability four or more name chocolate?

44. Thirty percent of the population in a southwestern community are Spanish-speaking
Americans. A Spanish-speaking person is accused of killing a non-Spanish-speaking Amer-
ican and goes to trial. Of the first 12 potential jurors, only 2 are Spanish-speaking Americans,
and 10 are not. The defendant’s lawyer challenges the jury selection, claiming bias against
her client. The government lawyer disagrees, saying that the probability of this particular jury
composition is common. Compute the probability and discuss the assumptions.

DISCRETE PROBABILITY DISTRIBUTIONS 205

45. An auditor for Health Maintenance Services of Georgia reports 40% of policyholders
55 years or older submit a claim during the year. Fifteen policyholders are randomly
selected for company records.
a. How many of the policyholders would you expect to have filed a claim within the last

year?
b. What is the probability that 10 of the selected policyholders submitted a claim last year?
c. What is the probability that 10 or more of the selected policyholders submitted a

claim last year?
d. What is the probability that more than 10 of the selected policyholders submitted a

claim last year?
46. Tire and Auto Supply is considering a 2-for-1 stock split. Before the transaction is final-

ized, at least two-thirds of the 1,200 company stockholders must approve the proposal.
To evaluate the likelihood the proposal will be approved, the CFO selected a sample of
18 stockholders. He contacted each and found 14 approved of the proposed split. What
is the likelihood of this event, assuming two-thirds of the stockholders approve?

47. A federal study reported that 7.5% of the U.S. workforce has a drug problem. A drug
enforcement official for the state of Indiana wished to investigate this statement. In her
sample of 20 employed workers:
a. How many would you expect to have a drug problem? What is the standard deviation?
b. What is the likelihood that none of the workers sampled has a drug problem?
c. What is the likelihood at least one has a drug problem?

48. The Bank of Hawaii reports that 7% of its credit card holders will default at some time in
their life. The Hilo branch just mailed out 12 new cards today.
a. How many of these new cardholders would you expect to default? What is the stan-

dard deviation?
b. What is the likelihood that none of the cardholders will default?
c. What is the likelihood at least one will default?

49. Recent statistics suggest that 15% of those who visit a retail site on the internet to make
a purchase. A retailer wished to verify this claim. To do so, she selected a sample of 16
“hits” to her site and found that 4 had actually made a purchase.
a. What is the likelihood of exactly four purchases?
b. How many purchases should she expect?
c. What is the likelihood that four or more “hits” result in a purchase?

50. In Chapter 19, we discuss acceptance sampling. Acceptance sampling is a statistical
method used to monitor the quality of purchased parts and components. To ensure the
quality of incoming parts, a purchaser or manufacturer normally samples 20 parts and
allows one defect.
a. What is the likelihood of accepting a lot that is 1% defective?
b. If the quality of the incoming lot was actually 2%, what is the likelihood of accepting it?
c. If the quality of the incoming lot was actually 5%, what is the likelihood of accepting it?

51. Unilever Inc. recently developed a new body wash with a scent of ginger. Their research
indicates that 30% of men like the new scent. To further investigate, Unilever’s market-
ing research group randomly selected 15 men and asked them if they liked the scent.
What is the probability that six or more men like the ginger scent in the body wash?

52. Dr. Richmond, a psychologist, is studying the daytime television viewing habits of col-
lege students. She believes 45% of college students watch soap operas during the af-
ternoon. To further investigate, she selects a sample of 10.
a. Develop a probability distribution for the number of students in the sample who

watch soap operas.
b. Find the mean and the standard deviation of this distribution.
c. What is the probability of finding exactly four students who watch soap operas?
d. What is the probability less than half of the students selected watch soap operas?

53. A recent study conducted by Penn, Shone, and Borland, on behalf of LastMinute.
com, revealed that 52% of business travelers plan their trips less than two weeks before
departure. The study is to be replicated in the tri-state area with a sample of 12 frequent
business travelers.
a. Develop a probability distribution for the number of travelers who plan their trips

within two weeks of departure.
b. Find the mean and the standard deviation of this distribution.

206 CHAPTER 6

c. What is the probability exactly 5 of the 12 selected business travelers plan their trips
within two weeks of departure?

d. What is the probability 5 or fewer of the 12 selected business travelers plan their
trips within two weeks of departure?

54. The Internal Revenue Service is studying the category of charitable contributions. A sample of
25 returns is selected from young couples between the ages of 20 and 35 who had an
adjusted gross income of more than $100,000. Of these 25 returns, five had charitable contri-
butions of more than $1,000. Four of these returns are selected for a comprehensive audit.
a. Explain why the hypergeometric distribution is appropriate.
b. What is the probability exactly one of the four audited had a charitable deduction of

more than $1,000?
c. What is the probability at least one of the audited returns had a charitable contribu-

tion of more than $1,000?
55. The law firm of Hagel and Hagel is located in downtown Cincinnati. There are 10 partners in

the firm; 7 live in Ohio and 3 in northern Kentucky. Ms. Wendy Hagel, the managing partner,
wants to appoint a committee of 3 partners to look into moving the firm to northern Kentucky.
If the committee is selected at random from the 10 partners, what is the probability that:
a. One member of the committee lives in northern Kentucky and the others live in Ohio?
b. At least one member of the committee lives in northern Kentucky?

56. Topten is a leading source on energy-efficient products. Their list of the top seven vehi-
cles in terms of fuel efficiency for 2017 includes three Hondas.
a. Determine the probability distribution for the number of Hondas in a sample of two

cars chosen from the top seven.
b. What is the likelihood that in the sample of two at least one Honda is included?

57. The position of chief of police in the city of Corry, Pennsylvania, is vacant. A search com-
mittee of Corry residents is charged with the responsibility of recommending a new
chief to the city council. There are 12 applicants, 4 of whom are either female or mem-
bers of a minority. The search committee decides to interview all 12 of the applicants. To
begin, they randomly select four applicants to be interviewed on the first day, and none
of the four is female or a member of a minority. The local newspaper, the Corry Press,
suggests discrimination in an editorial. What is the likelihood of this occurrence?

58. Listed below is the population by state for the 15 states with the largest popula-
tion. Also included is whether that state’s border touches the Gulf of Mexico, the Atlantic
Ocean, or the Pacific Ocean (coastline).

Rank State Population Coastline

1 California 38,802,500 Yes
2 Texas 26,956,958 Yes
3 Florida 19,893,297 Yes
4 New York 19,746,227 Yes
5 Illinois 12,880,580 No
6 Pennsylvania 12,787,209 No
7 Ohio 11,594,163 No
8 Georgia 10,097,343 Yes
9 North Carolina 9,943,964 Yes
10 Michigan 9,909,877 No
11 New Jersey 8,938,175 Yes
12 Virginia 8,326,289 Yes
13 Washington 7,061,530 Yes
14 Massachusetts 6,745,408 Yes
15 Arizona 6,731,484 No

Note that 5 of the 15 states do not have any coastline. Suppose three states are
selected at random. What is the probability that:
a. None of the states selected has any coastline?
b. Exactly one of the selected states has a coastline?
c. At least one of the selected states has a coastline?

DISCRETE PROBABILITY DISTRIBUTIONS 207

59. The sales of Lexus automobiles in the Detroit area follow a Poisson distribution with a
mean of 3 per day.
a. What is the probability that no Lexus is sold on a particular day?
b. What is the probability that for 5 consecutive days at least one Lexus is sold?

60. Suppose 1.5% of the antennas on new Nokia cell phones are defective. For a random
sample of 200 antennas, find the probability that:
a. None of the antennas is defective.
b. Three or more of the antennas are defective.

61. A study of the checkout lines at the Safeway Supermarket in the South Strand area
revealed that between 4 and 7 p.m. on weekdays there is an average of four custom-
ers waiting in line. What is the probability that you visit Safeway today during this pe-
riod and find:
a. No customers are waiting?
b. Four customers are waiting?
c. Four or fewer are waiting?
d. Four or more are waiting?

62. An internal study by the Technology Services department at Lahey Electronics revealed
company employees receive an average of two non-work-related e-mails per hour. As-
sume the arrival of these e-mails is approximated by the Poisson distribution.
a. What is the probability Linda Lahey, company president, received exactly one non-

work-related e-mail between 4 p.m. and 5 p.m. yesterday?
b. What is the probability she received five or more non-work-related e-mails during the

same period?
c. What is the probability she did not receive any non-work-related e-mails during the

period?
63. Recent crime reports indicate that 3.1 motor vehicle thefts occur each minute in the

United States. Assume that the distribution of thefts per minute can be approximated by
the Poisson probability distribution.
a. Calculate the probability exactly four thefts occur in a minute.
b. What is the probability there are no thefts in a minute?
c. What is the probability there is at least one theft in a minute?

64. Recent difficult economic times have caused an increase in the foreclosure rate of home
mortgages. Statistics from the Penn Bank and Trust Company show their monthly foreclo-
sure rate is now 1 loan out of every 136 loans. Last month the bank approved 300 loans.
a. How many foreclosures would you expect the bank to have last month?
b. What is the probability of exactly two foreclosures?
c. What is the probability of at least one foreclosure?

65. The National Aeronautics and Space Administration (NASA) has experienced two
disasters. The Challenger exploded over the Atlantic Ocean in 1986, and the
Columbia disintegrated on reentry over East Texas in 2003. Based on the first 113
missions, and assuming failures occur at the same rate, consider the next 23 mis-
sions. What is the probability of exactly two failures? What is the probability of no
failures?

66. According to the “January theory,” if the stock market is up for the month of January, it
will be up for the year. If it is down in January, it will be down for the year. According to
an article in The Wall Street Journal, this theory held for 29 out of the last 34 years. Sup-
pose there is no truth to this theory; that is, the probability it is either up or down is .50.
What is the probability this could occur by chance? You will probably need a software
package such as Excel or Minitab.

67. During the second round of the 1989 U.S. Open golf tournament, four golfers scored a
hole in one on the sixth hole. The odds of a professional golfer making a hole in one are
estimated to be 3,708 to 1, so the probability is 1/3,709. There were 155 golfers partic-
ipating in the second round that day. Estimate the probability that four golfers would
score a hole in one on the sixth hole.

68. According to sales information in the first quarter of 2016, 2.7% of new vehicles sold in
the United States were hybrids. This is down from 3.3% for the same period a year ear-
lier. An analyst’s review of the data indicates that the reasons for the sales decline in-
clude the low price of gasoline and the higher price of a hybrid compared to similar
vehicles. Let’s assume these statistics remain the same for 2017. That is, 2.7 percent of

208 CHAPTER 6

new car sales are hybrids in the first quarter of 2017. For a sample of 40 vehicles sold
in the Richmond, Virginia area:
a. How many vehicles would you expect to be hybrid?
b. Use the Poisson distribution to find the probability that five of the sales were hybrid

vehicles.
c. Use the binomial distribution to find the probability that five of the sales were hybrid

vehicles.
69. A recent CBS News survey reported that 67% of adults felt the U.S. Treasury should

continue making pennies. Suppose we select a sample of 15 adults.
a. How many of the 15 would we expect to indicate that the Treasury should continue

making pennies? What is the standard deviation?
b. What is the likelihood that exactly eight adults would indicate the Treasury should

continue making pennies?
c. What is the likelihood at least eight adults would indicate the Treasury should con-

tinue making pennies?

D A T A A N A L Y T I C S

70. Refer to the North Valley Real Estate data, which report information on homes sold
in the area last year.
a. Create a probability distribution for the number of bedrooms. Compute the mean and

the standard deviation of this distribution.
b. Create a probability distribution for the number of bathrooms. Compute the mean

and the standard deviation of this distribution.
71. Refer to the Baseball 2016 data. Compute the mean number of home runs per

game. To do this, first find the mean number of home runs per team for 2016. Next,
divide this value by 162 (a season comprises 162 games). Then multiply by 2 because
there are two teams in each game. Use the Poisson distribution to estimate the number
of home runs that will be hit in a game. Find the probability that:
a. There are no home runs in a game.
b. There are two home runs in a game.
c. There are at least four home runs in a game.

Continuous Probability
Distributions 7

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO7-1 Describe the uniform probability distribution and use it to calculate probabilities.

LO7-2 Describe the characteristics of a normal probability distribution.

LO7-3 Describe the standard normal probability distribution and use it to calculate probabilities.

LO7-4 Approximate the binomial probability distribution using the standard normal probability
distribution to calculate probabilities.

LO7-5 Describe the exponential probability distribution and use it to calculate probabilities.

CRUISE SHIPS of the Royal Viking line report that 80% of their rooms are occupied
during September. For a cruise ship having 800 rooms, what is the probability that 665 or
more are occupied in September? (See Exercise 60 and LO7-4.)

© Ilene MacDonald/Alamy Stock Photo

210 CHAPTER 7

INTRODUCTION
Chapter 6 began our study of probability distributions. We consider three discrete prob-
ability distributions: binomial, hypergeometric, and Poisson. These distributions are
based on discrete random variables, which can assume only clearly separated values.
For example, we select for study 10 small businesses that began operations during the
year 2014. The number still operating in 2017 can be 0, 1, 2, . . . , 10. There cannot be
3.7, 12, or −7 still operating in 2017. In this example, only certain outcomes are possi-
ble and these outcomes are represented by clearly separated values. In addition, the
result is usually found by counting the number of successes. We count the number of
the businesses in the study that are still in operation in 2017.

We continue our study of probability distributions by examining continuous proba-
bility distributions. A continuous probability distribution usually results from measuring
something, such as the distance from the dormitory to the classroom, the weight of an
individual, or the amount of bonus earned by CEOs. As an example, at Dave’s Inlet Fish
Shack flounder is the featured, fresh-fish menu item. The distribution of the amount
of flounder sold per day has a mean of 10.0 pounds per day and a standard deviation
of 3.0 pounds per day. This distribution is continuous because Dave, the owner,
“measures” the amount of flounder sold each day. It is important to realize that a contin-
uous random variable has an infinite number of values within a particular range. So, for
a continuous random variable, probability is for a range of values. The probability for a
specific value of a continuous random variable is 0.

This chapter shows how to use three continuous probability distributions: the uni-
form probability distribution, the normal probability distribution, and the exponential
probability distribution.

THE FAMILY OF UNIFORM PROBABILITY
DISTRIBUTIONS
The uniform probability distribution is the simplest distribution for a continuous ran-
dom variable. This distribution is rectangular in shape and is completely defined by
its minimum and maximum values. Here are some examples that follow a uniform
distribution.

• The sales of gasoline at the Kwik Fill in Medina, New York, fol-
low a uniform distribution that varies between 2,000 and 5,000
gallons per day. The random variable is the number of gallons
sold per day and is continuous within the interval between
2,000 gallons and 5,000 gallons.

• Volunteers at the Grand Strand Public Library prepare federal
income tax forms. The time to prepare form 1040-EZ follows a
uniform distribution over the interval between 10 minutes and
30 minutes. The random variable is the number of minutes
to complete the form, and it can assume any value between
10 and 30.

A uniform distribution is shown in Chart 7–1. The distribution’s shape is rectangular and
has a minimum value of a and a maximum of b. Also notice in Chart 7–1 the height of
the distribution is constant or uniform for all values between a and b.

The mean of a uniform distribution is located in the middle of the interval between
the minimum and maximum values. It is computed as:

LO7-1
Describe the uniform
probability distribution
and use it to calculate
probabilities.

© C. Sherburne/PhotoLink/Getty Imagess

MEAN OF THE UNIFORM DISTRIBUTION μ =
a + b

2
(7–1)

CONTINUOUS PROBABILITY DISTRIBUTIONS 211

The standard deviation describes the dispersion of a distribution. In the uniform distribu-
tion, the standard deviation is also related to the interval between the maximum and
minimum values.

a

P (x )

1
b 2 a

b

CHART 7–1 A Continuous Uniform Distribution

UNIFORM DISTRIBUTION P(x) =
1

b − a
if a ≤ x ≤ b and 0 elsewhere (7–3)

The equation for the uniform probability distribution is:

As we described in Chapter 6, probability distributions are useful for making prob-
ability statements concerning the values of a random variable. For distributions describ-
ing a continuous random variable, areas within the distribution represent probabilities.
In the uniform distribution, its rectangular shape allows us to apply the area formula for
a rectangle. Recall that we find the area of a rectangle by multiplying its length by its
height. For the uniform distribution, the height of the rectangle is P(x), which is 1/(b − a).
The length or base of the distribution is b − a. So if we multiply the height of the distribu-
tion by its entire range to find the area, the result is always 1.00. To put it another way, the
total area within a continuous probability distribution is equal to 1.00. In general

Area = (height) (base) =
1

(b − a)
(b − a) = 1.00

So if a uniform distribution ranges from 10 to 15, the height is 0.20, found by 1/(15 − 10).
The base is 5, found by 15 − 10. The total area is:

Area = (height) (base) =
1

(15 − 10)
(15 − 10) = 1.00

The following example illustrates the features of a uniform distribution and how we use
it to calculate probabilities.

E X A M P L E

Southwest Arizona State University provides bus service to students while they are
on campus. A bus arrives at the North Main Street and College Drive stop every
30 minutes between 6 a.m. and 11 p.m. during weekdays. Students arrive at the

σ = √
(b − a)2

12
(7–2)

STANDARD DEVIATION
OF THE UNIFORM DISTRIBUTION

212 CHAPTER 7

bus stop at random times. The time that a student waits is uniformly distributed from
0 to 30 minutes.
1. Draw a graph of this distribution.
2. Show that the area of this uniform distribution is 1.00.
3. How long will a student “typically” have to wait for a bus? In other words, what

is the mean waiting time? What is the standard deviation of the waiting times?
4. What is the probability a student will wait more than 25 minutes?
5. What is the probability a student will wait between 10 and 20 minutes?

S O L U T I O N

In this case, the random variable is the length of time a student must wait. Time is
measured on a continuous scale, and the wait times may range from 0 minutes up
to 30 minutes.
1. The graph of the uniform distribution is shown in Chart 7–2. The horizontal line

is drawn at a height of .0333, found by 1/(30 − 0). The range of this distribution
is 30 minutes.

2. The times students must wait for the bus are uniform over the interval from
0 minutes to 30 minutes, so in this case a is 0 and b is 30.

Area = (height) (base) =
1

(30 − 0)
(30 − 0) = 1.00

3. To find the mean, we use formula (7–1).

μ =
a + b

2
=

0 + 30
2

= 15

The mean of the distribution is 15 minutes, so the typical wait time for bus ser-
vice is 15 minutes.

To find the standard deviation of the wait times, we use formula (7–2).

σ = √
(b − a)2

12
= √

(30 − 0)2

12
= 8.66

The standard deviation of the distribution is 8.66 minutes. This measures the
variation in the student wait times.

4. The area within the distribution for the interval 25 to 30 represents this partic-
ular probability. From the area formula:

P(25 < wait time < 30) = (height) (base) =
1

(30 − 0)
(5) = .1667

0

.060

.0333

0
10 40

Length of Wait (minutes)

Pr
ob

ab
ili

ty

3020

CHART 7–2 Uniform Probability Distribution of Student Waiting Times

CONTINUOUS PROBABILITY DISTRIBUTIONS 213

Microwave ovens only last so long. The life-time of a microwave oven follows a uniform
distribution between 8 and 14 years.
(a) Draw this uniform distribution. What are the height and base values?
(b) Show the total area under the curve is 1.00.
(c) Calculate the mean and the standard deviation of this distribution.
(d) What is the probability a particular microwave oven lasts between 10 and 14 years?
(e) What is the probability a microwave oven will last less than 9 years?

S E L F - R E V I E W 7–1

1. A uniform distribution is defined over the interval from 6 to 10.
a. What are the values for a and b?
b. What is the mean of this uniform distribution?
c. What is the standard deviation?
d. Show that the total area is 1.00.
e. Find the probability of a value more than 7.
f. Find the probability of a value between 7 and 9.

2. A uniform distribution is defined over the interval from 2 to 5.
a. What are the values for a and b?
b. What is the mean of this uniform distribution?
c. What is the standard deviation?
d. Show that the total area is 1.00.
e. Find the probability of a value more than 2.6.
f. Find the probability of a value between 2.9 and 3.7.

E X E R C I S E S

So the probability a student waits between 25 and 30 minutes is .1667. This
conclusion is illustrated by the following graph.

0

P (x )

.0333

10 20 25

Area 5 .1667

m 5 15 30

5. The area within the distribution for the interval 10 to 20 represents the probability.

P(10 < wait time < 20) = (height) (base) =
1

(30 − 0)
(10) = .3333

We can illustrate this probability as follows.

0

P (x )

.0333

10 20

Area 5 .3333

m 5 15 30

214 CHAPTER 7

THE FAMILY OF NORMAL PROBABILITY
DISTRIBUTIONS
Next we consider the normal probability distribution. Unlike the uniform distribution [see
formula (7–3)] the normal probability distribution has a very complex formula.

LO7-2
Describe the characteristics
of a normal probability
distribution.

3. The closing price of Schnur Sporting Goods Inc. common stock is uniformly distributed
between $20 and $30 per share. What is the probability that the stock price will be:

a. More than $27?
b. Less than or equal to $24?

4. According to the Insurance Institute of America, a family of four spends between
$400 and $3,800 per year on all types of insurance. Suppose the money spent is
uniformly distributed between these amounts.

a. What is the mean amount spent on insurance?
b. What is the standard deviation of the amount spent?
c. If we select a family at random, what is the probability they spend less than

$2,000 per year on insurance per year?
d. What is the probability a family spends more than $3,000 per year?

5. The April rainfall in Flagstaff, Arizona, follows a uniform distribution between 0.5
and 3.00 inches.

a. What are the values for a and b?
b. What is the mean amount of rainfall for the month? What is the standard

deviation?
c. What is the probability of less than an inch of rain for the month?
d. What is the probability of exactly 1.00 inch of rain?
e. What is the probability of more than 1.50 inches of rain for the month?

6. Customers experiencing technical difficulty with their Internet cable service may
call an 800 number for technical support. It takes the technician between 30 seconds
and 10 minutes to resolve the problem. The distribution of this support time follows
the uniform distribution.

a. What are the values for a and b in minutes?
b. What is the mean time to resolve the problem? What is the standard deviation of

the time?
c. What percent of the problems take more than 5 minutes to resolve?
d. Suppose we wish to find the middle 50% of the problem-solving times. What are

the end points of these two times?

NORMAL PROBABILITY DISTRIBUTION P(x) =
1

σ√2π
e−[

(x−μ)2

2σ 2 ] (7–4)

However, do not be bothered by how complex this formula looks. You are already
familiar with many of the values. The symbols μ and σ refer to the mean and the stan-
dard deviation, as usual. The Greek symbol π is a constant and its value is approximately
22/7 or 3.1416. The letter e is also a constant. It is the base of the natural log system
and is approximately equal to 2.718. x is the value of a continuous random variable. So
a normal distribution is based on—that is, it is defined by—its mean and standard
deviation.

You will not need to make calculations using formula (7–4). Instead you will use a
table, given in Appendix B.3, to find various probabilities. These probabilities can also
be calculated using Excel functions as well as other statistical software.

CONTINUOUS PROBABILITY DISTRIBUTIONS 215

The normal probability distribution has the following characteristics:

• It is bell-shaped and has a single peak at the center of the distribution. The arithme-
tic mean, median, and mode are equal and located in the center of the distribution.
The total area under the curve is 1.00. Half the area under the normal curve is to
the right of this center point and the other half, to the left of it.

• It is symmetrical about the mean. If we cut the normal curve vertically at the center
value, the shapes of the curves will be mirror images. Also, the area of each half is 0.5.

• It falls off smoothly in either direction from the central value. That is, the distribution
is asymptotic: The curve gets closer and closer to the X-axis but never actually
touches it. To put it another way, the tails of the curve extend indefinitely in both
directions.

• The location of a normal distribution is determined by the mean, μ. The dispersion
or spread of the distribution is determined by the standard deviation, σ.

These characteristics are shown graphically in Chart 7–3.

STATISTICS IN ACTION

Many variables are approxi
mately, normally distributed,
such as IQ scores, life ex
pectancies, and adult
height. This implies that
nearly all observations oc
cur within 3 standard devia
tions of the mean. On the
other hand, observations
that occur beyond 3 stan
dard deviations from the
mean are extremely rare.
For example, the mean
adult male height is
68.2 inches (about 5 feet
8 inches) with a standard
deviation of 2.74. This
means that almost all males
are between 60.0 inches
(5 feet) and 76.4 inches
(6 feet 4 inches). LeBron
James, a professional bas
ketball player with the
Cleveland Cavaliers, is
80 inches, or 6 feet 8 inches,
which is clearly beyond
3 standard deviations from
the mean. The height of a
standard doorway is 6 feet
8 inches, and should be high
enough for almost all adult
males, except for a rare per
son like LeBron James.
As another example,
the driver’s seat in most ve
hicles is set to comfortably
fit a person who is at least
159 cm (62.5 inches) tall.
The distribution of heights
of adult women is approxi
mately a normal distribu
tion with a mean of
161.5 cm and a standard
deviation of 6.3 cm. Thus
about 35% of adult women
will not fit comfortably in
the driver’s seat.

Normal curve is symmetrical
Two halves identical

Mean, median,
and mode are

equal

Theoretically, curve
extends to – `

Theoretically, curve
extends to + `

Tail Tail

CHART 7–3 Characteristics of a Normal Distribution

There is not just one normal probability distribution, but rather a “family” of them.
For example, in Chart 7–4 the probability distributions of length of employee service in
three different plants are compared. In the Camden plant, the mean is 20 years and the
standard deviation is 3.1 years. There is another normal probability distribution for
the length of service in the Dunkirk plant, where μ = 20 years and σ = 3.9 years. In the
Elmira plant, μ = 20 years and σ = 5.0 years. Note that the means are the same but the
standard deviations are different. As the standard deviation gets smaller, the distribution
becomes more narrow and “peaked.”

CHART 7–4 Normal Probability Distributions with Equal Means but Different Standard Deviations

0 4 7 10 13 16 19 22 25 28
m 5 20 years of service

s 5 3.1 years,
Camden plant

s 5 3.9 years,
Dunkirk plant

s 5 5.0 years,
Elmira plant

37 403431

216 CHAPTER 7

Chart 7–5 shows the distribution of box weights of three different cereals. The
weights follow a normal distribution with different means but identical standard
deviations.

Finally, Chart 7–6 shows three normal distributions having different means and
standard deviations. They show the distribution of tensile strengths, measured in
pounds per square inch (psi), for three types of cables.

Sugar
Yummies

s = 1.6 grams

Alphabet
Gems

s = 1.6 grams

Weight
Droppers

s = 1.6 grams

m
283

grams

m
301

grams

m
321

grams

CHART 7–5 Normal Probability Distributions Having Different Means but Equal Standard Deviations

m
2,000

psi

s 5 41 psi
s 5 52 psi

s 5 26 psi

m
2,107

psi

m
2,186

psi

CHART 7–6 Normal Probability Distributions with Different Means and Standard Deviations

In Chapter 6, recall that discrete probability distributions show the specific likeli-
hood a discrete value will occur. For example, on page 186 the binomial distribution is
used to calculate the probability that none of the five flights arriving at the Bradford
Pennsylvania Regional Airport will be late.

With a continuous probability distribution, areas below the curve define probabili-
ties. The total area under the normal curve is 1.0. This accounts for all possible out-
comes. Because a normal probability distribution is symmetric, the area under the curve
to the left of the mean is 0.5, and the area under the curve to the right of the mean is
0.5. Apply this to the distribution of Sugar Yummies in Chart 7–5. It is normally distrib-
uted with a mean of 283 grams. Therefore, the probability of filling a box with more than
283 grams is 0.5 and the probability of filling a box with less than 283 grams is 0.5. We
also can determine the probability that a box weighs between 280 and 286 grams.
However, to determine this probability we need to know about the standard normal
probability distribution.

CONTINUOUS PROBABILITY DISTRIBUTIONS 217

THE STANDARD NORMAL PROBABILITY
DISTRIBUTION
The number of normal distributions is unlimited, each having a different mean (μ), stan-
dard deviation (σ), or both. While it is possible to provide a limited number of probability
tables for discrete distributions such as the binomial and the Poisson, providing tables
for the infinite number of normal distributions is impractial. Fortunately, one member of
the family can be used to determine the probabilities for all normal probability distribu-
tions. It is called the standard normal probability distribution, and it is unique because
it has a mean of 0 and a standard deviation of 1.

Any normal probability distribution can be converted into a standard normal prob-
ability distribution by subtracting the mean from each observation and dividing this dif-
ference by the standard deviation. The results are called z values or z scores.

LO7-3
Describe the standard
normal probability
distribution and use it to
calculate probabilities.

z VALUE The signed distance between a selected value, designated x, and the
mean, μ, divided by the standard deviation, σ.

So, a z value is the distance from the mean, measured in units of the standard devi-
ation. The formula for this conversion is:

STANDARD NORMAL VALUE z =
x − μ

σ
(7–5)

where:

x is the value of any particular observation or measurement.
μ is the mean of the distribution.
σ is the standard deviation of the distribution.

As we noted in the preceding definition, a z value expresses the distance or differ-
ence between a particular value of x and the arithmetic mean in units of the standard
deviation. Once the normally distributed observations are standardized, the z values are
normally distributed with a mean of 0 and a standard deviation of 1. Therefore, the z
distribution has all the characteristics of any normal probability distribution. These char-
acteristics are listed on page 215 in the Family of Normal Probability Distributions sec-
tion. The table in Appendix B.3 lists the probabilities for the standard normal probability
distribution. A small portion of this table follows.

STATISTICS IN ACTION

An individual’s skills depend
on a combination of many
hereditary and environ
mental factors, each having
about the same amount of
weight or influence on the
skills. Thus, much like a
binomial distribution with a
large number of trials, many
skills and attributes follow
the normal distribution. For
example, the SAT Reasoning
Test is the most widely
used standardized test for
college admissions in the
United States. Scores are
based on a normal dis
tribution with a mean of
1,500 and a standard
deviation of 300.

TABLE 7–1 Areas under the Normal Curve

z 0.00 0.01 0.02 0.03 0.04 0.05 . . .

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744
.
.
.

218 CHAPTER 7

Applications of the Standard Normal Distribution
The standard normal distribution is very useful for determining probabilities for any normally
distributed random variable. The basic procedure is to find the z value for a particular value
of the random variable based on the mean and standard deviation of its distribution. Then,
using the z value, we can use the standard normal distribution to find various probabilities.
The following example/solution describes the details of the application.

E X A M P L E

In recent years a new type of taxi service has evolved in more than 300 cities world-
wide, where the customer is connected directly with a driver via a smartphone. The
idea was first developed by Uber Technologies, which is headquartered in San
Francisco, California. It uses the Uber mobile app, which allows customers with a
smartphone to submit a trip request which is then routed to a Uber driver who picks
up the customer and takes the customer to the desired location. No cash is in-
volved, the payment for the transaction is handled via a digital payment.

Suppose the weekly income of Uber drivers follows the normal probability distribu-
tion with a mean of $1,000 and a standard deviation of $100. What is the z value of in-
come for a driver who earns $1,100 per week? For a driver who earns $900 per week?

S O L U T I O N

Using formula (7–5), the z values corresponding to the two x values ($1,100 and
$900) are:

For x = $1,100: For x = $900:

z =
x − μ

σ
z =

x − μ
σ

=
$1,100 − $1,000

$100
=

$900 − $1,000
$100

= 1.00 = −1.00

The z of 1.00 indicates that a weekly income of $1,100 is one standard deviation
above the mean, and a z of −1.00 shows that a $900 income is one standard devi-
ation below the mean. Note that both incomes ($1,100 and $900) are the same
distance ($100) from the mean.

A recent national survey concluded that the typical person consumes 48 ounces of water
per day. Assume daily water consumption follows a normal probability distribution with a
standard deviation of 12.8 ounces.
(a) What is the z value for a person who consumes 64 ounces of water per day? Based on

this z value, how does this person compare to the national average?
(b) What is the z value for a person who consumes 32 ounces of water per day? Based on

this z value, how does this person compare to the national average?

S E L F - R E V I E W 7–2

The Empirical Rule
The Empirical Rule is introduced on page 80 of Chapter 3. It states that if a random vari-
able is normally distributed, then:

1. Approximately 68% of the observations will lie within plus and minus one standard
deviation of the mean.

CONTINUOUS PROBABILITY DISTRIBUTIONS 219

2. About 95% of the observations will lie within plus and minus two standard devia-
tions of the mean.

3. Practically all, or 99.7% of the observations, will lie within plus and minus three stan-
dard deviations of the mean.

Now, knowing how to apply the standard normal probability distribution, we can verify
the Empirical Rule. For example, one standard deviation from the mean is the same as a
z value of 1.00. When we refer to the standard normal probability table, a z value of
1.00 corresponds to a probability of 0.3413. So what percent of the observations will lie
within plus and minus one standard deviation of the mean? We multiply (2)(0.3413),
which equals 0.6826, or approximately 68% of the observations are within plus and
minus one standard deviation of the mean.

The Empirical Rule is summarized in the following graph.

converts to

– 3 – 2 – 1 1 2 30

m – 3s m – 2s m – 1s m + 3sm + 2sm + 1s

68%

95%

Practically all

Scale of z

Scale of xm

Transforming measurements to standard normal deviates changes the scale. The
conversions are also shown in the graph. For example, μ + 1σ is converted to a z value
of 1.00. Likewise, μ − 2σ is transformed to a z value of −2.00. Note that the center of
the z distribution is zero, indicating no deviation from the mean, μ.

E X A M P L E

As part of its quality assurance program, the Autolite Battery Company conducts
tests on battery life. For a particular D-cell alkaline battery, the mean life is 19 hours.
The useful life of the battery follows a normal distribution with a standard deviation
of 1.2 hours. Answer the following questions.

1. About 68% of the batteries failed between what two values?
2. About 95% of the batteries failed between what two values?
3. Virtually all of the batteries failed between what two values?

S O L U T I O N

We can use the Empirical Rule to answer these questions.

1. About 68% of the batteries will fail between 17.8 and 20.2 hours, found by
19.0 ± 1(1.2) hours.

2. About 95% of the batteries will fail between 16.6 and 21.4 hours, found by
19.0 ± 2(1.2) hours.

3. Practically all failed between 15.4 and 22.6 hours, found by 19.0 ± 3(1.2) hours.

220 CHAPTER 7

The distribution of the annual incomes of a group of middle-management employees at
Compton Plastics approximates a normal distribution with a mean of $47,200 and a stan-
dard deviation of $800.
(a) About 68% of the incomes lie between what two amounts?
(b) About 95% of the incomes lie between what two amounts?
(c) Virtually all of the incomes lie between what two amounts?
(d) What are the median and the modal incomes?
(e) Is the distribution of incomes symmetrical?

S E L F - R E V I E W 7–3

7. Explain what is meant by this statement: “There is not just one normal probability
distribution but a ‘family’ of them.”

8. List the major characteristics of a normal probability distribution.
9. The mean of a normal probability distribution is 500; the standard deviation is 10.

a. About 68% of the observations lie between what two values?
b. About 95% of the observations lie between what two values?
c. Practically all of the observations lie between what two values?

10. The mean of a normal probability distribution is 60; the standard deviation is 5.
a. About what percent of the observations lie between 55 and 65?
b. About what percent of the observations lie between 50 and 70?
c. About what percent of the observations lie between 45 and 75?

11. The Kamp family has twins, Rob and Rachel. Both Rob and Rachel graduated from
college 2 years ago, and each is now earning $50,000 per year. Rachel works in
the retail industry, where the mean salary for executives with less than 5 years’ ex-
perience is $35,000 with a standard deviation of $8,000. Rob is an engineer. The
mean salary for engineers with less than 5 years’ experience is $60,000 with a
standard deviation of $5,000. Compute the z values for both Rob and Rachel and
comment on your findings.

12. A recent article in the Cincinnati Enquirer reported that the mean labor cost to re-
pair a heat pump is $90 with a standard deviation of $22. Monte’s Plumbing and
Heating Service completed repairs on two heat pumps this morning. The labor cost
for the first was $75 and it was $100 for the second. Assume the distribution of la-
bor costs follows the normal probability distribution. Compute z values for each and
comment on your findings.

E X E R C I S E S

This information is summarized on the following chart.

m m – 3s m – 2s m – 1s m + 3s m + 2sm + 1s
15.4 16.6 17.8 20.2 21.4 22.619.0

68%
95%

Practically all

Scale of hours

CONTINUOUS PROBABILITY DISTRIBUTIONS 221

Finding Areas under the Normal Curve
The next application of the standard normal distribution involves finding the area in a
normal distribution between the mean and a selected value, which we identify as x. The
following example/solution will illustrate the details.

E X A M P L E

In the first example/solution described on page 218 in this section, we reported that the
weekly income of Uber drivers followed the normal distribution with a mean of $1,000
and a standard deviation of $100. That is, μ = $1,000 and σ = $100. What is the likeli-
hood of selecting a driver whose weekly income is between $1,000 and $1,100?

S O L U T I O N

We have already converted $1,100 to a z value of 1.00 using formula (7–5). To repeat:

z =
x − μ

σ
=

$1,100 − $1,000
$100

= 1.00

The probability associated with a z of 1.00 is available in Appendix B.3. A portion
of Appendix B.3 follows. To locate the probability, go down the left column to 1.0,
and then move horizontally to the column headed .00. The value is .3413.

z 0.00 0.01 0.02
. . . .
. . . .
. . . .
0.7 .2580 .2611 .2642
0.8 .2881 .2910 .2939
0.9 .3159 .3186 .3212
1.0 .3413 .3438 .3461
1.1 .3643 .3665 .3686

. . . .
. . . .
. . . .

The area under the normal curve between $1,000 and $1,100 is .3413. We could
also say 34.13% of Uber drivers earn between $1,000 and $1,100 weekly, or the
likelihood of selecting a driver and finding his or her income is between $1,000
and $1,100 is .3413.

This information is summarized in the following diagram.

Scale of z1.0

$1,000 Scale of dollars

0

$1,100

.3413

222 CHAPTER 7

In the example/solution just completed, we are interested in the probability be-
tween the mean and a given value. Let’s change the question. Instead of wanting to
know the probability of selecting a random driver who earned between $1,000 and
$1,100, suppose we wanted the probability of selecting a driver who earned less than
$1,100. In probability notation, we write this statement as P(weekly income < $1,100).
The method of solution is the same. We find the probability of selecting a driver who
earns between $1,000, the mean, and $1,100. This probability is .3413. Next, recall
that half the area, or probability, is above the mean and half is below. So the probability
of selecting a driver earning less than $1,000 is .5000. Finally, we add the two probabil-
ities, so .3413 + .5000 = .8413. About 84% of Uber drivers earn less than $1,100 per
week. See the following diagram.

Scale of z1.0

$1,000 Scale of dollars

0

$1,100

.3413.5000

Excel will calculate this probability. The necessary commands are in the Software
Commands in Appendix C. The answer is .8413, the same as we calculated.

STATISTICS IN ACTION

Many processes, such as
filling soda bottles and can
ning fruit, are normally dis
tributed. Manufacturers
must guard against both
over and underfilling. If
they put too much in the
can or bottle, they are giv
ing away their product. If
they put too little in, the
customer may feel cheated
and the government may
question the label descrip
tion. “Control charts,” with
limits drawn three standard
deviations above and be
low the mean, are routinely
used to monitor this type of
production process.

E X A M P L E

Refer to the first example/solution discussed on page 218 in this section regarding
the weekly income of Uber drivers. The distribution of weekly incomes follows the
normal probability distribution, with a mean of $1,000 and a standard deviation of
$100. What is the probability of selecting a driver whose income is:

1. Between $790 and $1,000?
2. Less than $790?

CONTINUOUS PROBABILITY DISTRIBUTIONS 223

The temperature of coffee sold at the Coffee Bean Cafe follows the normal probability distri-
bution, with a mean of 150 degrees. The standard deviation of this distribution is 5 degrees.
(a) What is the probability that the coffee temperature is between 150 degrees and 154

degrees?
(b) What is the probability that the coffee temperature is more than 164 degrees?

S E L F - R E V I E W 7–4

S O L U T I O N

We begin by finding the z value corresponding to a weekly income of $790. From
formula (7–5):

z =
x − μ

s
=

$790 − $1,000
$100

= −2.10

See Appendix B.3. Move down the left margin to the row 2.1 and across that row
to the column headed 0.00. The value is .4821. So the area under the standard
normal curve corresponding to a z value of 2.10 is .4821. However, because the
normal distribution is symmetric, the area between 0 and a negative z value is the
same as that between 0 and the corresponding positive z value. The likelihood of
finding a driver earning between $790 and $1,000 is .4821. In probability nota-
tion, we write P($790 < weekly income < $1,000) = .4821.

z 0.00 0.01 0.02
. . . .
. . . .
. . . .
2.0 .4772 .4778 .4783
2.1 .4821 .4826 .4830
2.2 .4861 .4864 .4868
2.3 .4893 .4896 .4898
. . . .
. . . .
. . . .

The mean divides the normal curve into two identical halves. The area under the
half to the left of the mean is .5000, and the area to the right is also .5000. Be-
cause the area under the curve between $790 and $1,000 is .4821, the area
below $790 is .0179, found by .5000 − .4821. In probability notation, we write
P(weekly income < $790) = .0179.

So we conclude that 48.21% of the Uber drivers have weekly incomes be-
tween $790 and $1,000. Further, we can anticipate that 1.79% earn less than
$790 per week. This information is summarized in the following diagram.

.5000

Scale of z0
Scale of dollars$1,000

–2.10
$790

.0179

.4821

224 CHAPTER 7

Another application of the normal distribution involves combining two areas, or
probabilities. One of the areas is to the right of the mean and the other to the left.

13. A normal population has a mean of 20.0 and a standard deviation of 4.0.
a. Compute the z value associated with 25.0.
b. What proportion of the population is between 20.0 and 25.0?
c. What proportion of the population is less than 18.0?

14. A normal population has a mean of 12.2 and a standard deviation of 2.5.
a. Compute the z value associated with 14.3.
b. What proportion of the population is between 12.2 and 14.3?
c. What proportion of the population is less than 10.0?

15. A recent study of the hourly wages of maintenance crew members for major airlines
showed that the mean hourly salary was $20.50, with a standard deviation of
$3.50. Assume the distribution of hourly wages follows the normal probability dis-
tribution. If we select a crew member at random, what is the probability the crew
member earns:

a. Between $20.50 and $24.00 per hour?
b. More than $24.00 per hour?
c. Less than $19.00 per hour?

16. The mean of a normal probability distribution is 400 pounds. The standard devia-
tion is 10 pounds.

a. What is the area between 415 pounds and the mean of 400 pounds?
b. What is the area between the mean and 395 pounds?
c. What is the probability of selecting a value at random and discovering that it has

a value of less than 395 pounds?

E X E R C I S E S

E X A M P L E

Continuing the example/solution first discussed on page 218 using the weekly in-
come of Uber drivers, weekly income follows the normal probability distribution,
with a mean of $1,000 and a standard deviation of $100. What is the area under
this normal curve between $840 and $1,200?

S O L U T I O N

The problem can be divided into two parts. For the area between $840 and the
mean of $1,000:

z =
$840 − $1,000

$100
=

−$160
$100

= −1.60

For the area between the mean of $1,000 and $1,200:

z =
$1,200 − $1,000

$100
=

$200
$100

= 2.00

The area under the curve for a z of −1.60 is .4452 (from Appendix B.3). The area
under the curve for a z of 2.00 is .4772. Adding the two areas: .4452 + .4772 = .9224.
Thus, the probability of selecting an income between $840 and $1,200 is .9224.
In probability notation, we write P($840 < weekly income < $1,200) = .4452 +
.4772 = .9224. To summarize, 92.24% of the drivers have weekly incomes be-
tween $840 and $1,200. This is shown in a diagram:

CONTINUOUS PROBABILITY DISTRIBUTIONS 225

Another application of the normal distribution involves determining the area be-
tween values on the same side of the mean.

Scale of z2.00

.4772.4452

21.6
Scale of dollars$1,200$1,000$840

What is this
probability?

E X A M P L E

Returning to the weekly income distribution of Uber drivers (μ = $1,000, σ = $100),
what is the area under the normal curve between $1,150 and $1,250?

S O L U T I O N

The situation is again separated into two parts, and formula (7–5) is used. First,
we find the z value associated with a weekly income of $1,250:

z =
$1,250 − $1,000

$100
= 2.50

Next we find the z value for a weekly income of $1,150:

z =
$1,150 − $1,000

$100
= 1.50

From Appendix B.3, the area associated with a z value of 2.50 is .4938. So the
probability of a weekly income between $1,000 and $1,250 is .4938. Similarly, the
area associated with a z value of 1.50 is .4332, so the probability of a weekly in-
come between $1,000 and $1,150 is .4332. The probability of a weekly income
between $1,150 and $1,250 is found by subtracting the area associated with a
z value of 1.50 (.4332) from that associated with a z of 2.50 (.4938). Thus, the prob-
ability of a weekly income between $1,150 and $1,250 is .0606. In probability no-
tation, we write P($1,150 < weekly income < $1,250) = .4938 − .4332 = .0606.

Scale of incomes
Scale of z

$1,250
2.50

$1,000
0

.0606

$1,150
1.50

.4332

226 CHAPTER 7

To summarize, there are four situations for finding the area under the standard nor-
mal probability distribution.

1. To find the area between 0 and z or (−z), look up the probability directly in the
table.

2. To find the area beyond z or (−z), locate the probability of z in the table and subtract
that probability from .5000.

3. To find the area between two points on different sides of the mean, determine the
z values and add the corresponding probabilities.

4. To find the area between two points on the same side of the mean, determine the
z values and subtract the smaller probability from the larger.

Refer to Self-Review 7–4. The temperature of coffee sold at the Coffee Bean Cafe follows
the normal probability distribution with a mean of 150 degrees. The standard deviation of
this distribution is 5 degrees.
(a) What is the probability the coffee temperature is between 146 degrees and 156 degrees?
(b) What is the probability the coffee temperature is more than 156 but less than 162 degrees?

S E L F - R E V I E W 7–5

17. A normal distribution has a mean of 50 and a standard deviation of 4.
a. Compute the probability of a value between 44.0 and 55.0.
b. Compute the probability of a value greater than 55.0.
c. Compute the probability of a value between 52.0 and 55.0.

18. A normal population has a mean of 80.0 and a standard deviation of 14.0.
a. Compute the probability of a value between 75.0 and 90.0.
b. Compute the probability of a value of 75.0 or less.
c. Compute the probability of a value between 55.0 and 70.0.

19. Suppose the Internal Revenue Service reported that the mean tax refund for the
year 2016 was $2,800. Assume the standard deviation is $450 and that the
amounts refunded follow a normal probability distribution.

a. What percent of the refunds are more than $3,100?
b. What percent of the refunds are more than $3,100 but less than $3,500?
c. What percent of the refunds are more than $2,250 but less than $3,500?

20. The distribution of the number of viewers for the American Idol television show
follows a normal distribution with a mean of 29 million and a standard deviation of
5 million. What is the probability next week’s show will:

a. Have between 30 and 34 million viewers?
b. Have at least 23 million viewers?
c. Exceed 40 million viewers?

21. WNAE, an all-news AM station, finds that the distribution of the lengths of time lis-
teners are tuned to the station follows the normal distribution. The mean of the
distribution is 15.0 minutes and the standard deviation is 3.5 minutes. What is the
probability that a particular listener will tune in for:

a. More than 20 minutes?
b. 20 minutes or less?
c. Between 10 and 12 minutes?

22. Among the thirty largest U.S. cities, the mean one-way commute time to work is
25.8 minutes. https://deepblue.lib.umich.edu/bitstream/handle/2027.42/112057/103196.
pdf?sequence=1&isAllowed=y. The longest one-way travel time is in New York City,
where the mean time is 39.7 minutes. Assume the distribution of travel times in
New York City follows the normal probability distribution and the standard deviation
is 7.5 minutes.

a. What percent of the New York City commutes are for less than 30 minutes?
b. What percent are between 30 and 35 minutes?
c. What percent are between 30 and 50 minutes?

E X E R C I S E S

CONTINUOUS PROBABILITY DISTRIBUTIONS 227

The previous example/solutions require finding the percent of the observations
located between two observations or the percent of the observations above, or be-
low, a particular observation x. A further application of the normal distribution involves
finding the value of the observation x when the percent above or below the observa-
tion is given.

E X A M P L E

Layton Tire and Rubber Company wishes to
set a minimum mileage guarantee on its
new MX100 tire. Tests reveal the mean
mileage is 67,900 with a standard deviation
of 2,050 miles and that the distribution of
miles follows the normal probability distribu-
tion. Layton wants to set the minimum guar-
anteed mileage so that no more than 4% of
the tires will have to be replaced. What min-
imum guaranteed mileage should Layton
announce?

S O L U T I O N

The facets of this case are shown in the fol-
lowing diagram, where x represents the
minimum guaranteed mileage.

Scale of milesm
67,900

.5000

4% or .0400

x
?

Tire replaced if the
mileage is less than
this value

.4600

Inserting these values in formula (7–5) for z gives:

z =
x − μ

σ
=

x − 67,900
2,050

There are two unknowns in this equation, z and x. To find x, we first find z, and
then solve for x. Recall from the characteristics of a normal curve that the area to
the left of μ is .5000. The area between μ and x is .4600, found by .5000 − .0400.
Now refer to Appendix B.3. Search the body of the table for the area closest to
.4600. The closest area is .4599. Move to the margins from this value and read

© JupiterImages/Getty Images

228 CHAPTER 7

Knowing that the distance between μ and x is −1.75σ or z = −1.75, we can
now solve for x (the minimum guaranteed mileage):

z =
x − 67,900

2,050

−1.75 =
x − 67,900

2,050
−1.75(2,050) = x − 67,900
x = 67,900 − 1.75(2,050) = 64,312

So Layton can advertise that it will replace for free any tire that wears out before it
reaches 64,312 miles, and the company will know that only 4% of the tires will be
replaced under this plan.

z … .03 .04 .05 .06
. . . . .
. . . . .
. . . . .
1.5 .4370 .4382 .4394 .4406
1.6 .4484 .4495 .4505 .4515
1.7 .4582 .4591 .4599 .4608
1.8 .4664 .4671 .4678 .4686

TABLE 7–2 Selected Areas under the Normal Curve

Excel will also find the mileage value. See the following output. The necessary com-
mands are given in the Software Commands in Appendix C.

An analysis of the final test scores for Introduction to Business reveals the scores fol-
low the normal probability distribution. The mean of the distribution is 75 and the stan-
dard deviation is 8. The professor wants to award an A to students whose score is in
the highest 10%. What is the dividing point for those students who earn an A and those
earning a B?

S E L F - R E V I E W 7–6

the z value of 1.75. Because the value is to the left of the mean, it is actually
−1.75. These steps are illustrated in Table 7–2.

CONTINUOUS PROBABILITY DISTRIBUTIONS 229

THE NORMAL APPROXIMATION
TO THE BINOMIAL
Chapter 6 describes the binomial probability distribution, which is a discrete distribu-
tion. The table of binomial probabilities in Appendix B.1 goes successively from an n of
1 to an n of 15. If a problem involved taking a sample of 60, generating a binomial dis-
tribution for that large a number would be very time-consuming. A more efficient ap-
proach is to apply the normal approximation to the binomial.

We can use the normal distribution (a continuous distribution) as a substitute for
a binomial distribution (a discrete distribution) for large values of n because, as n
increases, a binomial distribution gets closer and closer to a normal distribution.
Chart 7–7 depicts the change in the shape of a binomial distribution with π = .50 from

LO7-4
Approximate the binomial
probability distribution
using the standard
normal probability
distribution to calculate
probabilities.

23. A normal distribution has a mean of 50 and a standard deviation of 4. Determine
the value below which 95% of the observations will occur.

24. A normal distribution has a mean of 80 and a standard deviation of 14. Determine
the value above which 80% of the values will occur.

25. Assume that the hourly cost to operate a commercial airplane follows the normal
distribution with a mean of $2,100 per hour and a standard deviation of $250.
What is the operating cost for the lowest 3% of the airplanes?

26. The SAT Reasoning Test is perhaps the most widely used standardized test for col-
lege admissions in the United States. Scores are based on a normal distribution
with a mean of 1500 and a standard deviation of 300. Clinton College would like to
offer an honors scholarship to students who score in the top 10% of this test. What
is the minimum score that qualifies for the scholarship?

27. According to media research, the typical American listened to 195 hours of
music in the last year. This is down from 290 hours 4 years earlier. Dick Trythall
is a big country and western music fan. He listens to music while working around
the house, reading, and riding in his truck. Assume the number of hours spent
listening to music follows a normal probability distribution with a standard devia-
tion of 8.5 hours.

a. If Dick is in the top 1% in terms of listening time, how many hours did he listen
last year?

b. Assume that the distribution of times 4 years earlier also follows the normal
probability distribution with a standard deviation of 8.5 hours. How many hours
did the 1% who listen to the least music actually listen?

28. For the most recent year available, the mean annual cost to attend a private univer-
sity in the United States was $42,224. Assume the distribution of annual costs fol-
lows the normal probability distribution and the standard deviation is $4,500.
Ninety-five percent of all students at private universities pay less than what amount?

29. In economic theory, a “hurdle rate” is the minimum return that a person requires
before he or she will make an investment. A research report says that annual re-
turns from a specific class of common equities are distributed according to a normal
distribution with a mean of 12% and a standard deviation of 18%. A stock screener
would like to identify a hurdle rate such that only 1 in 20 equities is above that
value. Where should the hurdle rate be set?

30. The manufacturer of a laser printer reports the mean number of pages a cartridge
will print before it needs replacing is 12,200. The distribution of pages printed
per cartridge closely follows the normal probability distribution and the standard
deviation is 820 pages. The manufacturer wants to provide guidelines to poten-
tial customers as to how long they can expect a cartridge to last. How many pages
should the manufacturer advertise for each cartridge if it wants to be correct 99%
of the time?

E X E R C I S E S

230 CHAPTER 7

an n of 1, to an n of 3, to an n of 20. Notice how the case where n = 20 approximates
the shape of the normal distribution.

When can we use the normal approximation to the binomial? The normal probability
distribution is a good approximation to the binomial probability distribution when nπ and
n(1 − π) are both at least 5. However, before we apply the normal approximation, we
must make sure that our distribution of interest is in fact a binomial distribution. Recall
from Chapter 6 that four criteria must be met:

1. There are only two mutually exclusive outcomes to an experiment: a “success” and
a “failure.”

2. The distribution results from counting the number of successes in a fixed number of
trials.

3. The probability of a success, π, remains the same from trial to trial.
4. Each trial is independent.

Continuity Correction Factor
To show the application of the normal approximation to the binomial and the need for a
correction factor, suppose the management of the Santoni Pizza Restaurant found that
70% of its new customers return for another meal. For a week in which 80 new (first-
time) customers dined at Santoni’s, what is the probability that 60 or more will return for
another meal?

Notice the binomial conditions are met: (1) There are only two possible outcomes—a
customer either returns for another meal or does not return. (2) We can count the number
of successes, meaning, for example, that 57 of the 80 customers return. (3) The trials are
independent, meaning that if the 34th person returns for a second meal, that does not
affect whether the 58th person returns. (4) The probability of a customer returning re-
mains at .70 for all 80 customers.

Therefore, we could use the binomial formula (6–3) described on page 185.

P(x) = nCx (π)x (1 − π)n−x

To find the probability 60 or more customers return for another pizza, we need to
first find the probability exactly 60 customers return. That is:

P(x = 60) = 80C60 (.70)60 (1 − .70)20 = .063

Next we find the probability that exactly 61 customers return. It is:

P(x = 61) = 80C61 (.70)61 (1 − .70)19 = .048

n = 1

P
(x

)

.40

x

.30

.20

0 1

Number of
occurrences

.10

.50
n = 3

.40

.30

.20

0 1

Number of
occurrences

.10

2 x3

n = 20

.20

.15

.10

0

Number of occurrences

.05

2 x4 6 8 10 12 14 16 18 20

CHART 7–7 Binomial Distributions for an n of 1, 3, and 20, Where π = .50

CONTINUOUS PROBABILITY DISTRIBUTIONS 231

We continue this process until we have the probability that all 80 customers return.
Finally, we add the probabilities from 60 to 80. Solving the preceding problem in this
manner is tedious. We can also use statistical software packages to find the various
probabilities. Listed below are the binomial probabilities for n = 80, π = .70, and x, the
number of customers returning, ranging from 43 to 68. The probability of any number of
customers less than 43 or more than 68 returning is less than .001. We can assume
these probabilities are 0.000.

Number Number
Returning Probability Returning Probability

43 .001 56 .097
44 .002 57 .095
45 .003 58 .088
46 .006 59 .077
47 .009 60 .063
48 .015 61 .048
49 .023 62 .034
50 .033 63 .023
51 .045 64 .014
52 .059 65 .008
53 .072 66 .004
54 .084 67 .002
55 .093 68 .001

We can find the probability of 60 or more returning by summing .063 + .048 + . . .
+ .001, which is .197. However, a look at the plot below shows the similarity of this dis-
tribution to a normal distribution. All we need do is “smooth out” the discrete probabili-
ties into a continuous distribution. Furthermore, working with a normal distribution will
involve far fewer calculations than working with the binomial.

The trick is to let the discrete probability for 56 customers be represented by an
area under the continuous curve between 55.5 and 56.5. Then let the probability for 57
customers be represented by an area between 56.5 and 57.5, and so on. This is just the
opposite of rounding off the numbers to a whole number.

.10

.09

.08

.07

.06

.05

.04

.03

.02

.01

43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68

Customers

Pr
ob

ab
ili

ty

Because we use the normal distribution to determine the binomial probability of 60 or
more successes, we must subtract, in this case, .5 from 60. The value .5 is called the
continuity correction factor. This small adjustment is made because a continuous dis-
tribution (the normal distribution) is being used to approximate a discrete distribution
(the binomial distribution).

232 CHAPTER 7

How to Apply the Correction Factor
Only four cases may arise. These cases are:

1. For the probability at least x occur, use the area above (x − .5).
2. For the probability that more than x occur, use the area above (x + .5).
3. For the probability that x or fewer occur, use the area below (x + .5).
4. For the probability that fewer than x occur, use the area below (x − .5).

To use the normal distribution to approximate the probability that 60 or more first-time
Santoni customers out of 80 will return, follow the procedure shown below.

Step 1: Find the z value corresponding to an x of 59.5 using formula (7–5), and
formulas (6–4) and (6–5) for the mean and the variance of a binomial
distribution:

μ = nπ = 80(.70) = 56
σ2 = nπ(1 − π) = 80(.70) (1 − .70) = 16.8
σ = √16.8 = 4.10

z =
x − μ

σ
=

59.5 − 56
4.10

= 0.85

Step 2: Determine the area under the normal curve between a μ of 56 and an x of
59.5. From step 1, we know that the z value corresponding to 59.5 is
0.85. So we go to Appendix B.3 and read down the left margin to 0.8, and
then we go horizontally to the area under the column headed by .05. That
area is .3023.

Step 3: Calculate the area beyond 59.5 by subtracting .3023 from .5000 (.5000 −
.3023 = .1977). Thus, .1977 is the probability that 60 or more first-time
Santoni customers out of 80 will return for another meal. In probability
notation, P(customers > 59.5) = .5000 − .3023 = .1977. The facets of this
problem are shown graphically:

Scale of x
Scale of z

.5000

59.5
.85

Probability is .1977
that 60 or more out of

80 will return to
Santoni’s

.3023

.1977

56
0

CONTINUITY CORRECTION FACTOR The value .5 subtracted or added, depending
on the question, to a selected value when a discrete probability distribution is
approximated by a continuous probability distribution.

No doubt you will agree that using the normal approximation to the binomial is a
more efficient method of estimating the probability of 60 or more first-time customers
returning. The result compares favorably with that computed on page 230 using the
binomial distribution. The probability using the binomial distribution is .197, whereas the
probability using the normal approximation is .1977.

CONTINUOUS PROBABILITY DISTRIBUTIONS 233

A study by Great Southern Home Insurance revealed that none of the stolen goods were
recovered by the homeowners in 80% of reported thefts.
(a) During a period in which 200 thefts occurred, what is the probability that no stolen

goods were recovered in 170 or more of the robberies?
(b) During a period in which 200 thefts occurred, what is the probability that no stolen

goods were recovered in 150 or more robberies?

S E L F - R E V I E W 7–7

31. Assume a binomial probability distribution with n = 50 and π = .25. Compute the
following:

a. The mean and standard deviation of the random variable.
b. The probability that x is 15 or more.
c. The probability that x is 10 or less.

32. Assume a binomial probability distribution with n = 40 and π = .55. Compute the
following:

a. The mean and standard deviation of the random variable.
b. The probability that x is 25 or greater.
c. The probability that x is 15 or less.
d. The probability that x is between 15 and 25, inclusive.

33. Dottie’s Tax Service specializes in federal tax returns for professional clients, such
as physicians, dentists, accountants, and lawyers. A recent audit by the IRS of the
returns she prepared indicated that an error was made on 7% of the returns she
prepared last year. Assuming this rate continues into this year and she prepares 80
returns, what is the probability that she makes errors on:

a. More than six returns?
b. At least six returns?
c. Exactly six returns?

34. Shorty’s Muffler advertises it can install a new muffler in 30 minutes or less. However,
the work standards department at corporate headquarters recently conducted a
study and found that 20% of the mufflers were not installed in 30 minutes or less. The
Maumee branch installed 50 mufflers last month. If the corporate report is correct:

a. How many of the installations at the Maumee branch would you expect to take
more than 30 minutes?

b. What is the likelihood that fewer than eight installations took more than 30 minutes?
c. What is the likelihood that eight or fewer installations took more than 30 minutes?
d. What is the likelihood that exactly 8 of the 50 installations took more than

30 minutes?
35. A study conducted by the nationally known Taurus Health Club revealed that 30%

of its new members are 15 pounds overweight. A membership drive in a metropol-
itan area resulted in 500 new members.

a. It has been suggested that the normal approximation to the binomial be used to
determine the probability that 175 or more of the new members are 15 pounds
overweight. Does this problem qualify as a binomial problem? Explain.

b. What is the probability that 175 or more of the new members are 15 pounds
overweight?

c. What is the probability that 140 or more new members are 15 pounds overweight?
36. The website, herecomestheguide.com, suggested that couples planning their wed-

ding should expect eighty percent of those who are sent an invitation to respond
that they will attend. Rich and Stacy are planning to be married later this year. They
plan to send 200 invitations.

a. How many guests would you expect to accept the invitation?
b. What is the standard deviation?
c. What is the probability 150 or more will accept the invitation?
d. What is the probability exactly 150 will accept the invitation?

E X E R C I S E S

234 CHAPTER 7

THE FAMILY OF EXPONENTIAL DISTRIBUTIONS
So far in this chapter, we have considered two continuous probability distributions, the
uniform and the normal. The next continuous distribution we consider is the exponential
distribution. This continuous probability distribution usually describes times between
events in a sequence. The actions occur independently at a constant rate per unit of
time or length. Because time is never negative, an exponential random variable is al-
ways positive. The exponential distribution usually describes situations such as:

• The service time for customers at the information desk of the Dallas Public Library.
• The time between “hits” on a website.
• The lifetime of a kitchen appliance.
• The time until the next phone call arrives in a customer service center.

The exponential probability distribution is positively skewed. That differs from the
uniform and normal distributions, which were both symmetric. Moreover, the distribution
is described by only one parameter, which we will identify as λ (pronounced “lambda”).
λ is often referred to as the “rate” parameter. The following chart shows the change in
the shape of the exponential distribution as we vary the value of λ from 1/3 to 1 to 2.
Observe that as we decrease λ, the shape of the distribution is “less skewed.”

2.5

2

1.5

x

1

0.5

0
0 1 2 3 4

Three Exponential Distributions

l 5 0.33
l 5 1.0
l 5 2.0

Another feature of the exponential distribution is its close relationship to
the Poisson distribution. The Poisson is a discrete probability distribution and
also has a single parameter, μ. We described the Poisson distribution starting
in Chapter 6 on page 197. It too is a positively skewed distribution. To ex-
plain the relationship between the Poisson and the exponential distributions,
suppose customers arrive at a family restaurant during the dinner hour at a
rate of six per hour. The Poisson distribution would have a mean of six. For a
time interval of one hour, we can use the Poisson distribution to find the
probability that one, or two, or ten customers arrive. But suppose instead of
studying the number of customers arriving in an hour, we wish to study the
time between their arrivals. The time between arrivals is a continuous distri-
bution because time is measured as a continuous random variable. If cus-

tomers arrive at a rate of six per hour, then logically the typical or mean time between
arrivals is 1/6 of an hour, or 10 minutes. We need to be careful here to be consistent
with our units, so let’s stay with 1/6 of an hour. So in general, if we know customers ar-
rive at a certain rate per hour, which we call μ, then we can expect the mean time be-
tween arrivals to be 1/μ. The rate parameter λ is equal to 1/μ. So in our restaurant arrival
example, the mean time between customer arrivals is λ = 1/6 of an hour.

The graph of the exponential distribution starts at the value of λ when the random
variable’s (x) value is 0. The distribution declines steadily as we move to the right with
increasing values of x. Formula (7–6) describes the exponential probability distribution
with λ as rate parameter. As we described with the Poisson distribution on page 197,

LO7-5
Describe the exponential
probability distribution
and use it to calculate
probabilities.

© Robert Cicchetti/Shutterstock.com

CONTINUOUS PROBABILITY DISTRIBUTIONS 235

e is a mathematical constant equal to 2.71828. It is the base for the natural logarithm
system. It is a pleasant surprise that both the mean and the standard deviation of the
exponential probability distribution are equal to 1/λ.

EXPONENTIAL DISTRIBUTION P(x) = λe−λx (7–6)

FINDING A PROBABILITY USING
THE EXPONENTIAL DISTRIBUTION

P(Arrival time < x) = 1 − e−λx (7–7)

With continuous distributions, we do not address the probability that a distinct value
will occur. Instead, areas or regions below the graph of the probability distribution be-
tween two specified values give the probability the random variable is in that interval. A
table, such as Appendix B.3 for the normal distribution, is not necessary for the expo-
nential distribution. The area under the exponential density function is found by a for-
mula and the necessary calculations can be accomplished with a handheld calculator
with an ex key. Most statistical software packages will also calculate exponential proba-
bilities by inputting the rate parameter, λ, only. The probability of obtaining an arrival
value less than a particular value of x is:

E X A M P L E

Orders for prescriptions arrive at a pharmacy website according to an exponential
probability distribution at a mean of one every 20 seconds. Find the probability the
next order arrives in less than 5 seconds. Find the probability the next order arrives
in more than 40 seconds.

S O L U T I O N

To begin, we determine the rate parameter λ, which in this case is 1/20. To find
the probability, we insert 1/20 for λ and 5 for x in formula (7–7).

P( Arrival time < 5) = 1 − e
− 1

20
(5)

= 1 − e−0.25 = 1 − .7788 = .2212

So we conclude there is a 22% chance the next order will arrive in less than
5 seconds. The region is identified as the colored area under the curve.

0 10 20 30 40 50 60 70 80 90 100
0

0.01

0.02

0.03

0.04

0.05

0.06
Exponential, l 5 1/20

The preceding computations addressed the area in the left-tail area of the expo-
nential distribution with λ = 1/20 and the area between 0 and 5—that is, the area
that is below 5 seconds. What if you are interested in the right-tail area? It is found

236 CHAPTER 7

using the complement rule. See formula (5–3) in Chapter 5. To put it another way,
to find the probability the next order will arrive in more than 40 seconds, we find
the probability the order arrives in less than 40 seconds and subtract the result
from 1.00. We show this in two steps.

1. Find the probability an order is received in less than 40 seconds.

P(Arrival < 40) = 1 − e
− 1

20
(40)

= 1 − .1353 = .8647

2. Find the probability an order is received in more than 40 seconds.

P(Arrival > 40) = 1 − P(Arrival < 40) = 1 − .8647 = .1353

We conclude that the likelihood that it will be 40 seconds or more before the next
order is received at the pharmacy is 13.5%.

In the preceding example/solution, when we apply the exponential probability dis-
tribution to compute the probability that the arrival time is greater than 40 seconds, you
probably observed that there is some redundancy. In general, if we wish to find the
likelihood of a time greater than some value x, such as 40, the complement rule is ap-
plied as follows:

P(Arrival > x) = 1 − P(Arrival < x) = 1 − (1 − e−λx) = e−λx

In other words, when we subtract formula (7–7) from 1 to find the area in the right
tail, the result is e−λx. Thus, the probability that more than 40 seconds go by before the
next order arrives is computed without the aid of the complement rule as follows:

P(Arrival > 40) = e
− 1

20
(40)

= .1353

The result is shown in the following graph.

0 10 20 30 40 50 60 70 80 90 100
0

0.01

0.02

0.03

0.04

0.05

0.06
Exponential, l 5 1/20

What if you wish to determine the probability that it will take more than 5 seconds
but less than 40 seconds for the next order to arrive? Use formula (7–7) with an x value
of 40 and then subtract the value of formula (7–7) when x is 5.

In symbols, you can write this as:

P( 5 ≤ x ≤ 40) = P(Arrival ≤ 40) − P(Arrival ≤ 5)

= (1 − e
− 1

20
(40)

) − (1 − e
− 1

20
(5)

) = .8647 − .2212 = .6435

We conclude that about 64% of the time, the time between orders will be between
5 seconds and 40 seconds.

CONTINUOUS PROBABILITY DISTRIBUTIONS 237

0 10 20 30 40 50 60 70 80 90 100
0

0.01

0.02

0.03

0.04

0.05

0.06
Exponential, l 5 1/20

Previous examples require finding the percentage of the observations located be-
tween two values or the percentage of the observations above or below a particular
value, x. We can also use formula (7–7) in “reverse” to find the value of the observation
x when the percentage above or below the observation is given. The following exam-
ple/solution illustrates this situation.

E X A M P L E

Compton Computers wishes to set a minimum lifetime guarantee on its new power
supply unit. Quality testing shows the time to failure follows an exponential distribu-
tion with a mean of 4,000 hours. Compton wants a warranty period such that only
5% of the power supply units fail during that period. What value should they set for
the warranty period?

S O L U T I O N

Note that 4,000 hours is a mean and not a rate. Therefore, we must compute λ as
1/4,000, or 0.00025 failure per hour. A diagram of the situation is shown below,
where x represents the minimum guaranteed lifetime.

0 2000 4000 6000 8000 10000 12000
0

0.00005

0.0001

0.00015

0.0002

0.00025

0.0003
Exponential, l 5 0.00025

We use formula (7–7) and essentially work backward for the solution. In this case, the
rate parameter is 4,000 hours and we want the area, as shown in the diagram, to be .05.

P (Arrival time < x) = 1 − e(−λ x )

= 1 − e
− 1

4,000
(x)

= .05

238 CHAPTER 7

Next, we solve this equation for x. So, we subtract 1 from both sides of the equa-
tion and multiply by −1 to simplify the signs. The result is:

.95 = e
− 1

4,000
(x)

Next, we take the natural log of both sides and solve for x:

ln (.95 ) = −
1

4,000
x

−(.051293294) = −
1

4,000
x

x = 205.17
In this case, x = 205.17. Hence, Compton can set the warranty period at 205 hours
and expect about 5% of the power supply units to be returned.

The time between ambulance arrivals at the Methodist Hospital emergency room follows
an exponential distribution with a mean of 10 minutes.
(a) What is the likelihood the next ambulance will arrive in 15 minutes or less?
(b) What is the likelihood the next ambulance will arrive in more than 25 minutes?
(c) What is the likelihood the next ambulance will arrive in more than 15 minutes but less

than 25?
(d) Find the 80th percentile for the time between ambulance arrivals. (This means only

20% of the runs are longer than this time.)

S E L F - R E V I E W 7–8

37. Waiting times to receive food after placing an order at the local Subway sandwich
shop follow an exponential distribution with a mean of 60 seconds. Calculate the
probability a customer waits:

a. Less than 30 seconds.
b. More than 120 seconds.
c. Between 45 and 75 seconds.
d. Fifty percent of the patrons wait less than how many seconds? What is the median?

38. The lifetime of LCD TV sets follows an exponential distribution with a mean of
100,000 hours. Compute the probability a television set:

a. Fails in less than 10,000 hours.
b. Lasts more than 120,000 hours.
c. Fails between 60,000 and 100,000 hours of use.
d. Find the 90th percentile. So 10% of the TV sets last more than what length of time?

39. The Bureau of Labor Statistics’ American Time Use Survey, www.bls.gov/data,
showed that the amount of time spent using a computer for leisure varied greatly
by age. Individuals age 75 and over averaged 0.3 hour (18 minutes) per day using
a computer for leisure. Individuals ages 15 to 19 spend 1.0 hour per day using a
computer for leisure. If these times follow an exponential distribution, find the pro-
portion of each group that spends:

a. Less than 15 minutes per day using a computer for leisure.
b. More than 2 hours.
c. Between 30 minutes and 90 minutes using a computer for leisure.
d. Find the 20th percentile. Eighty percent spend more than what amount of time?

40. The cost per item at a supermarket follows an exponential distribution. There are
many inexpensive items and a few relatively expensive ones. The mean cost per
item is $3.50. What is the percentage of items that cost:

a. Less than $1?
b. More than $4?
c. Between $2 and $3?
d. Find the 40th percentile. Sixty percent of the supermarket items cost more than

what amount?

E X E R C I S E S

CONTINUOUS PROBABILITY DISTRIBUTIONS 239

C H A P T E R S U M M A R Y

I. The uniform distribution is a continuous probability distribution with the following
characteristics.
A. It is rectangular in shape.
B. The mean and the median are equal.
C. It is completely described by its minimum value a and its maximum value b.
D. It is described by the following equation for the region from a to b:

P(x) =
1

b − a
(7–3)

E. The mean and standard deviation of a uniform distribution are computed as follows:

μ =
(a + b)

2
(7–1)

σ = √
(b − a)2

12
(7–2)

II. The normal probability distribution is a continuous distribution with the following
characteristics.
A. It is bell-shaped and has a single peak at the center of the distribution.
B. The distribution is symmetric.
C. It is asymptotic, meaning the curve approaches but never touches the X-axis.
D. It is completely described by its mean and standard deviation.
E. There is a family of normal probability distributions.

1. Another normal probability distribution is created when either the mean or the
standard deviation changes.

2. The normal probability distribution is described by the following formula:

P(x) =
1

σ√2π
e−[

(x − μ)2

2σ2 ] (7–4)

III. The standard normal probability distribution is a particular normal distribution.
A. It has a mean of 0 and a standard deviation of 1.
B. Any normal probability distribution can be converted to the standard normal probabil-

ity distribution by the following formula.

z =
x − μ

σ
(7–5)

C. By standardizing a normal probability distribution, we can report the distance of a
value from the mean in units of the standard deviation.

IV. The normal probability distribution can approximate a binomial distribution under certain
conditions.
A. nπ and n(1 − π) must both be at least 5.

1. n is the number of observations.
2. π is the probability of a success.

B. The four conditions for a binomial probability distribution are:
1. There are only two possible outcomes.
2. π (pi) remains the same from trial to trial.
3. The trials are independent.
4. The distribution results from a count of the number of successes in a fixed num-

ber of trials.
C. The mean and variance of a binomial distribution are computed as follows:

μ = nπ
σ2 = nπ(1 − π)

D. The continuity correction factor of .5 is used to extend the continuous value of x one-
half unit in either direction. This correction compensates for approximating a discrete
distribution by a continuous distribution.

240 CHAPTER 7

V. The exponential probability distribution describes times between events in a sequence.
A. The actions occur independently at a constant rate per unit of time or length.
B. The probabilities are computed using the formula:

P( x ) = λe−λ x (7–6)

C. It is nonnegative, is positively skewed, declines steadily to the right, and is
asymptotic.

D. The area under the curve is given by the formula

P(Arrival time < x ) = 1 − e−λ x (7–7)

E. Both the mean and standard deviation are:

μ = 1/λ
σ2 = 1/λ

C H A P T E R E X E R C I S E S

41. The amount of cola in a 12-ounce can is uniformly distributed between 11.96 ounces
and 12.05 ounces.
a. What is the mean amount per can?
b. What is the standard deviation amount per can?
c. What is the probability of selecting a can of cola and finding it has less than 12 ounces?
d. What is the probability of selecting a can of cola and finding it has more than 11.98 ounces?
e. What is the probability of selecting a can of cola and finding it has more than

11.00 ounces?
42. A tube of Listerine Tartar Control toothpaste contains 4.2 ounces. As people use the

toothpaste, the amount remaining in any tube is random. Assume the amount of tooth-
paste remaining in the tube follows a uniform distribution. From this information, we can
determine the following information about the amount remaining in a toothpaste tube
without invading anyone’s privacy.
a. How much toothpaste would you expect to be remaining in the tube?
b. What is the standard deviation of the amount remaining in the tube?
c. What is the likelihood there is less than 3.0 ounces remaining in the tube?
d. What is the probability there is more than 1.5 ounces remaining in the tube?

43. Many retail stores offer their own credit cards. At the time of the credit application, the
customer is given a 10% discount on the purchase. The time required for the credit ap-
plication process follows a uniform distribution with the times ranging from 4 minutes to
10 minutes.
a. What is the mean time for the application process?
b. What is the standard deviation of the process time?
c. What is the likelihood a particular application will take less than 6 minutes?
d. What is the likelihood an application will take more than 5 minutes?

44. The time patrons at the Grande Dunes Hotel in the Bahamas spend waiting for an eleva-
tor follows a uniform distribution between 0 and 3.5 minutes.
a. Show that the area under the curve is 1.00.
b. How long does the typical patron wait for elevator service?
c. What is the standard deviation of the waiting time?
d. What percent of the patrons wait for less than a minute?
e. What percent of the patrons wait more than 2 minutes?

45. The net sales and the number of employees for aluminum fabricators with similar char-
acteristics are organized into frequency distributions. Both are normally distributed. For
the net sales, the mean is $180 million and the standard deviation is $25 million. For the
number of employees, the mean is 1,500 and the standard deviation is 120. Clarion
Fabricators had sales of $170 million and 1,850 employees.
a. Convert Clarion’s sales and number of employees to z values.
b. Locate the two z values.
c. Compare Clarion’s sales and number of employees with those of the other fabricators.

CONTINUOUS PROBABILITY DISTRIBUTIONS 241

46. The accounting department at Weston Materials Inc., a national manufacturer of unat-
tached garages, reports that it takes two construction workers a mean of 32 hours and
a standard deviation of 2 hours to erect the Red Barn model. Assume the assembly
times follow the normal distribution.
a. Determine the z values for 29 and 34 hours. What percent of the garages take be-

tween 32 hours and 34 hours to erect?
b. What percent of the garages take between 29 hours and 34 hours to erect?
c. What percent of the garages take 28.7 hours or less to erect?
d. Of the garages, 5% take how many hours or more to erect?

47. In 2015 The United States Department of Agriculture issued a report (http://www.cnpp.
usda.gov/sites/default/files/CostofFoodMar2015.pdf) indicating a family of four spent an av-
erage of about $890 per month on food. Assume the distribution of food expenditures
for a family of four follows the normal distribution, with a standard deviation of $90 per
month.
a. What percent of the families spend more than $430 but less than $890 per month on

food?
b. What percent of the families spend less than $830 per month on food?
c. What percent spend between $830 and $1,000 per month on food?
d. What percent spend between $900 and $1,000 per month on food?

48. A study of long-distance phone calls made from General Electric Corporate Headquar-
ters in Fairfield, Connecticut, revealed the length of the calls, in minutes, follows the
normal probability distribution. The mean length of time per call was 4.2 minutes and
the standard deviation was 0.60 minute.
a. What is the probability that calls last between 4.2 and 5 minutes?
b. What is the probability that calls last more than 5 minutes?
c. What is the probability that calls last between 5 and 6 minutes?
d. What is the probability that calls last between 4 and 6 minutes?
e. As part of her report to the president, the director of communications would like to

report the length of the longest (in duration) 4% of the calls. What is this time?
49. Shaver Manufacturing Inc. offers dental insurance to its employees. A recent study by

the human resource director shows the annual cost per employee per year followed the
normal probability distribution, with a mean of $1,280 and a standard deviation of
$420 per year.
a. What is the probability that annual dental expenses are more than $1,500?
b. What is the probability that annual dental expenses are between $1,500 and

$2,000?
c. Estimate the probability that an employee had no annual dental expenses.
d. What was the cost for the 10% of employees who incurred the highest dental

expense?
50. The annual commissions earned by sales representatives of Machine Products Inc., a

manufacturer of light machinery, follow the normal probability distribution. The mean
yearly amount earned is $40,000 and the standard deviation is $5,000.
a. What percent of the sales representatives earn more than $42,000 per year?
b. What percent of the sales representatives earn between $32,000 and $42,000?
c. What percent of the sales representatives earn between $32,000 and $35,000?
d. The sales manager wants to award the sales representatives who earn the largest

commissions a bonus of $1,000. He can award a bonus to 20% of the representa-
tives. What is the cutoff point between those who earn a bonus and those who
do not?

51. According to the South Dakota Department of Health, the number of hours of TV view-
ing per week is higher among adult women than adult men. A recent study showed
women spent an average of 34 hours per week watching TV, and men, 29 hours per
week. Assume that the distribution of hours watched follows the normal distribution for
both groups, and that the standard deviation among the women is 4.5 hours and is
5.1 hours for the men.
a. What percent of the women watch TV less than 40 hours per week?
b. What percent of the men watch TV more than 25 hours per week?
c. How many hours of TV do the 1% of women who watch the most TV per week watch?

Find the comparable value for the men.

242 CHAPTER 7

52. According to a government study among adults in the 25- to 34-year age group, the mean
amount spent per year on reading and entertainment is $1,994. Assume that the distribution
of the amounts spent follows the normal distribution with a standard deviation of $450.
a. What percent of the adults spend more than $2,500 per year on reading and

entertainment?
b. What percent spend between $2,500 and $3,000 per year on reading and

entertainment?
c. What percent spend less than $1,000 per year on reading and entertainment?

53. Management at Gordon Electronics is considering adopting a bonus system to increase
production. One suggestion is to pay a bonus on the highest 5% of production based on
past experience. Past records indicate weekly production follows the normal distribu-
tion. The mean of this distribution is 4,000 units per week and the standard deviation is
60 units per week. If the bonus is paid on the upper 5% of production, the bonus will be
paid on how many units or more?

54. Fast Service Truck Lines uses the Ford Super Duty F-750 exclusively. Management
made a study of the maintenance costs and determined the number of miles traveled
during the year followed the normal distribution. The mean of the distribution was
60,000 miles and the standard deviation 2,000 miles.
a. What percent of the Ford Super Duty F-750s logged 65,200 miles or more?
b. What percent of the trucks logged more than 57,060 but less than 58,280 miles?
c. What percent of the Fords traveled 62,000 miles or less during the year?
d. Is it reasonable to conclude that any of the trucks were driven more than 70,000

miles? Explain.
55. Best Electronics Inc. offers a “no hassle” returns policy. The daily number of customers

returning items follows the normal distribution. The mean number of customers return-
ing items is 10.3 per day and the standard deviation is 2.25 per day.
a. For any day, what is the probability that eight or fewer customers returned items?
b. For any day, what is the probability that the number of customers returning items is

between 12 and 14?
c. Is there any chance of a day with no customer returns?

56. A recent news report indicated that 20% of all employees steal from their company each
year. If a company employs 50 people, what is the probability that:
a. Fewer than five employees steal?
b. More than five employees steal?
c. Exactly five employees steal?
d. More than 5 but fewer than 15 employees steal?

57. The Orange County Register, as part of its Sunday health supplement, reported that
64% of American men over the age of 18 consider nutrition a top priority in their lives.
Suppose we select a sample of 60 men. What is the likelihood that:
a. 32 or more consider nutrition important?
b. 44 or more consider nutrition important?
c. More than 32 but fewer than 43 consider nutrition important?
d. Exactly 44 consider diet important?

58. It is estimated that 10% of those taking the quantitative methods portion of the CPA ex-
amination fail that section. Sixty students are taking the exam this Saturday.
a. How many would you expect to fail? What is the standard deviation?
b. What is the probability that exactly two students will fail?
c. What is the probability at least two students will fail?

59. The Georgetown, South Carolina, Traffic Division reported 40% of high-speed chases in-
volving automobiles result in a minor or major accident. If 50 high-speed chases occur in
a year, what is the probability that 25 or more will result in a minor or major accident?

60. Cruise ships of the Royal Viking line report that 80% of their rooms are occupied during
September. For a cruise ship having 800 rooms, what is the probability that 665 or more
are occupied in September?

61. The goal at U.S. airports handling international flights is to clear these flights within 45 min-
utes. Let’s interpret this to mean that 95% of the flights are cleared in 45 minutes, so 5% of the
flights take longer to clear. Let’s also assume that the distribution is approximately normal.
a. If the standard deviation of the time to clear an international flight is 5 minutes, what

is the mean time to clear a flight?

CONTINUOUS PROBABILITY DISTRIBUTIONS 243

b. Suppose the standard deviation is 10 minutes, not the 5 minutes suggested in part
(a). What is the new mean?

c. A customer has 30 minutes from the time her flight lands to catch her limousine. As-
suming a standard deviation of 10 minutes, what is the likelihood that she will be
cleared in time?

62. The funds dispensed at the ATM machine located near the checkout line at the Kroger’s in
Union, Kentucky, follows a normal probability distribution with a mean of $4,200 per day and
a standard deviation of $720 per day. The machine is programmed to notify the nearby
bank if the amount dispensed is very low (less than $2,500) or very high (more than $6,000).
a. What percent of the days will the bank be notified because the amount dispensed is

very low?
b. What percent of the time will the bank be notified because the amount dispensed is

high?
c. What percent of the time will the bank not be notified regarding the amount of funds

dispersed?
63. The weights of canned hams processed at Henline Ham Company follow the normal

distribution, with a mean of 9.20 pounds and a standard deviation of 0.25 pound. The
label weight is given as 9.00 pounds.
a. What proportion of the hams actually weigh less than the amount claimed on the

label?
b. The owner, Glen Henline, is considering two proposals to reduce the proportion of

hams below label weight. He can increase the mean weight to 9.25 and leave the
standard deviation the same, or he can leave the mean weight at 9.20 and reduce
the standard deviation from 0.25 pound to 0.15. Which change would you
recommend?

64. A recent Gallup study (http://www.gallup.com/poll/175286/hour-workweek-actually- longer-
seven-hours.aspx) found the typical American works an average of 46.7 hour per week.
The study did not report the shape of the distribution of hours worked or the standard
deviation. It did however indicate that 40% of the workers worked less than 40 hours a
week and that 18 percent worked more than 60 hours.
a. If we assume that the distribution of hours worked is normally distributed, and know-

ing 40% of the workers worked less than 40 hours, find the standard deviation of the
distribution.

b. If we assume that the distribution of hours worked is normally distributed and 18% of
the workers worked more than 60 hours, find the standard deviation of the
distribution.

c. Compare the standard deviations computed in parts a and b. Is the assumption that
the distribution of hours worked is approximately normal reasonable? Why?

65. Most four-year automobile leases allow up to 60,000 miles. If the lessee goes beyond
this amount, a penalty of 20 cents per mile is added to the lease cost. Suppose the dis-
tribution of miles driven on four-year leases follows the normal distribution. The mean is
52,000 miles and the standard deviation is 5,000 miles.
a. What percent of the leases will yield a penalty because of excess mileage?
b. If the automobile company wanted to change the terms of the lease so that 25% of

the leases went over the limit, where should the new upper limit be set?
c. One definition of a low-mileage car is one that is 4 years old and has been driven

less than 45,000 miles. What percent of the cars returned are considered
low-mileage?

66. The price of shares of Bank of Florida at the end of trading each day for the last year
followed the normal distribution. Assume there were 240 trading days in the year. The
mean price was $42.00 per share and the standard deviation was $2.25 per share.
a. What is the probability that the end-of-day trading price is over $45.00? Estimate the

number of days in a year when the trading price finished above $45.00.
b. What percent of the days was the price between $38.00 and $40.00?
c. What is the minimum share price for the top 15% of end-of-day trading prices?

67. The annual sales of romance novels follow the normal distribution. However, the mean
and the standard deviation are unknown. Forty percent of the time sales are more than
470,000, and 10% of the time sales are more than 500,000. What are the mean and the
standard deviation?

244 CHAPTER 7

68. In establishing warranties on HDTVs, the manufacturer wants to set the limits so that few
will need repair at the manufacturer’s expense. On the other hand, the warranty period
must be long enough to make the purchase attractive to the buyer. For a new HDTV, the
mean number of months until repairs are needed is 36.84 with a standard deviation of
3.34 months. Where should the warranty limits be set so that only 10% of the HDTVs
need repairs at the manufacturer’s expense?

69. DeKorte Tele-Marketing Inc. is considering purchasing a machine that randomly selects
and automatically dials telephone numbers. DeKorte Tele-Marketing makes most of its
calls during the evening, so calls to business phones are wasted. The manufacturer of
the machine claims that its programming reduces the calling to business phones to 15%
of all calls. To test this claim, the director of purchasing at DeKorte programmed the
machine to select a sample of 150 phone numbers. What is the likelihood that more
than 30 of the phone numbers selected are those of businesses, assuming the manu-
facturer’s claim is correct?

70. A carbon monoxide detector in the Wheelock household activates once every 200 days
on average. Assume this activation follows the exponential distribution. What is the
probability that:
a. There will be an alarm within the next 60 days?
b. At least 400 days will pass before the next alarm?
c. It will be between 150 and 250 days until the next warning?
d. Find the median time until the next activation.

71. “Boot time” (the time between the appearance of the Bios screen to the first file that is
loaded in Windows) on Eric Mouser’s personal computer follows an exponential distribu-
tion with a mean of 27 seconds. What is the probability his “boot” will require:
a. Less than 15 seconds?
b. More than 60 seconds?
c. Between 30 and 45 seconds?
d. What is the point below which only 10% of the boots occur?

72. The time between visits to a U.S. emergency room for a member of the general popula-
tion follows an exponential distribution with a mean of 2.5 years. What proportion of the
population:
a. Will visit an emergency room within the next 6 months?
b. Will not visit the ER over the next 6 years?
c. Will visit an ER next year, but not this year?
d. Find the first and third quartiles of this distribution.

73. The times between failures on a personal computer follow an exponential distribution
with a mean of 300,000 hours. What is the probability of:
a. A failure in less than 100,000 hours?
b. No failure in the next 500,000 hours?
c. The next failure occurring between 200,000 and 350,000 hours?
d. What are the mean and standard deviation of the time between failures?

D A T A A N A L Y T I C S

(The data for these exercises are available at the text website: www.mhhe.com/lind17e.)

74. Refer to the North Valley Real Estate data, which report information on homes sold
during the last year.
a. The mean selling price (in $ thousands) of the homes was computed earlier to be $357.0,

with a standard deviation of $160.7. Use the normal distribution to estimate the percent-
age of homes selling for more than $500.000. Compare this to the actual results. Is price
normally distributed? Try another test. If price is normally distributed, how many homes
should have a price greater than the mean? Compare this to the actual number of homes.
Construct a frequency distribution of price. What do you observe?

b. The mean days on the market is 30 with a standard deviation of 10 days. Use
the normal distribution to estimate the number of homes on the market more than
24 days. Compare this to the actual results. Try another test. If days on the market
is normally distributed, how many homes should be on the market more than the
mean number of days? Compare this to the actual number of homes. Does the normal

CONTINUOUS PROBABILITY DISTRIBUTIONS 245

distribution yield a good approximation of the actual results? Create a frequency dis-
tribution of days on the market. What do you observe?

75. Refer to the Baseball 2016 data, which report information on the 30 Major League Base-
ball teams for the 2016 season.
a. The mean attendance per team for the season was 2.439 million, with a standard

deviation of 0.618 million. Use the normal distribution to estimate the number of
teams with attendance of more than 3.5 million. Compare that estimate with the ac-
tual number. Comment on the accuracy of your estimate.

b. The mean team salary was $121 million, with a standard deviation of $40.0 million.
Use the normal distribution to estimate the number of teams with a team salary of
more than $100 million. Compare that estimate with the actual number. Comment on
the accuracy of the estimate.

76. Refer to the Lincolnville School District bus data.
a. Refer to the maintenance cost variable. The mean maintenance cost for last year is

$4,552 with a standard deviation of $2332. Estimate the number of buses with a
maintenance cost of more than $6,000. Compare that with the actual number. Create
a frequency distribution of maintenance cost. Is the distribution normally distributed?

b. Refer to the variable on the number of miles driven since the last maintenance. The
mean is 11,121 and the standard deviation is 617 miles. Estimate the number of
buses traveling more than 11,500 miles since the last maintenance. Compare that
number with the actual value. Create a frequency distribution of miles since mainte-
nance cost. Is the distribution normally distributed?

A REVIEW OF CHAPTERS 5–7
The chapters in this section consider methods of dealing with uncertainty. In Chapter 5, we describe the concept of prob-
ability. A probability is a value between 0 and 1 that expresses the likelihood a particular event will occur. We also looked
at methods to calculate probabilities using rules of addition and multiplication; presented principles of counting, including
permutations and combinations; and described situations for using Bayes’ theorem.

Chapter 6 describes discrete probability distributions. Discrete probability distributions list all possible outcomes of an exper-
iment and the probability associated with each outcome. We describe three discrete probability distributions: the binomial
distribution, the hypergeometric distribution, and the Poisson distribution. The requirements for the binomial distribution are
there are only two possible outcomes for each trial, there is a constant probability of success, there are a fixed number of
trials, and the trials are independent. The binomial distribution lists the probabilities for the number of successes in a fixed
number of trials. The hypergeometric distribution is similar to the binomial, but the probability of success is not constant, so
the trials are not independent. The Poisson distribution is characterized by a small probability of success in a large number of
trials. It has the following characteristics: the random variable is the number of times some event occurs in a fixed interval, the
probability of a success is proportional to the size of the interval, and the intervals are independent and do not overlap.

Chapter 7 describes three continuous probability distributions: the uniform distribution, the normal distribution, and the
exponential distribution. The uniform probability distribution is rectangular in shape and is defined by minimum and maxi-
mum values. The mean and the median of a uniform probability distribution are equal, and it does not have a mode.

A normal probability distribution is the most widely used and widely reported distribution. Its major characteristics are that
it is bell-shaped and symmetrical, completely described by its mean and standard deviation, and asymptotic, that is, it falls
smoothly in each direction from its peak but never touches the horizontal axis. There is a family of normal probability
distributions—each with its own mean and standard deviation. There are an unlimited number of normal probability
distributions.

To find the probabilities for any normal probability distribution, we convert a normal distribution to a standard normal prob-
ability distribution by computing z values. A z value is the distance between x and the mean in units of the standard devi-
ation. The standard normal probability distribution has a mean of 0 and a standard deviation of 1. It is useful because the
probability for any event from a normal probability distribution can be computed using standard normal probability tables
(see Appendix B.3).

The exponential probability distribution describes the time between events in a sequence. These events occur inde-
pendently at a constant rate per unit of time or length. The exponential probability distribution is positively skewed, with λ
as the “rate” parameter. The mean and standard deviation are equal and are the reciprocal of λ.

246 CHAPTER 7

P R O B L E M S

1. Proactine, a new medicine for acne, is claimed by the manufacturer to be 80% effective.
It is applied to the affected area of a sample of 15 people. What is the probability that:
a. All 15 will show significant improvement?
b. Fewer than 9 of 15 will show significant improvement?
c. 12 or more people will show significant improvement?

2. Customers at the Bank of Commerce of Idaho Falls, Idaho, default at a rate of .005 on
small home-improvement loans. The bank has approved 400 small home-improvement
loans. Assuming the Poisson probability distribution applies to this problem:
a. What is the probability that no homeowners out of the 400 will default?
b. How many of the 400 are expected not to default?
c. What is the probability that three or more homeowners will default on their small

home-improvement loans?
3. A study of the attendance at the University of Alabama’s basketball games revealed that

the distribution of attendance is normally distributed with a mean of 10,000 and a stan-
dard deviation of 2,000.
a. What is the probability a particular game has an attendance of 13,500 or more?
b. What percent of the games have an attendance between 8,000 and 11,500?
c. Ten percent of the games have an attendance of how many or less?

4. Daniel-James Insurance Company will insure an offshore ExxonMobil oil production
platform against weather losses for one year. The president of Daniel-James estimates the
following losses for that platform (in millions of dollars) with the accompanying probabilities:

Amount of Loss Probability
($ millions) of Loss

0 .98
40 .016
300 .004

a. What is the expected amount Daniel-James will have to pay to ExxonMobil in claims?
b. What is the likelihood that Daniel-James will actually lose less than the expected amount?
c. Given that Daniel-James suffers a loss, what is the likelihood that it is for $300 million?
d. Daniel-James has set the annual premium at $2.0 million. Does that seem like a fair

premium? Will it cover its risk?
5. The distribution of the number of school-age children per family in the Whitehall

Estates area of Grand Junction, Colorado, is:

Number of children 0 1 2 3 4
Percent of families 40 30 15 10 5

a. Determine the mean and standard deviation of the number of school-age children
per family in Whitehall Estates.

b. A new school is planned in Whitehall Estates. An estimate of the number of school-age
children is needed. There are 500 family units. How many children would you estimate?

c. Some additional information is needed about only the families having children. Con-
vert the preceding distribution to one for families with children. What is the mean
number of children among families that have children?

6. The following table shows a breakdown of the 114th U.S. Congress by party affiliation.
(There are two independent senators included in the count of Democratic senators.
There is one vacant House seat.)

Party

Democrats Republicans Total

House 188 246 434
Senate 46 54 100
Total 234 300 534

CONTINUOUS PROBABILITY DISTRIBUTIONS 247

a. A member of Congress is selected at random. What is the probability of selecting a
Republican?

b. Given that the person selected is a member of the House of Representatives, what is
the probability he or she is a Republican?

c. What is the probability of selecting a member of the House of Representatives or a
Democrat?

C A S E S

A. Century National Bank
Refer to the Century National Bank data. Is it reasonable
that the distribution of checking account balances approx-
imates a normal probability distribution? Determine the
mean and the standard deviation for the sample of 60
customers. Compare the actual distribution with the theo-
retical distribution. Cite some specific examples and com-
ment on your findings.
Divide the account balances into three groups, of
about 20 each, with the smallest third of the balances in
the first group, the middle third in the second group, and
those with the largest balances in the third group. Next,
develop a table that shows the number in each of the cat-
egories of the account balances by branch. Does it ap-
pear that account balances are related to the branch? Cite
some examples and comment on your findings.

B. Elections Auditor
An item such as an increase in taxes, recall of elected offi-
cials, or an expansion of public services can be placed on
the ballot if a required number of valid signatures are col-
lected on the petition. Unfortunately, many people will
sign the petition even though they are not registered to
vote in that particular district, or they will sign the petition
more than once.
Sara Ferguson, the elections auditor in Venango
County, must certify the validity of these signatures after
the petition is officially presented. Not surprisingly, her
staff is overloaded, so she is considering using statistical
methods to validate the pages of 200 signatures, instead
of validating each individual signature. At a recent profes-
sional meeting, she found that, in some communities in
the state, election officials were checking only five signa-
tures on each page and rejecting the entire page if two or
more signatures were invalid. Some people are con-
cerned that five may not be enough to make a good deci-
sion. They suggest that you should check 10 signatures
and reject the page if 3 or more are invalid.
In order to investigate these methods, Sara asks her
staff to pull the results from the last election and sample
30 pages. It happens that the staff selected 14 pages
from the Avondale district, 9 pages from the Midway dis-
trict, and 7 pages from the Kingston district. Each page
had 200 signatures, and the data below show the number
of invalid signatures on each.
Use the data to evaluate Sara’s two proposals. Calcu-
late the probability of rejecting a page under each of the

approaches. Would you get about the same results by ex-
amining every single signature? Offer a plan of your own,
and discuss how it might be better or worse than the two
plans proposed by Sara.

Avondale Midway Kingston

9 19 38
14 22 39
11 23 41
8 14 39
14 22 41
6 17 39
10 15 39
13 20
8 18
8
9
12
7
13

C. Geoff Applies Data Analytics
Geoff Brown is the manager for a small telemarketing
firm and is evaluating the sales rate of experienced
workers in order to set minimum standards for new hires.
During the past few weeks, he has recorded the number
of successful calls per hour for the staff. These data ap-
pear next along with some summary statistics he worked
out with a statistical software package. Geoff has been a
student at the local community college and has heard of
many different kinds of probability distributions (bino-
mial, normal, hypergeometric, Poisson, etc.). Could you
give Geoff some advice on which distribution to use to fit
these data as well as possible and how to decide when
a probationary employee should be accepted as having
reached full production status? This is important be-
cause it means a pay raise for the employee, and there
have been some probationary employees in the past
who have quit because of discouragement that they
would never meet the standard.

Successful sales calls per hour during the week of
August 14:

4 2 3 1 4 5 5 2 3 2 2 4 5 2 5 3 3 0
1 3 2 8 4 5 2 2 4 1 5 5 4 5 1 2 4

248 CHAPTER 7

The score is the sum of the points on the six items. For
example, Tracy Brown is under 25 years old (12 pts.), has
lived at the same address for 2 years (0 pts.), owns a
4-year-old car (13 pts.), with car payments of $75 (6 pts.),
housing cost of $450 (10 pts.), and a checking account
(3 pts.). She would score 44.
A second chart is then used to convert scores into
the probability of being a profitable customer. A sample
chart of this type appears below.

Score 30 40 50 60 70 80 90
Probability .70 .78 .85 .90 .94 .95 .96

Tracy’s score of 44 would translate into a probability of
being profitable of approximately .81. In other words, 81%
of customers like Tracy will make money for the bank card
operations.
Here are the interview results for three potential
customers.

David Edward Ann
Name Born Brendan McLaughlin
Age 42 23 33
Time at same address 9 2 5
Auto age 2 3 7
Monthly car payment $140 $99 $175
Housing cost $450 $650 Owns clear
Checking/savings accounts Both Checking only Neither

1. Score each of these customers and estimate their
probability of being profitable.

2. What is the probability that all three are profitable?
3. What is the probability that none of them are

profitable?
4. Find the entire probability distribution for the number

of profitable customers among this group of three.
5. Write a brief summary of your findings.

Descriptive statistics:

N MEAN MEDIAN STANDARD DEVIATION
35 3.229 3.000 1.682
MIN MAX 1ST QUARTILE 3RD QUARTILE
0.0 8.0 2.0 5.0

Analyze the distribution of sales calls. Which distribution
do you think Geoff should use for his analysis? Support
your recommendation with your analysis. What standard
should be used to determine if an employee has reached
“full production” status? Explain your recommendation.

D. CNP Bank Card
Before banks issue a credit card, they usually rate or
score the customer in terms of his or her projected proba-
bility of being a profitable customer. A typical scoring ta-
ble appears below.

Age Under 25 25–29 30–34 35+
(12 pts.) (5 pts.) (0 pts.) (18 pts.)

Time at <1 yr. 1–2 yrs. 3–4 yrs. 5+ yrs.
same (9 pts.) (0 pts.) (13 pts.) (20 pts.)
address

Auto age None 0–1yr. 2–4 yrs. 5+ yrs.
(18 pts.) (12 pts.) (13 pts.) (3 pts.)

Monthly None $1–$99 $100–$299 $300+
car (15 pts.) (6 pts.) (4 pts.) (0 pts.)
payment

Housing $1–$199 $200–$399 Owns Lives with
cost (0 pts.) (10 pts.) (12 pts.) relatives
(24 pts.)

Checking/ Both Checking Savings Neither
savings only only
accounts (15 pts.) (3 pts.) (2 pts.) (0 pts.)

P R A C T I C E T E S T

Part 1—Objective
1. Under what conditions will a probability be greater than 1 or 100%? 1.
2. An is the observation of some activity or the act of taking some type of

measurement. 2.
3. An is the collection of one or more outcomes to an experiment. 3.
4. A probability is the likelihood that two or more events will happen at the same time. 4.
5. In a (5a) , the order in which the events are counted is important, but in a

(5b) , it is not important. 5. a.
5. b.

6. In a discrete probability distribution, the sum of the possible outcomes is equal to . 6.
7. Which of the following is NOT a requirement of the binomial distribution? (constant

probability of success, three or more outcomes, the result of counts) 7.
8. How many normal distributions are there? (1, 10, 30, 1,000, or infinite—pick one) 8.
9. How many standard normal distributions are there? (1, 10, 30, 1,000, or infinite—pick one) 9.
10. What is the probability of finding a z value between 0 and −0.76? 10.
11. What is the probability of finding a z value greater than 1.67? 11.
12. Two events are ____________ if the occurrence of one event does not affect the

occurrence of another event. 12.

CONTINUOUS PROBABILITY DISTRIBUTIONS 249

13. Two events are ____________ if by virtue of one event happening the other
cannot happen. 13.

14. Which of the following is not true regarding the normal probability distribution?
(asymptotic, family of distributions, only two outcomes, 50% of the
observations greater than the mean) 14.

15. Which of the following statements best describes the shape of a normal
probability distribution? (bell-shaped, uniform, V-shaped, no constant shape) 15.

Part 2—Problems
1. Fred Friendly, CPA, has 20 tax returns to prepare before the April 15th deadline. It is late at night so he decides to do

two more before going home. In his stack of accounts, 12 are personal, 5 are businesses, and 3 are for charitable
organizations. If he selects the two returns at random, what is the probability:
a. Both are businesses?
b. At least one is a business?

2. The IRS reports that 15% of returns where the adjusted gross income is more than $1,000,000 will be subject to a
computer audit. For the year 2017, Fred Friendly, CPA, completed 16 returns where the adjusted gross income was
more than $1,000,000.
a. What is the probability exactly one of these returns will be audited?
b. What is the probability at least one will be audited?

3. Fred works in a tax office with five other CPAs. There are five parking spots beside the office. In how many different
ways can the cars belonging to the CPAs be arranged in the five spots? Assume they all drive to work.

4. Fred decided to study the number of exemptions claimed on personal tax returns he prepared in 2017. The data are
summarized in the following table.

Exemptions Percent

1 20
2 50
3 20
4 10

a. What is the mean number of exemptions per return?
b. What is the variance of the number of exemptions per return?

5. In a memo to all those involved in tax preparation, the IRS indicated that the mean amount of refund was $1,600 with
a standard deviation of $850. Assume the distribution of the amounts returned follows the normal distribution.
a. What percent of the refunds were between $1,600 and $2,000?
b. What percent of the refunds were between $900 and $2,000?
c. According to the above information, what percent of the refunds were less than $0; that is, the taxpayer owed the IRS.

6. For the year 2017, Fred Friendly completed a total of 80 returns. He developed the following table summarizing the
relationship between number of dependents and whether or not the client received a refund.

Dependents

Refund 1 2 3 or more Total

Yes 20 20 10 50
No 10 20 0 30
Total 30 40 10 80

a. What is the name given to this table?
b. What is the probability of selecting a client who received a refund?
c. What is the probability of selecting a client who received a refund or had one dependent?
d. Given that the client received a refund, what is the probability he or she had one dependent?
e. What is the probability of selecting a client who did not receive a refund and had one dependent?

7. The IRS offers taxpayers the choice of allowing the IRS to compute the amount of their tax refund. During the busy
filing season, the number of returns received at the Springfield Service Center that request this service follows a
Poisson distribution with a mean of three per day. What is the probability that on a particular day:
a. There are no requests?
b. Exactly three requests appear?
c. Five or more requests take place?
d. There are no requests on two consecutive days?

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO8-1 Explain why populations are sampled and describe four methods to sample a population.

LO8-2 Define sampling error.

LO8-3 Demonstrate the construction of a sampling distribution of the sample mean.

LO8-4 Recite the central limit theorem and define the mean and standard error of the sampling
distribution of the sample mean.

LO8-5 Apply the central limit theorem to calculate probabilities.

THE NIKE annual report says that the average American buys 6.5 pairs of sports shoes
per year. Suppose a sample of 81 customers is surveyed and the population standard
deviation of sports shoes purchased per year is 2.1 What is the standard error of the
mean in this experiment? (See Exercise 45 and LO8-4.)

Sampling Methods
and the Central
Limit Theorem

8

© ecopix/ullstein bild/The Image Works

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 251

INTRODUCTION
Chapters 2 through 4 emphasize techniques to describe data. To illustrate these techniques,
we organize the profits for the sale of 180 vehicles by the four dealers included in the
Applewood Auto Group into a frequency distribution and compute measures of location and
dispersion. Such measures as the mean and the standard deviation describe the typical
profit and the spread in the profits. In these chapters, the emphasis is on describing the
distribution of the data. That is, we describe something that has already happened.

In Chapter 5, we begin to lay the foundation for statistical inference with the study
of probability. Recall that in statistical inference our goal is to determine something
about a population based only on the sample. The population is the entire group of in-
dividuals or objects under consideration, and the sample is a part or subset of that pop-
ulation. Chapter 6 extends the probability concepts by describing three discrete
probability distributions: the binomial, the hypergeometric, and the Poisson. Chapter 7
describes three continuous probability distributions: the uniform, normal, and exponen-
tial. Probability distributions encompass all possible outcomes of an experiment and the
probability associated with each outcome. We use probability distributions to evaluate
the likelihood something occurs in the future.

This chapter begins our study of sampling. Sampling is a process of selecting items from
a population so we can use this information to make judgments or inferences about the pop-
ulation. We begin this chapter by discussing methods of selecting a sample from a popula-
tion. Next, we construct a distribution of the sample mean to understand how the sample
means tend to cluster around the population mean. Finally, we show that for any population
the shape of this sampling distribution tends to follow the normal probability distribution.

SAMPLING METHODS
In Chapter 1, we said the purpose of inferential statistics is to find something about a pop-
ulation based on a sample. A sample is a portion or part of the population of interest. In
many cases, sampling is more feasible than studying the entire population. In this section,
we discuss the reasons for sampling, and then several methods for selecting a sample.

Reasons to Sample
When studying characteristics of a population, there are many practical reasons why we
prefer to select portions or samples of a population to observe and measure. Here are
some of the reasons for sampling:

1. To contact the whole population would be time-consuming. A candidate for a
national office may wish to determine her chances for election. A sample poll using
the regular staff and field interviews of a professional polling firm would take only
1 or 2 days. Using the same staff and interviewers and working 7 days a week, it
would take nearly 200 years to contact all the voting population! Even if a large
staff of interviewers could be assembled, the benefit of contacting all of the voters
would probably not be worth the time.

2. The cost of studying all the items in a population may be prohibitive. Public opin-
ion polls and consumer testing organizations, such as Harris Interactive Inc., CBS
News Polls, and Zogby Analytics, usually contact fewer than 2,000 of the nearly
60 million families in the United States. One consumer panel–type organization
charges $40,000 to mail samples and tabulate responses to test a product (such as
breakfast cereal, cat food, or perfume). The same product test using all 60 million
families would be too expensive to be worthwhile.

3. The physical impossibility of checking all items in the population. Some popu-
lations are infinite. It would be impossible to check all the water in Lake Erie for
bacterial levels, so we select samples at various locations. The populations of
fish, birds, snakes, deer, and the like are large and are constantly moving, being

LO8-1
Explain why populations
are sampled and describe
four methods to sample a
population.

252 CHAPTER 8

born, and dying. Instead of even attempting to count all the ducks in
Canada or all the fish in Lake Pontchartrain, we make estimates using
various techniques—such as counting all the ducks on a pond se-
lected at random, tracking fish catches, or netting fish at predeter-
mined places in the lake.
4. The destructive nature of some tests. If the wine tasters at the

Sutter Home Winery in California drank all the wine to evaluate the
vintage, they would consume the entire crop, and none would be
available for sale. In the area of industrial production, steel plates,
wires, and similar products must have a certain minimum tensile
strength. To ensure that the product meets the minimum standard,
the Quality Assurance Department selects a sample from the
current production. Each piece is stretched until it breaks and
the breaking point (usually measured in pounds per square inch)

recorded. Obviously, if all the wire or all the plates were tested for tensile strength,
none would be available for sale or use. For the same reason, only a few seeds are
tested for germination by Burpee Seeds Inc. prior to the planting season.

5. The sample results are adequate. Even if funds were available, it is doubtful the ad-
ditional accuracy of a 100% sample—that is, studying the entire population—is essen-
tial in most problems. For example, the federal government uses a sample of grocery
stores scattered throughout the United States to determine the monthly index of food
prices. The prices of bread, beans, milk, and other major food items are included in
the index. It is unlikely that the inclusion of all grocery stores in the United States
would significantly affect the index because the prices of milk, bread, and other major
foods usually do not vary by more than a few cents from one chain store to another.

Simple Random Sampling
The most widely used sampling method is a simple random sampling.

© David Epperson/Getty Images

SIMPLE RANDOM SAMPLE A sample selected so that each item or person in the
population has the same chance of being included.

To illustrate the selection process for a simple random sample, suppose the popula-
tion of interest is the 750 Major League Baseball players on the active rosters of the 30
teams at the end of the 2017 season. The president of the players’ union wishes to form
a committee of 10 players to study the issue of concussions. One way of ensuring that
every player in the population has the same chance of being chosen to serve on the Con-
cussion Committee is to write each name of the 750 players on a slip of paper and place
all the slips of paper in a box. After the slips of paper have been thoroughly mixed, the first
selection is made by drawing a slip of paper from the box identifying the first player. The
slip of paper is not returned to the box. This process is repeated nine more times to form
the committee. (Note that the probability of each selection does increase slightly because
the slip is not replaced. However, the differences are very small because the population is
750. The probability of each selection is about 0.0013, rounded to four decimal places.)

Of course, the process of writing all the players’ names on a slip of paper is very
time-consuming. A more convenient method of selecting a random sample is to use a table
of random numbers such as the one in Appendix B.4. In this case the union president
would prepare a list of all 750 players and number each of the players from 1 to 750 with
a computer application. Using a table of random numbers, we would randomly pick a
starting place in the table, and then select 10 three-digit numbers between 001 and 750.
A computer can also generate random numbers. These numbers would correspond with
the 10 players in the list that will be asked to participate on the committee. As the name
simple random sampling implies, the probability of selecting any number between 001 and
750 is the same. Thus, the probability of selecting the player assigned the number 131 is

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 253

the same as the probability of selecting player 722 or player 382. Using random numbers
to select players for the committee removes any bias from the selection process.

The following example shows how to select random numbers using a portion of a
random number table illustrated below. First, we choose a starting point in the table. One
way of selecting the starting point is to close your eyes and point at a number in the table.
Any starting point will do. Another way is to randomly pick a column and row. Suppose the
time is 3:04. Using the hour, three o’clock, pick the third column and then, using the min-
utes, four, move down to the fourth row of numbers. The number is 03759. Because there
are only 750 players, we will use the first three digits of a five-digit random number. Thus,
037 is the number of the first player to be a member of the sample. To continue selecting
players, we could move in any direction. Suppose we move right. The first three digits of
the number to the right of 03759 are 447. Player number 447 is the second player se-
lected to be on the committee. The next three-digit number to the right is 961. You skip
961 as well as the next number 784 because there are only 750 players. The third player
selected is number 189. We continue this process until we have 10 players.

STATISTICS IN ACTION

To insure that an unbiased,
representative sample is
selected from a population,
lists of random numbers
are needed. In 1927,
L. Tippett published the first
book of random numbers.
In 1938, R. A. Fisher and
F. Yates published 15,000
random digits generated
using two decks of cards.
In 1955, RAND Corporation
published a million random
digits, generated by the
random frequency pulses
of an electronic roulette
wheel. Since then, com-
puter programs have been
developed for generating
digits that are “almost”
random and hence are
called pseudo-random. The
question of whether a com-
puter program can be used
to generate numbers that
are truly random remains a
debatable issue.

5 0 5 2 5 5 7 4 5 4 2 8 4 5 5 6 8 2 2 6 3 4 6 5 6 3 8 8 8 4 3 9 0 1 8
7 2 5 0 7 5 3 3 8 0 5 3 8 2 7 4 2 4 8 6 5 4 4 6 5 7 1 8 1 9 9 1 1 9 9
3 4 9 8 6 7 4 2 9 7 0 0 1 4 4 3 8 6 7 6 8 9 9 6 7 9 8 8 6 9 3 9 7 4 4
6 8 8 5 1 2 7 3 0 5 0 3 7 5 9 4 4 7 2 3 9 6 1 0 8 7 8 4 8 9 1 8 9 1 0
0 6 7 3 8 6 2 8 7 9 0 3 9 1 0 1 7 3 5 0 4 9 1 6 9 0 3 8 5 0 1 8 9 1 0
1 1 4 4 8 1 0 7 3 4 0 5 8 3 7 2 4 3 9 7 1 0 4 2 0 1 6 7 1 2 9 4 4 9 6

Starting Second Third
point player player

Statistical packages such as Minitab and spreadsheet packages such as Excel have
software that will select a simple random sample. The following example/solution uses
Excel to select a random sample from a list of the data.

E X A M P L E

Jane and Joe Miley operate the Foxtrot Inn, a bed and breakfast in Tryon, North
Carolina. There are eight rooms available for rent at this B&B. For each day of June
2017, the number of rooms rented is listed. Use Excel to select a sample of five
nights during the month of June.

June Rentals

1 0
2 2
3 3
4 2
5 3
6 4
7 2
8 3
9 4
10 7

June Rentals

11 3
12 4
13 4
14 4
15 7
16 0
17 5
18 3
19 6
20 2

June Rentals

21 3
22 2
23 3
24 6
25 0
26 4
27 1
28 1
29 3
30 3

S O L U T I O N

Excel will select the random sample and report the results. On the first sampled
date, four of the eight rooms were rented. On the second sampled date in June,
seven rooms were rented. The information is reported in column D of the Excel

254 CHAPTER 8

The following roster lists the students enrolled in an introductory course in business statis-
tics. Three students will be randomly selected and asked questions about course content
and method of instruction.
(a) The numbers 00 through 45 are handwritten on slips of paper and placed in a bowl.

The three numbers selected are 31, 7, and 25. Which students are in the sample?
(b) Now use the table of random digits, Appendix B.4, to select your own sample.
(c) What would you do if you encountered the number 59 in the table of random digits?

S E L F - R E V I E W 8–1

STAT 264 BUSINESS STATISTICS
9:00 AM - 9:50 AM MW; 118 CARLSON HALL; PROFESSOR LIND

RANDOM CLASS
NUMBER NAME RANK
00 ANDERSON, RAYMOND SO
01 ANGER, CHERYL RENEE SO
02 BALL, CLAIRE JEANETTE FR
03 BERRY, CHRISTOPHER G FR
04 BOBAK, JAMES PATRICK SO
05 BRIGHT, M. STARR JR
06 CHONTOS, PAUL JOSEPH SO
07 DETLEY, BRIAN HANS JR
08 DUDAS, VIOLA SO
09 DULBS, RICHARD ZALFA JR
10 EDINGER, SUSAN KEE SR
11 FINK, FRANK JAMES SR
12 FRANCIS, JAMES P JR
13 GAGHEN, PAMELA LYNN JR
14 GOULD, ROBYN KAY SO
15 GROSENBACHER, SCOTT ALAN SO
16 HEETFIELD, DIANE MARIE SO
17 KABAT, JAMES DAVID JR
18 KEMP, LISA ADRIANE FR
19 KILLION, MICHELLE A SO
20 KOPERSKI, MARY ELLEN SO
21 KOPP, BRIDGETTE ANN SO
22 LEHMANN, KRISTINA MARIE JR

RANDOM CLASS
NUMBER NAME RANK
23 MEDLEY, CHERYL ANN SO
24 MITCHELL, GREG R FR
25 MOLTER, KRISTI MARIE SO
26 MULCAHY, STEPHEN ROBERT SO
27 NICHOLAS, ROBERT CHARLES JR
28 NICKENS, VIRGINIA SO
29 PENNYWITT, SEAN PATRICK SO
30 POTEAU, KRIS E JR
31 PRICE, MARY LYNETTE SO
32 RISTAS, JAMES SR
33 SAGER, ANNE MARIE SO
34 SMILLIE, HEATHER MICHELLE SO
35 SNYDER, LEISHA KAY SR
36 STAHL, MARIA TASHERY SO
37 ST. JOHN, AMY J SO
38 STURDEVANT, RICHARD K SO
39 SWETYE, LYNN MICHELE SO
40 WALASINSKI, MICHAEL SO
41 WALKER, DIANE ELAINE SO
42 WARNOCK, JENNIFER MARY SO
43 WILLIAMS, WENDY A SO
44 YAP, HOCK BAN SO
45 YODER, ARLAN JAY JR

spreadsheet. The Excel steps are listed in the Software Commands in Appendix C.
The Excel system performs the sampling with replacement. This means it is possi-
ble for the same day to appear more than once in a sample.

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 255

Systematic Random Sampling
The simple random sampling procedure is awkward in some research situations. For exam-
ple, Stood’s Grocery Market needs to sample their customers to study the length of time
customers spend in the store. Simple random sampling is not an effective method. Practi-
cally, we do not have a list of customers, so assigning random numbers to customers is im-
possible. Instead, we can use systematic random sampling to select a representative
sample. Using this method for Stood’s Grocery Market, we decide to select 100 customers
over 4 days, Monday through Thursday. We will select 25 customers a day and begin the
sampling at different times each day: 8 a.m., 11 a.m., 4 p.m., and 7 p.m. We write the 4 times
and 4 days on slips of paper and put them in two hats—one hat for the days and the other
hat for the times. We select one slip from each hat. This ensures that the time of day is ran-
domly assigned for each day. Suppose we selected 4 p.m. for the starting time on Monday.
Next we select a random number between 1 and 10; it is 6. Our selection process begins on
Monday at 4 p.m. by selecting the sixth customer to enter the store. Then, we select every
10th (16th, 26th, 36th) customer until we reach the goal of 25 customers. For each of these
sampled customers, we measure the length of time the customer spends in the store.

SYSTEMATIC RANDOM SAMPLE A random starting point is selected, and then
every kth member of the population is selected.

STRATIFIED RANDOM SAMPLE A population is divided into subgroups, called
strata, and a sample is randomly selected from each stratum.

STATISTICS IN ACTION

Random and unbiased
sampling methods are ex-
tremely important to make
valid statistical inferences.
In 1936, the Literary Di-
gest conducted a straw
vote to predict the out-
come of the presidential
race between Franklin
Roosevelt and Alfred
Landon. Ten million ballots
in the form of returnable
postcards were sent to ad-
dresses taken from Literary
Digest subscribers, tele-
phone directories and au-
tomobile registrations. In
1936 not many people
could afford a telephone or
an automobile. Thus, the
population that was sam-
pled did not represent the
population of voters. A sec-
ond problem was with the
non-responses. More than
10 million people were
sent surveys, and more
than 2.3 million responded.
However, no attempt was
made to see whether those
responding represented a
cross-section of all the vot-
ers. On Election Day, Roos-
evelt won with 61% of the
vote. Landon had 39%. In
the mid-1930s people who
had telephones and drove
automobiles clearly did not
represent American voters!

Simple random sampling is used in the selection of the days, the times, and the
starting point. But the systematic procedure is used to select the actual customer.

Before using systematic random sampling, we should carefully observe the physi-
cal order of the population. When the physical order is related to the population charac-
teristic, then systematic random sampling should not be used because the sample
could be biased. For example, if we wanted to audit the invoices in a file drawer that
were ordered in increasing dollar amounts, systematic random sampling would not
guarantee an unbiased random sample. Other sampling methods should be used.

Stratified Random Sampling
When a population can be clearly divided into groups based on some characteristic, we
may use stratified random sampling. It guarantees each group is represented in the
sample. The groups are called strata. For example, college students can be grouped as
full time or part time; as male or female; or as freshman, sophomore, junior, or senior.
Usually the strata are formed based on members’ shared attributes or characteristics. A
random sample from each stratum is taken in a number proportional to the stratum’s
size when compared to the population. Once the strata are defined, we apply simple
random sampling within each group or stratum to collect the sample.

For instance, we might study the advertising expenditures for the 352 largest
companies in the United States. The objective of the study is to determine whether
firms with high returns on equity (a measure of profitability) spend more on advertising
than firms with low returns on equity. To make sure the sample is a fair representation
of the 352 companies, the companies are grouped on percent return on equity. Table 8–1
shows the strata and the relative frequencies. If simple random sampling is used,
observe that firms in the 3rd and 4th strata have a high chance of selection (probabil-
ity of 0.87) while firms in the other strata have a small chance of selection (probability
of 0.13). We might not select any firms in stratum 1 or 5 simply by chance. However,
stratified random sampling will guarantee that at least one firm in each of strata 1 and

256 CHAPTER 8

5 is represented in the sample. Let’s say that 50 firms are selected for intensive study.
Then based on probability, 1 firm, or (0.02)(50), should be randomly selected from
stratum 1. We would randomly select 5, or (0.10)(50), firms from stratum 2. In this
case, the number of firms sampled from each stratum is proportional to the stratum’s
relative frequency in the population. Stratified sampling has the advantage, in some
cases, of more accurately reflecting the characteristics of the population than does
simple random or systematic random sampling.

Cluster Sampling
Another common type of sampling is cluster sampling. It is often employed to reduce
the cost of sampling a population scattered over a large geographic area.

TABLE 8–1 Number Selected for a Proportional Stratified Random Sample

Profitability Number of Relative Number
Stratum (return on equity) Firms Frequency Sampled

1 30% and over 8 0.02 1*
2 20 up to 30% 35 0.10 5*
3 10 up to 20% 189 0.54 27
4 0 up to 10% 115 0.33 16
5 Deficit 5 0.01 1

Total 352 1.00 50

*0.02 of 50 = 1, 0.10 of 50 = 5, etc.

CLUSTER SAMPLING A population is divided into clusters using naturally occurring
geographic or other boundaries. Then, clusters are randomly selected and a sample
is collected by randomly selecting from each cluster.

Suppose you want to determine the views of residents in the greater Chicago, Illinois,
metropolitan area about state and federal environmental protection policies. Selecting a
random sample of residents in this region and personally contacting each one would be
time-consuming and very expensive. Instead, you could employ cluster sampling by subdi-
viding the region into small units, perhaps by counties. These are often called primary units.

There are 12 counties in the greater Chicago metropolitan area. Suppose you ran-
domly select 3 counties. The 3 chosen are La Porte, Cook, and Kenosha (see Chart 8–1
below). Next, you select a random sample of the residents in each of these counties and
interview them. This is also referred to as sampling through an intermediate unit. In this
case, the intermediate unit is the county. (Note that this is a combination of cluster sam-
pling and simple random sampling.)

Lake
Michigan

La Porte

Po
rte

r
Lake

Cook

Will

Gr
un

dy

Ke
nd

all

Kane

McHenry Lake

Du
Pa

ge

Kenosha

CHART 8–1 The Counties of the Greater Chicago, Illinois, Metropolitan Area

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 257

The discussion of sampling methods in the preceding sections did not include all
the sampling methods available to a researcher. Should you become involved in a major
research project in marketing, finance, accounting, or other areas, you would need to
consult books devoted solely to sample theory and sample design.

Refer to Self-Review 8–1 and the class roster on page 254. Suppose a systematic random
sample will select every ninth student enrolled in the class. Initially, the fourth student on
the list was selected at random. That student is numbered 03. Remembering that the ran-
dom numbers start with 00, which students will be chosen to be members of the sample?

S E L F - R E V I E W 8–2

1. The following is a list of 24 Marco’s Pizza stores in Lucas County. The stores are iden-
tified by numbering them 00 through 23. Also noted is whether the store is corpo-
rate-owned (C) or manager-owned (M). A sample of four locations is to be selected
and inspected for customer convenience, safety, cleanliness, and other features.

ID No. Address Type

00 2607 Starr Av C
01 309 W Alexis Rd C
02 2652 W Central Av C
03 630 Dixie Hwy M
04 3510 Dorr St C
05 5055 Glendale Av C
06 3382 Lagrange St M
07 2525 W Laskey Rd C
08 303 Louisiana Av C
09 149 Main St C
10 835 S McCord Rd M
11 3501 Monroe St M

ID No. Address Type

12 2040 Ottawa River Rd C
13 2116 N Reynolds Rd C
14 3678 Rugby Dr C
15 1419 South Av C
16 1234 W Sylvania Av C
17 4624 Woodville Rd M
18 5155 S Main M
19 106 E Airport Hwy C
20 6725 W Central M
21 4252 Monroe C
22 2036 Woodville Rd C
23 1316 Michigan Av M

a. The random numbers selected are 08, 18, 11, 54, 02, 41, and 54. Which stores
are selected?

b. Use the table of random numbers to select your own sample of locations.
c. Using systematic random sampling, every seventh location is selected starting

with the third store in the list. Which locations will be included in the sample?
d. Using stratified random sampling, select three locations. Two should be corpo-

rate-owned and one should be manager-owned.
2. The following is a list of 29 hospitals in the Cincinnati, Ohio, and Northern Kentucky

region. Each hospital is assigned a number, 00 through 28. The hospitals are clas-
sified by type, either a general medical/surgical hospital (M/S) or a specialty hospital
(S). We are interested in estimating the average number of full- and part-time nurses
employed in the area hospitals.

E X E R C I S E S

ID
Number Name Address Type

00 Bethesda North 10500 Montgomery M/S
Cincinnati, Ohio 45242
01 Ft. Hamilton–Hughes 630 Eaton Avenue M/S
Hamilton, Ohio 45013
02 Jewish Hospital– 4700 East Galbraith Rd. M/S
Kenwood Cincinnati, Ohio 45236
03 Mercy Hospital– 3000 Mack Road M/S
Fairfield Fairfield, Ohio 45014

ID
Number Name Address Type

04 Mercy Hospital– 100 Riverfront Plaza M/S
Hamilton Hamilton, Ohio 45011
05 Middletown Regional 105 McKnight Drive M/S
Middletown, Ohio 45044
06 Clermont Mercy 3000 Hospital Drive M/S
Hospital Batavia, Ohio 45103
07 Mercy Hospital– 7500 State Road M/S
Anderson Cincinnati, Ohio 45255

258 CHAPTER 8

a. A sample of five hospitals is to be randomly selected. The random numbers are
09, 16, 00, 49, 54, 12, and 04. Which hospitals are included in the sample?

b. Use a table of random numbers to develop your own sample of five hospitals.
c. Using systematic random sampling, every fifth location is selected starting with

the second hospital in the list. Which hospitals will be included in the sample?
d. Using stratified random sampling, select five hospitals. Four should be medical

and surgical hospitals and one should be a specialty hospital. Select an appro-
priate sample.

3. Listed below are the 35 members of the Metro Toledo Automobile Dealers Association.
We would like to estimate the mean revenue from dealer service departments. The
members are identified by numbering them 00 through 34.

a. We want to select a random sample of five dealers. The random numbers are 05,
20, 59, 21, 31, 28, 49, 38, 66, 08, 29, and 02. Which dealers would be included
in the sample?

ID
Number Name Address Type

08 Bethesda Oak 619 Oak Street M/S
Hospital Cincinnati, Ohio 45206
09 Children’s Hospital 3333 Burnet Avenue M/S
Medical Center Cincinnati, Ohio 45229
10 Christ Hospital 2139 Auburn Avenue M/S
Cincinnati, Ohio 45219
11 Deaconess Hospital 311 Straight Street M/S
Cincinnati, Ohio 45219
12 Good Samaritan 375 Dixmyth Avenue M/S
Hospital Cincinnati, Ohio 45220
13 Jewish Hospital 3200 Burnet Avenue M/S
Cincinnati, Ohio 45229
14 University Hospital 234 Goodman Street M/S
Cincinnati, Ohio 45267
15 Providence Hospital 2446 Kipling Avenue M/S
Cincinnati, Ohio 45239
16 St. Francis– 3131 Queen City Avenue M/S
St. George Hospital Cincinnati, Ohio 45238
17 St. Elizabeth Medical 401 E. 20th Street M/S
Center, North Unit Covington, Kentucky 41014
18 St. Elizabeth Medical One Medical Village M/S
Center, South Unit Edgewood, Kentucky 41017

ID
Number Name Address Type

19 St. Luke’s Hospital 7380 Turfway Drive M/S
West Florence, Kentucky 41075
20 St. Luke’s Hospital 85 North Grand Avenue M/S
East Ft. Thomas, Kentucky 41042
21 Care Unit Hospital 3156 Glenmore Avenue S
Cincinnati, Ohio 45211
22 Emerson Behavioral 2446 Kipling Avenue S
Science Cincinnati, Ohio 45239
23 Pauline Warfield 1101 Summit Road S
Lewis Center for Cincinnati, Ohio 45237
Psychiatric Treat.
24 Children’s Psychiatric 502 Farrell Drive S
No. Kentucky Covington, Kentucky
41011
25 Drake Center Rehab— 151 W. Galbraith Road S
Long Term Cincinnati, Ohio 45216
26 No. Kentucky Rehab 201 Medical Village S
Hospital—Short Term Edgewood, Kentucky
27 Shriners Burns 3229 Burnet Avenue S
Institute Cincinnati, Ohio 45229
28 VA Medical Center 3200 Vine S
Cincinnati, Ohio 45220

ID Number Dealer

00 Dave White Acura
01 Autofair Nissan
02 Autofair Toyota-Suzuki
03 George Ball’s Buick GMC

Truck
04 York Automotive Group
05 Bob Schmidt Chevrolet
06 Bowling Green Lincoln Mercury

Jeep Eagle
07 Brondes Ford
08 Brown Honda
09 Brown Mazda
10 Charlie’s Dodge

ID Number Dealer

11 Thayer Chevrolet/Toyota
12 Spurgeon Chevrolet Motor

Sales, Inc.
13 Dunn Chevrolet
14 Don Scott Chevrolet
15 Dave White Chevrolet Co.
16 Dick Wilson Infinity
17 Doyle Buick
18 Franklin Park Lincoln Mercury
19 Genoa Motors
20 Great Lakes Ford Nissan
21 Grogan Towne Chrysler
22 Hatfield Motor Sales

ID Number Dealer

23 Kistler Ford, Inc.
24 Lexus of Toledo
25 Mathews Ford Oregon, Inc.
26 Northtown Chevrolet
27 Quality Ford Sales, Inc.
28 Rouen Chrysler Jeep Eagle
29 Saturn of Toledo
30 Ed Schmidt Jeep Eagle
31 Southside Lincoln Mercury
32 Valiton Chrysler
33 Vin Divers
34 Whitman Ford

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 259

SAMPLING “ERROR”
In the previous section, we discussed sampling methods that are used to select a sam-
ple that is an unbiased representation of the population. In each method, the selection
of every possible sample of a specified size from a population has a known chance or
probability. This is another way to describe an unbiased sampling method.

Samples are used to estimate population characteristics. For example, the mean of
a sample is used to estimate the population mean. However, since the sample is a part
or portion of the population, it is unlikely that the sample mean would be exactly equal
to the population mean. Similarly, it is unlikely that the sample standard deviation would
be exactly equal to the population standard deviation. We can therefore expect a differ-
ence between a sample statistic and its corresponding population parameter. This dif-
ference is called sampling error.

LO8-2
Define sampling error.

SAMPLING ERROR The difference between a sample statistic and its corresponding
population parameter.

The following example/solution clarifies the idea of sampling error.

E X A M P L E

Refer to the example/solution on page 253, where we studied the number of rooms
rented at the Foxtrot Inn bed and breakfast in Tryon, North Carolina. The population
is the number of rooms rented each of the 30 days in June 2017. Find the mean of

a. We want to select a random sample of four agents. The random numbers are
02, 59, 51, 25, 14, 29, 77, 69, and 18. Which dealers would be included in the
sample?

b. Use the table of random numbers to select your own sample of four agents.
c. Using systematic random sampling, every fifth dealer is selected starting with

the third dealer in the list. Which dealers are included in the sample?

b. Use the table of random numbers to select your own sample of five dealers.
c. Using systematic random sampling, every seventh dealer is selected starting

with the fourth dealer in the list. Which dealers are included in the sample?
4. Listed next are the 27 Nationwide Insurance agents in the El Paso, Texas metropol-

itan area. The agents are numbered 00 through 26. We would like to estimate the
mean number of years employed with Nationwide.

ID
Number Agent

00 Bly Scott 3332 W Laskey Rd
01 Coyle Mike 5432 W Central Av
02 Denker Brett 7445 Airport Hwy
03 Denker Rollie 7445 Airport Hwy
04 Farley Ron 1837 W Alexis Rd
05 George Mark 7247 W Central Av
06 Gibellato Carlo 6616 Monroe St
07 Glemser Cathy 5602 Woodville Rd
08 Green Mike
4149 Holland Sylvania Rd
09 Harris Ev 2026 Albon Rd

ID
Number Agent

10 Heini Bernie 7110 W Centra
11 Hinckley Dave
14 N Holland Sylvania Rd
12 Joehlin Bob 3358 Navarre Av
13 Keisser David 3030 W Sylvania Av
14 Keisser Keith 5902 Sylvania Av
15 Lawrence Grant 342 W Dussel Dr
16 Miller Ken 2427 Woodville Rd
17 O’Donnell Jim 7247 W Central Av
18 Priest Harvey 5113 N Summit St
19 Riker Craig 2621 N Reynolds Rd

ID
Number Agent

20 Schwab Dave 572 W Dussel Dr
21 Seibert John H 201 S Main
22 Smithers Bob 229 Superior St
23 Smithers Jerry 229 Superior St
24 Wright Steve 105 S Third St
25 Wood Tom 112 Louisiana Av
26 Yoder Scott 6 Willoughby Av

260 CHAPTER 8

the population. Select three random samples of 5 days. Calculate the mean rooms
rented for each sample and compare it to the population mean. What is the sam-
pling error in each case?

S O L U T I O N

During the month, there were a total of 94 rentals. So the mean number of units
rented per night is 3.13. This is the population mean. Hence we designate this
value with the Greek letter μ.

μ =
Σx
N

=
0 + 2 + 3 + . . . + 3

30
=

94
30

= 3.13

The first random sample of five nights resulted in the following number of rooms
rented: 4, 7, 4, 3, and 1. The mean of this sample is 3.80 rooms, which we desig-
nate as x1. The bar over the x reminds us that it is a sample mean and the sub-
script 1 indicates it is the mean of the first sample.

x1 =
Σx
n

=
4 + 7 + 4 + 3 + 1

5
=

19
5

= 3.80

The sampling error for the first sample is the difference between the population
mean (3.13) and the first sample mean (3.80). Hence, the sampling error is
(x1 − μ) = 3.80 − 3.13 = 0.67. The second random sample of 5 days from the pop-
ulation of all 30 days in June revealed the following number of rooms rented: 3,
3, 2, 3, and 6. The mean of these five values is 3.40, found by

x2 =
Σx
n

=
3 + 3 + 2 + 3 + 6

5
= 3.40

The sampling error is (x2 − μ) = 3.4 − 3.13 = 0.27. In the third random sample, the
mean was 1.80 and the sampling error was −1.33.

Each of these differences, 0.67, 0.27, and −1.33, is the sampling error made
in estimating the population mean. Sometimes these errors are positive values,
indicating that the sample mean overestimated the population mean; other times
they are negative values, indicating the sample mean was less than the popula-
tion mean.

In this case, where we have a population of 30 values and samples of 5 values,
there is a very large number of possible samples—142,506 to be exact! To find this
value, use the combination formula (5–10) on page 164. Each of the 142,506 dif-
ferent samples has the same chance of being selected. Each sample may have a
different sample mean and therefore a different sampling error. The value of the
sampling error is based on the particular one of the 142,506 different possible
samples selected. Therefore, the sampling errors are random and occur by chance.
If you summed the sampling errors for all 142,506 samples, the result would equal
zero. This is true because the sample mean is an unbiased estimator of the popula-
tion mean.

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 261

SAMPLING DISTRIBUTION OF THE SAMPLE MEAN
In the previous section, we defined sampling error and presented the results when we
compared a sample statistic, such as the sample mean, to the population mean. To put
it another way, when we use the sample mean to estimate the population mean, how
can we determine how accurate the estimate is? How does:

• A quality-assurance supervisor decide if a machine is filling 20-ounce bottles with
20 ounces of cola based only on a sample of 10 filled bottles?

• FiveThirtyEight.com or Gallup make accurate statements about the demographics
of voters in a presidential race based on relatively small samples from a voting pop-
ulation of nearly 90 million?

To answer these questions, we first develop a sampling distribution of the sample mean.
The sample means in the previous example/solution varied from one sample to the

next. The mean of the first sample of 5 days was 3.80 rooms, and the second sample
mean was 3.40 rooms. The population mean was 3.13 rooms. If we organized the
means of all possible samples of 5 days into a probability distribution, the result is called
the sampling distribution of the sample mean.

LO8-3
Demonstrate the
construction of a
sampling distribution of
the sample mean.

SAMPLING DISTRIBUTION OF THE SAMPLE MEAN A probability distribution of
all possible sample means of a given sample size.

The following example/solution illustrates the construction of a sampling distribu-
tion of the sample mean. We have intentionally used a small population to highlight the
relationship between the population mean and the various sample means.

E X A M P L E

Tartus Industries has seven production employees (considered the population). The
hourly earnings of each employee are given in Table 8–2.

TABLE 8–2 Hourly Earnings of the Production Employees of Tartus Industries

Employee Hourly Earnings Employee Hourly Earnings

Joe $14 Jan 14
Sam 14 Art 16
Sue 16 Ted 18
Bob 16

1. What is the population mean?
2. What is the sampling distribution of the sample mean for samples of size 2?
3. What is the mean of the sampling distribution?
4. What observations can be made about the population and the sampling

distribution?

S O L U T I O N

Here are the solutions to the questions.

1. The population is small so it is easy to calculate the population mean. It is
$15.43, found by:

μ =
Σx
N

=
$14 + $14 + $16 + $16 + $14 + $16 + $18

7
= $15.43

262 CHAPTER 8

Hourly Hourly
Sample Employees Earnings Sum Mean Sample Employees Earnings Sum Mean

1 Joe, Sam $14, $14 $28 $14 12 Sue, Bob 16,16 32 16
2 Joe, Sue 14, 16 30 15 13 Sue, Jan 16,14 30 15
3 Joe, Bob 14, 16 30 15 14 Sue, Art 16,16 32 16
4 Joe, Jan 14, 14 28 14 15 Sue, Ted 16,18 34 17
5 Joe, Art 14, 16 30 15 16 Bob, Jan 16,14 30 15
6 Joe, Ted 14, 18 32 16 17 Bob, Art 16,16 32 16
7 Sam, Sue 14, 16 30 15 18 Bob, Ted 16,18 34 17
8 Sam, Bob 14, 16 30 15 19 Jan, Art 14,16 30 15
9 Sam, Jan 14,14 28 14 20 Jan, Ted 14,18 32 16
10 Sam, Art 14,16 30 15 21 Art, Ted 16,18 34 17
11 Sam, Ted 14,18 32 16

TABLE 8–3 Sample Means for All Possible Samples of 2 Employees

TABLE 8–4 Sampling Distribution of the Sample Mean for n = 2

Sample Mean Number of Means Probability

$14 3 .1429
15 9 .4285
16 6 .2857
17 3 .1429
21 1.0000

We identify the population mean with the Greek letter μ. Recall from earlier
chapters, Greek letters are used to represent population parameters.

2. To arrive at the sampling distribution of the sample mean, we need to select all
possible samples of 2 without replacement from the population, then compute
the mean of each sample. There are 21 possible samples, found by using for-
mula (5–10) on page 164.

NCn =
N!

n!(N − n)!
=

7!
2!(7 − 2)!

= 21

where N = 7 is the number of items in the population and n = 2 is the number
of items in the sample.

The 21 sample means from all possible samples of 2 that can be drawn from
the population of 7 employees are shown in Table 8–3. These 21 sample
means are used to construct a probability distribution. This is called the sam-
pling distribution of the sample mean, and it is summarized in Table 8–4.

3. Using the data in Table 8–3, the mean of the sampling distribution of the sample
mean is obtained by summing the various sample means and dividing the sum
by the number of samples. The mean of all the sample means is usually written
μx. The μ reminds us that it is a population value because we have considered all
possible samples of two employees from the population of seven employees.
The subscript x indicates that it is the sampling distribution of the sample mean.

μx =
Sum of all sample means
Total number of samples

=
$14 + $15 + $15 + … + $16 + $17

21

=
$324

21
= $15.43

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 263

In summary, we took all possible random samples from a population and for each
sample calculated a sample statistic (the mean amount earned). This example illustrates
important relationships between the population distribution and the sampling distribu-
tion of the sample mean:

1. The mean of the sample means is exactly equal to the population mean.
2. The dispersion of the sampling distribution of the sample mean is narrower than the

population distribution.
3. The sampling distribution of the sample mean tends to become bell-shaped and to

approximate the normal probability distribution.

Given a bell-shaped or normal probability distribution, we will be able to apply con-
cepts from Chapter 7 to determine the probability of selecting a sample with a specified
sample mean. In the next section, we will show the importance of sample size as it re-
lates to the sampling distribution of the sample mean.

The years of service of the five executives employed by Standard Chemicals are:

Name Years

Mr. Snow 20
Ms. Tolson 22
Mr. Kraft 26
Ms. Irwin 24
Mr. Jones 28

S E L F - R E V I E W 8–3

4. Refer to Chart 8–2. It shows the population distribution based on the data
in Table 8–2 and the distribution of the sample mean based on the data in
Table 8–4. These observations can be made:

a. The mean of the distribution of the sample mean ($15.43) is equal to the
mean of the population: μ = μx.

b. The spread in the distribution of the sample mean is less than the spread in the
population values. The sample means range from $14 to $17 while the popula-
tion values vary from $14 up to $18. If we continue to increase the sample size,
the spread of the distribution of the sample mean becomes smaller.

c. The shape of the sampling distribution of the sample mean and the shape of
the frequency distribution of the population values are different. The distri-
bution of the sample mean tends to be more bell-shaped and to approxi-
mate the normal probability distribution.

Population distribution

.40

.30

.20

14

.10

.40

.30

.20

.10

Hourly earnings16 18m
m

Distribution of sample mean

14 Sample mean
of hourly earnings

16 1815 17 x
x

Pr
ob

ab
ili

ty

Pr
ob

ab
ili

ty

CHART 8–2 Distributions of Population Values and Sample Means

264 CHAPTER 8

(a) Using the combination formula, how many samples of size 2 are possible?
(b) List all possible samples of two executives from the population and compute their means.
(c) Organize the means into a sampling distribution.
(d) Compare the population mean and the mean of the sample means.
(e) Compare the dispersion in the population with that in the distribution of the sample mean.
(f) A chart portraying the population values follows. Is the distribution of population val-

ues normally distributed (bell-shaped)?

1

0
20 22 24 26 28

Years of service

Fr
eq

ue
nc

y

(g) Is the distribution of the sample mean computed in part (c) starting to show some
tendency toward being bell-shaped?

5. A population consists of the following four values: 12, 12, 14, and 16.
a. List all samples of size 2, and compute the mean of each sample.
b. Compute the mean of the distribution of the sample mean and the population

mean. Compare the two values.
c. Compare the dispersion in the population with that of the sample mean.

6. A population consists of the following five values: 2, 2, 4, 4, and 8.
a. List all samples of size 2, and compute the mean of each sample.
b. Compute the mean of the distribution of sample means and the population

mean. Compare the two values.
c. Compare the dispersion in the population with that of the sample means.

7. A population consists of the following five values: 12, 12, 14, 15, and 20.
a. List all samples of size 3, and compute the mean of each sample.
b. Compute the mean of the distribution of sample means and the population

mean. Compare the two values.
c. Compare the dispersion in the population with that of the sample means.

8. A population consists of the following five values: 0, 0, 1, 3, 6.
a. List all samples of size 3, and compute the mean of each sample.
b. Compute the mean of the distribution of sample means and the population

mean. Compare the two values.
c. Compare the dispersion in the population with that of the sample means.

9. In the law firm Tybo and Associates, there are six partners. Listed is the number of
cases each partner actually tried in court last month.

Partner Number of Cases

Ruud 3
Wu 6
Sass 3
Flores 3
Wilhelms 0
Schueller 1

a. How many different samples of size 3 are possible?
b. List all possible samples of size 3, and compute the mean number of cases in

each sample.
c. Compare the mean of the distribution of sample means to the population mean.
d. On a chart similar to Chart 8–2, compare the dispersion in the population with

that of the sample means.

E X E R C I S E S

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 265

THE CENTRAL LIMIT THEOREM
In this section, we examine the central limit theorem. Its application to the sampling distri-
bution of the sample mean, introduced in the previous section, allows us to use the normal
probability distribution to create confidence intervals for the population mean (described in
Chapter 9) and perform tests of hypothesis (described in Chapter 10). The central limit
theorem states that, for large random samples, the shape of the sampling distribution of
the sample mean is close to the normal probability distribution. The approximation is more
accurate for large samples than for small samples. This is one of the most useful conclu-
sions in statistics. We can reason about the distribution of the sample mean with absolutely
no information about the shape of the population distribution from which the sample is
taken. In other words, the central limit theorem is true for all population distributions.

LO8-4
Recite the central limit
theorem and define the
mean and standard error
of the sampling distribution
of the sample mean.

CENTRAL LIMIT THEOREM If all samples of a particular size are selected from any
population, the sampling distribution of the sample mean is approximately a normal
distribution. This approximation improves with larger samples.

To further illustrate the central limit theorem, if the population follows a normal proba-
bility distribution, then for any sample size the sampling distribution of the sample mean
will also be normal. If the population distribution is symmetrical (but not normal), you will
see the normal shape of the distribution of the sample mean emerge with samples as
small as 10. On the other hand, if you start with a distribution that is skewed or has thick
tails, it may require samples of 30 or more to observe the normality feature. This concept
is summarized in Chart 8–3 for various population shapes. Observe the convergence to a
normal distribution regardless of the shape of the population distribution.

The idea that a distribution of sample means will converge to normality when the
population is not normal is illustrated in Charts 8–4, 8–5, and 8–6. We will discuss this
example in more detail shortly, but Chart 8–4 is a graph of a discrete probability distri-
bution that is positively skewed. There are many possible samples of 5 that might be
selected from this population. Suppose we randomly select 25 samples of size 5 from
the population portrayed in Chart 8–4 and compute the mean of each sample. These
results are shown in Chart 8–5. Notice that the shape of the distribution of sample
means has changed from the shape of the original population even though we selected
only 25 of the many possible samples. To put it another way, we selected 25 random
samples of n = 5 from a population that is positively skewed and found the distribution

10. There are five sales associates at Mid-Motors Ford. The five associates and the
number of cars they sold last week are:

Sales Associate Cars Sold

Peter Hankish 8
Connie Stallter 6
Juan Lopez 4
Ted Barnes 10
Peggy Chu 6

a. How many different samples of size 2 are possible?
b. List all possible samples of size 2, and compute the mean of each sample.
c. Compare the mean of the sampling distribution of the sample mean with that of

the population.
d. On a chart similar to Chart 8–2, compare the dispersion in sample means with

that of the population.

266 CHAPTER 8

of sample means is different from the shape of the population. As we take larger sam-
ples, that is, n = 20 instead of n = 5, we will find the distribution of the sample mean will
approach the normal distribution. Chart 8–6 shows the results of 25 random samples of
20 observations each from the same population. Observe the clear trend toward the
normal probability distribution. This is the point of the central limit theorem. The follow-
ing example/solution will underscore this condition.

n 5 2

n 5 6

n 5 30

n 5 2

n 5 6

n 5 30

n 5 2

n 5 6

n 5 30

n 5 2

n 5 6

n 5 30

x x x x

Populations

Sampling Distributions

x
_

x
_

x
_

x
_

x
_

x
_

x
_

x
_

x
_

x
_

x
_

x
_

CHART 8–3 Results of the Central Limit Theorem for Several Populations

E X A M P L E

Ed Spence began his sprocket business 20 years ago. The business has grown
over the years and now employs 40 people. Spence Sprockets Inc. faces some
major decisions regarding health care for these employees. Before making a final
decision on what health care plan to purchase, Ed decides to form a committee of
five representative employees. The committee will be asked to study the health
care issue carefully and make a recommendation as to what plan best fits the
employees’ needs. Ed feels the views of newer employees toward health care
may differ from those of more experienced employees. If Ed randomly selects this

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 267

committee, what can he expect in terms of the mean years with Spence Sprockets
for those on the committee? How does the shape of the distribution of years of
service of all employees (the population) compare with the shape of the sampling
distribution of the mean? The years of service (rounded to the nearest year) of the
40 employees currently on the Spence Sprockets Inc. payroll are as follows.

11 4 18 2 1 2 0 2 2 4
3 4 1 2 2 3 3 19 8 3
7 1 0 2 7 0 4 5 1 14
16 8 9 1 1 2 5 10 2 3

CHART 8–4 Years of Service for Spence Sprockets Inc. Employees

S O L U T I O N

Chart 8–4 shows a histogram for the frequency distribution of the years of ser-
vice for the population of 40 current employees. This distribution is positively
skewed. Why? Because the business has grown in recent years, the distribution
shows that 29 of the 40 employees have been with the company less than 6
years. Also, there are 11 employees who have worked at Spence Sprockets for
more than 6 years. In particular, four employees have been with the company
12 years or more (count the frequencies above 12). So there is a long tail in the
distribution of service years to the right, that is, the distribution is positively
skewed.

Let’s consider the first of Ed Spence’s problems. He would like to form a
committee of five employees to look into the health care question and suggest
what type of health care coverage would be most appropriate for the majority of
workers. How should he select the committee? If he selects the committee ran-
domly, what might he expect in terms of mean years of service for those on the
committee?

To begin, Ed writes the years of service for each of the 40 employees on
pieces of paper and puts them into an old baseball hat. Next, he shuffles the
pieces of paper and randomly selects five slips of paper. The years of service
for these five employees are 1, 9, 0, 19, and 14 years. Thus, the mean years of
service for these five sampled employees is 8.60 years. How does that com-
pare with the population mean? At this point, Ed does not know the population
mean, but the number of employees in the population is only 40, so he decides
to calculate the mean years of service for all his employees. It is 4.8 years,

268 CHAPTER 8

found by adding the years of service for all the employees and dividing the
total by 40.

μ =
11 + 4 + 18 + … + 2 + 3

40
= 4.80

The difference between a sample mean (x̄) and the population mean (μ) is
called sampling error. In other words, the difference of 3.80 years between the
sample mean of 8.60 and the population mean of 4.80 is the sampling error. It is
due to chance. Thus, if Ed selected these five employees to constitute the com-
mittee, their mean years of service would be larger than the population mean.

What would happen if Ed put the five pieces of paper back into the baseball
hat and selected another sample? Would you expect the mean of this second
sample to be exactly the same as the previous one? Suppose he selects another
sample of five employees and finds the years of service in this sample to be 7, 4,
4, 1, and 3. This sample mean is 3.80 years. The result of selecting 25 samples
of five employees and computing the mean for each sample is shown in
Table 8–5 and Chart 8–5. There are actually 658,008 possible samples of 5 from
the population of 40 employees, found by the combination formula (5–10) for 40
things taken 5 at a time. Notice the difference in the shape of the population and
the distribution of these sample means. The population of the years of service
for employees (Chart 8–4) is positively skewed, but the distribution of these 25
sample means does not reflect the same positive skew. There is also a differ-
ence in the range of the sample means versus the range of the population. The
population ranged from 0 to 19 years, whereas the sample means range from
1.6 to 8.6 years.

TABLE 8–5 Twenty-Five Random Samples of Five Employees

Sample Data

Sample Obs 1 Obs 2 Obs 3 Obs 4 Obs 5 Sum Mean

A 1 9 0 19 14 43 8.6
B 7 4 4 1 3 19 3.8
C 8 19 8 2 1 38 7.6
D 4 18 2 0 11 35 7.0
E 4 2 4 7 18 35 7.0
F 1 2 0 3 2 8 1.6
G 2 3 2 0 2 9 1.8
H 11 2 9 2 4 28 5.6
I 9 0 4 2 7 22 4.4
J 1 1 1 11 1 15 3.0
K 2 0 0 10 2 14 2.8
L 0 2 3 2 16 23 4.6
M 2 3 1 1 1 8 1.6
N 3 7 3 4 3 20 4.0
O 1 2 3 1 4 11 2.2
P 19 0 1 3 8 31 6.2
Q 5 1 7 14 9 36 7.2
R 5 4 2 3 4 18 3.6
S 14 5 2 2 5 28 5.6
T 2 1 1 4 7 15 3.0
U 3 7 1 2 1 14 2.8
V 0 1 5 1 2 9 1.8
W 0 3 19 4 2 28 5.6
X 4 2 3 4 0 13 2.6
Y 1 1 2 3 2 9 1.8

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 269

Now let’s change the example by increasing the size of each sample from 5
employees to 20. Table 8–6 reports the result of selecting 25 samples of 20 em-
ployees each and computing their sample means. These sample means are
shown graphically in Chart 8–6. Compare the shape of this distribution to the
population (Chart 8–4) and to the distribution of sample means where the sample
is n = 5 (Chart 8–5). You should observe two important features:

1. The shape of the distribution of the sample mean is different from that of the
population. In Chart 8–4, the distribution of all employees is positively skewed.
However, as we select random samples from this population, the shape of the

Sample Data

Sample Obs 1 Obs 2 Obs 3 – Obs 19 Obs 20 Sum Mean

A 3 8 3 – 4 16 79 3.95
B 2 3 8 – 3 1 65 3.25
C 14 5 0 – 19 8 119 5.95
D 9 2 1 – 1 3 87 4.35
E 18 1 2 – 3 14 107 5.35
F 10 4 4 – 2 1 80 4.00
G 5 7 11 – 2 4 131 6.55
H 3 0 2 – 16 5 85 4.25
I 0 0 18 – 2 3 80 4.00
J 2 7 2 – 3 2 81 4.05
K 7 4 5 – 1 2 84 4.20
L 0 3 10 – 0 4 81 4.05
M 4 1 2 – 1 2 88 4.40
N 3 16 1 – 11 1 95 4.75
O 2 19 2 – 2 2 102 5.10
P 2 18 16 – 4 3 100 5.00
Q 3 2 3 – 3 1 102 5.10
R 2 3 1 – 0 2 73 3.65
S 2 14 19 – 0 7 142 7.10
T 0 1 3 – 2 0 61 3.05
U 1 0 1 – 9 3 65 3.25
V 1 9 4 – 2 11 137 6.85
W 8 1 9 – 8 7 107 5.35
X 4 2 0 – 2 5 86 4.30
Y 1 2 1 – 1 18 101 5.05

TABLE 8–6 Twenty-Five Random Samples of 20 Employees

CHART 8–5 Histogram of Mean Years of Service for 25 Samples of Five Employees

270 CHAPTER 8

distribution of the sample mean changes. As we increase the size of the sam-
ple, the distribution of the sample mean approaches the normal probability dis-
tribution. This illustrates the central limit theorem.

2. There is less dispersion in the sampling distribution of the sample mean than in
the population distribution. In the population, the years of service ranged from
0 to 19 years. When we selected samples of 5, the sample means ranged from
1.6 to 8.6 years, and when we selected samples of 20, the means ranged from
3.05 to 7.10 years.

CHART 8–6 Histogram of Mean Years of Service for 25 Samples of 20 Employees

We can also compare the mean of the sample means to the population mean.
The mean of the 25 samples of 20 employees reported in Table 8–6 is 4.676
years.

μx =
3.95 + 3.25 + … + 4.30 + 5.05

25
= 4.676

We use the symbol μx to identify the mean of the distribution of the sample mean.
The subscript reminds us that the distribution is of the sample mean. It is read
“mu sub × bar.” We observe that the mean of the sample means, 4.676 years, is
very close to the population mean of 4.80.

What should we conclude from this example? The central limit theorem indicates
that, regardless of the shape of the population distribution, the sampling distribution
of the sample mean will move toward the normal probability distribution. The larger
the number of observations sampled or selected, the stronger the convergence. The
Spence Sprockets Inc. example shows how the central limit theorem works. We began
with a positively skewed population (Chart 8–4). Next, we selected 25 random sam-
ples of 5 observations, computed the mean of each sample, and finally organized
these 25 sample means into a histogram (Chart 8–5). We observe that the shape of
the sampling distribution of the sample mean is very different from that of the popula-
tion. The population distribution is positively skewed compared to the nearly normal
shape of the sampling distribution of the sample mean.

To further illustrate the effects of the central limit theorem, we increased the num-
ber of observations in each sample from 5 to 20. We selected 25 samples of 20 obser-
vations each and calculated the mean of each sample. Finally, we organized these
sample means into a histogram (Chart 8–6). The shape of the histogram in Chart 8–6 is
clearly moving toward the normal probability distribution.

If you go back to Chapter 6 where several binomial distributions with a “success” pro-
portion of .10 are shown in Chart 6–3 on page 190, you can see yet another demonstration
of the central limit theorem. Observe as n increases from 7 through 12 and 20 up to 40 that
the profile of the probability distributions moves closer and closer to a normal probability
distribution. Chart 8–6 also shows the convergence to normality as n increases. This

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 271

again reinforces the fact that, as more observations are sampled from any population
distribution, the shape of the sampling distribution of the sample mean will get closer
and closer to a normal distribution.

The central limit theorem, defined on page 265, does not say anything about
the dispersion of the sampling distribution of the sample mean or about the compar-
ison of the mean of the sampling distribution of the sample mean to the mean of the
population. However, in our Spence Sprockets example, we did observe that there
was less dispersion in the distribution of the sample mean than in the population
distribution by noting the difference in the range in the population and the range of
the sample means. We observe that the mean of the sample means is close to the
mean of the population. It can be demonstrated that the mean of the sampling distri-
bution is exactly equal to the population mean (i.e., μx = μ ), and if the standard devi-
ation in the population is σ, the standard deviation of the sample means is σ∕√n
where n is the number of observations in each sample. We refer to σ∕√n as the stan-
dard error of the mean. Its longer name is actually the standard deviation of the
sampling distribution of the sample mean.

In this section, we also came to other important conclusions.

1. The mean of the distribution of sample means will be exactly equal to the popula-
tion mean if we are able to select all possible samples of the same size from a given
population. That is:

μ = μx
Even if we do not select all samples, we can expect the mean of the distribution of

sample means to be close to the population mean.
2. There will be less dispersion in the sampling distribution of the sample mean than

in the population. If the standard deviation of the population is σ, the standard devi-
ation of the distribution of sample means is σ∕√n. Note that when we increase the
size of the sample, the standard error of the mean decreases.

STANDARD ERROR OF THE MEAN σ x =
σ

√n
(8–1)

Refer to the Spence Sprockets Inc. data on page 267. Select 10 random samples of five
employees each. Use the methods described earlier in the chapter and the Table of
Random Numbers (Appendix B.4) to find the employees to include in the sample. Compute
the mean of each sample and plot the sample means on a chart similar to Chart 8–4. What
is the mean of your 10 sample means?

S E L F - R E V I E W 8–4

11. Appendix B.4 is a table of random numbers that are uniformly distributed.
Hence, each digit from 0 to 9 has the same likelihood of occurrence.

a. Draw a graph showing the population distribution of random numbers. What is
the population mean?

b. Following are the first 10 rows of five digits from the table of random numbers in
Appendix B.4. Assume that these are 10 random samples of five values each.
Determine the mean of each sample and plot the means on a chart similar to

E X E R C I S E S

272 CHAPTER 8

Chart 8–4. Compare the mean of the sampling distribution of the sample mean
with the population mean.

0 2 7 1 1
9 4 8 7 3
5 4 9 2 1
7 7 6 4 0
6 1 5 4 5
1 7 1 4 7
1 3 7 4 8
8 7 4 5 5
0 8 9 9 9
7 8 8 0 4

12. Scrapper Elevator Company has 20 sales representatives who sell its product
throughout the United States and Canada. The number of units sold last month by
each representative is listed below. Assume these sales figures to be the popula-
tion values.

2 3 2 3 3 4 2 4 3 2 2 7 3 4 5 3 3 3 3 5

a. Draw a graph showing the population distribution.
b. Compute the mean of the population.
c. Select five random samples of 5 each. Compute the mean of each sample. Use

the methods described in this chapter and Appendix B.4 to determine the items
to be included in the sample.

d. Compare the mean of the sampling distribution of the sample mean to the pop-
ulation mean. Would you expect the two values to be about the same?

e. Draw a histogram of the sample means. Do you notice a difference in the shape
of the distribution of sample means compared to the shape of the population
distribution?

13. Consider all of the coins (pennies, nickels, quarters, etc.) in your pocket or
purse as a population. Make a frequency table beginning with the current
year and counting backward to record the ages (in years) of the coins. For
example, if the current year is 2017, then a coin with 2015 stamped on it is
2 years old.

a. Draw a histogram or other graph showing the population distribution.
b. Randomly select five coins and record the mean age of the sampled coins.

Repeat this sampling process 20 times. Now draw a histogram or other graph
showing the distribution of the sample means.

c. Compare the shapes of the two histograms.

14. Consider the digits in the phone numbers on a randomly selected page of your
local phone book a population. Make a frequency table of the final digit of 30 ran-
domly selected phone numbers. For example, if a phone number is 555-9704, re-
cord a 4.

a. Draw a histogram or other graph of this population distribution. Using the uni-
form distribution, compute the population mean and the population standard
deviation.

b. Also record the sample mean of the final four digits (9704 would lead to a mean
of 5). Now draw a histogram or other graph showing the distribution of the sam-
ple means.

c. Compare the shapes of the two histograms.

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 273

USING THE SAMPLING DISTRIBUTION
OF THE SAMPLE MEAN
The previous discussion is important because most business decisions are made on the
basis of sample information. Here are some examples.

1. Arm & Hammer Company wants to ensure that its laundry detergent actually con-
tains 100 fluid ounces, as indicated on the label. Historical summaries from the fill-
ing process indicate the mean amount per container is 100 fluid ounces and the
standard deviation is 2 fluid ounces. At 10 a.m., a quality technician measures 40
containers and finds the mean amount per container is 99.8 fluid ounces. Should
the technician shut down the filling operation?

2. A. C. Nielsen Company provides information to organizations advertising on televi-
sion. Prior research indicates that adult Americans watch an average of 6.0 hours
per day of television. The standard deviation is 1.5 hours. What is the probability
that we could randomly select a sample of 50 adults and find that they watch an
average of 6.5 hours or more of television per day?

3. Haughton Elevator Company wishes to develop specifications for the number of
people who can ride in a new oversized elevator. Suppose the mean weight of an
adult is 160 pounds and the standard deviation is 15 pounds. However, the distri-
bution of weights does not follow the normal probability distribution. It is positively
skewed. For a sample of 30 adults, what is the likelihood that their mean weight is
170 pounds or more?

We can answer the questions in each of these situations using the ideas discussed in the
previous section. In each case, we have a population with information about its mean and
standard deviation. Using this information and sample size, we can determine the distribu-
tion of sample means and compute the probability that a sample mean will fall within a
certain range. The sampling distribution will be normally distributed under two conditions:

1. When the samples are taken from populations known to follow the normal distribu-
tion. In this case, the size of the sample is not a factor.

2. When the shape of the population distribution is not known, sample size is import-
ant. In general, the sampling distribution will be normally distributed as the sample
size approaches infinity. In practice, a sampling distribution will be close to a normal
distribution with samples of at least 30 observations.

We use formula (7–5) from the previous chapter to convert any normal distribution
to the standard normal distribution. Using formula (7–5) to compute z values, we can
use the standard normal table, Appendix B.3, to find the probability that an observation
is within a specific range. The formula for finding a z value is:

z =
x − μ

σ
In this formula, x is the value of the random variable, μ is the population mean, and σ is
the population standard deviation.

However, when we sample from populations, we are interested in the distribution of
X , the sample mean, instead of X, the value of one observation. That is the first change
we make in formula (7–5). The second is that we use the standard error of the mean of
n observations instead of the population standard deviation. That is, we use σ∕√n in the
denominator rather than σ. Therefore, to find the likelihood of a sample mean within a
specified range, we first use the following formula to find the corresponding z value.
Then we use Appendix B.3 or statistical software to determine the probability.

LO8-5
Apply the central limit
theorem to calculate
probabilities.

z =
x − μ
σ∕√n

(8–2)FINDING THE z VALUE OF x
− WHEN THE

POPULATION STANDARD DEVIATION IS KNOWN

274 CHAPTER 8

The following example/solution will show the application.

E X A M P L E

The Quality Assurance Department for Cola, Inc. maintains records regarding the
amount of cola in its jumbo bottle. The actual amount of cola in each bottle is critical
but varies a small amount from one bottle to the next. Cola, Inc. does not wish to
underfill the bottles because it will have a problem with truth in labeling. On the
other hand, it cannot overfill each bottle because it would be giving cola away,
hence reducing its profits. Records maintained by the Quality Assurance Depart-
ment indicate that the amount of cola follows the normal probability distribution.
The mean amount per bottle is 31.2 ounces and the population standard deviation
is 0.4 ounce. At 8 a.m. today the quality technician randomly selected 16 bottles
from the filling line. The mean amount of cola contained in the bottles is 31.38
ounces. Is this an unlikely result? Is it likely the process is putting too much soda in
the bottles? To put it another way, is the sampling error of 0.18 ounce unusual?

S O L U T I O N

We use the results of the previous section to find the likelihood that we could select
a sample of 16 (n) bottles from a normal population with a mean of 31.2 (μ) ounces
and a population standard deviation of 0.4 (σ) ounce and find the sample mean to
be 31.38 (x ) or more. We use formula (8–2) to find the value of z.

z =
x − μ
σ∕√n

=
31.38 − 31.20

0.4∕√16
= 1.80

The numerator of this equation, x − μ = 31.38 − 31.20 = .18, is the sampling error.
The denominator, σ∕√n = 0.4∕√16 = 0.1, is the standard error of the sampling
distribution of the sample mean. So the z values express the sampling error in
standard units—in other words, the standard error.

Next, we compute the likelihood of a z value greater than 1.80. In Appendix B.3,
locate the probability corresponding to a z value of 1.80. It is .4641. The likeli-
hood of a z value greater than 1.80 is .0359, found by .5000 − .4641.

What do we conclude? It is unlikely, less than a 4% chance, we could select a
sample of 16 observations from a normal population with a mean of 31.2 ounces and
a population standard deviation of 0.4 ounce and find the sample mean equal to or
greater than 31.38 ounces. We conclude the process is putting too much cola in the
bottles. The quality technician should see the production supervisor about reducing
the amount of soda in each bottle. This information is summarized in Chart 8–7.

CHART 8–7 Sampling Distribution of the Mean Amount of Cola in a Jumbo Bottle

31.20 31.38 Ounces (x– )

0 1.80 z value

.4641

.0359

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 275

Refer to the Cola, Inc. information. Suppose the quality technician selected a sample of
16 jumbo bottles that averaged 31.08 ounces. What can you conclude about the filling
process?

S E L F - R E V I E W 8–5

15. A normal population has a mean of 60 and a standard deviation of 12. You select a
random sample of 9. Compute the probability the sample mean is:

a. Greater than 63.
b. Less than 56.
c. Between 56 and 63.

16. A normal population has a mean of 75 and a standard deviation of 5. You select a
sample of 40. Compute the probability the sample mean is:

a. Less than 74.
b. Between 74 and 76.
c. Between 76 and 77.
d. Greater than 77.

17. In a certain section of Southern California, the distribution of monthly rent for a
one-bedroom apartment has a mean of $2,200 and a standard deviation of $250.
The distribution of the monthly rent does not follow the normal distribution. In fact,
it is positively skewed. What is the probability of selecting a sample of 50 one-
bedroom apartments and finding the mean to be at least $1,950 per month?

18. According to an IRS study, it takes a mean of 330 minutes for taxpayers to prepare,
copy, and electronically file a 1040 tax form. This distribution of times follows the
normal distribution and the standard deviation is 80 minutes. A consumer watch-
dog agency selects a random sample of 40 taxpayers.

a. What is the standard error of the mean in this example?
b. What is the likelihood the sample mean is greater than 320 minutes?
c. What is the likelihood the sample mean is between 320 and 350 minutes?
d. What is the likelihood the sample mean is greater than 350 minutes?

E X E R C I S E S

C H A P T E R S U M M A R Y

I. The characteristics of the F distribution are:
A. There are many reasons for sampling a population.
B. The results of a sample may adequately estimate the value of the population parame-

ter, thus saving time and money.
C. It may be too time-consuming to contact all members of the population.
D. It may be impossible to check or locate all the members of the population.
E. The cost of studying all the items in the population may be prohibitive.
F. Often testing destroys the sampled item and it cannot be returned to the population.

II. In an unbiased or probability sample, all members of the population have a chance of
being selected for the sample. There are several probability sampling methods.
A. In a simple random sample, all members of the population have the same chance of

being selected for the sample.
B. In a systematic sample, a random starting point is selected, and then every kth item

thereafter is selected for the sample.
C. In a stratified sample, the population is divided into several groups, called strata, and

then a random sample is selected from each stratum.
D. In cluster sampling, the population is divided into primary units, then samples are

drawn from the primary units.
III. The sampling error is the difference between a population parameter and a sample

statistic.

276 CHAPTER 8

IV. The sampling distribution of the sample mean is a probability distribution of all possible
sample means of the same sample size.
A. For a given sample size, the mean of all possible sample means selected from a pop-

ulation is equal to the population mean.
B. There is less variation in the distribution of the sample mean than in the population

distribution.
C. The standard error of the mean measures the variation in the sampling distribution of

the sample mean. The standard error is found by:

σ
X =

σ
√n

(8–1)

D. If the population follows a normal distribution, the sampling distribution of the sample
mean will also follow the normal distribution for samples of any size. If the population
is not normally distributed, the sampling distribution of the sample mean will approach
a normal distribution when the sample size is at least 30. Assume the population stan-
dard deviation is known. To determine the probability that a sample mean falls in a
particular region, use the following formula.

z =
x − μ
σ∕√n

(8–2)

P R O N U N C I A T I O N K E Y

SYMBOL MEANING PRONUNCIATION

μX Mean of the sampling distribution mu sub x bar
of the sample mean

σ X Population standard error of sigma sub x bar
the sample mean

C H A P T E R E X E R C I S E S

19. The 25 retail stores located in the North Towne Square Mall numbered 00 through 24
are:

00 Elder-Beerman
01 Sears
02 Deb Shop
03 Frederick’s of Hollywood
04 Petries
05 Easy Dreams
06 Summit Stationers
07 E. B. Brown Opticians
08 Kay-Bee Toy & Hobby

09 Lion Store
10 Bootleggers
11 Formal Man
12 Leather Ltd.
13 Barnes and Noble
14 Pat’s Hallmark
15 Things Remembered
16 Pearle Vision Express
17 Dollar Tree

18 County Seat
19 Kid Mart
20 Lerner
21 Coach House Gifts
22 Spencer Gifts
23 CPI Photo Finish
24 Regis Hairstylists

a. If the following random numbers are selected, which retail stores should be con-
tacted for a survey? 11, 65, 86, 62, 06, 10, 12, 77, and 04

b. Select a random sample of four retail stores. Use Appendix B.4.
c. A systematic sampling procedure will be used. The first store will be selected and

then every third store. Which stores will be in the sample?
20. The Medical Assurance Company is investigating the cost of a routine office visit to fam-

ily-practice physicians in the Rochester, New York, area. The following is a list of 39
family-practice physicians in the region. Physicians are to be randomly selected and
contacted regarding their charges. The 39 physicians have been coded from 00 to 38.
Also noted is whether they are in practice by themselves (S), have a partner (P), or are in
a group practice (G).

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 277

a. The random numbers obtained from Appendix B.4 are 31, 94, 43, 36, 03, 24, 17,
and 09. Which physicians should be contacted?

b. Select a random sample of four physicians using the random numbers of Appendix
B.4.

c. Using systematic random sampling, every fifth physician is selected starting with the
fourth physician in the list. Which physicians will be contacted?

d. Select a sample that includes two physicians in solo practice (S), two in partnership
(P), and one in group practice (G). Explain your procedure.

21. A population consists of the following three values: 1, 2, and 3.
a. Sampling with replacement, list all possible samples of size 2 and compute the mean

of every sample.
b. Find the means of the distribution of the sample mean and the population mean.

Compare the two values.
c. Compare the dispersion of the population with that of the sample mean.
d. Describe the shapes of the two distributions.

22. Based on all student records at Camford University, students spend an average of
5.5 hours per week playing organized sports. The population’s standard deviation is
2.2 hours per week. Based on a sample of 121 students, Healthy Lifestyles Incorpo-
rated (HLI) would like to apply the central limit theorem to make various estimates.
a. Compute the standard error of the sample mean.
b. What is the chance HLI will find a sample mean between 5 and 6 hours?
c. Calculate the probability that the sample mean will be between 5.3 and 5.7 hours.
d. How strange would it be to obtain a sample mean greater than 6.5 hours?

23. The manufacturer of eComputers, an economy-priced computer, recently completed
the design for a new laptop model. eComputer’s top management would like some
assistance in pricing the new laptop. Two market research firms were contacted and
asked to prepare a pricing strategy. Marketing-Gets-Results tested the new eComput-
ers laptop with 50 randomly selected consumers who indicated they plan to purchase
a laptop within the next year. The second marketing research firm, called Market-
ing-Reaps-Profits, test-marketed the new eComputers laptop with 200 current laptop
owners. Which of the marketing research companies’ test results will be more useful?
Discuss why.

Type of Type of
Number Physician Practice Number Physician Practice

00 R. E. Scherbarth, M.D. S 20 Gregory Yost, M.D. P
01 Crystal R. Goveia, M.D. P 21 J. Christian Zona, M.D. P
02 Mark D. Hillard, M.D. P 22 Larry Johnson, M.D. P
03 Jeanine S. Huttner, M.D. P 23 Sanford Kimmel, M.D. P
04 Francis Aona, M.D. P 24 Harry Mayhew, M.D. S
05 Janet Arrowsmith, M.D. P 25 Leroy Rodgers, M.D. S
06 David DeFrance, M.D. S 26 Thomas Tafelski, M.D. S
07 Judith Furlong, M.D. S 27 Mark Zilkoski, M.D. G
08 Leslie Jackson, M.D. G 28 Ken Bertka, M.D. G
09 Paul Langenkamp, M.D. S 29 Mark DeMichiei, M.D. G
10 Philip Lepkowski, M.D. S 30 John Eggert, M.D. P
11 Wendy Martin, M.D. S 31 Jeanne Fiorito, M.D. P
12 Denny Mauricio, M.D. P 32 Michael Fitzpatrick, M.D. P
13 Hasmukh Parmar, M.D. P 33 Charles Holt, D.O. P
14 Ricardo Pena, M.D. P 34 Richard Koby, M.D. P
15 David Reames, M.D. P 35 John Meier, M.D. P
16 Ronald Reynolds, M.D. G 36 Douglas Smucker, M.D. S
17 Mark Steinmetz, M.D. G 37 David Weldy, M.D. P
18 Geza Torok, M.D. S 38 Cheryl Zaborowski, M.D. P
19 Mark Young, M.D. P

278 CHAPTER 8

24. Answer the following questions in one or two well-constructed sentences.
a. What happens to the standard error of the mean if the sample size is increased?
b. What happens to the distribution of the sample means if the sample size is

increased?
c. When using sample means to estimate the population mean, what is the benefit of

using larger sample sizes?
25. There are 25 motels in Goshen, Indiana. The number of rooms in each motel follows:

90 72 75 60 75 72 84 72 88 74 105 115 68 74 80 64 104 82 48 58 60 80 48 58 100

a. Using a table of random numbers (Appendix B.4), select a random sample of five
motels from this population.

b. Obtain a systematic sample by selecting a random starting point among the first five
motels and then select every fifth motel.

c. Suppose the last five motels are “cut-rate” motels. Describe how you would select a
random sample of three regular motels and two cut-rate motels.

26. As a part of their customer-service program, United Airlines randomly selected 10 pas-
sengers from today’s 9 a.m. Chicago–Tampa flight. Each sampled passenger will be in-
terviewed about airport facilities, service, and so on. To select the sample, each
passenger was given a number on boarding the aircraft. The numbers started with 001
and ended with 250.
a. Select 10 usable numbers at random using Appendix B.4.
b. The sample of 10 could have been chosen using a systematic sample. Choose the

first number using Appendix B.4, and then list the numbers to be interviewed.
c. Evaluate the two methods by giving the advantages and possible disadvantages.
d. What other way could a random sample be selected from the 250 passengers?

27. Suppose your statistics instructor gave six examinations during the semester. You re-
ceived the following exam scores (percent correct): 79, 64, 84, 82, 92, and 77. To com-
pute your final course grade, the instructor decided to randomly select two exam scores,
compute their mean, and use this score to determine your final course grade.
a. Compute the population mean.
b. How many different samples of two test grades are possible?
c. List all possible samples of size 2 and compute the mean of each.
d. Compute the mean of the sample means and compare it to the population mean.
e. If you were a student, would you like this arrangement? Would the result be different

from dropping the lowest score? Write a brief report.
28. At the downtown office of First National Bank, there are five tellers. Last week, the tell-

ers made the following number of errors each: 2, 3, 5, 3, and 5.
a. How many different samples of two tellers are possible?
b. List all possible samples of size 2 and compute the mean of each.
c. Compute the mean of the sample means and compare it to the population mean.

29. The Quality Control Department employs five technicians during the day shift. Listed
below is the number of times each technician instructed the production foreman to shut
down the manufacturing process last week.

Technician Shutdowns Technician Shutdowns

Taylor 4 Rousche 3
Hurley 3 Huang 2
Gupta 5

a. How many different samples of two technicians are possible from this population?
b. List all possible samples of two observations each and compute the mean of each

sample.
c. Compare the mean of the sample means with the population mean.
d. Compare the shape of the population distribution with the shape of the distribution of

the sample means.
30. The Appliance Center has six sales representatives at its North Jacksonville outlet. The

following table lists the number of refrigerators sold by each representative last month.

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 279

Sales Number Sales Number
Representative Sold Representative Sold

Zina Craft 54 Jan Niles 48
Woon Junge 50 Molly Camp 50
Ernie DeBrul 52 Rachel Myak 52

a. How many samples of size 2 are possible?
b. Select all possible samples of size 2 and compute the mean number sold.
c. Organize the sample means into a frequency distribution.
d. What is the mean of the population? What is the mean of the sample means?
e. What is the shape of the population distribution?
f. What is the shape of the distribution of the sample mean?

31. Power +, Inc. produces AA batteries used in remote-controlled toy cars. The mean life of
these batteries follows the normal probability distribution with a mean of 35.0 hours and
a standard deviation of 5.5 hours. As a part of its quality assurance program, Power +,
Inc. tests samples of 25 batteries.
a. What can you say about the shape of the distribution of the sample mean?
b. What is the standard error of the distribution of the sample mean?
c. What proportion of the samples will have a mean useful life of more than 36 hours?
d. What proportion of the samples will have a mean useful life greater than 34.5

hours?
e. What proportion of the samples will have a mean useful life between 34.5 and 36.0

hours?
32. Majesty Video Production Inc. wants the mean length of its advertisements to be 30

seconds. Assume the distribution of ad length follows the normal distribution with a
population standard deviation of 2 seconds. Suppose we select a sample of 16 ads
produced by Majesty.
a. What can we say about the shape of the distribution of the sample mean time?
b. What is the standard error of the mean time?
c. What percent of the sample means will be greater than 31.25 seconds?
d. What percent of the sample means will be greater than 28.25 seconds?
e. What percent of the sample means will be greater than 28.25 but less than 31.25

seconds?
33. Recent studies indicate that the typical 50-year-old woman spends $350 per year for

personal-care products. The distribution of the amounts spent follows a normal distribu-
tion with a standard deviation of $45 per year. We select a random sample of 40 women.
The mean amount spent for those sampled is $335. What is the likelihood of finding a
sample mean this large or larger from the specified population?

34. Information from the American Institute of Insurance indicates the mean amount of life
insurance per household in the United States is $165,000. This distribution follows the
normal distribution with a standard deviation of $40,000.
a. If we select a random sample of 50 households, what is the standard error of the

mean?
b. What is the expected shape of the distribution of the sample mean?
c. What is the likelihood of selecting a sample with a mean of at least $167,000?
d. What is the likelihood of selecting a sample with a mean of more than $155,000?
e. Find the likelihood of selecting a sample with a mean of more than $155,000 but

less than $167,000.
35. In the United States, the mean age of men when they marry for the first time follows the

normal distribution with a mean of 29 years. The standard deviation of the distribution is
2.5 years. For a random sample of 60 men, what is the likelihood that the age when they
were first married is less than 29.3. years?

36. A recent study by the Greater Los Angeles Taxi Drivers Association showed that the
mean fare charged for service from Hermosa Beach to Los Angeles International Airport
is $21 and the standard deviation is $3.50. We select a sample of 15 fares.
a. What is the likelihood that the sample mean is between $20 and $23?
b. What must you assume to make the above calculation?

280 CHAPTER 8

37. Crossett Trucking Company claims that the mean weight of its delivery trucks when they
are fully loaded is 6,000 pounds and the standard deviation is 150 pounds. Assume that
the population follows the normal distribution. Forty trucks are randomly selected and
weighed. Within what limits will 95% of the sample means occur?

38. The mean amount purchased by a typical customer at Churchill’s Grocery Store is $23.50,
with a standard deviation of $5.00. Assume the distribution of amounts purchased follows
the normal distribution. For a sample of 50 customers, answer the following questions.
a. What is the likelihood the sample mean is at least $25.00?
b. What is the likelihood the sample mean is greater than $22.50 but less than $25.00?
c. Within what limits will 90% of the sample means occur?

39. The mean performance score on a physical fitness test for Division I student-athletes is
947 with a standard deviation of 205. If you select a random sample of 60 of these stu-
dents, what is the probability the mean is below 900?

40. Suppose we roll a fair die two times.
a. How many different samples are there?
b. List each of the possible samples and compute the mean.
c. On a chart similar to Chart 8–2, compare the distribution of sample means with the

distribution of the population.
d. Compute the mean and the standard deviation of each distribution and compare them.

41. Following is a list of the 50 states with the numbers 0 through 49 assigned to them.

Number State

0 Alabama
1 Alaska
2 Arizona
3 Arkansas
4 California
5 Colorado
6 Connecticut
7 Delaware
8 Florida
9 Georgia
10 Hawaii
11 Idaho
12 Illinois
13 Indiana
14 Iowa
15 Kansas
16 Kentucky
17 Louisiana
18 Maine
19 Maryland
20 Massachusetts
21 Michigan
22 Minnesota
23 Mississippi
24 Missouri

Number State

25 Montana
26 Nebraska
27 Nevada
28 New Hampshire
29 New Jersey
30 New Mexico
31 New York
32 North Carolina
33 North Dakota
34 Ohio
35 Oklahoma
36 Oregon
37 Pennsylvania
38 Rhode Island
39 South Carolina
40 South Dakota
41 Tennessee
42 Texas
43 Utah
44 Vermont
45 Virginia
46 Washington
47 West Virginia
48 Wisconsin
49 Wyoming

a. You wish to select a sample of eight from this list. The selected random numbers are 45,
15, 81, 09, 39, 43, 90, 26, 06, 45, 01, and 42. Which states are included in the sample?

b. Select a systematic sample of every sixth item using the digit 02 as the starting point.
Which states are included?

42. Human Resource Consulting (HRC) surveyed a random sample of 60 Twin Cities construc-
tion companies to find information on the costs of their health care plans. One of the
items being tracked is the annual deductible that employees must pay. The Minnesota
Department of Labor reports that historically the mean deductible amount per employee
is $502 with a standard deviation of $100.

SAMPLING METHODS AND THE CENTRAL LIMIT THEOREM 281

a. Compute the standard error of the sample mean for HRC.
b. What is the chance HRC finds a sample mean between $477 and $527?
c. Calculate the likelihood that the sample mean is between $492 and $512.
d. What is the probability the sample mean is greater than $550?

43. Over the past decade, the mean number of hacking attacks experienced by members of
the Information Systems Security Association is 510 per year with a standard deviation
of 14.28 attacks. The number of attacks per year is normally distributed. Suppose noth-
ing in this environment changes.
a. What is the likelihood this group will suffer an average of more than 600 attacks in

the next 10 years?
b. Compute the probability the mean number of attacks over the next 10 years is be-

tween 500 and 600.
c. What is the possibility they will experience an average of less than 500 attacks over

the next 10 years?
44. An economist uses the price of a gallon of milk as a measure of inflation. She finds that

the average price is $3.82 per gallon and the population standard deviation is $0.33.
You decide to sample 40 convenience stores, collect their prices for a gallon of milk,
and compute the mean price for the sample.
a. What is the standard error of the mean in this experiment?
b. What is the probability that the sample mean is between $3.78 and $3.86?
c. What is the probability that the difference between the sample mean and the popu-

lation mean is less than $0.01?
d. What is the likelihood the sample mean is greater than $3.92?

45. Nike's annual report says that the average American buys 6.5 pairs of sports shoes per
year. Suppose a sample of 81 customers is surveyed and the population standard devi-
ation of sports shoes purchased per year is 2.1.
a. What is the standard error of the mean in this experiment?
b. What is the probability that the sample mean is between 6 and 7 pairs of sports shoes?
c. What is the probability that the difference between the sample mean and the popu-

lation mean is less than 0.25 pair?
d. What is the likelihood the sample mean is greater than 7 pairs?

D A T A A N A L Y T I C S

46. Refer to the North Valley Real Estate data, which report information on the homes
sold last year. Assume the 105 homes is a population. Compute the population mean
and the standard deviation of price. Select a sample of 10 homes. Compute the mean.
Determine the likelihood of a sample mean price this high or higher.

47. Refer to the Baseball 2016 data, which report information on the 30 Major League
Baseball teams for the 2016 season. Over the last decade, the mean attendance per
team followed a normal distribution with a mean of 2.45 million per team and a standard
deviation of .71 million. Compute the mean attendance per team for the 2016 season.
Determine the likelihood of a sample mean attendance this large or larger from the
population.

48. Refer to the Lincolnville School District bus data. Information provided by manufac-
turers of school buses suggests the mean maintenance cost per year is $4,400 per bus
with a standard deviation of $1,000. Compute the mean maintenance cost for the Lin-
colnville buses. Does the Lincolnville data seem to be in line with that reported by the
manufacturer? Specifically, what is the probability of Lincolnville’s mean annual mainte-
nance cost, or greater, given the manufacturer’s data?

Estimation and
Confidence Intervals

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO9-1 Compute and interpret a point estimate of a population mean.

LO9-2 Compute and interpret a confidence interval for a population mean.

LO9-3 Compute and interpret a confidence interval for a population proportion.

LO9-4 Calculate the required sample size to estimate a population proportion or population mean.

LO9-5 Adjust a confidence interval for finite populations.

THE AMERICAN RESTAURANT ASSOCIATION collected information on the number of
meals eaten outside the home per week by young married couples. A survey of 60 couples
showed the sample mean number of meals eaten outside the home was 2.76 meals per
week, with a standard deviation of 0.75 meal per week. Construct a 99% confidence
interval for the population mean. (See Exercise 36 and LO9-2.)

© Jack Hollingsworth/Photodisc/Getty Images

9

ESTIMATION AND CONFIDENCE INTERVALS 283

INTRODUCTION
The previous chapter began our discussion of sampling. We introduced both the rea-
sons for, and the methods of, sampling. The reasons for sampling were:

• Contacting the entire population is too time-consuming.
• Studying all the items in the population is often too expensive.
• The sample results are usually adequate.
• Certain tests are destructive.
• Checking all the items is physically impossible.

There are several methods of sampling. Simple random sampling is the most widely
used method. With this type of sampling, each member of the population has the same
chance of being selected to be a part of the sample. Other methods of sampling include
systematic sampling, stratified sampling, and cluster sampling.

Chapter 8 assumes information about the population, such as the mean, the stan-
dard deviation, or the shape of the population, is known. In most business situations,
such information is not available. In fact, one purpose of sampling is to estimate some of
these values. For example, you select a sample from a population and use the mean of
the sample to estimate the mean of the population.

This chapter considers several important aspects of sampling. We begin by study-
ing point estimates. A point estimate is a single value (point) computed from sample
information and used to estimate a population value. For example, we may be inter-
ested in the number of hours worked by consultants employed by Boston Consulting
Group. Using simple random sampling, we select 50 consultants and ask each of them
how many hours they worked last week. The sample’s mean is a point estimate of the
unknown population mean. A more informative approach is to present a range of values
where we expect the population parameter to occur. Such a range of values is called a
confidence interval.

Frequently in business we need to determine the size of a sample. How many vot-
ers should a polling organization contact to forecast the election outcome? How many
products do we need to examine to ensure our quality level? This chapter also devel-
ops a strategy for determining the appropriate number of observations in the sample.

POINT ESTIMATE FOR A POPULATION MEAN
A point estimate is a single statistic used to estimate a population parameter. Suppose
Best Buy Inc. wants to estimate the mean age of people who purchase LCD HDTV tele-
visions. They select a random sample of 75 recent purchases, determine the age of
each buyer, and compute the mean age of the buyers in the sample. The mean of this
sample is a point estimate of the population mean.

POINT ESTIMATE The statistic, computed from sample information, that estimates
a population parameter.

The following examples illustrate point estimates of population means.

1. Tourism is a major source of income for many Caribbean countries, such as Barbados.
Suppose the Bureau of Tourism for Barbados wants an estimate of the mean
amount spent by tourists visiting the country. It would not be feasible to contact
each tourist. Therefore, 500 tourists are randomly selected as they depart the
country and asked in detail about their spending while visiting Barbados. The mean
amount spent by the sample of 500 tourists is an estimate of the unknown popula-
tion parameter. That is, we let the sample mean serve as a point estimate of the
population mean.

LO9-1
Compute and interpret
a point estimate of a
population mean.

STATISTICS IN ACTION

On all new cars, a fuel
economy estimate is promi-
nently displayed on the
window sticker as required
by the Environmental Pro-
tection Agency (EPA). Often,
fuel economy is a factor in
a consumer’s choice of a
new car because of fuel
costs or environmental con-
cerns. The fuel estimates
for a 2016 BMW 328i
Sedan (4-cylinder, auto-
matic) are 23 miles per
gallon (mpg) in the city and
35 on the highway. The EPA
recognizes that actual fuel
economy may differ from
the estimates by noting,
“No test can simulate all
possible combinations of
conditions and climate,
driver behavior, and car
care habits. Actual mileage
depends on how, when,
and where the vehicle is
driven. The EPA has found
that the mpg obtained by
most drivers will be within a
few mpg of the estimates.”

284 CHAPTER 9

2. Litchfield Home Builders Inc. builds homes in the southeastern region of the
United States. One of the major concerns of new buyers is the date when the
home will be completed. Recently, Litchfield has been telling customers, “Your
home will be completed 45 working days from the date we begin installing dry-
wall.” The customer relations department at Litchfield wishes to compare this
pledge with recent experience. A sample of 50 homes completed this year re-
vealed that the point estimate of the population mean is 46.7 working days from
the start of drywall to the completion of the home. Is it reasonable to conclude
that the population mean is still 45 days and that the difference between the sam-

ple mean (46.7 days) and the proposed population mean (45 days) is
sampling error? In other words, is the sample mean significantly differ-
ent from the population mean?
3. Recent medical studies indicate that exercise is an important part of

a person’s overall health. The director of human resources at OCF, a
large glass manufacturer, wants an estimate of the number of hours
per week employees spend exercising. A sample of 70 employees
reveals the mean number of hours of exercise last week is 3.3. This
value is a point estimate of the unknown population mean.

The sample mean, x , is not the only point estimate of a popula-
tion parameter. For example, p, a sample proportion, is a point esti-
mate of π, the population proportion; and s, the sample standard
deviation, is a point estimate of σ, the population standard deviation.

CONFIDENCE INTERVALS FOR
A POPULATION MEAN
A point estimate, however, tells only part of the story. While we expect the point esti-
mate to be close to the population parameter, we would like to measure how close it
really is. A confidence interval serves this purpose. For example, we estimate the mean
yearly income for construction workers in the New York–New Jersey area is $85,000.
The range of this estimate might be from $81,000 to $89,000. We can describe how
confident we are that the population parameter is in the interval. We might say, for in-
stance, that we are 90% confident that the mean yearly income of construction workers
in the New York–New Jersey area is between $81,000 and $89,000.

CONFIDENCE INTERVAL A range of values constructed from sample data so that
the population parameter is likely to occur within that range at a specified probability.
The specified probability is called the level of confidence.

To compute a confidence interval for a population mean, we will consider two situations:

• We use sample data to estimate μ with x and the population standard deviation (σ)
is known.

• We use sample data to estimate μ with x and the population standard deviation is
unknown. In this case, we substitute the sample standard deviation (s) for the pop-
ulation standard deviation (σ).

There are important distinctions in the assumptions between these two situations. We
first consider the case where σ is known.

Population Standard Deviation, Known σ
A confidence interval is computed using two statistics: the sample mean, x , and the
standard deviation. From previous chapters, you know that the standard deviation is an
important statistic because it measures the dispersion, or variation, of a population or

LO9-2
Compute and interpret a
confidence interval for a
population mean.

© Andersen Ross/Getty Images RF

ESTIMATION AND CONFIDENCE INTERVALS 285

sampling distribution. In computing a confidence interval, the standard deviation is used
to compute the limits of the confidence interval.

To demonstrate the idea of a confidence interval, we start with one simplifying
assumption. That assumption is that we know the value of the population standard
deviation, σ. Typically, we know the population standard deviation in situations
where we have a long history of collected data. Examples are data from monitoring
processes that fill soda bottles or cereal boxes, and the results of the SAT Reason-
ing Test (for college admission). Knowing σ allows us to simplify the development of
a confidence interval because we can use the standard normal distribution from
Chapter 8.

Recall that the sampling distribution of the sample mean is the distribution of
all sample means, x , of sample size n from a population. The population standard
deviation, σ, is known. From this information, and the central limit theorem, we
know that the sampling distribution follows the normal probability distribution with a
mean of μ and a standard deviation σ∕√n. Also recall that this value is called the
standard error.

The results of the central limit theorem allow us to make the following general con-
fidence interval statements using z-statistics:

1. Ninety-five percent of all confidence intervals computed from random samples se-
lected from a population will contain the population mean. These intervals are com-
puted using a z-statistic equal to 1.96.

2. Ninety percent of all confidence intervals computed from random samples selected
from a population will contain the population mean. These confidence intervals are
computed using a z-statistic equal to 1.65.

These confidence interval statements provide examples of levels of confidence and
are called a 95% confidence interval and a 90% confidence interval. The 95% and
90% are the levels of confidence and refer to the percentage of similarly constructed
intervals that would include the parameter being estimated—in this case, μ, the popu-
lation mean.

How are the values of 1.96 and 1.65 obtained? First, let’s look for the z value for
a 95% confidence interval. The following diagram and Table 9–1 will help explain.
Table 9–1 is a reproduction of the standard normal table in Appendix B. However,
many rows and columns have been eliminated to allow us to better focus on particular
rows and columns.

1. First, we divide the confidence level in half, so .9500/2 = .4750.
2. Next, we find the value .4750 in the body of Table 9–1. Note that .4750 is located

in the table at the intersection of a row and a column.
3. Locate the corresponding row value in the left margin, which is 1.9, and the column

value in the top margin, which is .06. Adding the row and column values gives us a
z value of 1.96.

4. Thus, the probability of finding a z value between 0 and 1.96 is .4750.
5. Likewise, because the normal distribution is symmetric, the probability of finding a

z value between −1.96 and 0 is also .4750.
6. When we add these two probabilities, the probability that a z value is between

−1.96 and 1.96 is .9500.

For the 90% level of confidence, we follow the same steps. First, one-half of the
desired confidence interval is .4500. A search of Table 9–1 does not reveal this exact
value. However, it is between two values, .4495 and .4505. As in step three, we locate
each value in the table. The first, .4495, corresponds to a z value of 1.64 and the sec-
ond, .4505, corresponds to a z value of 1.65. To be conservative, we will select the
larger of the two z values, 1.65, and the exact level of confidence is 90.1%, or 2(0.4505).
Next, the probability of finding a z value between −1.65 and 0 is .4505, and the proba-
bility that a z value is between −1.65 and 1.65 is .9010.

286 CHAPTER 9

How do we determine a 95% confidence interval? The width of the interval is deter-
mined by two factors: (1) the level of confidence, as described in the previous section, and
(2) the size of the standard error of the mean. To find the standard error of the mean, recall
from the previous chapter [see formula (8–1) on page 271] that the standard error of the
mean reports the variation in the distribution of sample means. It is really the standard
deviation of the distribution of sample means. The formula is repeated below:

σ
x =

σ
√n

where:

σ
x is the symbol for the standard error of the mean. We use a Greek letter because

it is a population value, and the subscript x reminds us that it refers to a sam-
pling distribution of the sample means.

σ is the population standard deviation.
n is the number of observations in the sample.

The size of the standard error is affected by two values. The first is the standard
deviation of the population. The larger the population standard deviation, σ, the larger
σ∕√n. If the population is homogeneous, resulting in a small population standard devi-
ation, the standard error will also be small. However, the standard error is also affected
by the number of observations in the sample. A large number of observations in the
sample will result in a small standard error of estimate, indicating that there is less vari-
ability in the sample means.

We can summarize the calculation for a 95% confidence interval using the following
formula:

x ± 1.96
σ

√n

TABLE 9–1 The Standard Normal Table for Selected Values

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07

⫶ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884

z value

.025

1.96–1.96

.025 .4750.4750

ESTIMATION AND CONFIDENCE INTERVALS 287

Similarly, a 90.1% confidence interval is computed as follows:

x ± 1.65
σ

√n

The values 1.96 and 1.65 are z values corresponding to the 95% and the 90.1% con-
fidence intervals, respectively. However, we are not restricted to these values. We can
select any confidence level between 0 and 100% and find the corresponding value for z.
In general, a confidence interval for the population mean when the population follows the
normal distribution and the population standard deviation is known is computed by:

x ± z
σ

√n
(9–1)CONFIDENCE INTERVAL FOR A

POPULATION MEAN WITH σ KNOWN

To explain these ideas, consider the following example. Del Monte
Foods distributes diced peaches in 4.5-ounce plastic cups. To ensure
that each cup contains at least the required amount, Del Monte sets the
filling operation to dispense 4.51 ounces of peaches and gel in each
cup. Of course, not every cup will contain exactly 4.51 ounces of peaches
and gel. Some cups will have more and others less. From historical data,
Del Monte knows that 0.04 ounce is the standard deviation of the filling
process and that the amount, in ounces, follows the normal probability
distribution. The quality control technician selects a sample of 64 cups at
the start of each shift, measures the amount in each cup, computes the
mean fill amount, and then develops a 95% confidence interval for
the population mean. Using the confidence interval, is the process filling

the cups to the desired amount? This morning’s sample of 64 cups had a sample mean
of 4.507 ounces. Based on this information, the 95% confidence interval is:

x ± 1.96
σ

√n
= 4.507 ± 1.96

0.04
√64

= 4.507 ± 0.0098

The 95% confidence interval estimates that the population mean is between
4.4972 ounces and 4.5168 ounces of peaches and gel. Recall that the process is set to
fill each cup with 4.51 ounces. Because the desired fill amount of 4.51 ounces is in this
interval, we conclude that the filling process is achieving the desired results. In other
words, it is reasonable to conclude that the sample mean of 4.507 could have come
from a population distribution with a mean of 4.51 ounces.

In this example, we observe that the population mean of 4.51 ounces is in the con-
fidence interval. But this is not always the case. If we selected 100 samples of 64 cups
from the population, calculated the sample mean, and developed a confidence interval
based on each sample, we would expect to find the population mean in about 95 of the
100 intervals. Or, in contrast, about five of the intervals would not contain the population
mean. From Chapter 8, this is called sampling error. The following example details re-
peated sampling from a population.

Courtesy Del Monte Corporation

E X A M P L E

The American Management Association (AMA) is studying the income of store man-
agers in the retail industry. A random sample of 49 managers reveals a sample
mean of $45,420. The standard deviation of this population is $2,050. The associ-
ation would like answers to the following questions:

1. What is the population mean?
2. What is a reasonable range of values for the population mean?
3. How do we interpret these results?

288 CHAPTER 9

mm – 1.96
σ
n

x1

x2

x3

x4

x5

x6

Sample 1 of size 49. It includes
the population mean.
Sample 2 of size 49. It includes
the population mean.
Sample 3 of size 49. It includes
the population mean.
Sample 4 of size 49. It includes
the population mean.

Sample 6 of size 49. It includes
the population mean.

Sample 5 of size 49. It does not
include the population mean.

Population mean

m + 1.96
σ
n

Scale of x

S O L U T I O N

Generally, distributions of salary and income are positively skewed because a few
individuals earn considerably more than others, thus skewing the distribution in the
positive direction. Fortunately, the central limit theorem states that the sampling
distribution of the mean becomes a normal distribution as sample size increases. In
this instance, a sample of 49 store managers is large enough that we can assume
that the sampling distribution will follow the normal distribution. Now to answer the
questions posed in the example.

1. What is the population mean? In this case, we do not know. We do know the
sample mean is $45,420. Hence, our best estimate of the unknown population
value is the corresponding sample statistic. Thus, the sample mean of $45,420
is a point estimate of the unknown population mean.

2. What is a reasonable range of values for the population mean? The AMA
decides to use the 95% level of confidence. To determine the corresponding
confidence interval, we use formula (9–1):

x ± z
σ

√n
= $45,420 ± 1.96

$2,050
√49

= $45,420 ± $574

The confidence interval limits are $44,846 and $45,994 determined by sub-
tracting $574 and adding $574 to the sample mean. The degree or level of
confidence is 95% and the confidence interval is from $44,846 to $45,994.
The value, $574, is called the margin of error.

3. How do we interpret these results? Suppose we select many samples of
49 store managers, perhaps several hundred. For each sample, we compute the
mean and then construct a 95% confidence interval, such as we did in the previ-
ous section. We could expect about 95% of these confidence intervals to contain
the population mean. About 5% of the intervals would not contain the population
mean annual income, which is μ. However, a particular confidence interval either
contains the population parameter or it does not. The following diagram shows
the results of selecting samples from the population of store managers in the
retail industry, computing the mean of each, and then, using formula (9–1), deter-
mining a 95% confidence interval for the population mean. Note that not all inter-
vals include the population mean. Both the endpoints of the fifth sample are less
than the population mean. We attribute this to sampling error, and it is the risk we
assume when we select the level of confidence.

ESTIMATION AND CONFIDENCE INTERVALS 289

A Computer Simulation
With statistical software, we can create random samples of a desired sample size, n,
from a population. For each sample of n observations with corresponding numerical
values, we can calculate the sample mean. With the sample mean, population standard
deviation, and confidence level, we can determine the confidence interval for each sam-
ple. Then, using all samples and the confidence intervals, we can find the frequency that
the population mean is included in the confidence intervals. The following example
does just that.

E X A M P L E

From many years in the automobile leasing business, Town Bank knows that the
mean distance driven on an automobile with a four-year lease is 50,000 miles and
the standard deviation is 5,000 miles. These are population values. Suppose Town
Bank would like to experiment with the idea of sampling to estimate the population
mean of 50,000 miles. Town Bank decides to choose a sample size of 30 observa-
tions and a 95% confidence interval to estimate the population mean. Based on the
experiment, we want to count the number of confidence intervals that include the
population mean of 50,000. We expect about 95%, or 57 of the 60 intervals, will
include the population mean. To make the calculations easier to understand, we’ll
conduct the study in thousands of miles, instead of miles.

S O L U T I O N

Using statistical software, 60 random samples of 30 observations, n = 30, are gen-
erated and the sample means for each sample computed. Then, using the n of 30
and a standard error of 0.913 (σ∕√n = 5∕√30), a 95% confidence interval is com-
puted for each sample. The results of the experiment are shown next.

Sample Observations Sample 95% Confidence Limits

Sample 1 2 3 4 5 – – – 26 27 28 29 30 Mean Lower Limit Upper Limit

1 56 47 47 48 58 – – – 55 62 48 61 57 51.6 49.811 53.389
2 55 51 52 40 53 – – – 47 54 55 55 45 50.77 48.981 52.559
3 42 46 48 46 41 – – – 50 52 50 47 45 48.63 46.841 50.419
4 52 49 55 47 49 – – – 46 56 49 43 50 49.9 48.111 51.689
5 48 50 53 48 45 – – – 46 51 61 49 47 49.03 47.241 50.819
6 49 44 47 46 48 – – – 51 44 51 52 43 47.73 45.941 49.519
7 50 53 39 50 46 – – – 55 47 43 50 57 50.2 48.411 51.989
8 47 51 49 58 44 – – – 49 57 54 48 48 51.17 49.381 52.959
9 51 44 47 56 45 – – – 45 51 49 49 52 50.33 48.541 52.119
10 45 44 52 52 56 – – – 52 51 52 50 48 50 48.211 51.789
11 43 52 54 46 54 – – – 43 46 49 52 52 51.2 49.411 52.989
12 57 53 48 42 55 – – – 49 44 46 46 48 49.8 48.011 51.589
13 53 39 47 51 53 – – – 42 44 44 55 58 49.6 47.811 51.389
14 56 55 45 43 57 – – – 48 51 52 55 47 49.03 47.241 50.819
15 49 50 39 45 44 – – – 49 43 44 51 51 49.37 47.581 51.159
16 46 44 55 53 55 – – – 44 53 53 43 44 50.13 48.341 51.919
17 64 52 55 55 43 – – – 58 46 52 58 55 52.47 50.681 54.259
18 57 51 60 40 53 – – – 50 51 53 46 52 50.1 48.311 51.889
19 50 49 51 57 45 – – – 53 52 40 45 52 49.6 47.811 51.389
20 45 46 53 57 49 – – – 49 43 43 53 48 49.47 47.681 51.259
21 52 45 51 52 45 – – – 43 49 49 58 53 50.43 48.641 52.219
22 48 48 52 49 40 – – – 50 47 54 51 45 47.53 45.741 49.319

(continued)

290 CHAPTER 9

23 48 50 50 53 44 – – – 48 57 52 44 39 49.1 47.311 50.889
24 51 51 40 54 52 – – – 54 45 50 57 48 50.13 48.341 51.919
25 48 63 41 52 41 – – – 48 50 48 44 53 49.33 47.541 51.119
26 47 45 48 59 49 – – – 44 47 49 55 42 49.63 47.841 51.419
27 52 45 60 51 52 – – – 52 50 54 46 52 49.4 47.611 51.189
28 46 48 46 57 51 – – – 51 50 51 41 52 49.33 47.541 51.119
29 46 48 45 42 48 – – – 49 43 59 46 50 48.27 46.481 50.059
30 55 48 47 48 48 – – – 47 59 54 51 42 50.53 48.741 52.319
31 58 49 56 46 46 – – – 44 51 47 51 46 50.77 48.981 52.559
32 53 54 52 58 55 – – – 53 52 45 44 51 50 48.211 51.789
33 50 57 56 51 51 – – – 58 47 50 56 46 49.7 47.911 51.489
34 61 48 49 53 54 – – – 46 46 56 45 54 50.03 48.241 51.819
35 43 42 43 46 49 – – – 49 49 56 51 45 49.43 47.641 51.219
36 39 48 48 51 44 – – – 54 52 47 50 52 50.07 48.281 51.859
37 48 43 57 42 54 – – – 52 50 59 50 52 50.17 48.381 51.959
38 55 43 49 57 45 – – – 41 51 51 52 52 49.5 47.711 51.289
39 47 49 58 54 54 – – – 50 56 51 56 58 50.37 48.581 52.159
40 47 56 41 50 54 – – – 46 56 61 61 45 51.6 49.811 53.389
41 48 47 42 47 62 – – – 44 47 49 55 43 49.43 47.641 51.219
42 46 49 43 36 52 – – – 45 51 46 51 43 47.67 45.881 49.459
43 44 48 49 48 51 – – – 47 52 51 48 49 49.63 47.841 51.419
44 45 52 54 54 49 – – – 49 45 53 50 52 49.07 47.281 50.859
45 54 46 54 45 48 – – – 55 38 56 50 62 49.53 47.741 51.319
46 48 50 49 52 51 – – – 53 57 58 46 50 49.9 48.111 51.689
47 54 55 46 55 50 – – – 56 54 50 55 51 50.5 48.711 52.289
48 45 47 47 63 44 – – – 45 53 42 53 50 50.1 48.311 51.889
49 47 47 48 54 56 – – – 50 48 54 49 51 49.93 48.141 51.719
50 45 61 51 45 54 – – – 55 52 47 45 53 51.03 49.241 52.819
51 49 62 43 49 48 – – – 49 58 42 58 52 51.07 49.281 52.859
52 54 52 62 43 54 – – – 51 57 49 58 55 50.17 48.381 51.959
53 46 50 59 56 46 – – – 50 51 52 54 53 50.47 48.681 52.259
54 52 50 48 48 58 – – – 58 52 43 61 54 51.77 49.981 53.559
55 45 44 46 56 46 – – – 43 45 63 48 56 49.37 47.581 51.159
56 60 50 56 51 43 – – – 45 43 49 59 54 50.37 48.581 52.159
57 59 56 43 47 52 – – – 49 54 50 50 57 49.53 47.741 51.319
58 52 55 48 51 40 – – – 53 51 51 52 47 49.77 47.981 51.559
59 53 50 44 53 52 – – – 47 50 55 46 51 50.07 48.281 51.859
60 55 54 50 52 43 – – – 57 50 48 47 53 52.07 50.281 53.859

To explain, in the first row, the statistical software computed 30 random obser-
vations from a population distribution with a mean of 50 and a standard deviation of
5. To conserve space, only observations 1 through 5 and 26 through 30 are listed.
The first sample’s mean is computed and listed as 51.6. In the next columns, the
upper and lower limits of the 95% confidence interval for the first sample are shown.
The confidence interval calculation for the first sample follows:

x ± 1.96
σ

√n
= 51.6 ± 1.96

5
√30

= 51.6 ± 1.789

This calculation is repeated for all samples. The results of the experiment show
that 93.33%, or 56 of the sixty confidence intervals include the population mean of
50. 93.33% is close to the estimate that 95%, or 57, of the intervals will include the
population mean. Using the complement, we expected 5%, or three, of the intervals
would not include the population mean. The experiment resulted in 6.67%, or four,
of the 60 intervals that did not include the population mean. The particular intervals,
6, 17, 22, and 42, are highlighted in yellow. This is another example of sampling

ESTIMATION AND CONFIDENCE INTERVALS 291

error, or the possibility that a particular random sample may not be a good repre-
sentation of the population. In each of these four samples, the mean of the sample
is either much less or much more than the population mean. Because of random
sampling, the mean of the sample is not a good estimate of the population mean,
and the confidence interval based on the sample’s mean does not include the pop-
ulation mean.

The Bun-and-Run is a franchise fast-food restaurant located in the Northeast specializing in
half-pound hamburgers, fish sandwiches, and chicken sandwiches. Soft drinks and French
fries also are available. The Marketing Department of Bun-and-Run Inc. reports that the
distribution of daily sales for their restaurants follows the normal distribution and that the
population standard deviation is $3,000. A sample of 40 franchises showed the mean daily
sales to be $20,000.
(a) What is the population mean of daily sales for Bun-and-Run franchises?
(b) What is the best estimate of the population mean? What is this value called?
(c) Develop a 95% confidence interval for the population mean of daily sales.
(d) Interpret the confidence interval.

S E L F - R E V I E W 9–1

1. A sample of 49 observations is taken from a normal population with a standard
deviation of 10. The sample mean is 55. Determine the 99% confidence interval for
the population mean.

2. A sample of 81 observations is taken from a normal population with a standard
deviation of 5. The sample mean is 40. Determine the 95% confidence interval for
the population mean.

3. A sample of 250 observations is selected from a normal population with a popula-
tion standard deviation of 25. The sample mean is 20.

a. Determine the standard error of the mean.
b. Explain why we can use formula (9–1) to determine the 95% confidence interval.
c. Determine the 95% confidence interval for the population mean.

4. Suppose you know σ and you want an 85% confidence level. What value would you
use as z in formula (9–1)?

5. A research firm conducted a survey to determine the mean amount Americans
spend on coffee during a week. They found the distribution of weekly spending
followed the normal distribution with a population standard deviation of $5. A sam-
ple of 49 Americans revealed that x = $20.

a. What is the point estimate of the population mean? Explain what it indicates.
b. Using the 95% level of confidence, determine the confidence interval for μ. Ex-

plain what it indicates.
6. Refer to the previous exercise. Instead of 49, suppose that 64 Americans were sur-

veyed about their weekly expenditures on coffee. Assume the sample mean re-
mained the same.

a. What is the 95% confidence interval estimate of μ?
b. Explain why this confidence interval is narrower than the one determined in the

previous exercise.
7. Bob Nale is the owner of Nale’s Quick Fill. Bob would like to estimate the mean

number of gallons of gasoline sold to his customers. Assume the number of gallons
sold follows the normal distribution with a population standard deviation of 2.30
gallons. From his records, he selects a random sample of 60 sales and finds the
mean number of gallons sold is 8.60.

a. What is the point estimate of the population mean?
b. Develop a 99% confidence interval for the population mean.
c. Interpret the meaning of part (b).

E X E R C I S E S

292 CHAPTER 9

Population Standard Deviation, σ Unknown
In the previous section, we assumed the population standard deviation was known. In the
case involving Del Monte 4.5-ounce cups of peaches, there would likely be a long history
of measurements in the filling process. Therefore, it is reasonable to assume the standard
deviation of the population is available. However, in most sampling situations the popula-
tion standard deviation (σ) is not known. Here are some examples where we wish to esti-
mate the population means and it is unlikely we would know the population standard
deviations. Suppose each of these studies involves students at West Virginia University.

• The Dean of the Business College wants to estimate the mean number of hours full-
time students work at paying jobs each week. He selects a sample of 30 students,
contacts each student, and asks them how many hours they worked last week. From
the sample information, he can calculate the sample mean, but it is not likely he would
know or be able to find the population standard deviation (σ) required in formula (9–1).

• The Dean of Students wants to estimate the distance the typical commuter student
travels to class. She selects a sample of 40 commuter students, contacts each, and
determines the one-way distance from each student’s home to the center of cam-
pus. From the sample data, she calculates the mean travel distance, that is, x. It is
unlikely the standard deviation of the population would be known or available,
again making formula (9–1) unusable.

• The Director of Student Loans wants to estimate the mean amount owed on stu-
dent loans at the time of his/her graduation. The director selects a sample of 20
graduating students and contacts each to find the information. From the sample
information, the director can estimate the mean amount. However, to develop a
confidence interval using formula (9–1), the population standard deviation is neces-
sary. It is not likely this information is available.

Fortunately we can use the sample standard deviation to estimate the population
standard deviation. That is, we use s, the sample standard deviation, to estimate σ, the
population standard deviation. But in doing so, we cannot use formula (9–1). Because
we do not know σ, we cannot use the z distribution. However, there is a remedy. We use
the sample standard deviation and replace the z distribution with the t distribution.

The t distribution is a continuous probability distribution, with many similar charac-
teristics to the z distribution. William Gosset, an English brewmaster, was the first to
study the t distribution. He was particularly concerned with the exact behavior of the
distribution of the following statistic:

t =
x − μ
s∕√n

where s is an estimate of σ. He noticed differences between estimating σ based on s,
especially when s was calculated from a very small sample. The t distribution and the
standard normal distribution are shown graphically in Chart 9–1. Note particularly that
the t distribution is flatter, more spread out, than the standard normal distribution. This is
because the standard deviation of the t distribution is larger than that of the standard
normal distribution.

The following characteristics of the t distribution are based on the assumption that
the population of interest is normal, or nearly normal.

8. Dr. Patton is a professor of English. Recently she counted the number of misspelled
words in a group of student essays. She noted the distribution of misspelled words
per essay followed the normal distribution with a population standard deviation of
2.44 words per essay. For her 10 a.m. section of 40 students, the mean number of
misspelled words was 6.05. Construct a 95% confidence interval for the mean num-
ber of misspelled words in the population of student essays.

STATISTICS IN ACTION

The t distribution was cre-
ated by William Gosset who
was born in England in
1876 and died there in
1937. He worked for many
years at Arthur Guinness,
Sons and Company. In fact,
in his later years he was in
charge of the Guinness
Brewery in London.
Guinness preferred its em-
ployees to use pen names
when publishing papers,
so in 1908, when Gosset
wrote “The Probable Error
of a Mean,” he used the
name “Student.” In this
paper, he first described the
properties of the t distribu-
tion and used it to monitor
the brewing process so
that the beer met Guinness’
quality standards.

ESTIMATION AND CONFIDENCE INTERVALS 293

Because Student’s t distribution has a greater spread than the z distribution, the
value of t for a given level of confidence is larger in magnitude than the corresponding
z value. Chart 9–2 shows the values of z for a 95% level of confidence and of t for the

0

z distribution

t distribution

CHART 9–1 The Standard Normal Distribution and Student’s t Distribution

Distribution of z

.025

1.96 Scale of z

Distribution of t

.025

.025 .95

.95
.025

2.776

-1.96 0

-2.776 0 Scale of t

n = 5

CHART 9–2 Values of z and t for the 95% Level of Confidence

• It is, like the z distribution, a continuous distribution.
• It is, like the z distribution, bell-shaped and symmetrical.
• There is not one t distribution, but rather a family of t distributions. All t distributions

have a mean of 0, but their standard deviations differ according to the sample size,
n. There is a t distribution for a sample size of 20, another for a sample size of 22,
and so on. The standard deviation for a t distribution with 5 observations is larger
than for a t distribution with 20 observations.

• The t distribution is more spread out and flatter at the center than the standard nor-
mal distribution (see Chart 9–1). As the sample size increases, however, the t distri-
bution approaches the standard normal distribution because the errors in using s to
estimate σ decrease with larger samples.

294 CHAPTER 9

same level of confidence when the sample size is n = 5. How we obtained the actual
value of t will be explained shortly. For now, observe that for the same level of confi-
dence the t distribution is flatter or more spread out than the standard normal distribu-
tion. Note that the 95% confidence interval using a t statistic will be wider compared to
an interval using a z statistic.

To develop a confidence interval for the population mean using the t distribution,
we adjust formula (9–1) as follows.

CONFIDENCE INTERVAL FOR THE
POPULATION MEAN, σ UNKNOWN

x ± t
s

√n
(9–2)

To determine a confidence interval for the population mean with an unknown pop-
ulation standard deviation, we:

1. Assume the sampled population is either normal or approximately normal. This as-
sumption may be questionable for small sample sizes, and becomes more valid
with larger sample sizes.

2. Estimate the population standard deviation (σ) with the sample standard devia-
tion (s).

3. Use the t distribution rather than the z distribution.

We should be clear at this point. We base the decision on whether to use the t or
the z on whether or not we know σ, the population standard deviation. If we know
the population standard deviation, then we use z. If we do not know the population
standard deviation, then we must use t. Chart 9–3 summarizes the decision- making
process.

Use the t distribution Use the z distribution

Is the population
standard

deviation known?

Assume the
population is

normal

YesNo

CHART 9–3 Determining When to Use the z Distribution or the t Distribution

The following example will illustrate a confidence interval for a population mean
when the population standard deviation is unknown and how to find the appropriate
value of t in a table.

E X A M P L E

A tire manufacturer wishes to investigate the tread life of its tires. A sample of 10
tires driven 50,000 miles revealed a sample mean of 0.32 inch of tread remain-
ing with a standard deviation of 0.09 inch. Construct a 95% confidence interval

ESTIMATION AND CONFIDENCE INTERVALS 295

for the population mean. Would it be reasonable for the manufacturer to con-
clude that after 50,000 miles the population mean amount of tread remaining is
0.30 inch?

S O L U T I O N

To begin, we assume the population distribution is normal. In this case, we don’t
have a lot of evidence, but the assumption is probably reasonable. We know the
sample standard deviation is .09 inch. We use formula (9–2):

x ± t
s

√n

From the information given, x = 0.32, s = 0.09, and n = 10. To find the value of t,
we use Appendix B.5, a portion of which is reproduced in Table 9–2. Appendix B.5
is also reproduced on the inside back cover of the text. The first step for locating t
is to move across the columns identified for “Confidence Intervals” to the level of
confidence requested. In this case, we want the 95% level of confidence, so we
move to the column headed “95%.” The column on the left margin is identified as
“df.” This refers to the number of degrees of freedom. The number of degrees of
freedom is the number of observations in the sample minus the number of samples,
written n − 1. In this case, it is 10 − 1 = 9. Why did we decide there were 9 degrees
of freedom? When sample statistics are being used, it is necessary to determine the
number of values that are free to vary.

TABLE 9–2 A Portion of the t Distribution

To illustrate the meaning of degrees of freedom: Assume that the mean of
four numbers is known to be 5. The four numbers are 7, 4, 1, and 8. The deviations
of these numbers from the mean must total 0. The deviations of +2, −1, −4, and
+3 do total 0. If the deviations of +2, −1, and −4 are known, then the value of +3
is fixed (restricted) in order to satisfy the condition that the sum of the deviations
must equal 0. Thus, 1 degree of freedom is lost in a sampling problem involving

Confidence Intervals

80% 90% 95% 98% 99%

Level of Significance for One-Tailed Test

df 0.10 0.05 0.025 0.010 0.005

Level of Significance for Two-Tailed Test

0.20 0.10 0.05 0.02 0.01

1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169

296 CHAPTER 9

Here is another example to clarify the use of confidence intervals. Suppose an
article in your local newspaper reported that the mean time to sell a residential prop-
erty in the area is 60 days. You select a random sample of 20 homes sold in the last
year and find the mean selling time is 65 days. Based on the sample data, you de-
velop a 95% confidence interval for the population mean. You find that the endpoints
of the confidence interval are 62 days and 68 days. How do you interpret this result?
You can be reasonably confident the population mean is within this range. The value
proposed for the population mean, that is, 60 days, is not included in the interval. It
is not likely that the population mean is 60 days. The evidence indicates the state-
ment by the local newspaper may not be correct. To put it another way, it seems un-
reasonable to obtain the sample you did from a population that had a mean selling
time of 60 days.

The following example will show additional details for determining and interpreting
a confidence interval. We used Minitab to perform the calculations.

the standard deviation of the sample because one number (the arithmetic mean)
is known. For a 95% level of confidence and 9 degrees of freedom, we select the
row with 9 degrees of freedom. The value of t is 2.262.

To determine the confidence interval, we substitute the values in formula
(9–2).

x ± t
s

√n
= 0.32 ± 2.262

0.09
√10

= 0.32 ± 0.64

The endpoints of the confidence interval are 0.256 and 0.384. How do
we interpret this result? If we repeated this study 200 times, calculating the
95% confidence interval with each sample’s mean and the standard deviation,
we expect 190 of the intervals would include the population mean. Ten of the
intervals would not include the population mean. This is the effect of sampling
error. A further interpretation is to conclude that the population mean is in
this interval. The manufacturer can be reasonably sure (95% confident)
that the mean remaining tread depth is between 0.256 and 0.384 inch. Be-
cause the value of 0.30 is in this interval, it is possible that the mean of the
population is 0.30.

E X A M P L E

The manager of the Inlet Square Mall, near Ft. Myers, Florida, wants to estimate the
mean amount spent per shopping visit by customers. A sample of 20 customers
reveals the following amounts spent.

$48.16 $42.22 $46.82 $51.45 $23.78 $41.86 $54.86
37.92 52.64 48.59 50.82 46.94 61.83 61.69
49.17 61.46 51.35 52.68 58.84 43.88

What is the best estimate of the population mean? Determine a 95% confidence
interval. Interpret the result. Would it be reasonable to conclude that the population
mean is $50? What about $60?

S O L U T I O N

The mall manager assumes that the population of the amounts spent follows the
normal distribution. This is a reasonable assumption in this case. Additionally, the

ESTIMATION AND CONFIDENCE INTERVALS 297

© McGraw-Hill Education/Andrew
Resek, photographer

confidence interval technique is quite powerful and
tends to commit any errors on the conservative side
if the population is not normal. We should not make
the normality assumption when the population is se-
verely skewed or when the distribution has “thick
tails.” In Chapter 16, we present methods for han-
dling this issue if we cannot make the normality as-
sumption. In this case, the normality assumption is
reasonable.

The population standard deviation is not known. Hence, it is appropriate to
use the t distribution and formula (9–2) to find the confidence interval. We use the
Minitab system to find the mean and standard deviation of this sample. The re-
sults are shown below.

The mall manager does not know the population mean. The sample mean is the
best estimate of that value. From the pictured Minitab output, the mean is $49.348,
which is the best estimate, the point estimate, of the unknown population mean.

We use formula (9–2) to find the confidence interval. The value of t is avail-
able from Appendix B.5. There are n − 1 = 20 − 1 = 19 degrees of freedom. We
move across the row with 19 degrees of freedom to the column for the 95% con-
fidence level. The value at this intersection is 2.093. We substitute these values
into formula (9–2) to find the confidence interval.

x ± t
s

√n
= $49.348 ± 2.093

$9.012
√20

= $49.348 ± $4.218

The endpoints of the confidence interval are $45.130 and $53.566. It is rea-
sonable to conclude that the population mean is in that interval.

The manager of Inlet Square wondered whether the population mean could
have been $50 or $60. The value of $50 is within the confidence interval. It is reason-
able that the population mean could be $50. The value of $60 is not in the confi-
dence interval. Hence, we conclude that the population mean is unlikely to be $60.

The calculations to construct a confidence interval are also available in Excel. The
output follows. Note that the sample mean ($49.348) and the sample standard devia-
tion ($9.012) are the same as those in the Minitab calculations. In the Excel output, the
last line also includes the margin of error, which is the amount that is added and
subtracted from the sample mean to form the endpoints of the confidence interval. This
value is found from

t
s

√n
= 2.093

$9.012
√20

= $4.218

298 CHAPTER 9

Before doing the confidence interval exercises, we would like to point out a useful
characteristic of the t distribution that will allow us to use the t table to quickly find both
z and t values. Earlier in this section on page 293, we detailed the characteristics of the
t distribution. The last point indicated that as we increase the sample size the t distribu-
tion approaches the z distribution. In fact, when we reach an infinitely large sample, the
t distribution is exactly equal to the z distribution.

To explain, Table 9–3 is a portion of Appendix B.5, with the degrees of freedom
between 4 and 99 omitted. To find the appropriate z value for a 95% confidence interval,

TABLE 9–3 Student’s t Distribution

Confidence Interval

80% 90% 95% 98% 99% 99.9%

Level of Significance for One-Tailed Test, α
0.1 0.05 0.025 0.01 0.005 0.0005

Level of Significance for Two-Tailed Test, α
0.2 0.1 0.05 0.02 0.01 0.001

1 3.078 6.314 12.706 31.821 63.657 636.619
2 1.886 2.920 4.303 6.965 9.925 31.599
3 1.638 2.353 3.182 4.541 5.841 12.924
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
100 1.290 1.660 1.984 2.364 2.626 3.390
120 1.289 1.658 1.980 2.358 2.617 3.373
140 1.288 1.656 1.977 2.353 2.611 3.361
160 1.287 1.654 1.975 2.350 2.607 3.352
180 1.286 1.653 1.973 2.347 2.603 3.345
200 1.286 1.653 1.972 2.345 2.601 3.340
∞ 1.282 1.645 1.960 2.326 2.576 3.291

df
(degrees of
freedom)

ESTIMATION AND CONFIDENCE INTERVALS 299

we begin by going to the confidence interval section and selecting the column headed
“95%.” Move down that column to the last row, which is labeled “∞,” or infinite degrees
of freedom. The value reported is 1.960, the same value that we found using the stan-
dard normal distribution in Appendix B.3. This confirms the convergence of the t dis-
tribution to the z distribution.

What does this mean for us? Instead of searching in the body of the z table, we
can go to the last row of the t table and find the appropriate value to build a confi-
dence interval. An additional benefit is that the values have three decimal places.
So, using this table for a 90% confidence interval, go down the column headed
“90%” and see the value 1.645, which is a more precise z value that can be used for
the 90% confidence level. Other z values for 98% and 99% confidence intervals are
also available with three decimals. Note that we will use the t table, which is summa-
rized in Table 9–3, to find the z values with three decimals for all following exercises
and problems.

Dottie Kleman is the “Cookie Lady.” She bakes and sells cookies at locations in the
Philadelphia area. Ms. Kleman is concerned about absenteeism among her workers. The
information below reports the number of days absent for a sample of 10 workers during
the last two-week pay period.

4 1 2 2 1 2 2 1 0 3

(a) Determine the mean and the standard deviation of the sample.
(b) What is the population mean? What is the best estimate of that value?
(c) Develop a 95% confidence interval for the population mean. Assume that the popula-

tion distribution is normal.
(d) Explain why the t distribution is used as a part of the confidence interval.
(e) Is it reasonable to conclude that the typical worker does not miss any days during a

pay period?

S E L F - R E V I E W 9–2

9. Use Appendix B.5 to locate the value of t under the following conditions.
a. The sample size is 12 and the level of confidence is 95%.
b. The sample size is 20 and the level of confidence is 90%.
c. The sample size is 8 and the level of confidence is 99%.

10. Use Appendix B.5 to locate the value of t under the following conditions.
a. The sample size is 15 and the level of confidence is 95%.
b. The sample size is 24 and the level of confidence is 98%.
c. The sample size is 12 and the level of confidence is 90%.

11. The owner of Britten’s Egg Farm wants to estimate the mean number of eggs pro-
duced per chicken. A sample of 20 chickens shows they produced an average of
20 eggs per month with a standard deviation of 2 eggs per month.

a. What is the value of the population mean? What is the best estimate of this
value?

b. Explain why we need to use the t distribution. What assumption do you need to
make?

c. For a 95% confidence interval, what is the value of t?
d. Develop the 95% confidence interval for the population mean.
e. Would it be reasonable to conclude that the population mean is 21 eggs? What

about 25 eggs?
12. The U.S. Dairy Industry wants to estimate the mean yearly milk consumption.

A sample of 16 people reveals the mean yearly consumption to be 45 gallons

E X E R C I S E S

300 CHAPTER 9

A CONFIDENCE INTERVAL FOR A POPULATION
PROPORTION
The material presented so far in this chapter uses the ratio scale of measurement. That
is, we use such variables as incomes, weights, distances, and ages. We now want to
consider situations such as the following:

• The career services director at Southern Technical Institute reports that
80% of its graduates enter the job market in a position related to their field
of study.

• A company representative claims that 45% of Burger King sales are made
at the drive-through window.

• A survey of homes in the Chicago area indicated that 85% of the new con-
struction had central air conditioning.

• A recent survey of married men between the ages of 35 and 50 found that
63% felt that both partners should earn a living.

These examples illustrate the nominal scale of measurement when the
outcome is limited to two values. In these cases, an observation is classified
into one of two mutually exclusive groups. For example, a graduate of Southern
Tech either entered the job market in a position related to his or her field of
study or not. A particular Burger King customer either made a purchase at the
drive-through window or did not make a purchase at the drive-through win-
dow. We can talk about the groups in terms of proportions.

PROPORTION The fraction, ratio, or percent indicating the part of the
sample or the population having a particular trait of interest.

LO9-3
Compute and interpret a
confidence interval for a
population proportion.

© Vytautas Kielaitis/Shutterstock.com

with a standard deviation of 20 gallons. Assume the population distribution is
normal.

a. What is the value of the population mean? What is the best estimate of this value?
b. Explain why we need to use the t distribution. What assumption do you need

to make?
c. For a 90% confidence interval, what is the value of t?
d. Develop the 90% confidence interval for the population mean.
e. Would it be reasonable to conclude that the population mean is 48 gallons?

13. Merrill Lynch Securities and Health Care Retirement Inc. are two large employ-
ers in downtown Toledo, Ohio. They are considering jointly offering child care for
their employees. As a part of the feasibility study, they wish to estimate the mean
weekly child-care cost of their employees. A sample of 10 employees who use
child care reveals the following amounts spent last week.

$107 $92 $97 $95 $105 $101 $91 $99 $95 $104

Develop a 90% confidence interval for the population mean. Interpret the result.
14. The Columbus, Ohio Area Chamber of Commerce wants to estimate the mean

time workers who are employed in the downtown area spend getting to work. A
sample of 15 workers reveals the following number of minutes spent traveling.

14 24 24 19 24 7 31 20
26 23 23 28 16 15 21

Develop a 98% confidence interval for the population mean. Interpret the result.

ESTIMATION AND CONFIDENCE INTERVALS 301

As an example of a proportion, a recent survey indicated that 92 out of 100 people
surveyed favored the continued use of daylight savings time in the summer. The sam-
ple proportion is 92/100, or .92, or 92%. If we let p represent the sample proportion, x
the number of “successes,” and n the number of items sampled, we can determine a
sample proportion as follows.

SAMPLE PROPORTION p =
x
n

(9–3)

The population proportion is identified by π. Therefore, π refers to the percent of
successes in the population. Recall from Chapter 6 that π is the proportion of “suc-
cesses” in a binomial distribution. This continues our practice of using Greek letters to
identify population parameters and Roman letters to identify sample statistics.

To develop a confidence interval for a proportion, we need to meet two
requirements:

1. The binomial conditions, discussed in Chapter 6, have been met. These conditions are:
a. The sample data are the number of successes in n trials.
b. There are only two possible outcomes. (We usually label one of the outcomes a

“success” and the other a “failure.”)
c. The probability of a success remains the same from one trial to the next.
d. The trials are independent. This means the outcome on one trial does not affect

the outcome on another.
2. The values nπ and n(1 − π) should both be greater than or equal to 5. This allows us

to invoke the central limit theorem and employ the standard normal distribution,
that is, z, to complete a confidence interval.

Developing a point estimate for a population proportion and a confidence interval
for a population proportion is similar to doing so for a mean. To illustrate, John Gail
is running for Congress from the third district of Nebraska. From a random sample of
100 voters in the district, 60 indicate they plan to vote for him in the upcoming election.
The sample proportion is .60, but the population proportion is unknown. That is, we do
not know what proportion of voters in the population will vote for Mr. Gail. The sample
value, .60, is the best estimate we have of the unknown population parameter. So we let
p, which is .60, be an estimate of π, which is not known.

To develop a confidence interval for a population proportion, we use:

p ± z√
p(1 − p)

n
(9–4)

CONFIDENCE INTERVAL FOR A
POPULATION PROPORTION

An example will help to explain the details of determining a confidence interval and
interpreting the result.

E X A M P L E

The union representing the Bottle Blowers of America (BBA) is considering a pro-
posal to merge with the Teamsters Union. According to BBA union bylaws, at least
three-fourths of the union membership must approve any merger. A random sam-
ple of 2,000 current BBA members reveals 1,600 plan to vote for the merger pro-
posal. What is the estimate of the population proportion? Develop a 95%
confidence interval for the population proportion. Basing your decision on this
sample information, can you conclude that the necessary proportion of BBA mem-
bers favor the merger? Why?

302 CHAPTER 9

S O L U T I O N

First, calculate the sample proportion from formula (9–3). It is .80, found by

p =
x
n

=
1,600
2,000

= .80

Thus, we estimate that 80% of the population favor the merger proposal. We deter-
mine the 95% confidence interval using formula (9–4). The z value corresponding
to the 95% level of confidence is 1.96.

p ± z √
p(1 − p)

n
= .80 ± 1.96 √

.80(1 − .80)
2,000

= .80 ± .018

The endpoints of the confidence interval are .782 and .818. The lower endpoint is
greater than .75. Hence, we conclude that the merger proposal will likely pass
because the interval estimate includes only values greater than 75% of the union
membership.

To review the interpretation of the confidence interval: If the poll was conducted
100 times with 100 different samples, we expect the confidence intervals constructed
from 95 of the samples would contain the true population proportion. In addition, the
interpretation of a confidence interval can be very useful in decision making and play a
very important role especially on election night. For example, Cliff Obermeyer is run-
ning for Congress from the 6th District of New Jersey. Suppose 500 voters are con-
tacted upon leaving the polls and 275 indicate they voted for Mr. Obermeyer. We will
assume that the exit poll of 500 voters is a random sample of those voting in the 6th
District. That means that 55% of those in the sample voted for Mr. Obermeyer. Based
on formula (9–3):

p =
x
n

=
275
500

= .55

Now, to be assured of election, he must earn more than 50% of the votes in the
population of those voting. At this point, we know a point estimate, which is .55, of the
population of voters that will vote for him. But we do not know the percent in the pop-
ulation that will ultimately vote for the candidate. So the question is: Could we take a
sample of 500 voters from a population where 50% or less of the voters support
Mr. Obermeyer and find that 55% of the sample support him? To put it another way,
could the sampling error, which is p − π = .55 − .50 = .05 be due to chance, or is the
population of voters who support Mr. Obermeyer greater than .50? If we develop a
confidence interval for the sample proportion and find that the lower endpoint is greater
than .50, then we conclude that the proportion of voters supporting Mr. Obermeyer is
greater than .50. What does that mean? Well, it means he should be elected! What if
.50 is in the interval? Then we conclude that he is not assured of a majority and we
cannot conclude he will be elected. In this case, using the 95% significance level and
formula (9–4):

p ± z √
p(1 − p)

n
= .55 ± 1.96 √

.55(1 − .55)
500

= .55 ± .044

So the endpoints of the confidence interval are .55 − .044 = .506 and .55 + .044 = .594.
The value of .50 is not in this interval. So we conclude that probably more than 50% of
the voters support Mr. Obermeyer and that is enough to get him elected.

Is this procedure ever used? Yes! It is exactly the procedure used by polling organi-
zations, television networks, and surveys of public opinion on election night.

STATISTICS IN ACTION

The results of many surveys
include confidence intervals.
For example, a recent sur-
vey of 800 TV viewers in
Toledo, Ohio, found 44%
watched the evening news
on the local CBS affiliate.
The article also reported a
margin of error of 3.4%.
The margin of error is actu-
ally the amount that is added
and subtracted from the
point estimate to find the
endpoints of a confidence
interval. For a 95% level of
confidence, the margin of
error is:

z √
p(1 − p)

n

= 1.96 √
.44(1 − .44)

800
= 0.034

The estimate of the pro-
portion of all TV viewers in
Toledo, Ohio who watch
the local news on CBS is
between (.44 − .034) and
(.44 + .034) or 40.6%
and 47.4%.

ESTIMATION AND CONFIDENCE INTERVALS 303

A market research consultant was hired to estimate the proportion of homemakers who
associate the brand name of a laundry detergent with the container’s shape and color. The
consultant randomly selected 1,400 homemakers. From the sample, 420 were able to
identify the brand by name based only on the shape and color of the container.
(a) Estimate the value of the population proportion.
(b) Develop a 99% confidence interval for the population proportion.
(c) Interpret your findings.

S E L F - R E V I E W 9–3

15. The owner of the West End Kwick Fill Gas Station wishes to determine the propor-
tion of customers who pay at the pump using a credit card or debit card. He surveys
100 customers and finds that 80 paid at the pump.

a. Estimate the value of the population proportion.
b. Develop a 95% confidence interval for the population proportion.
c. Interpret your findings.

16. Ms. Maria Wilson is considering running for mayor of Bear Gulch, Montana. Before
completing the petitions, she decides to conduct a survey of voters in Bear Gulch. A
sample of 400 voters reveals that 300 would support her in the November election.

1. Estimate the value of the population proportion.
2. Develop a 99% confidence interval for the population proportion.
3. Interpret your findings.

17. The Fox TV network is considering replacing one of its prime-time crime investiga-
tion shows with a new family-oriented comedy show. Before a final decision is
made, network executives designed an experiment to estimate the proportion of
their viewers who would prefer the comedy show over the crime investigation
show. A random sample of 400 viewers was selected and asked to watch the new
comedy show and the crime investigation show. After viewing the shows, 250 indi-
cated they would watch the new comedy show and suggested it replace the crime
investigation show.

a. Estimate the value of the population proportion of people who would prefer the
comedy show.

b. Develop a 99% confidence interval for the population proportion of people who
would prefer the comedy show.

c. Interpret your findings.
18. Schadek Silkscreen Printing Inc. purchases plastic cups and imprints them with

logos for sporting events, proms, birthdays, and other special occasions. Zack
Schadek, the owner, received a large shipment this morning. To ensure the quality
of the shipment, he selected a random sample of 300 cups and inspected them for
defects. He found 15 to be defective.

a. What is the estimated proportion defective in the population?
b. Develop a 95% confidence interval for the proportion defective.
c. Zack has an agreement with his supplier that if 10% or more of the cups are de-

fective, he can return the order. Should he return this lot? Explain your decision.

E X E R C I S E S

CHOOSING AN APPROPRIATE SAMPLE SIZE
When working with confidence intervals, one important variable is sample size. However,
in practice, sample size is not a variable. It is a decision we make so that our estimate of
a population parameter is a good one. Our decision is based on three variables:

1. The margin of error the researcher will tolerate.
2. The level of confidence desired, for example, 95%.
3. The variation or dispersion of the population being studied.

LO9-4
Calculate the required
sample size to estimate
a population proportion
or population mean.

304 CHAPTER 9

The first variable is the margin of error. It is designated as E and is the amount that
is added and subtracted to the sample mean (or sample proportion) to determine the
endpoints of the confidence interval. For example, in a study of wages, we may decide
that we want to estimate the mean wage of the population with a margin of error of plus
or minus $1,000. Or, in an opinion poll, we may decide that we want to estimate the
population proportion with a margin of error of plus or minus 3.5%. The margin of error
is the amount of error we are willing to tolerate in estimating a population parameter.
You may wonder why we do not choose small margins of error. There is a trade-off be-
tween the margin of error and sample size. A small margin of error will require a larger
sample and more money and time to collect the sample. A larger margin of error will
permit a smaller sample and result in a wider confidence interval.

The second choice is the level of confidence. In working with confidence intervals,
we logically choose relatively high levels of confidence such as 95% and 99%. To com-
pute the sample size, we need the z-statistic that corresponds to the chosen level of
confidence. The 95% level of confidence corresponds to a z value of 1.96, and a 90%
level of confidence corresponds to a z value of 1.645 (using the t table). Notice that
larger sample sizes (and more time and money to collect the sample) correspond with
higher levels of confidence. Also, notice that we use a z-statistic.

The third choice to determine the sample size is the population standard deviation. If
the population is widely dispersed, a large sample is required to get a good estimate. On the
other hand, if the population is concentrated (homogeneous), the required sample size to
get a good estimate will be smaller. Often, we do not know the population standard devia-
tion. Here are three suggestions to estimate the population standard deviation.

1. Conduct a pilot study. This is the most common method. Suppose we want an
estimate of the number of hours per week worked by students enrolled in the
College of Business at the University of Texas. To test the validity of our question-
naire, we use it on a small sample of students. From this small sample, we compute
the standard deviation of the number of hours worked and use this value as the
population standard deviation.

2. Use a comparable study. Use this approach when there is an estimate of the stan-
dard deviation from another study. Suppose we want to estimate the number of
hours worked per week by refuse workers. Information from certain state or federal
agencies that regularly study the workforce may provide a reliable value to use for
the population standard deviation.

3. Use a range-based approach. To use this approach, we need to know or have an
estimate of the largest and smallest values in the population. Recall from Chapter 3,
the Empirical Rule states that virtually all the observations could be expected to be
within plus or minus 3 standard deviations of the mean, assuming that the distribu-
tion follows the normal distribution. Thus, the distance between the largest and the
smallest values is 6 standard deviations. We can estimate the standard deviation as
one-sixth of the range. For example, the director of operations at University Bank
wants to estimate the number of ATM transactions per month made by college stu-
dents. She believes that the distribution of ATM transactions follows the normal
distribution. The minimum and maximum of ATM transactions per month are 2 and
50, so the range is 48, found by (50 − 2). Then the estimated value of the popula-
tion standard deviation would be eight ATM transactions per month, 48/6.

Sample Size to Estimate a Population Mean
To estimate a population mean, we can express the interaction among these three
factors and the sample size in the following formula. Notice that this formula is the
margin of error used to calculate the endpoints of confidence intervals to estimate a
population mean! See formula 9–1.

E = z
σ

√n

ESTIMATION AND CONFIDENCE INTERVALS 305

Solving this equation for n yields the following result.

n = (
zσ
E )

2

(9–5)
SAMPLE SIZE FOR ESTIMATING
THE POPULATION MEAN

where:

n is the size of the sample.
z is the standard normal z-value corresponding to the desired level of confidence.
σ is the population standard deviation.
E is the maximum allowable error.

The result of this calculation is not always a whole number. When the outcome is
not a whole number, the usual practice is to round up any fractional result to the next
whole number. For example, 201.21 would be rounded up to 202.

Sample Size to Estimate a Population Proportion
To determine the sample size to estimate a population proportion, the same three vari-
ables need to be specified:

1. The margin of error.
2. The desired level of confidence.
3. The variation or dispersion of the population being studied.

E X A M P L E

A student in public administration wants to estimate the mean monthly earnings of
city council members in large cities. She can tolerate a margin of error of $100 in
estimating the mean. She would also prefer to report the interval estimate with a
95% level of confidence. The student found a report by the Department of Labor
that reported a standard deviation of $1,000. What is the required sample size?

S O L U T I O N

The maximum allowable error, E, is $100. The value of z for a 95% level of confi-
dence is 1.96, and the value of the standard deviation is $1,000. Substituting these
values into formula (9–5) gives the required sample size as:

n = (
zσ
E )

2

= (
(1.96) ($1,000)

$100 )
2

= (19.6)2 = 384.16

The computed value of 384.16 is rounded up to 385. A sample of 385 is required
to meet the specifications. If the student wants to increase the level of confidence,
for example to 99%, this will require a larger sample. Using the t table with infinite
degrees of freedom, the z value for a 99% level of confidence is 2.576.

n = (
zσ
E )

2

= (
(2.576) ($1,000)

$100 )
2

= (25.76)2 = 663.58

We recommend a sample of 664. Observe how much the change in the confidence
level changed the size of the sample. An increase from the 95% to the 99% level of
confidence resulted in an increase of 279 observations, or 72% [(664/385) × 100].
This would greatly increase the cost of the study, in terms of both time and money.
Hence, the level of confidence should be considered carefully.

306 CHAPTER 9

For the binomial distribution, the margin of error is:

E = z √
π(1 − π)

n

Solving this equation for n yields the following equation

n = π(1 − π)(
z
E)

2

(9–6)
SAMPLE SIZE FOR THE
POPULATION PROPORTION

where:

n is the size of the sample.
z is the standard normal z-value corresponding to the desired level of confidence.
π is the population proportion.
E is the maximum allowable error.

As before, the z-value is associated with our choice of confidence level. We also
decide the margin of error, E. However, the population variance of the binomial distribu-
tion is represented by π(1 − π). To estimate the population variance, we need a value of
the population proportion. If a reliable value cannot be determined with a pilot study or
found in a comparable study, then a value of .50 can be used for π. Note that π (1 − π)
has the largest value using 0.50 and, therefore, without a good estimate of the popula-
tion proportion, using 0.50 as an estimate of π overstates the sample size. Using a
larger sample size will not hurt the estimate of the population proportion.

A university’s office of research wants to estimate the arithmetic mean grade point av-
erage (GPA) of all graduating seniors during the past 10 years. GPAs range between
2.0 and 4.0. The estimate of the population mean GPA should be within plus or minus
.05 of the population mean. Based on prior experience, the population standard devia-
tion is 0.279. Using a 99% level of confidence, how many student records need to be
selected?

S E L F - R E V I E W 9–4

E X A M P L E

The student in the previous example also wants to estimate the proportion of
cities that have private refuse collectors. The student wants to estimate the pop-
ulation proportion with a margin of error of .10, prefers a level of confidence of
90%, and has no estimate for the population proportion. What is the required
sample size?

S O L U T I O N

The estimate of the population proportion is to be within .10, so E = .10. The
desired level of confidence is .90, which corresponds to a z value of 1.645, us-
ing the t table with infinite degrees of freedom. Because no estimate of the
population proportion is available, we use .50. The suggested number of obser-
vations is

n = (.5) (1 − .5)(
1.645

.10 )
2

= 67.65

The student needs a random sample of 68 cities.

ESTIMATION AND CONFIDENCE INTERVALS 307

19. A population’s standard deviation is 10. We want to estimate the population mean
within 2, with a 95% level of confidence. How large a sample is required?

20. We want to estimate the population mean within 5, with a 99% level of confidence.
The population standard deviation is estimated to be 15. How large a sample is
required?

21. The estimate of the population proportion should be within plus or minus .05, with
a 95% level of confidence. The best estimate of the population proportion is .15.
How large a sample is required?

22. The estimate of the population proportion should be within plus or minus .10, with
a 99% level of confidence. The best estimate of the population proportion is .45.
How large a sample is required?

23. A large on-demand, video streaming company is designing a large-scale survey to
determine the mean amount of time corporate executives watch on-demand televi-
sion. A small pilot survey of 10 executives indicated that the mean time per week is
12 hours, with a standard deviation of 3 hours. The estimate of the mean viewing
time should be within one-quarter hour. The 95% level of confidence is to be used.
How many executives should be surveyed?

24. A processor of carrots cuts the green top off each carrot, washes the carrots, and
inserts six to a package. Twenty packages are inserted in a box for shipment. Each
box of carrots should weigh 20.4 pounds. The processor knows that the standard
deviation of box weight is 0.5 pound. The processor wants to know if the current
packing process meets the 20.4 weight standard. How many boxes must the pro-
cessor sample to be 95% confident that the estimate of the population mean is
within 0.2 pound?

25. Suppose the U.S. president wants to estimate the proportion of the population
that supports his current policy toward revisions in the health care system. The
president wants the estimate to be within .04 of the true proportion. Assume a
95% level of confidence. The president’s political advisors found a similar survey
from two years ago that reported that 60% of people supported health care
revisions.

a. How large of a sample is required?
b. How large of a sample would be necessary if no estimate were available for the

proportion supporting current policy?
26. Past surveys reveal that 30% of tourists going to Las Vegas to gamble spend more

than $1,000. The Visitor’s Bureau of Las Vegas wants to update this percentage.
a. How many tourists should be randomly selected to estimate the population

proportion with a 90% confidence level and a 1% margin of error.
b. The Bureau feels the sample size determined above is too large. What can

be done to reduce the sample? Based on your suggestion, recalculate the
sample size.

E X E R C I S E S

FINITE-POPULATION CORRECTION FACTOR
The populations we have sampled so far have been very large or infinite. What if the
sampled population is not very large? We need to make some adjustments in the way
we compute the standard error of the sample means and the standard error of the sam-
ple proportions.

A population that has a fixed upper bound is finite. For example, there are 7,640
students enrolled at Eastern Illinois University, there are 40 employees at Spence
Sprockets, there were 917 Jeep Wranglers assembled at the Alexis Avenue plant
yesterday, or there were 65 surgical patients at St. Rose Memorial Hospital in Sarasota
yesterday. A finite population can be rather small; it could be all the students regis-
tered for your statistics class. It can also be very large, such as all senior citizens living
in Florida.

LO9-5
Adjust a confidence
interval for finite
populations.

308 CHAPTER 9

For a finite population, where the total number of objects or individuals is N and the
number of objects or individuals in the sample is n, we need to adjust the standard
errors in the confidence interval formulas. To put it another way, to find the confidence
interval for the mean, we adjust the standard error of the mean in formulas (9–1) and
(9–2). If we are determining the confidence interval for a proportion, then we need to
adjust the standard error of the proportion in formula (9–4).

This adjustment is called the finite-population correction factor. It is often short-
ened to FPC and is:

FPC = √
N − n
N − 1

Why is it necessary to apply a factor, and what is its effect? Logically, if the sample
is a substantial percentage of the population, the estimate of the population parameter
is more precise. Note the effect of the term (N − n)/(N − 1). Suppose the population is
1,000 and the sample is 100. Then this ratio is (1,000 − 100)/(1,000 − 1), or 900/999.
Taking the square root gives the correction factor, .9492. Multiplying this correction factor
by the standard error reduces the standard error by about 5% (1 − .9492 = .0508). This
reduction in the size of the standard error yields a smaller range of values in estimating
the population mean or the population proportion. If the sample is 200, the correction
factor is .8949, meaning that the standard error has been reduced by more than 10%.
Table 9–4 shows the effects of various sample sizes.

TABLE 9–4 Finite-Population Correction Factor for Selected Samples When the Population Is 1,000

Sample Fraction of Correction
Size Population Factor

10 .010 .9955
25 .025 .9879
50 .050 .9752
100 .100 .9492
200 .200 .8949
500 .500 .7075

So if we wished to develop a confidence interval for the mean from a finite popula-
tion and the population standard deviation was unknown, we would adjust formula (9–2)
as follows:

x ± t
s

√n(√
N − n
N − 1 )

We would make a similar adjustment to formula (9–4) in the case of a proportion.
The following example summarizes the steps to find a confidence interval for the

mean.

E X A M P L E

There are 250 families residing in Scandia, Pennsylvania. A random sample of 40 of
these families revealed the mean annual church contribution was $450 and the
standard deviation of this was $75.

1. What is the population mean? What is the best estimate of the population
mean?

2. Develop a 90% confidence interval for the population mean. What are the end-
points of the confidence interval?

3. Using the confidence interval, explain why the population mean could be
$445. Could the population mean be $425? Why?

ESTIMATION AND CONFIDENCE INTERVALS 309

S O L U T I O N

First, note the population is finite. That is, there is a limit to the number of people
residing in Scandia, in this case 250.

1. We do not know the population mean. This is the value we wish to estimate.
The best estimate we have of the population mean is the sample mean, which
is $450.

2. The formula to find the confidence interval for a population mean follows.

x ± t
s

√n(√
N − n
N − 1 )

In this case, we know x = 450, s = 75, N = 250, and n = 40. We do not know
the population standard deviation, so we use the t distribution. To find the ap-
propriate value of t, we use Appendix B.5, and move across the top row to the
column headed 90%. The degrees of freedom are df = n − 1 = 40 − 1 = 39, so
we move to the cell where the df row of 39 intersects with the column headed
90%. The value is 1.685. Inserting these values in the formula:

x ± t
s

√n(√
N − n
N − 1 )

= $450 ± 1.685
$75
√40(√

250 − 40
250 − 1 ) = $450 ± $19.98

√.8434 = $450 ± $18.35

The endpoints of the confidence interval are $431.65 and $468.35.
3. It is likely that the population mean is more than $431.65 but less than

$468.35. To put it another way, could the population mean be $445? Yes, but
it is not likely that it is $425. Why is this so? Because the value $445 is within
the confidence interval and $425 is not within the confidence interval.

The same study of church contributions in Scandia revealed that 15 of the 40 families sam-
pled attend church regularly. Construct the 95% confidence interval for the proportion of
families attending church regularly.

S E L F - R E V I E W 9–5

27. Thirty-six items are randomly selected from a population of 300 items. The sample
mean is 35 and the sample standard deviation 5. Develop a 95% confidence inter-
val for the population mean.

28. Forty-nine items are randomly selected from a population of 500 items. The sample
mean is 40 and the sample standard deviation 9. Develop a 99% confidence inter-
val for the population mean.

29. The attendance at the Savannah Colts minor league baseball game last night was
400. A random sample of 50 of those in attendance revealed that the mean num-
ber of soft drinks consumed per person was 1.86, with a standard deviation of
0.50. Develop a 99% confidence interval for the mean number of soft drinks con-
sumed per person.

30. There are 300 welders employed at Maine Shipyards Corporation. A sample of
30 welders revealed that 18 graduated from a registered welding course. Con-
struct the 95% confidence interval for the proportion of all welders who graduated
from a registered welding course.

E X E R C I S E S

310 CHAPTER 9

C H A P T E R S U M M A R Y

I. A point estimate is a single value (statistic) used to estimate a population value
(parameter).

II. A confidence interval is a range of values within which the population parameter is
expected to occur.
A. The factors that determine the width of a confidence interval for a mean are:

1. The number of observations in the sample, n.
2. The variability in the population, usually estimated by the sample standard devia-

tion, s.
3. The level of confidence.

a. To determine the confidence limits when the population standard deviation is
known, we use the z distribution. The formula is

x ± z
σ

√n
(9–1)

b. To determine the confidence limits when the population standard deviation is
unknown, we use the t distribution. The formula is

x ± t
s

√n
(9–2)

III. The major characteristics of the t distribution are:
A. It is a continuous distribution.
B. It is mound-shaped and symmetrical.
C. It is flatter, or more spread out, than the standard normal distribution.
D. There is a family of t distributions, depending on the number of degrees of freedom.

IV. A proportion is a ratio, fraction, or percent that indicates the part of the sample or popu-
lation that has a particular characteristic.
A. A sample proportion, p, is found by x, the number of successes, divided by n, the num-

ber of observations.
B. We construct a confidence interval for a sample proportion from the following

formula.

p ± z √
p(1 − p)

n
(9–4)

V. We can determine an appropriate sample size for estimating both means and
proportions.
A. There are three factors that determine the sample size when we wish to estimate the

mean.
1. The margin of error, E.
2. The desired level of confidence.
3. The variation in the population.
4. The formula to determine the sample size for the mean is

n = (
zσ
E )

2

(9–5)

B. There are three factors that determine the sample size when we wish to estimate a
proportion.
1. The margin of error, E.
2. The desired level of confidence.
3. A value for π to calculate the variation in the population.
4. The formula to determine the sample size for a proportion is

n = π(1 − π)(
z
E)

2

(9–6)

VI. For a finite population, the standard error is adjusted by the factor: √
N − n
N − 1

ESTIMATION AND CONFIDENCE INTERVALS 311

C H A P T E R E X E R C I S E S

31. A random sample of 85 group leaders, supervisors, and similar personnel at General
Motors revealed that, on average, they spent 6.5 years in a particular job before being
promoted. The standard deviation of the sample was 1.7 years. Construct a 95% confi-
dence interval.

32. A state meat inspector in Iowa has been given the assignment of estimating the mean
net weight of packages of ground chuck labeled “3 pounds.” Of course, he realizes that
the weights cannot always be precisely 3 pounds. A sample of 36 packages reveals the
mean weight to be 3.01 pounds, with a standard deviation of 0.03 pound.
a. What is the estimated population mean?
b. Determine a 95% confidence interval for the population mean.

33. As part of their business promotional package, the Milwaukee Chamber of Commerce
would like an estimate of the mean cost per month to lease a one-bedroom apartment.
The mean cost per month for a random sample of 40 apartments currently available for
lease was $884. The standard deviation of the sample was $50.
a. Develop a 98% confidence interval for the population mean.
b. Would it be reasonable to conclude that the population mean is $950 per

month?
34. A recent survey of 50 executives who were laid off during a recent recession revealed it

took a mean of 26 weeks for them to find another position. The standard deviation of
the sample was 6.2 weeks. Construct a 95% confidence interval for the population
mean. Is it reasonable that the population mean is 28 weeks? Justify your answer.

35. Marty Rowatti recently assumed the position of director of the YMCA of South Jersey.
He would like some current data on how long current members of the YMCA have been
members. To investigate, suppose he selects a random sample of 40 current members.
The mean length of membership for the sample is 8.32 years and the standard devia-
tion is 3.07 years.
a. What is the mean of the population?
b. Develop a 90% confidence interval for the population mean.
c. The previous director, in the summary report she prepared as she retired, indicated

the mean length of membership was now “almost 10 years.” Does the sample infor-
mation substantiate this claim? Cite evidence.

36. The American Restaurant Association collected information on the number of meals
eaten outside the home per week by young married couples. A survey of 60 couples
showed the sample mean number of meals eaten outside the home was 2.76 meals per
week, with a standard deviation of 0.75 meal per week. Construct a 99% confidence
interval for the population mean.

37. The National Collegiate Athletic Association (NCAA) reported that college football assis-
tant coaches spend a mean of 70 hours per week on coaching and recruiting during
the season. A random sample of 50 assistant coaches showed the sample mean to be
68.6 hours, with a standard deviation of 8.2 hours.
a. Using the sample data, construct a 99% confidence interval for the population

mean.
b. Does the 99% confidence interval include the value suggested by the NCAA? Inter-

pret this result.
c. Suppose you decided to switch from a 99% to a 95% confidence interval. Without

performing any calculations, will the interval increase, decrease, or stay the same?
Which of the values in the formula will change?

38. The Human Relations Department of Electronics Inc. would like to include a dental
plan as part of the benefits package. The question is: How much does a typical
employee and his or her family spend per year on dental expenses? A sample of
45 employees reveals the mean amount spent last year was $1,820, with a standard
deviation of $660.
a. Construct a 95% confidence interval for the population mean.
b. The information from part (a) was given to the president of Electronics Inc. He indi-

cated he could afford $1,700 of dental expenses per employee. Is it possible that the
population mean could be $1,700? Justify your answer.

312 CHAPTER 9

39. A student conducted a study and reported that the 95% confidence interval for the
mean ranged from 46 to 54. He was sure that the mean of the sample was 50, that the
standard deviation of the sample was 16, and that the sample size was at least 30, but
could not remember the exact number. Can you help him out?

40. A recent study by the American Automobile Dealers Association surveyed a random
sample of 20 dealers. The data revealed a mean amount of profit per car sold was
$290, with a standard deviation of $125. Develop a 95% confidence interval for the
population mean of profit per car.

41. A study of 25 graduates of four-year public colleges revealed the mean amount owed
by a student in student loans was $55,051. The standard deviation of the sample was
$7,568. Construct a 90% confidence interval for the population mean. Is it reasonable to
conclude that the mean of the population is actually $55,000? Explain why or why not.

42. An important factor in selling a residential property is the number of times real estate
agents show a home. A sample of 15 homes recently sold in the Buffalo, New York, area
revealed the mean number of times a home was shown was 24 and the standard deviation
of the sample was 5 people. Develop a 98% confidence interval for the population mean.

43. In 2003, the Accreditation Council for Graduate Medical Education (ACGME) imple-
mented new rules limiting work hours for all residents. A key component of these rules is
that residents should work no more than 80 hours per week. The following is the number of
weekly hours worked in 2017 by a sample of residents at the Tidelands Medical Center.

84 86 84 86 79 82 87 81 84 78 74 86

a. What is the point estimate of the population mean for the number of weekly hours
worked at the Tidelands Medical Center?

b. Develop a 90% confidence interval for the population mean.
c. Is the Tidelands Medical Center within the ACGME guideline? Why?

44. PrintTech, Inc. is introducing a new line of ink-jet printers and would like to promote
the number of pages a user can expect from a print cartridge. A sample of 10 cartridges
revealed the following number of pages printed.

2,698 2,028 2,474 2,395 2,372 2,475 1,927 3,006 2,334 2,379

a. What is the point estimate of the population mean?
b. Develop a 95% confidence interval for the population mean.

45. Dr. Susan Benner is an industrial psychologist. She is currently studying stress
among executives of Internet companies. She has developed a questionnaire that she
believes measures stress. A score above 80 indicates stress at a dangerous level. A
random sample of 15 executives revealed the following stress level scores.

94 78 83 90 78 99 97 90 97 90 93 94 100 75 84

a. Find the mean stress level for this sample. What is the point estimate of the popula-
tion mean?

b. Construct a 95% confidence level for the population mean.
c. According to Dr. Benner’s test, is it reasonable to conclude that the mean stress level

of Internet executives is 80? Explain.
46. Pharmaceutical companies promote their prescription drugs using television advertis-

ing. In a survey of 80 randomly sampled television viewers, 10 indicated that they asked
their physician about using a prescription drug they saw advertised on TV. Develop a
95% confidence interval for the proportion of viewers who discussed a drug seen on TV
with their physician. Is it reasonable to conclude that 25% of the viewers discuss an ad-
vertised drug with their physician?

47. HighTech, Inc. randomly tests its employees about company policies. Last year in the
400 random tests conducted, 14 employees failed the test. Develop a 99% confidence
interval for the proportion of applicants that fail the test. Would it be reasonable to con-
clude that 5% of the employees cannot pass the company policy test? Explain.

ESTIMATION AND CONFIDENCE INTERVALS 313

48. During a national debate on changes to health care, a cable news service performs an
opinion poll of 500 small-business owners. It shows that 65% of small-business owners
do not approve of the changes. Develop a 95% confidence interval for the proportion
opposing health care changes. Comment on the result.

49. There are 20,000 eligible voters in York County, South Carolina. A random sample of
500 York County voters revealed 350 plan to vote to return Louella Miller to the state
senate. Construct a 99% confidence interval for the proportion of voters in the county
who plan to vote for Ms. Miller. From this sample information, is it reasonable to con-
clude that Ms. Miller will receive a majority of the votes?

50. In a poll to estimate presidential popularity, each person in a random sample of 1,000
voters was asked to agree with one of the following statements:
1. The president is doing a good job.
2. The president is doing a poor job.
3. I have no opinion.

A total of 560 respondents selected the first statement, indicating they thought the
president was doing a good job.
a. Construct a 95% confidence interval for the proportion of respondents who feel the

president is doing a good job.
b. Based on your interval in part (a), is it reasonable to conclude that a majority of the

population believes the president is doing a good job?
51. Police Chief Edward Wilkin of River City reports 500 traffic citations were issued last

month. A sample of 35 of these citations showed the mean amount of the fine was $54,
with a standard deviation of $4.50. Construct a 95% confidence interval for the mean
amount of a citation in River City.

52. The First National Bank of Wilson has 650 checking account customers. A recent sam-
ple of 50 of these customers showed 26 have a Visa card with the bank. Construct the
99% confidence interval for the proportion of checking account customers who have a
Visa card with the bank.

53. It is estimated that 60% of U.S. households subscribe to cable TV. You would like to ver-
ify this statement for your class in mass communications. If you want your estimate to be
within 5 percentage points, with a 95% level of confidence, how many households
should you sample?

54. You need to estimate the mean number of travel days per year for salespeople. The
mean of a small pilot study was 150 days, with a standard deviation of 14 days. If you
must estimate the population mean within 2 days, how many salespeople should you
sample? Use the 90% confidence level.

55. You want to estimate the mean family income in a rural area of central Indiana. The
question is, how many families should be sampled? In a pilot sample of 10 families, the
standard deviation of the sample was $500. The sponsor of the survey wants you to use
the 95% confidence level. The estimate is to be within $100. How many families should
be interviewed?

56. Families USA, a monthly magazine that discusses issues related to health and health
costs, surveyed 20 of its subscribers. It found that the annual health insurance premi-
ums for a family with coverage through an employer averaged $10,979. The standard
deviation of the sample was $1,000.
a. Based on this sample information, develop a 90% confidence interval for the popula-

tion mean yearly premium.
b. How large a sample is needed to find the population mean within $250 at 99%

confidence?
57. Passenger comfort is influenced by the amount of pressurization in an airline cabin.

Higher pressurization permits a closer-to-normal environment and a more relaxed flight.
A study by an airline user group recorded the equivalent air pressure on 30 randomly
chosen flights. The study revealed a mean equivalent air pressure of 8,000 feet with a
standard deviation of 300 feet.
a. Develop a 99% confidence interval for the population mean equivalent air

pressure.
b. How large a sample is needed to find the population mean within 25 feet at 95%

confidence?

314 CHAPTER 9

58. A survey of 25 randomly sampled judges employed by the state of Florida found that
they earned an average wage (including benefits) of $65.00 per hour. The sample stan-
dard deviation was $6.25 per hour.
a. What is the population mean? What is the best estimate of the population mean?
b. Develop a 99% confidence interval for the population mean wage (including benefits)

for these employees.
c. How large a sample is needed to assess the population mean with an allowable error

of $1.00 at 95% confidence?
59. Based on a sample of 50 U.S. citizens, the American Film Institute found that a typical

American spent 78 hours watching movies last year. The standard deviation of this sam-
ple was 9 hours.
a. Develop a 95% confidence interval for the population mean number of hours spent

watching movies last year.
b. How large a sample should be used to be 90% confident the sample mean is within

1.0 hour of the population mean?
60. Dylan Jones kept careful records of the fuel efficiency of his new car. After the first nine

times he filled up the tank, he found the mean was 23.4 miles per gallon (mpg) with a
sample standard deviation of 0.9 mpg.
a. Compute the 95% confidence interval for his mpg.
b. How many times should he fill his gas tank to obtain a margin of error below 0.1 mpg?

61. A survey of 36 randomly selected iPhone owners showed that the purchase price has a
mean of $650 with a sample standard deviation of $24.
a. Compute the standard error of the sample mean.
b. Compute the 95% confidence interval for the mean.
c. How large a sample is needed to estimate the population mean within $10?

62. You plan to conduct a survey to find what proportion of the workforce has two or more
jobs. You decide on the 95% confidence level and a margin of error of 2%. A pilot survey
reveals that 5 of the 50 sampled hold two or more jobs. How many in the workforce
should be interviewed to meet your requirements?

63. A study conducted several years ago reported that 21 percent of public accountants changed
companies within 3 years. The American Institute of CPA’s would like to update the study.
They would like to estimate the population proportion of public accountants who changed
companies within 3 years with a margin of error of 3% and a 95% level of confidence.
a. To update this study, the files of how many public accountants should be studied?
b. How many public accountants should be contacted if no previous estimates of the

population proportion are available?
64. As part of an annual review of its accounts, a discount brokerage selected a random

sample of 36 customers and reviewed the value of their accounts. The mean was
$32,000 with a sample standard deviation of $8,200. What is a 90% confidence interval
for the mean account value of the population of customers?

65. The National Weight Control Registry tries to mine secrets of success from people who
lost at least 30 pounds and kept it off for at least a year. It reports that out of 2,700 reg-
istrants, 459 were on a low-carbohydrate diet (less than 90 grams a day).
a. Develop a 95% confidence interval for the proportion of people on a low-carbohy-

drate diet.
b. Is it possible that the population percentage is 18%?
c. How large a sample is needed to estimate the proportion within 0.5%?

66. Near the time of an election, a cable news service performs an opinion poll of 1,000 prob-
able voters. It shows that the Republican contender has an advantage of 52% to 48%.
a. Develop a 95% confidence interval for the proportion favoring the Republican candidate.
b. Estimate the probability that the Democratic candidate is actually leading.
c. Repeat the above analysis based on a sample of 3,000 probable voters.

67. A sample of 352 subscribers to Wired magazine shows the mean time spent using the
Internet is 13.4 hours per week, with a sample standard deviation of 6.8 hours. Find the
95% confidence interval for the mean time Wired subscribers spend on the Internet.

68. The Tennessee Tourism Institute (TTI) plans to sample information center visitors
entering the state to learn the fraction of visitors who plan to camp in the state. Current
estimates are that 35% of visitors are campers. How many visitors would you sample
to estimate the population proportion of campers with a 95% confidence level and an
allowable error of 2%?

ESTIMATION AND CONFIDENCE INTERVALS 315

D A T A A N A L Y T I C S

69. Refer to the North Valley Real Estate data, which reports information on homes
sold in the area during the last year. Select a random sample of twenty homes.
a. Based on your random sample of twenty homes, develop a 95% confidence interval

for the mean selling price of the homes.
b. Based on your random sample of twenty homes, develop a 95% confidence interval

for the mean days on the market.
c. Based on your random sample of twenty homes, develop a 95% confidence interval

for the proportion of homes with a pool.
d. Suppose that North Valley Real Estate employs several agents. Each agent will be

randomly assigned twenty homes to sell. The agents are highly motivated to sell
homes based on the commissions they earn. They are also concerned about the
twenty homes they are assigned to sell. Using the confidence intervals you created,
write a general memo informing the agents about the characteristics of the homes
they may be assigned to sell.

e. What would you do if your confidence intervals did not include the mean of all 105
homes? How could this happen?

70. Refer to the Baseball 2016 data, which report information on the 30 Major League
Baseball teams for the 2016 season. Assume the 2016 data represents a sample.
a. Develop a 95% confidence interval for the mean number of home runs per team.
b. Develop a 95% confidence interval for the mean batting average by each team.
c. Develop a 95% confidence interval for the mean earned run average (ERA) for each team.

71. Refer to the Lincolnville School District bus data.
a. Develop a 95% confidence interval for the mean bus maintenance cost.
b. Develop a 95% confidence interval for the mean bus odometer miles.
c. Write a business memo to the state transportation official to report your results.

A REVIEW OF CHAPTERS 8–9
We began Chapter 8 by describing the reasons sampling is necessary. We sample because it is often impossible to study
every item, or individual, in some populations. For example, to contact all U.S. bank officers and record their annual incomes
would be too expensive and time-consuming, . Also, sampling often destroys the product. A drug manufacturer cannot test
the properties of each vitamin tablet manufactured because there would be none left to sell. Therefore, to estimate a popu-
lation parameter, we select a sample from the population. A sample is a part of the population. Care must be taken to ensure
that every member of our population has a chance of being selected; otherwise, the conclusions might be biased. A number
of probability-type sampling methods can be used, including simple random, systematic, stratified, and cluster sampling.

Regardless of the sampling method selected, a sample statistic is seldom equal to the corresponding population parame-
ter. For example, the mean of a sample is seldom exactly the same as the mean of the population. The difference between
this sample statistic and the population parameter is the sampling error.

In Chapter 8, we demonstrated that, if we select all possible samples of a specified size from a population and calculate the mean
of these samples, the result will be exactly equal to the population mean. We also showed that the dispersion in the distribution
of the sample means is equal to the population standard deviation divided by the square root of the sample size. This result is
called the standard error of the mean. There is less dispersion in the distribution of the sample means than in the population. In
addition, as we increase the number of observations in each sample, we decrease the variation in the sampling distribution.

The central limit theorem is the foundation of statistical inference. It states that, if the population from which we select the sam-
ples follows the normal probability distribution, the distribution of the sample means will also follow the normal distribution. If the
population is not normal, it will approach the normal probability distribution as we increase the size of the sample.

Our focus in Chapter 9 was point estimates and interval estimates. A point estimate is a single value used to estimate a
population parameter. An interval estimate is a range of values within which we expect the population parameter to occur.
For example, based on a sample, we estimate that the mean annual income of all professional house painters in Atlanta,
Georgia (the population), is $45,300. That estimate is called a point estimate. If we state that the population mean is prob-
ably in the interval between $45,200 and $45,400, that estimate is called an interval estimate. The two endpoints
($45,200 and $45,400) are the confidence limits for the population mean. We also described procedures for establishing
a confidence interval for a population mean when the population standard deviation is not known and for a population
proportion. In this chapter, we also provided a method to determine the necessary sample size based on the dispersion in
the population, the level of confidence desired, and the desired precision of the estimate or margin of error.

316 CHAPTER 9

P R O B L E M S

1. A recent study indicated that women took an average of 8.6 weeks of unpaid leave from
their jobs after the birth of a child. Assume that this distribution follows the normal prob-
ability distribution with a standard deviation of 2.0 weeks. We select a sample of 35
women who recently returned to work after the birth of a child. What is the likelihood
that the mean of this sample is at least 8.8 weeks?

2. The manager of Tee Shirt Emporium reports that the mean number of shirts sold per
week is 1,210, with a standard deviation of 325. The distribution of sales follows the
normal distribution. What is the likelihood of selecting a sample of 25 weeks and finding
the sample mean to be 1,100 or less?

3. The owner of the Gulf Stream Café wished to estimate the mean number of lunch customers
per day. A sample of 40 days revealed a mean of 160 per day, with a standard deviation of
20 per day. Develop a 98% confidence interval for the mean number of customers per day.

4. The manager of the local Hamburger Express wishes to estimate the mean time custom-
ers spend at the drive-through window. A sample of 20 customers experienced a mean
waiting time of 2.65 minutes, with a standard deviation of 0.45 minute. Develop a 90%
confidence interval for the mean waiting time.

5. Defiance Tool and Die has 293 sales offices throughout the world. The VP of Sales is
studying the usage of its copy machines. A random sample of six of the sales offices
revealed the following number of copies made in each selected office last week.

826 931 1,126 918 1,011 1,101

Develop a 95% confidence interval for the mean number of copies per week.
6. John Kleman is the host of KXYZ Radio 55 AM drive-time news in Denver. During his

morning program, John asks listeners to call in and discuss current local and national
news. This morning, John was concerned with the number of hours children under 12
years of age watch TV per day. The last five callers reported that their children watched
the following number of hours of TV last night.

3.0 3.5 4.0 4.5 3.0

Would it be reasonable to develop a confidence interval from these data to show the
mean number of hours of TV watched? If yes, construct an appropriate confidence inter-
val and interpret the result. If no, why would a confidence interval not be appropriate?

7. Historically, Widgets Manufacturing Inc. produces 250 widgets per day. Recently the
new owner bought a new machine to produce more widgets per day. A sample of 16
days’ production revealed a mean of 240 units with a standard deviation of 35. Con-
struct a confidence interval for the mean number of widgets produced per day. Does it
seem reasonable to conclude that the mean daily widget production has changed? Jus-
tify your conclusion.

8. A manufacturer of cell phone batteries wants to estimate the useful life of its battery (in
thousands of hours). The estimate is to be within 0.10 (100 hours). Assume a 95% level
of confidence and that the standard deviation of the useful life of the battery is 0.90
(900 hours). Determine the required sample size.

9. The manager of a home improvement store wishes to estimate the mean amount of money
spent in the store. The estimate is to be within $4.00 with a 95% level of confidence. The
manager does not know the standard deviation of the amounts spent. However, he does
estimate that the range is from $5.00 up to $155.00. How large of a sample is needed?

10. In a sample of 200 residents of Georgetown County, 120 reported they believed the
county real estate taxes were too high. Develop a 95% confidence interval for the pro-
portion of residents who believe the tax rate is too high. Does it seem reasonable to
conclude that 50% of the voters believe that taxes are too high?

11. In recent times, the percent of buyers purchasing a new vehicle via the Internet has
been large enough that local automobile dealers are concerned about its impact on
their business. The information needed is an estimate of the proportion of purchases via
the Internet. How large of a sample of purchasers is necessary for the estimate to be

ESTIMATION AND CONFIDENCE INTERVALS 317

within 2 percentage points with a 98% level of confidence? Current thinking is that
about 8% of the vehicles are purchased via the Internet.

12. Historically, the proportion of adults over the age of 24 who smoke has been .30. In recent
years, much information has been published and aired on radio and TV that smoking is not
good for one’s health. A sample of 500 adults revealed only 25% of those sampled smoked.
Develop a 98% confidence interval for the proportion of adults who currently smoke. Does it
seem reasonable to conclude that the proportion of adults who smoke has changed?

13. The auditor of the state of Ohio needs an estimate of the proportion of residents who
regularly play the state lottery. Historically, about 40% regularly play, but the auditor
would like some current information. How large a sample is necessary for the estimate
to be within 3 percentage points, with a 98% level of confidence?

C A S E S

Century National Bank
Refer to the description of Century National Bank at the
end of the Review of Chapters 1–4 on page 129. When Mr.
Selig took over as president of Century several years ago,
the use of debit cards was just beginning. He would like an

update on the use of these cards. Develop a 95% confi-
dence interval for the proportion of customers using these
cards. On the basis of the confidence interval, is it reason-
able to conclude that more than half of the customers use
a debit card? Write a brief report interpreting the results.

P R A C T I C E T E S T

Part 1—Objective
1. If each item in the population has the same chance of being selected, this is called a . 1.
2. The difference between the population mean and the sample mean is called the . 2.
3. The is the standard deviation of the distribution of sample means. 3.
4. If the sample size is increased, the variance of the sample means will .

(become smaller, become larger, not change) 4.
5. A single value used to estimate a population parameter is called a . 5.
6. A range of values within which the population parameter is expected to occur is

called a . 6.
7. Which of the following does not affect the width of a confidence interval?

(sample size, variation in the population, level of confidence, size of population) 7.
8. The fraction of a population that has a particular characteristic is called a . 8.
9. Which of the following is not a characteristic of the t distribution? (positively skewed,

continuous, mean of zero, based on degrees of freedom) 9.
10. To determine the required sample size of a proportion when no estimate of the

population proportion is available, what value is used? 10.

Part 2—Problems
1. Americans spend an average (mean) of 12.2 minutes (per day) in the shower. The distribution of times follows the

normal distribution with a population standard deviation of 2.3 minutes. What is the likelihood that the mean time per
day for a sample of 12 Americans was 11 minutes or less?

2. A recent study of 26 Conway, South Carolina, residents revealed they had lived at their current address an average
of 9.3 years. The standard deviation of the sample was 2 years.
a. What is the population mean?
b. What is the best estimate of the population mean?
c. What is the standard error of estimate?
d. Develop a 90% confidence interval for the population mean.

3. A recent federal report indicated that 27% of children ages 2 to 5 ate a vegetable at least five times a week. How
large a sample is needed to estimate the true population proportion within 2% with a 98% level of confidence? Be
sure to use the information contained in the federal report.

4. The Philadelphia Area Transit Authority wishes to estimate the proportion of central city workers that use public trans-
portation to get to work. A sample of 100 workers revealed that 64 used public transportation. Develop a 95% confi-
dence interval for the population proportion.

LEARNING OBJECTIVES
When you have completed this chapter, you will be able to:

LO10-1 Explain the process of testing a hypothesis.

LO10-2 Apply the six-step procedure for testing a hypothesis.

LO10-3 Distinguish between a one-tailed and a two-tailed test of hypothesis.

LO10-4 Conduct a test of a hypothesis about a population mean.

LO10-5 Compute and interpret a p-value.

LO10-6 Use a t statistic to test a hypothesis.

LO10-7 Compute the probability of a Type II error.

© Franco Salmoiraghi/Photo Resource Hawaii/Alamy Stock Photo

One-Sample Tests
of Hypothesis10

DOLE PINEAPPLE INC. is concerned that the 16-ounce can of sliced pineapple is being
overfilled. Assume the standard deviation of the process is .03 ounce. The quality control
department took a random sample of 50 cans and found that the arithmetic mean weight
was 16.05 ounces. At the 5% level of significance, can we conclude that the mean weight
is greater than 16 ounces? Determine the p-value. (See Exercise 26 and LO10-4.)

ONE-SAMPLE TESTS OF HYPOTHESIS 319

INTRODUCTION
Chapter 8 began our study sampling and statistical inference. We described how we
could select a random sample to estimate the value of a population parameter. For ex-
ample, we selected a sample of five employees at Spence Sprockets, found the number
of years of service for each sampled employee, computed the mean years of service,
and used the sample mean to estimate the mean years of service for all employees. In
other words, we estimated a population parameter from a sample statistic.

Chapter 9 continued the study of statistical inference by developing a confidence
interval. A confidence interval is a range of values within which we expect the popula-
tion parameter to occur. In this chapter, rather than develop a range of values within
which we expect the population parameter to occur, we develop a procedure to test the
validity of a statement about a population parameter. Some examples of statements we
might want to test are:

• The mean speed of auto-
mobiles passing milepost
150 on the West Virginia
Turnpike is 68 miles per
hour.

• The mean number of
miles driven by those
leasing a Chevy Trail-
Blazer for 3 years is
32,000 miles.

• The mean time an
American family lives in
a particular single-family
dwelling is 11.8 years.

• In 2016, the mean start-
ing salary for a graduate from a four-year business program is $51,541.

• According to the Kelley Blue Book (www.kbb.com), a 2017 Ford Edge averages 21
miles per gallon in the city.

• The mean cost to remodel a kitchen is $20,000.

This chapter and several of the following chapters cover statistical hypothesis test-
ing. We begin by defining what we mean by a statistical hypothesis and statistical hy-
pothesis testing. Next, we outline the steps in statistical hypothesis testing. Then we
conduct tests of hypothesis for means. In the last section of the chapter, we describe
possible errors due to sampling in hypothesis testing.

WHAT IS HYPOTHESIS TESTING?
The terms hypothesis testing and testing a hypothesis are used interchangeably. Hypoth-
esis testing starts with a statement, or assumption, about a population parameter—such
as the population mean. This statement is referred to as a hypothesis.

HYPOTHESIS A statement about a population parameter subject to verification.

A hypothesis might be that the mean monthly commission of sales associates in
retail electronics stores, such as hhgregg, is $2,000. We cannot contact all hhgregg
sales associates to determine that the mean is $2,000. The cost of locating and inter-
viewing every hhgregg electronics sales associate in the United States would be exor-
bitant. To test the validity of the hypothesis (μ = $2,000), we must select a sample from
the population of all hhgregg electronics sales associates, calculate sample statistics,

LO10-1
Explain the process of
testing a hypothesis.

© Russell Ilig/Getty Images

320 CHAPTER 10

and based on certain decision rules reject or fail to reject the hypothesis. A sample
mean of $1,000 per month is much less than $2,000 per month and we would most
likely reject the hypothesis. However, suppose the sample mean is $1,995. Can we at-
tribute the $5 difference between $1,995 and $2,000 to sampling error? Or is this dif-
ference of $5 statistically significant?

HYPOTHESIS TESTING A procedure based on sample evidence and probability
theory to determine whether the hypothesis is a reasonable statement.

SIX-STEP PROCEDURE FOR TESTING
A HYPOTHESIS
There is a six-step procedure that systematizes hypothesis testing; when we get to
step 6 we are ready to interpret the results of the test based on the decision to reject or
not reject the hypothesis. However, hypothesis testing as used by statisticians does not
provide proof that something is true, in the manner in which a mathematician “proves”
a statement. It does provide a kind of “proof beyond a reasonable doubt,” in the man-
ner of the court system. Hence, there are specific rules of evidence, or procedures, that
are followed. The steps are shown in the following diagram. We will discuss in detail
each of the steps.

LO10-2
Apply the six-step
procedure for testing a
hypothesis.

Step 1 Step 2 Step 3 Step 4 Step 5 Step 6

Take a sample,
arrive at
decision

Formulate a
decision

rule

Select a
level of

Identify the
test

statistic

State null and
alternate

hypotheses

Interpret
the

result

Step 1: State the Null Hypothesis (H0) and the Alternate
Hypothesis (H1)
The first step is to state the hypothesis being tested. It is called the null hypothesis,
designated H0, and read “H sub zero.” The capital letter H stands for hypothesis, and
the subscript zero implies “no difference.” There is usually a “not” or a “no” term in the
null hypothesis, meaning that there is “no change.” For example, the null hypothesis is
that the mean number of miles driven on the steel-belted tire is not different from
60,000. The null hypothesis would be written H0: μ = 60,000. Generally speaking, the
null hypothesis is developed for the purpose of testing. We either reject or fail to reject
the null hypothesis. The null hypothesis is a statement that is not rejected unless our
sample data provide convincing evidence that it is false.

We should emphasize that, if the null hypothesis is not rejected on the basis of the
sample data, we cannot say that the null hypothesis is true. To put it another way, failing
to reject the null hypothesis does not prove that H0 is true; it means we have failed to
disprove H0. To prove without any doubt the null hypothesis is true, the population pa-
rameter would have to be known. To actually determine it, we would have to test, sur-
vey, or count every item in the population. This is usually not feasible. The alternative is
to take a sample from the population.

Often, the null hypothesis begins by stating, “There is no significant difference
between . . .” or “The mean impact strength of the glass is not significantly different

ONE-SAMPLE TESTS OF HYPOTHESIS 321

from. . . .” When we select a sample from a population, the sample statistic is usually
numerically different from the hypothesized population parameter. As an illustration,
suppose the hypothesized impact strength of a glass plate is 70 psi, and the mean im-
pact strength of a sample of 12 glass plates is 69.5 psi. We must make a decision about
the difference of 0.5 psi. Is it a true difference, that is, a significant difference, or is the
difference between the sample statistic (69.5) and the hypothesized population param-
eter (70.0) due to chance (sampling)? To answer this question, we conduct a test of
significance, commonly referred to as a test of hypothesis. To define what is meant by a
null hypothesis:

NULL HYPOTHESIS A statement about the value of a population parameter
developed for the purpose of testing numerical evidence.

ALTERNATE HYPOTHESIS A statement that is accepted if the sample data provide
sufficient evidence that the null hypothesis is false.

LEVEL OF SIGNIFICANCE The probability of rejecting the null hypothesis when
it is true.

The alternate hypothesis describes what you will conclude if you reject the null
hypothesis. It is written H1 and is read “H sub one.” It is also referred to as the research
hypothesis. The alternate hypothesis is accepted if the sample data provide us with
enough statistical evidence that the null hypothesis is false.

The following example will help clarify what is meant by the null hypothesis and the
alternate hypothesis. A recent article indicated the mean age of U.S. commercial aircraft
is 15 years. To conduct a statistical test regarding this statement, the first step is to
determine the null and the alternate hypotheses. The null hypothesis represents the
current or reported condition. It is written H0: μ = 15. The alternate hypothesis is that
the statement is not true, that is, H1: μ ≠ 15. It is important to remember that no matter
how the problem is stated, the null hypothesis will always contain the equal sign. The
equal sign (=) will never appear in the alternate hypothesis. Why? Because the null
hypothesis is the statement being tested, and we need a specific value to include in our
calculations. We turn to the alternate hypothesis only if the data suggest the null hypoth-
esis is untrue.

Step 2: Select a Level of Significance
After setting up the null hypothesis and alternate hypothesis, the next step is to state
the level of significance.

The level of significance is designated α, the Greek letter alpha. It is also sometimes
called the level of risk. This may be a more appropriate term because it is the risk you
take of rejecting the null hypothesis when it is really true.

There is no one level of significance that is applied to all tests. A decision is made
to use the .05 level (often stated as the 5% level), the .01 level, the .10 level, or any
other level between 0 and 1. Traditionally, the .05 level is selected for consumer re-
search projects, .01 for quality assurance, and .10 for political polling. You, the re-
searcher, must decide on the level of significance before formulating a decision rule and
collecting sample data.

322 CHAPTER 10

To illustrate how it is possible to reject a true hypothesis, suppose
a firm manufacturing personal computers uses a large number of
printed circuit boards. Suppliers bid on the boards, and the one with the
lowest bid is awarded a sizable contract. Suppose the contract speci-
fies that the computer manufacturer’s quality-assurance department
will randomly sample all incoming shipments of circuit boards. If more
than 6% of the boards sampled are substandard, the shipment will be
rejected. The null hypothesis is that the incoming shipment of boards
meets the quality standards of the contract and contains 6% or less
defective boards. The alternate hypothesis is that more than 6% of the
boards are defective.

A shipment of 4,000 circuit boards was received from Allied
Electronics, and the quality assurance department selected a random
sample of 50 circuit boards for testing. Of the 50 circuit boards

sampled, 4 boards, or 8%, were substandard. The shipment was rejected because it
exceeded the maximum of 6% substandard printed circuit boards. If the shipment
was actually substandard, then the decision to return the boards to the supplier
was correct.

However, because of sampling error, there is a small probability of an incorrect de-
cision. Suppose there were only 40, or 4%, defective boards in the shipment (well under
the 6% threshold) and 4 of these 40 were randomly selected in the sample of 50. The
sample evidence indicates that the percentage of defective boards is 8% (4 out of 50 is
8%) so we reject the shipment. But, in fact, of the 4,000 boards, there are only 40 defec-
tive units. The true defect rate is 1.00%. In this instance our sample evidence estimates
8% defective but there is only 1% defective in the population. Based on the sample evi-
dence, an incorrect decision was made. In terms of hypothesis testing, we rejected the
null hypothesis when we should have failed to reject the null hypothesis. By rejecting a
true null hypothesis, we committed a Type I error. The probability of committing a Type
I error is represented by the Greek letter alpha (α).

© Jim Stern/Bloomberg via Getty Images

TYPE I ERROR Rejecting the null hypothesis, H0, when it is true.

TYPE II ERROR Not rejecting the null hypothesis when it is false.

The other possible error in hypothesis testing is called Type II error. The probability
of committing a Type II error is designated by the Greek letter beta (β).

The firm manufacturing personal computers would commit a Type II error if, un-
known to the manufacturer, an incoming shipment of printed circuit boards from Allied
Electronics contained 15% substandard boards, yet the shipment was accepted. How
could this happen? A random sample of 50 boards could have 2 (4%) substandard
boards, and 48 good boards. According to the stated procedure, because the sample
contained less than 6% substandard boards, the decision is to accept the shipment. This
is a Type II error. While this event is extremely unlikely, it is possible based on the
process of randomly sampling from a population. In a later section, we show how to
calculate the probability of a Type II error.

In retrospect, the researcher cannot study every item or individual in the popula-
tion. Thus, there is a possibility of two types of error—a Type I error, wherein the null
hypothesis is rejected when it should not be rejected, and a Type II error, wherein the
null hypothesis is not rejected when it should have been rejected.

We often refer to the probability of these two possible errors as alpha, α, and
beta, β. Alpha (α) is the probability of making a Type I error, and beta (β) is the probability
of making a Type II error. The following table summarizes the decisions the researcher
could make and the possible consequences.

ONE-SAMPLE TESTS OF HYPOTHESIS 323

Step 3: Select the Test Statistic
There are many test statistics. In this chapter, we use both z and t as the test statistics.
In later chapters, we will use such test statistics as F and χ2, called chi-square.

Researcher

Null Does Not Reject Rejects
Hypothesis H0 H0

H0 is true
Correct Type I

decision error

H0 is false
Type II Correct

error decision

TEST STATISTIC A value, determined from sample information, used to determine
whether to reject the null hypothesis.

In hypothesis testing for the mean (μ) when σ is known, the test statistic z is com-
puted by:

TESTING A MEAN, σ KNOWN z =
x − μ
σ∕√n

(10–1)

The z value is based on the sampling distribution of x , which follows the normal dis-
tribution with a mean (μ

x ) equal to μ and a standard deviation σ x, which is equal to σ∕√n.
We can thus determine whether the difference between x and μ is statistically significant
by finding the number of standard deviations x is from μ, using formula (10–1).

Step 4: Formulate the Decision Rule
A decision rule is a statement of the specific conditions under which the null hypothesis
is rejected and the conditions under which it is not rejected. The region or area of rejec-
tion defines the location of all those values that are so large or so small that the proba-
bility of their occurrence under a true null hypothesis is rather remote.

Chart 10–1 portrays the rejection region for a test of significance that will be con-
ducted later in the chapter.

Scale of z0 1.645
Critical
value

Region of
rejection

Do not
reject Ho

Probability = .95 Probability = .05

CHART 10–1 Sampling Distribution of the Statistic z, a Right-Tailed Test, .05 Level of Significance

324 CHAPTER 10

Note in the chart that:

• The area where the null hypothesis is not rejected is to the left of 1.645. We will
explain how to get the 1.645 value shortly.

• The area of rejection is to the right of 1.645.
• A one-tailed test is being applied. (This will also be explained later.)
• The .05 level of significance was chosen.
• The sampling distribution of the statistic z follows the normal probability

distribution.
• The value 1.645 separates the regions where the null hypothesis is rejected and

where it is not rejected.
• The value 1.645 is the critical value.

CRITICAL VALUE The dividing point between the region where the null hypothesis
is rejected and the region where it is not rejected.

Step 5: Make a Decision
The fifth step in hypothesis testing is to compute the value of the test statistic, compare
its value to the critical value, and make a decision to reject or not to reject the null
hypothesis. Referring to Chart 10–1, if, based on sample information, z is computed to
be 2.34, the null hypothesis is rejected at the .05 level of significance. The decision to
reject H0 was made because 2.34 lies in the region of rejection, that is, beyond 1.645.
We reject the null hypothesis, reasoning that it is highly improbable that a computed z
value this large is due to sampling error (chance).

Had the computed value been 1.645 or less, say 0.71, the null hypothesis is not
rejected. It is reasoned that such a small computed value could be attributed to chance,
that is, sampling error. As we have emphasized, only one of two decisions is possible in
hypothesis testing—either reject or do not reject the null hypothesis.

However, because the decision is based on a sample, it is always possible to make
either of two decision errors. It is possible to make a Type I error when the null hypoth-
esis is rejected when it should not be rejected. Or it is also possible to make a Type II
error when the null hypothesis is not rejected and it should have been rejected. Fortu-
nately, we select the probability of making a Type I error, α (alpha), and we can compute
the probabilities associated with a Type II error, β (beta).

Step 6: Interpret the Result
The final step in the hypothesis testing procedure is to interpret the results. The process
does not end with the value of a sample statistic or the decision to reject or not reject
the null hypothesis. What can we say or report based on the results of the statistical
test? Here are two examples:

• An investigative reporter for a Colorado newspaper reports that the mean monthly
income of convenience stores in the state is $130,000. You decide to conduct
a test of hypothesis to verify the report. The null hypothesis and the alternate
hypothesis are:

H0: μ = $130,000

H1: μ ≠ $130,000

A sample of convenience stores provides a sample mean and standard deviation,
and you compute a z statistic. The results of the hypothesis test result in a decision
to not reject the null hypothesis. How do you interpret the result? Be cautious with

STATISTICS IN ACTION

During World War II, allied
military planners needed
estimates of the number of
German tanks. The infor-
mation provided by tradi-
tional spying methods was
not reliable, but statistical
methods proved to be valu-
able. For example, espio-
nage and reconnaissance
led analysts to estimate
that 1,550 tanks were pro-
duced during June 1941.
However, using the serial
numbers of captured tanks
and statistical analysis, mil-
itary planners estimated
that only 244 tanks were
produced. The actual num-
ber produced, as deter-
mined from German
production records, was
271. The estimate using
statistical analysis turned
out to be much more
accurate. A similar type
of analysis was used to
estimate the number of
Iraqi tanks destroyed
during Desert Storm.

ONE-SAMPLE TESTS OF HYPOTHESIS 325

your interpretation because by not rejecting the null hypothesis, you did not prove
the null hypothesis to be true. Based on the sample data, the difference between
the sample mean and hypothesized population mean was not large enough to re-
ject the null hypothesis.

• In a recent speech to students, the dean of the College of Business reported that
the mean credit card debt for college students is $3,000. You decide to conduct a
test of the dean’s statement or hypothesis to investigate the statement’s truth. The
null hypothesis and the alternate hypothesis are:

H0: μ = $3,000

H1: μ ≠ $3,000

A random sample of college students provides a sample mean and standard devia-
tion, and you compute a z statistic. The hypothesis test results in a decision to reject
the null hypothesis. How do you interpret the result? The sample evidence does not
support the dean’s statement. Based on the sample data, the mean amount of stu-
dent credit card debt is different from $3,000. You have disproved the null hypothesis
with a stated probability of a Type I error, α. That is, there is a small probability that the
decision to reject the null hypothesis was an error due to random sampling.

SUMMARY OF THE STEPS IN HYPOTHESIS TESTING

1. Establish the null hypothesis (H0) and the alternate hypothesis (H1).
2. Select the level of significance, that is, α.
3. Select an appropriate test statistic.
4. Formulate a decision rule based on steps 1, 2, and 3 above.
5. Make a decision regarding the null hypothesis based on the sample information.
6. Interpret the results of the test.

Before actually conducting a test of hypothesis, we describe the difference
between a one-tailed and a two-tailed hypothesis test.

ONE-TAILED AND TWO-TAILED
HYPOTHESIS TESTS
Refer to Chart 10–1. It shows a one-tailed test. It is called a one-tailed test because the
rejection region is only in one tail of the curve. In this case, it is in the right, or upper, tail
of the curve. To illustrate, suppose that the packaging department at General Foods
Corporation is concerned that some boxes of Grape Nuts are significantly overweight.
The cereal is packaged in 453-gram boxes, so the null hypothesis is H0: μ ≤ 453. This is
read, “the population mean (μ) is equal to or less than 453.” The alternate hypothesis is,
therefore, H1: μ > 453. This is read, “μ is greater than 453.” Note that the inequality
sign in the alternate hypothesis (>) points to the region of rejection in the upper tail.
(See Chart 10–1.) Also observe that the null hypothesis includes the equal sign. That is,
H0: μ ≤ 453. The equality condition always appears in H0, never in H1.

Chart 10–2 portrays a situation where the rejection region is in the left (lower) tail of
the standard normal distribution. As an illustration, consider the problem of automobile
manufacturers, large automobile leasing companies, and other organizations that pur-
chase large quantities of tires. They want the tires to average, say, 60,000 miles of wear
under normal usage. They will, therefore, reject a shipment of tires if tests reveal that
the mean life of the tires is significantly below 60,000 miles. They gladly accept a ship-
ment if the mean life is greater than 60,000 miles! They are not concerned with this
possibility, however. They are concerned only if they have sample evidence to conclude

LO10-3
Distinguish between a
one-tailed and a two-
tailed test of hypothesis.

STATISTICS IN ACTION

LASIK is a 15-minute surgi-
cal procedure that uses a
laser to reshape an eye’s
cornea with the goal of im-
proving eyesight. Research
shows that about 5% of all
surgeries involve complica-
tions such as glare, corneal
haze, overcorrection or un-
dercorrection of vision, and
loss of vision. In a statistical
sense, the research tests
a null hypothesis that the
surgery will not improve
eyesight with the alternative
hypothesis that the surgery
will improve eyesight. The
sample data of LASIK sur-
gery shows that 5% of all
cases result in complica-
tions. The 5% represents a
Type I error rate. When a
person decides to have the
surgery, he or she expects
to reject the null hypothesis.
In 5% of future cases, this
expectation will not be met.
(Source: American Academy
of Ophthalmology Journal,
Vol. 16, no. 43.)

326 CHAPTER 10

that the tires will average less than 60,000 miles of useful life. Thus, the test is set up to
satisfy the concern of the