(Ebook PDF) Mind On Statistics: Australian & New Zealand 2nd All Chapter Instant Download
(Ebook PDF) Mind On Statistics: Australian & New Zealand 2nd All Chapter Instant Download
com
https://ebooksecure.com/product/ebook-pdf-mind-on-
statistics-australian-new-zealand-2nd/
OR CLICK BUTTON
DOWLOAD EBOOK
http://ebooksecure.com/product/ebook-pdf-social-psychology-
australian-new-zealand-edition-2nd-edition/
http://ebooksecure.com/product/ebook-pdf-pharmacology-in-nursing-
australian-new-zealand-edition-2nd-edition/
http://ebooksecure.com/product/original-pdf-health-assessment-
physical-examination-australian-new-zealand-edition-2nd-edition/
http://ebooksecure.com/product/ebook-pdf-mind-on-statistics-5th-
edition/
(eBook PDF) Campbell Biology 10th Australian New
Zealand Edition
http://ebooksecure.com/product/ebook-pdf-campbell-biology-10th-
australian-new-zealand-edition/
http://ebooksecure.com/product/ebook-pdf-social-psychology-
australian-and-new-zealand-edition/
http://ebooksecure.com/product/ebook-pdf-psychology-5th-
australian-and-new-zealand-edition/
http://ebooksecure.com/product/ebook-pdf-psychology-australian-
and-new-zealand-edition-3rd-edition/
http://ebooksecure.com/product/original-pdf-campbell-
biology-11th-australian-and-new-zealand-edition/
MIND ON STATISTICS
VIII CONTENTS
• Chapters 6, 7, 10, 11, 5, with the choice of omitting some of the later sections of some chapters,
and then continuing to Chapters 8 and/or 9 in either order.
• The above could be varied by moving Chapter 5 to before Chapter 10 or after Chapters 8 and/or
9 if desired.
PREFACE XIII
• Chapter 12 could be included after Chapter 6 in either the original sequence or in the sequences
above.
• Chapters 6 and 9, preceded by either (Sections 4.5 and 4.6), or (Sections 10.1–10.4 and Sections
11.1, 11.5–11.7), could be followed by Section 14.1.
• Section 14.2 could be preceded by Chapter 6, and either (Sections 4.5 and 4.6), or (Sections 10.1–
10.4 and Sections 11.1, 11.5–11.7), and Chapter 12.
It should be noted that introducing statistical inference via categorical data, with hypothesis
testing as in Chapter 5, and interval estimation as in Chapter 7, has proven remarkably successful
for all disciplines and quantitative inclinations or non-inclinations.
• Each chapter starts with a rich vignette of universal interest, which is revisited at the end of the
chapter as a case study illustrating the material of the chapter.
• The Keep in mind feature captures core concepts, results and procedures.
• The Mind your step feature provides cautions and pointers to common mistakes or
misunderstandings.
• The Signposts at the beginning of each chapter outline what will be covered by each section.
• An extensive Key terms glossary has been added to each chapter, to maximise student
understanding of the key terms as they appear within each chapter of the book.
• Sections 4.5 and 4.6 have been added as described above, and are also intended to optimise
flexibility.
• The number of available datasets has been considerably increased.
• As well as the addition of vignettes, examples have been updated to include new scenarios and
data.
• The Bringing minds together feature replaces ‘For discussion’.
In addition, all the text has been carefully examined and modified in the light of reviewers’
comments and to enhance reader-friendly exposition, accessibility of concepts and language, as
well as progression and flexibility. For example, the former Section 4.5 has (mostly) moved to
become Section 6.5. The alternative procedures in Chapter 7 are now mentioned only at the
chapter’s end. In Chapter 10, Section 10.1 discusses sample statistics generally, and non-
parametric procedures have their own section. More of the sections which provide background or
may be chosen to be omitted are moved further to the end of chapters. The large number of
exercises has been retained.
ACKNOWLEDGEMENTS
Thanks to all the colleagues in Australia, New Zealand, UK, South Africa, Canada, USA and
elsewhere internationally, who have used or commented on my materials and strategies, and with
whom I have shared many discussions on the never-ending challenges, fascinations and
richnesses of teaching, learning and students in statistics. Thanks to the Australian and New
Zealand statistical education communities for their initiatives and dedication in statistics
education. Thanks to all lecturing and tutoring staff who have worked with me at a number of
universities and who care passionately about helping students’ learning in statistics.
Thanks to Jessica Utts and Robert Heckard for the spirit of their book and the opportunity to
build on that spirit.
Thanks to all the reviewers and colleagues who have provided valuable feedback and comments –
both general and detailed – on the book. Your efforts and dedication are much appreciated.
Thanks to the Australian Learning and Teaching Council (ALTC) for the many and far-reaching
opportunities given to me by the award of a national Senior Teaching Fellowship, and the
subsequent support as an ongoing Fellow.
Thanks to the thousands of students who worked with commitment in their statistics courses,
and who developed statistical understanding, confidence and appreciation of the importance and
power of statistical thinking beyond their expectations. Their enthusiasm and teamwork in their
data investigations and problem-solving and their willingness to share their thoughts with me
have contributed to my ongoing learning in understanding students’ learning in statistics.
Thanks to the judges of the Australian Educational Publishing Awards for their comments in
awarding the first edition a joint winner in the 2011 Tertiary Education Awards – Teaching and
Learning category (Adaptation).
My sincere appreciation and gratitude to Fiona Hammond, Emily Spurr, Kylie McInnes,
Michaela Skelly, Greg Alford and the staff of Cengage, without whom this book could not, and
would not, have been written. Finally, my thanks and gratitude to my family for their ongoing
support, encouragement and patience, especially Bernie, Bryony and Jen.
The authors and Cengage Learning would also like to thank the following reviewers for their
incisive and helpful feedback:
Helen MacGillivray
Helen’s university teaching and curriculum design experience extends across many areas of
statistical sciences and their applications, across all levels of subjects, all class sizes and most
disciplines. Her work has received support through many national or university grants, including a
national (ALTC) leadership grant and one of the first national (ALTC) Senior Fellowships. She has
published widely, including textbooks, book chapters and more than 80 refereed, keynote or
invited papers, and delivered approximately 100 local, national or international presentations and
workshops on learning, teaching and assessment in statistics and quantitative learning support.
Helen was the first female President and the first female Honorary Life Member of the
Statistical Society of Australia Inc. (SSAI), and is now a Vice-President of the International
Statistical Institute. She is a past president of the International Association for Statistical
Education, a Fellow of the Royal Statistical Society, and has also been President of the Australian
Mathematical Sciences Council, Board member of the Federation of Australian Scientific and
Technological Sciences and a member of the Institutional Grants Committee of the ARC. She is
joint chair and editor of OZCOTS, the Australian Conference on Teaching Statistics, and member
of the International Programme Executives for both the 8th and 9th International Conferences on
Teaching Statistics. She has been a member of the organising or editorial committees for many
conferences, including World Statistics Conferences, Australian Statistics Conferences, Southern
Hemisphere Conferences on Undergraduate Mathematics and Statistics Teaching and Learning,
and Australasian Engineering Education Conferences. She has chaired reviews of university
departments and centres, and worked as a consultant on teaching statistics in Australian
universities, and with the Royal Statistical Society Centre in Statistical Education and the UK
Learning and Teaching Support Network for Mathematics, Statistics and OR. She is also currently
Australian representative on the editorial board of the journal Teaching Statistics.
Helen has played key roles in mathematics and statistics school education with the Queensland
Studies Authority, the Australian national curriculum and the Australian Mathematical Sciences
Institute’s TIMES project. This has included work on syllabus committees, state panels, core skills
scrutineering and as a statistical adviser in research and moderation. She has given many
professional development workshops for teachers, and a variety of successful extension and
enrichment programs in mathematics and statistics for high school students.
Jessica M Utts
Jessica Utts is a Professor of Statistics at the University of California at Irvine, previously at Davis
where she joined the faculty in 1978. She received her BA in Math and Psychology at SUNY
Binghamton, and her MA and PhD in Statistics at Penn State University. She is the author of Seeing
Through Statistics (3rd edition, 2005) and the co-author with Robert Heckard of Statistical Ideas and
Methods (1st edition, 2006) both published by Duxbury Press, an imprint of Cengage Learning. She is
also the Editor-in-Chief of CYBERSTATS, an interactive online introductory statistics course.
Jessica has been active in the statistics education community at the high school and college level.
ABOUT THE AUTHORS XVII
She served as a member and then chaired the Advanced Placement Statistics Development
Committee for six years, and was a member of the American Statistical Association task force that
produced the GAISE (Guidelines for Assessment and Instruction in Statistics Education)
recommendations for Elementary Statistics courses. She is the recipient of the Academic Senate
Distinguished Teaching Award and the Magnar Ronning Award for Teaching Excellence, both at
the University of California at Davis. She is also a Fellow of the American Statistical Association,
the Institute of Mathematical Statistics and the American Association for the Advancement of
Science. Beyond statistics education, Jessica’s major contributions have been in applying statistics
to a variety of disciplines, most notably to parapsychology, the laboratory study of psychic
phenomena.
Robert F Heckard
Robert Heckard is a Senior Lecturer in Statistics at the Pennsylvania State University where he has
taught for over 30 years. He has taught introductory and intermediate applied statistics to more
than 15 000 college students. Bob has been awarded several grants to develop multimedia and
web-based instructional materials for teaching statistical concepts. He is the co-author of Statistical
Ideas and Methods (1st edition, 2006) and is a co-author of CYBERSTATS, a web-based introductory
course. As a consultant, he is active in the statistical analysis and design of highway safety
research and has frequently been a consultant in cancer treatment clinical trials.
RESOURCES GUIDE
As you read this text you will find a number of features to enhance your study of
statistics and help you understand its applications. We have added a number of new
features, based on your feedback, to help you navigate through the text and find what
is most important. These features are indicated by the new to this edition! icon. NEW TO TH
EDITIO IS
N!
5
Investigating categorical variables and their
relationships
• row and column percentages for two categorical variables and their interpretation in context
•
Chapter objectives give you a clear
the possible effects of other variables on the relationships between two categorical variables
• concepts and principles of testing statistical hypotheses, test statistics and the interpretation of p-values
and what you should be able to do after Section 5.1 summarises key aspects of collecting, presenting and summarising data on categorical variables and poses
SIGNPOSTS
Section 5.2 looks at exploration and presentation of data from more than two categorical variables, and the importance of
considering the effect(s) of possible confounding or lurking variables, including the phenomenon known as
Simpson’s paradox.
Section 5.3 introduces the chi-square test for the statistical hypothesis specifying a set of proportions for one categorical
uses the test procedure of Section 5.3 to discuss p-values and the general principles of statistical hypothesis
Not only is this a voluntary online survey, but the total number of respondents is not given and there is no
disclaimer as in Example 7.1 above. On an episode (24 October 2012) of the Gruen Planet (ABC TV), the Guardian
was praised for its many ways of interacting with, and involving its readers online. How do you think the above
poll could be improved while retaining interaction with readers?
EXAMPLE 7.8
‘Pension savings gender gap widens’
Relevant examples form the basis for
A report ‘Pension savings gender gap widens’ in the UK’s Guardian by Hilary Osborn, 22 October 2012 (http://
www.guardian.co.uk/money/2012/oct/22/pension-savings-gender-gap-widens), includes the following: discussion in each chapter and walk
The report, which was based on interviews with 5,200 adults, found the number of women saving
nothing at all increased year on year from 23% in 2011 to 26% today. The proportion of men not
saving for retirement stands at 19%.
Women who are saving put aside £203.21 a month on average, down from £227 in previous years,
you step-by-step through real-life uses
of statistical concepts.
and 29% are saving on a regular basis.
5 200 is a lot of observations, but the report does not give the breakup into men and women which is critical
information as the emphasis is on comparing men and women. Also it is not clear further in the article if the
29% who are saving on a regular basis is a percentage of the women who are saving or of all the women in the
survey.
In all of the above reports, percentages obtained from data are an integral part of the story, and
all are being used, explicitly or implicitly, as estimates of proportions or probabilities in more
general situations or populations that the data are being assumed to randomly represent. Such
estimates are called point estimates because they provide a single value with no information about
the error of the estimate. Other names that are used are sample estimates or sample statistics. The
quantities they are being used to estimate are called parameters. A parameter is a quantity that is a
measure of some feature or characteristic of a general situation or population.
In considering how appropriate and how good these estimates are, we need to know how the
LINK ME
^2 ¼ sample proportion for the sample from the second situation or population.
p
Keep in mind margin icons The point estimate of the difference between the proportion parameters is the sample statistic
^1 p
p ^2 ¼ the difference in sample proportions.
NEW LINK ME You will have noticed in Section 7.2 that the value 1.96 used in the 95% confidence intervals
TO TH
EDITIO IS
and highlighted text has Section 7.5 comes from the standard normal, and that the interval (1.96, 1.96) has probability 0.95 lying in it
for the standard normal. Section 7.5 explains that we can use this because for large values of n, the
sample proportion has an approximately normal distribution. Similarly, the distribution of the
N!
been selected by the author difference of proportions in two independent samples is approximately normal for sufficiently
large values of the sample sizes n1 and n2, provided the samples are independent and randomly
selected from the two situations or populations of interest.
concepts and knowledge that If a different confidence level is wanted, the appropriate standard normal multiplier can be
used instead of 1.96.
you ‘need to know’. You can see the similarities between this confidence interval and the traditional approximate
method for a single proportion; these are discussed in Section 7.5.
The conditions that must be met in order to use the above approximate confidence interval are:
1 Independent sample proportions are available. These could be based on two independent,
randomly selected samples on a categorical variable, or from a randomly selected sample on
two categorical variables, one of which has two categories that we wish to compare.
2 The sample sizes must be reasonably large and the proportions not too close to 0 or 1. One ‘rule
of thumb’ that is sometimes used to help ensure that the approximations are reasonable is that
^1 , n1 ð1 p
all of the quantities n1 p ^2 , and n2 ð1 p
^1 Þ, n2 p ^2 Þ are at least 10. These quantities
represent the counts observed in the category of interest, and not in that category, respectively,
EXAMPLE 7.11
(CONTINUED)
NEW where chapter concepts link
Lift or stairs and time of day TO TH
Consider the data in Table 7.2 to compare proportions of people who use the lift to go up in the morning peak EDITIO IS
N!
time with those in the evening peak time. First let us consider the condition we need that we have independent
samples. Do we? We have no way of knowing how many people in the morning sample and in the evening sample
are the same and, if so, is it random if they use the lift or the stairs? The morning and evening samples might be
to concepts in other chapters,
to help you connect the
taken in times or in places in the bus station such that we can assume independence. The bus station is clearly a
very busy place as it was easy to obtain large samples. If we have context knowledge of the bus station and
exactly where and when the samples were taken, we would be able to better comment on this. This illustrates
the importance of reporting as much as possible about the context of the data and how they were collected.
Here all we can do with the information given is to state that we are assuming that the two sample
proportions of morning and evening commuters who use the lift when going up are independent. There is
nothing wrong with stating this assumption. Stating assumptions is very important in all statistical work and all
theory and concepts in a
statistical reports.
larger context.
first challenge was to define what they meant by ‘dissolved’ in order to have a consistent measure of time to
dissolve. They decided to define the tablet as ‘dissolved’ when the form of tablet first disappeared, that is, when it
first could not be discerned that there was a tablet. They decided not to stir the liquid to avoid the difficulty of
ensuring the same stirring conditions each time. They chose their experimental conditions to be water temperature
(two temperatures, one approximately room temperature and one cool), pH of water (neutral and a selected
slightly acidic one), and water type (normal and slightly salt water at a selected concentration). Five brands of
soluble aspirin were tested and three tablets from each were tested for each of the eight combinations of
NEW
TO TH
EDITIO IS
boxes appear throughout Bringing minds together 2.2
N!
each chapter and encourage For chemistry fans: is there any other information about the design of this experiment that should be
reported?
manufacturing process, it is likely that the company would want to be more precise in a 95%
picture focuses on the comparison, that is, on the parts relative to the whole and this comparison is the same. confidence interval for the proportion of defectives. A range of just below 2% up to just over 7%
The category ‘other’ is very small and could be combined with one of the other categories. Which category would may not be sufficiently precise in a situation where hundreds of thousands of cells are being sold
you choose to combine it with? This depends on the context. In this case, looking at the speeds of the users in by the company. We will see in Section 7.5 how many observations need to be collected in order to
the ‘other’ category of transport shows that they are close in speed to joggers, so they could be combined with
give a more precise estimate for this company. In that section, we will also see that it takes many
the joggers. An alternative procedure could be to omit them from the dataset because there are so few, but the
more observations to reliably estimate a proportion close to 0.5 than it does to
usual procedure is not to throw data away without good reason, and in the case of categorical variables, we
often combine categories in real data investigations. estimate a proportion close to 0 or 1.
percentage fall into the categories of a ‘Categorical variables’ box, specify the
including Minitab, Excel, SPSS, HOW MANY PASSERSBY WOULD NOTICE THIS?
C. Watterson, QUT
or attention, with categories ‘no attention’, ‘some
• To create a frequency table for one • To create a bar chart or pie chart for one attention’ and ‘considerable attention’. A large
categorical variable, use categorical variable, continue from box was placed in a public thoroughfare for
SPSS tip
Analyze>Descriptive
Statistics>Frequencies.
creating frequencies by selecting Charts
and selecting the desired chart. interesting real-world scenarios, to help 1 hour on three days with sufficient separation
to be able to assume no overlap of people or
sufficient separation of the experience. On the first day, the box was plain. On the second day, the box had the
The history of the pie chart is very interesting. It was developed by William Playfair in 1801.
Florence Nightingale used a form of pie chart with great effect in the presentation of her data on
develop your understanding of concepts visual stimuli as shown in the accompanying photograph. On the third day the box had no visual stimuli but a
mobile phone was constantly ringing inside the box. On each day, observations were taken on 51 randomly
selected ‘groups’ where the number in the group was also recorded (taking values of 1, 2,. . .). The numbers that
Dataset:
Human Curiosity
causes of mortality in military hospitals. Pie charts can be used only for one categorical variable so
have limited use. Unfortunately fancy versions have also been invented that often distort the very
feature that makes a pie chart useful and representative of the data – namely the areas of the
covered in the text. fell in the category of ‘no attention’ were 35 for the plain box, 14 for the visually decorated box, and 36 for the
box with the phone constantly ringing.
From these data, an approximate 95% confidence interval for the probability that no notice is taken of a
pieces of pie which give the relative frequencies of the categories. Three-dimensional pie charts plain box is (0.56, 0.81). And a 95% confidence interval for the probability that no notice is taken of a box with
and doughnut pie charts are poor graphs because neither the third dimension nor the doughnut considerable visual stimuli is (0.15, 0.4).
‘hole’ represents any information, and the essential information of the accurate representation of In the first case, the accurate 95% confidence interval from software is (0.17, 0.41). Notice that the difference
the relative frequencies is hidden or distorted. Similarly, three-dimensional bar charts are poor between the traditional 95% approximate confidence interval and the Jeffrey 95% accurate confidence interval is
graphs because the third dimension does not represent any information and distorts the cc
presentation of the information, which is the height of the bars. Section 3.7 comments on some
reasons for graphs not doing the job they are supposed to do. The various ‘innovations’ in pie
charts seem to be some of the worst in this respect.
At the end of each chapter you’ll find several tools to help you to review the chapter and key learning
concepts, and also to help extend your learning.
The summary section recaps the keys points from each section of the chapter, giving you a snapshot of the
important concepts covered.
Exercises are found at the end of each chapter. They include questions designed for practice and review, as well
as conceptual and data analysis exercises.
Answers to selected exercises are Exercises marked with # have related Key terms are bolded when first
indicated by blue question numbers in datasets that can be found on introduced in the text and are
the exercises section. These questions the CourseMate website. A list of listed with a definition at the end of
have solutions in the back of the text examples and cases that relate to each chapter. You will also find them
for checking your answers and guiding these datasets can also be found on the defined in the full glossary on the
your thinking on similar problems. CourseMate website. CourseMate site.
102 MIND ON STATISTICS CHAPTER 4 DATA SUMMARIES AND INFERENTIAL CONCEPTS 139 122 MIND ON STATISTICS
We have already seen the interquartile range and, to a lesser extent, the overall range
Summary Exercises
(maximum – minimum) of the data in commenting on how spread out or how concentrated the
# Denotes dataset is available on the website but is not required to solve the exercise. data are.
3.1 Exploratory data analysis (EDA) of the data. Each type of plot has its advantages and
Blue-numbered exercises have answers in the back of the text and on the website.
disadvantages.
Exploratory data analysis (EDA) refers to procedures to present
data in an informative way, using graphical, pictorial and 3.4 Continuous and categorical data Sections 4.1–4.4
Boxplot of price ($)
summary methods. 4.1 # Student textbooks (refer Example 3.5): EXAMPLE 4.3 (CONTINUED)
It is often desired to compare continuous data across categories
Data were collected on all textbooks from the university 200 *
3.2 Categorical data of categorical variables. Any of the plots of Section 3.3 may be * Referring to the fish lengths of Table 4.1, we can find the quartiles. For the 57 observations on fish lengths
bookshop with staff permission. Course notes were not * * (without the shark):
used provided the same scale is used. Dotplots and boxplots tend
Categorical data fall into categories and are reported in terms of included, nor were general reference books. To qualify, a 150
to be the most useful because they facilitate comparisons.
frequencies or relative frequencies of observations in categories, book was required to have at least five copies on the shelf. t UIF NFEJBO JT UIF UI PCTFSWBUJPO ¼ NN
Price ($)
either of a single variable or the joint categories of two variables. Where multiple editions of a book were available, only the
3.5 More than one continuous variable most recent was included. The books were classified
100 t FBDI PG UIF UXP AIBMWFT PG UIF EBUB IBWF PCTFSWBUJPOT JO UIFN
Tables summarise these, and visual displays can be pie charts
Scatterplots are essential tools in exploring relationships according to discipline area, whether the cover was hard or * t UIF MPXFS RVBSUJMF JT UIFO IBMGXBZ CFUXFFO UIF UI BOE UI PCTFSWBUJPOT GSPN UIF MPXFS FOE UIJT HJWFT
and bar charts. Pie charts can be used only for single variables
between continuous variables. The relationships may not fit soft, and if it came with a CD. The level of colour used was 50 * ** NN
but bar charts can present data from two-way tables for two (or *
simple representations, may be affected by other variables and considered to be either full (F), some (Y) or none (N). Each *
*
more) categorical variables. Row or column percentages in two- * * t UIF VQQFS RVBSUJMF JT IBMGXBZ CFUXFFO UIF UI BOE UI PCTFSWBUJPOT GSPN UIF VQQFS FOE UIJT HJWFT NN
may involve much variation across the whole range or for book was weighed, its thickness measured and its price *
way tables can indicate or illustrate association between two 0
particular parts of the range. Scatterplots facilitate and guide (in $) and year of publication noted. 5IF JOUFSRVBSUJMF EJTUBODF JT UIFO NN
categorical variables. a The stem-and-leaf plot below is of the prices of Cover H S H S H S
exploration and later analysis. Continuous data collected over Colour F N Y
textbooks in Law. Use this plot to answer the following
3.3 Graphs and plots for one continuous time and dependent on previous observations in time are usually
questions.
explored through time series plots.
variable Stem-and-leaf of Law price N ¼ 72
i Hard cover books are generally more expensive Another measure of spread is the data standard deviation, usually called the sample standard
A dotplot is a plot of the individual observations of raw data of a 3.6 Outlying observations than soft cover books.
Leaf Unit ¼ 1.0 deviation. We find this by first finding the sample variance. You can think of the sample variance as
continuous variable with a dot for each observation, or for a Many datasets have extreme observations which may be part of ii The prices of hard cover books are generally more roughly the average squared distance that values fall from the mean. Put another way, it measures
fixed number of observations in very large datasets. A histogram 4 1 1799 variable than those of soft cover books.
the natural variability or may need individual investigation to variability by summarising how far individual data values are from the mean.
groups continuous data into ‘bins’ based on intervals chosen by 7 2 369
check if they correspond to different conditions or are due to iii The average price of a soft cover textbook with no In words, the sample variance is the total of the squared distances from the sample average,
the investigator or by computer software. If equal-sized intervals 10 3 137
error. No observation should be omitted without good reason. 23 4 2355677777899 colour is $75.
are chosen the heights of the boxes represent frequencies and divided by (the number of observations – 1). We will not do an example of finding this ‘by hand’ as
Graphs and plots can help identify observations that may need 29 5 223666 iv The standard deviation of the prices of soft cover
relative frequencies of the intervals. A histogram is not a bar all technology aids provide it – from calculators to statistical software. If you use a calculator,
checking of circumstances. Sometimes such observations (8) 6 26667799 textbooks with no colour is approximately $50.
chart and the bins must abut each other. A stem-and-leaf plot is provide valuable information. 35 7 257 check that the function you use has the divisor (n – 1) where n is the number of observations.
like a histogram on its side but it retains the original observed v Half of the soft cover books with some colour are
32 8 1145556
more expensive than three-quarters of the soft
values to a certain number of significant figures, with repeated 3.7 Good graphs and bad graphs 25 9 13999 General notation
digits in the ‘leaves’ representing the frequency of the 20 10 334667 cover textbooks with no colour.
Good graphs summarise information in the data in pictures that
corresponding observed value. A boxplot divides the 14 11 224557799 vi The prices of the soft cover books with some colour In general notation, with n observations with values x1, x2 . . ., xn, and denoting the data or sample
provide insight and with clear representation. Bad graphs distort
observations from smallest to largest into four equally 5 12 0239 are skew to the left. average by x, the sample variance can be written as
or misrepresent information, through unnecessary third 1 13 4
sized groups, with identification of the median, the quartiles
dimensions, poor choice of scaling or optical deception. vii Some of the observations should be discarded. !
and individual observations if they are away from the bulk i Obtain the median of the price for Law texts. Xn
(Data source: Textbooks on the website.) ðxi xÞ2 =ðn 1Þ
ii Find the lower quartile of the price for Law texts.
i¼1
4.2 Real-estate data:
iii From the data, estimate the probability that the
Exercises price for a Law textbook is more than $50.
The data are of 280 houses sold in four regions from 2000–
2003. Townhouses and duplexes were omitted from the This is usually denoted by s2. The data or sample standard deviation is then the square root of this
# Denotes dataset is available on the website but is not required to solve the exercise. b The boxplots below are of the prices in A$, classified by study. and is denoted by s.
Blue-numbered exercises have answers in the back of the text and on the website. colour and type of cover (H ¼ hard, S ¼ soft). For each of a The boxplots below are of the selling price per unit land The formula above should not be used for calculations by hand. Any spreadsheet or calculator
the statements following, decide whether the statement area, across four regions. For each of the statements with basic statistical functions will calculate s for you. If you need to calculate s by hand, use
Section 3.2 a Which plot would you choose to represent these data? is an appropriate one to make based only on the boxplots below, decide whether the statement is an appropriate ! !
3.1 In a pilot study of vehicles travelling on residential streets, b Either row or column percentages could be useful for below. one to make based only on the boxplots on the next page. Xn
Online resources
Visit http://login.cengagebrain.com and login using the code card in the front of
this text for 12 months’ access to the CourseMate website. You’ll find an eBook,
interactive self-assessments, datasets, technical manuals, glossary, flashcards,
crosswords, case questions and more tools to help you excel in your studies.