Statistical Methodologies with Medical Applications - Poduri S. R. S. Rao - ebook

Statistical Methodologies with Medical Applications ebook

Poduri S. R. S. Rao

0,0
309,99 zł

Opis

This book presents the methodology and applications of a range of important topics in statistics, and is designed for graduate students in Statistics and Biostatistics and for medical researchers. Illustrations and more than ninety exercises with solutions are presented. They are constructed from the research findings of the medical journals, summary reports of the Centre for Disease Control (CDC) and the World Health Organization (WHO), and practical situations. The illustrations and exercises are related to topics such as immunization, obesity, hypertension, lipid levels, diet and exercise, harmful effects of smoking and air pollution, and the benefits of gluten free diet. This book can be recommended for a one or two semester graduate level course for students studying Statistics, Biostatistics, Epidemiology and Health Sciences. It will also be useful as a companion for medical researchers and research oriented physicians.

Ebooka przeczytasz w aplikacjach Legimi na:

Androidzie
iOS
czytnikach certyfikowanych
przez Legimi
Windows
10
Windows
Phone

Liczba stron: 416




Table of Contents

Cover

Title Page

Topics for illustrations, examples and exercises

Preface

List of abbreviations

1 Statistical measures

1.1 Introduction

1.2 Mean, mode and median

1.3 Variance and standard deviation

1.4 Quartiles, deciles and percentiles

1.5 Skewness and kurtosis

1.6 Frequency distributions

1.7 Covariance and correlation

1.8 Joint frequency distribution

1.9 Linear transformation of the observations

1.10 Linear combinations of two sets of observations

Exercises

2 Probability, random variable, expected value and variance

2.1 Introduction

2.2 Events and probabilities

2.3 Mutually exclusive events

2.4 Independent and dependent events

2.5 Addition of probabilities

2.6 Bayes' theorem

2.7 Random variables and probability distributions

2.8 Expected value, variance and standard deviation

2.9 Moments of a distribution

Exercises

3 Odds ratios, relative risk, sensitivity, specificity and the ROC curve

3.1 Introduction

3.2 Odds ratio

3.3 Relative risk

3.4 Sensitivity and specificity

3.5 The receiver operating characteristic (ROC) curve

Exercises

4 Probability distributions, expectations, variances and correlation

4.1 Introduction

4.2 Probability distribution of a discrete random variable

4.3 Discrete distributions

4.4 Continuous distributions

4.5 Joint distribution of two discrete random variables

4.6 Bivariate normal distribution

Exercises

Appendix A4

5 Means, standard errors and confidence limits

5.1 Introduction

5.2 Expectation, variance and standard error (S.E.) of the sample mean

5.3 Estimation of the variance and standard error

5.4 Confidence limits for the mean

5.5 Estimator and confidence limits for the difference of two means

5.6 Approximate confidence limits for the difference of two means

5.7 Matched samples and paired comparisons

5.8 Confidence limits for the variance

5.9 Confidence limits for the ratio of two variances

5.10 Least squares and maximum likelihood methods of estimation

Exercises

Appendix A5

6 Proportions, odds ratios and relative risks: Estimation and confidence limits

6.1 Introduction

6.2 A single proportion

6.3 Confidence limits for the proportion

6.4 Difference of two proportions or percentages

6.5 Combining proportions from independent samples

6.6 More than two classes or categories

6.7 Odds ratio

6.8 Relative risk

Exercises

Appendix A6

7 Tests of hypotheses: Means and variances

7.1 Introduction

7.2 Principle steps for the tests of a hypothesis

7.3 Right‐sided alternative, test statistic and critical region

7.4 Left‐sided alternative and the critical region

7.5 Two‐sided alternative, critical region and the

p

‐value

7.6 Difference between two means: Variances known

7.7 Matched samples and paired comparison

7.8 Test for the variance

7.9 Test for the equality of two variances

7.10 Homogeneity of variances

Exercises

8 Tests of hypotheses: Proportions and percentages

8.1 A single proportion

8.2 Right‐sided alternative

8.3 Left‐sided alternative

8.4 Two‐sided alternative

8.5 Difference of two proportions

8.6 Specified difference of two proportions

8.7 Equality of two or more proportions

8.8 A common proportion

Exercises

9 The Chisquare statistic

9.1 Introduction

9.2 The test statistic

9.3 Test of goodness of fit

9.4 Test of independence: (

r

x

c

) classification

9.5 Test of independence: (2x2) classification

Exercises

Appendix A9

10 Regression and correlation

10.1 Introduction

10.2 The regression model: One independent variable

10.3 Regression on two independent variables

10.4 Multiple regression: The least squares estimation

10.5 Indicator variables

10.6 Regression through the origin

10.7 Estimation of trends

10.8 Logistic regression and the odds ratio

10.9 Weighted Least Squares (WLS) estimator

10.10 Correlation

10.11 Further topics in regression

Exercises

Appendix A10

11 Analysis of variance and covariance: Designs of experiments

11.1 Introduction

11.2 One‐way classification: Balanced design

11.3 One‐way random effects model: Balanced design

11.4 Inference for the variance components and the mean

11.5 One‐way classification: Unbalanced design and fixed effects

11.6 Unbalanced one‐way classification: Random effects

11.7 Intraclass correlation

11.8 Analysis of covariance: The balanced design

11.9 Analysis of covariance: Unbalanced design

11.10 Randomized blocks

11.11 Repeated measures design

11.12 Latin squares

11.13 Cross‐over design

11.14 Two‐way cross‐classification

11.15 Missing observations in the designs of experiments

Exercises

Appendix A11

12 Meta‐analysis

12.1 Introduction

12.2 Illustrations of large‐scale studies

12.3 Fixed effects model for combining the estimates

12.4 Random effects model for combining the estimates

12.5 Alternative estimators for

12.6 Tests of hypotheses and confidence limits for the variance components

Exercises

Appendix A12

13 Survival analysis

13.1 Introduction

13.2 Survival and hazard functions

13.3 Kaplan‐Meier Product‐Limit estimator

13.4 Standard error of

Ŝ

(

t

m

) and confidence limits for

S

(

t

m

)

13.5 Confidence limits for

S

(

t

m

) with the right‐censored observations

13.6 Log‐Rank test for the equality of two survival distributions

13.7 Cox's proportional hazard model

Exercises

Appendix A13 Expected value and variance of

Ŝ(t

m

)

and confidence limits for

S(t

m

)

14 Nonparametric statistics

14.1 Introduction

14.2 Spearman’s rank correlation coefficient

14.3 The Sign test

14.4 Wilcoxon (1945) Matched‐pairs Signed‐ranks test

14.5 Wilcoxon's test for the equality of the distributions of two non‐normal populations with unpaired sample observations

14.6 McNemer’s (1955) matched pair test for two proportions

14.7 Cochran's (1950)

Q

‐test for the difference of three or more matched proportions

14.8 Kruskal‐Wallis one‐way ANOVA test by ranks

Exercises

15 Further topics

15.1 Introduction

15.2 Bonferroni inequality and the Joint Confidence Region

15.3 Least significant difference (LSD) for a pair of treatment effects

15.4 Tukey's studentized range test

15.5 Scheffe's simultaneous confidence intervals

15.6 Bootstrap confidence intervals

15.7 Transformations for the ANOVA

Exercises

Appendix A15

Solutions to exercises

Appendix tables

References

Index

End User License Agreement

List of Tables

Chapter 01

Table 1.1 Heights (cm), weights (kg) and BMIs of twenty‐year old boys.

Table 1.2 Summary figures for the heights, weights and BMIs of the 20 boys in Table 1.1.

Table 1.3 Frequency distribution of the heights of the 20 boys in Table 1.1.

Table 1.4 Age (

x

) and weight (

y

) of

n

 = 200 adults.

Chapter 03

Table 3.1 Smoking and hypertension.

Table 3.2 Heights (feet), weights (pounds) and relative risks for the (25–59) years‐old‐persons.

Table 3.3 Results of a diagnostic test.

Chapter 04

Table 4.1 Joint probability distribution of age (

X

) and weight (

Y

) of the 200 adults in Table 1.4

Chapter 06

Table 6.1 HDL of 400 obese and 200 nonobese adults.

Chapter 07

Table 7.1 Power of the test for the mean of the LDL with the right‐sided alternative when

.

Table 7.2 Power of the

t

‐test for the mean of the LDL with estimated variance: Right‐sided alternative.

Table 7.3 Power of the test for the mean of the HDL with the left‐sided alternative when

.

Table 7.4 Power for the t‐test for the mean of the HDL with the left‐sided alternative when

; estimated variance.

Table 7.5 Power of the test for the mean of the LDL with the two‐sided alternative;

.

Table 7.6 Power for the two‐side alternative for the mean of the LDL with estimated variance.

Chapter 09

Table 9.1 Observed and expected frequencies of the normal distribution of the 400 weights in 15 classes.

Chapter 10

Table 10.1 ANOVA for the Significance of the Regression with one independent variable.

Table 10.2 Hypotheses, Test Statistics and Confidence Limits; regression on one independent variable.

Table 10.3 ANOVA for the significance of the regression on two independent variables.

Table 10.4 Hypotheses, Test Statistics and Confidence Limits for the regression on two independent variables.

Table 10.5 ANOVA for the significance of the regression on p independent variables.

Table 10.6 Estimators for the regression through the origin.

Chapter 11

Table 11.1 ANOVA for the equality of the treatment effects. One‐way classification.

Table 11.2 Sums of Squares and Cross Products for the Analysis of Covariance.

Table 11.3 Expected values of the mean squares. Randomized blocks design.

Table 11.4 Lipid levels (LDL, HDL) of four age groups after administering four treatments and a control. Randomized blocks design.

Table 11.5 SBPs (top) and DBPs (bottom) of three patients observed over four time intervals. Repeated measures.

Table 11.6 Percentage decrease of the SBP Latin Square design.

Table 11.7 Reduction of the SBP for three treatments. Cross‐over design.

Table 11.8 SBPs for the two‐way weight × fitness program classifications.

Table 11.9 ANOVA for the row and column effects. Unbalanced two‐way Cross classification.

Table 11.10 ANOVA for the row and column effects. Unbalanced two‐way Cross classification with interaction.

Chapter 12

Table 12.1 Average decreases of the SBPs after a medical treatment for 15 groups of adults. Means (

y

i

) and standard deviations (

s

i

).

Chapter 13

Table 13.1 Uncensored observations of Example 13.1. Survival times

t

i

: (5, 8, 12, 18, 24, 24, 24, 35, 42, 50).

Table 13.2 Survival times

.

Appendix

Table T1.1 Heights (cm), Weights (kg) and BMI of 20

ten

‐year‐old boys.

Table T1.2 Heights (cm), Weights (kg) and BMI's of 20

ten

‐year‐old girls.

Table T1.3 Heights (cm), Weights (kg) and BMI's of 20

sixteen

year old boys.

Table T1.4 Heights (cm), Weights (kg) and BMI's of 20

sixteen

‐year‐old girls.

Table T1.5 Immunization Coverage (percentage) among 1‐year olds in 2004 in the countries of the world.

Table T4.1 US Population in 2006 in millions.

Table T4.2 Number of physicians in the U.S. per 10,000 civilian population.

Table T4.3 Lowest per capita healthcare expenditures in 2006 at the average exchange rate (US$); 150 countries.

Table T5 Expenditure on health in 2006. Averages (top) and standard deviations (bottom).

Table T10.1 (a) Physical and diagnostic measurements of

n

 = 20 adults.

Table T10.1 (b) Variances and Covariances of the measurements.

Table T10.1 (c) Correlations of the measurements.

Table T10.1 (d) Corrected Sums of squares (SS) and Sums of Products (SP) of the measurements.

Table T10.2 Weight and LDL of two groups of adults (

n

1

 = 9,

n

2

 = 11); first group followed by the second.

Table T10.3 Percent of the overweight (includes obesity, BMI 

>

 25) 20–74 years old persons.

Table T10.4 Personal Healthcare Expenditure.

Table T11.1 Age and weight of three groups of size

m

=10 adults each. (for balanced one‐way).

Table T11.2 Systolic blood pressures (mmHg) of the three groups of the adults in Table 11 before and after an exercise program; balanced case.

Table T11.3 Age (years) and weight (lbs) of

n

 = 30 adults in three age groups of sizes

n

1

 = 9,

n

2

 = 13 and

n

3

 = 8.

Table T11.4 Systolic Blood Pressures (mmHg) for the adults in Table T11.3 before and after an exercise program.

List of Illustrations

Chapter 01

Figure 1.1 Stem and leaf display of the heights of the twenty boys. Leaf unit = 1.0. The median class has (6) observations. The cumulative number of observations below and above the median class are (2, 4, 9) and (5, 2).

Figure 1.2 Box and whiskers plot of the heights of boys in Table 1.1, obtained from Minitab. The middle line of the box is the median Q

2

. The bottom and top lines are the first and third quartiles Q

1

and Q

3

. The tips of the vertical line, whiskers, are the upper and lower limits

and

.

Figure 1.3 Histogram of the distribution of the heights of the boys in Table 1.3 obtained from Minitab.

Figure 1.4 Plot of the weights of the twenty‐year‐old boys on their heights from the observations in Table 1.1.

Chapter 04

Figure 4.1 Probabilities under the standard normal distribution; (

) percent of the area is between

and Z

α/2

, and (α/2) percent above Z

α/2

and below

.

Figure 4.2 Normal distribution fitted to the heights in Table 1.1 of the 20 twenty‐year old boys.

Figure 4.3 Normal probability plot of the heights of the twenty boys in Table 1.1.

Figure 4.4 Number of physicians per 10,000 civilian population in the U.S. in 2006 as presented in Table T4.2.

Figure 4.5 Lowest per capita healthcare expenditure at the average exchange rate (US$); 135 countries; 2006.

Chapter 07

Figure 7.1 Probabilities of Type I and II errors for the right‐sided alternative.

Figure 7.2 Power of the test for the mean with the right‐sided alternative when the standard deviation is known.

Figure 7.3 Power of the test for the mean with the left‐sided alternative when the standard deviation is known.

Figure 7.4 Power for the test for the mean with the two‐sided alternative.

Chapter 10

Figure 10.1 Regression of LDL on weight.

Figure 10.2 Expenditure for all persons in the United States over the nine periods from 1960 to 2010.

Figure 10.3 Logistic regression for the increase of LDL with weight.

Chapter 13

Figure 13.1 Survival times t

i

 : (5, 8, 12, 18, 24, 24, 24, 35, 42, 50) of Example 13.1.

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

v

iv

xv

xvi

xvii

xviii

xix

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

Statistical Methodologies with Medical Applications

 

 

Poduri S.R.S. Rao

Professor of StatisticsUniversity of RochesterRochester, New York, USA

 

 

 

 

 

 

 

 

 

This edition first published 2017© 2017 John Wiley & Sons, Ltd

Registered OfficeJohn Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging‐in‐Publication Data

Names: Rao, Poduri S.R.S., author.Title: Statistical methodologies with medical applications / Poduri S.R.S. Rao.Description: Chichester, West Sussex, United Kingdom ; Hoboken : John Wiley & Sons Inc., 2016. | Includes bibliographical references and index.Identifiers: LCCN 2016022669| ISBN 9781119258490 (cloth) | ISBN 9781119258483 (Adobe PDF) | ISBN 9781119258520 (epub)Subjects: | MESH: Statistics as TopicClassification: LCC RA409 | NLM WA 950 | DDC 610.2/1–dc23

LC record available at https://lccn.loc.gov/2016022669

A catalogue record for this book is available from the British Library.

Cover Image: Gun2becontinued/Gettyimages

 

 

 

 

 

To my grandchildrenAsha, Sita,Maya and Wyatt

Topics for illustrations, examples and exercises

Heights, weights and BMI (Body Mass Index) of sixteen and twenty‐year‐old boys from growth charts

Immunization coverage of one‐year‐olds: Measles, DTP3 and HEP B3 from WHO reports

Medical insurance for children

Sudden Infant Death Syndrome (SIDS)

Population growth rates and fertility

Age, family size, income and health insurance

Healthcare expenditure in Africa, Asia and Europe

Vaccination for flu for different age groups

Emergency department visits for cold symptoms, injuries and other reasons.

Overweight and obesity

Trends of adult obesity

BMI and mortality

Smoking, heart disease and cancer risk

Air pollution and cancer risk

Hypertension, systolic and diastolic blood pressures (SBP, DBP) of males and females.

Cholesterol levels: LDL and HDL

Effects of overweight on LDL

Low‐dose aspirin and reduction of certain types of cancer

Celiac disease and the benefits of gluten‐free diet

Statins and the reduction of LDL

Exercise and its benefits for blood pressure levels

Weight loss with diets of combinations of low and high‐levels of fatty acids and protein

Medical rehabilitation of stroke patients

Functional independence measures of stroke patients from medical rehabilitation

Sources

: Reports of WHO, CDC, U.S. Health Statistics;

Journal of the American Medical Association

(

JAMA

);

New England Journal of Medicine

(

NEJM

),

Lancet

and other published literature.

Preface

Statistical analysis, evaluation and inference are essential for every type of medical study and clinical experiment. Physicians and medical clinics and laboratories routinely record the blood pressures, cholesterol levels and other relevant diagnostic measurements of patients. Clinical experiments evaluate and compare the effects of medical treatments and procedures. Medical journals report the research findings on the relative risks and odds ratios related to hypertension, abnormal cholesterol levels, obesity, harmful effects of smoking habits and excessive alcohol consumption and similar topics.

Estimation of the means, standard deviations, proportions, odds ratios, relative risks and related statistical measures of health‐related characteristics are of importance for the above types of medical studies. Evaluation of the errors of estimation, ascertaining the confidence limits for the population characteristics of interest, tests of hypotheses and statistical inference, and Chisquare tests for independence and association of categorical variables are important aspects of many medical studies and clinical experiments. Statistical inference is employed, for instance, to assess the relationship between obesity and hypertension and the association between air pollution and bronchial problems. A variety of similar problems require statistical investigations and inference. Regression analysis is widely used to determine the relationship of clinical outcomes and physical attributes. In several clinical investigations, correlations between diagnostic observations are examined to search for the causal factors. Analysis of Variance and Covariance procedures are extensively employed to examine the differences between the effects of medical treatments. All the above types of statistical methods, procedures and techniques required for medical studies, research and evaluations are presented in the following chapters. Topics such as the Meta‐analysis, Survival Analysis and Hazard Ratios, and nonparametric statistics are also included.

Following the descriptive statistical measures in the first chapter, definitions of probability, odds ratios and relative risk appear in Chapters 2 and 3. Binomial, normal, Chisquare and related probability distributions essential for the statistical methods and applications are presented in Chapter 4. Estimation of the means, variances, proportions and percentages, odds ratios and relative risks, Standard Errors (S.E.) of the estimators and confidence intervals appear in Chapters 5 and 6. Tests of hypotheses of means, proportions and variances, p‐values, power of a test, sample size required for a specified power are the topics for Chapters 7 and 8. The Chisquare tests for goodness of fit and independence are presented in Chapter 9. Linear, multiple and logistic regressions and correlation are the topics for Chapter 10. Chapter 11 presents the Analysis of Variance (ANOVA) and Covariance procedures, Randomized bocks, Latin square designs, fixed and random effects models, and two‐way cross‐classification with and without interaction. Meta‐analysis and Survival Analysis in Chapters 12 and 13 are followed by the nonparametric statistics in Chapter 14. The final chapter contains topics in ANOVA and tests of hypotheses including the Simultaneous Confidence Intervals and Bootstrap Confidence Intervals.

Examples, illustrations and exercises with solutions are presented in each chapter. They are constructed from the observations of practical situations, research studies appearing in The New England Journal of Medicine (NEJM), Journal of the American Medical Association (JAMA), Lancet and other medical journals, and the summaries presented in the Health Statistics of the Center for Disease Control (CDC) in the United States and the World Health Organization (WHO). They are related to a variety of medical topics of general interest including the following: (a) heights, weights and Body Mass Index (BMI) of ten‐to‐twenty‐year‐old boys and girls; (b) immunization of children; (c) overweight, obesity, hypertension and high cholesterol levels of adults; (d) benefits of fat‐free and gluten‐free diets and exercise, and (e) healthcare expenditures and medical insurance.

BMI is the ratio of the weight in kilograms to the square of the height in meters. A person is considered to be of normal weight if the BMI is 18.5–24, overweight if it is 25–29, and obese if it is 30 or more. For the blood cholesterol levels of adults, LDL less than 100 mg/dL and HDL higher than 40 mg/dL are considered optimal. Systolic and diastolic blood pressures, SBP and DBP of 120/80 mmHg are considered desirable. Illustrations and examples and exercises throughout the chapters are related to these medical measurements and other health‐related topics. Readily available software programs in Excel, Minitab and R are utilized for the solutions of the illustrations, examples and exercises.

The various topics in these chapters are presented at the level of comprehension of the students pursuing statistics, biostatistics, medicine, biological, physical and natural sciences and epidemiological studies. Each topic is illustrated through examples. More than one hundred exercises with solutions are included. This book can be recommended for a one‐semester or two‐quarter course for the above types of students, and also for self‐study. One or two semesters of training in the principles and applications of statistical methods provides adequate preparation to pursue the different topics. The various statistical methods for medical studies presented in this manuscript can also be of interest to clinicians, physicians, and medical students and residents.

I would like to thank the editor, Ms. Kathryn Sharples, for her interest in this project. Thanks to Charles Heckler, Kevin Rader and Nicholas Zaino for their expert reviews of the manuscript. Thanks also to Sarah Briscoe, Isabelle Weir and Patricia Digiorgio for their assistance in assembling the manuscript on the word processor. Special thanks to my wife and daughter, Drs. K.R. Poduri, MD and Ann Hug Poduri, MD, MPH for sharing their medical expertise in selecting the various topics and illustrations throughout the chapters.

Poduri S.R.S. Rao

Professor of StatisticsUniversity of Rochester

List of abbreviations

WHO:

World Health Organization

CDC:

Center for Disease Control

LDL:

Low Density Lipoprotein

HDL:

High Density Lipoprotein

LDL and HDL

are measures of cholesterol levels in units of milligrams for Deciliter (mg/dl)

SBP :

Systolic Blood Pressure

DBP:

Diastolic Blood Pressure

SBP and DBP

are measures of pressure in the blood vessels in units of millimeters of mercury (mmHg)

BMI:

Body Mass Index

1Statistical measures

1.1 Introduction

Medical professionals, hospitals and healthcare centers record heights, weights and other relevant physical measurements of patients along with their blood pressures cholesterol levels and similar diagnostic measurements. National organizations such as the Center for Disease Control (CDC) in the United States, the World Health Organization (WHO) and several national and international organizations record and analyze various aspects of the healthcare status of the citizens of all age groups. Epidemiological studies and surveys collect and analyze health‐related information of the people around the globe. Clinical trials and experiments are conducted for the development of effective and improved medical treatments.

Statistical measures are utilized to analyze the various diagnostic measurements as well as the outcomes of clinical experiments. The mean, mode and median described in the following sections locate the centers of the distributions of the above types of observations. The variance, standard deviation (S.D.) and the related coefficient of variation (C.V.) are the measures of dispersion of a set of observations. The quartiles, deciles and percentiles divide the data respectively into four, ten and one hundred equal parts. The skewness coefficient exhibits the departure of the data from its symmetry, and the kurtosis coefficient its peakedness. The measurements on the heights, weights and Body Mass Indexes (BMIs) of a sample of twenty‐year‐old boys obtained from the Chart Tables of the CDC (2008) are presented in Table 1.1. These measurements for the ten and sixteen‐ year old boys and girls are presented in Appendix Tables T1.1–T1.4.

Table 1.1 Heights (cm), weights (kg) and BMIs of twenty‐year old boys.

Height

Weight

BMI

162

54

20.58

163

55

20.70

167

58

20.80

168

59

20.90

170

60

20.76

172

62

20.96

172

63

21.30

173

66

22.05

174

68

22.46

176

72

23.24

176

75

24.21

176

75

24.21

177

78

24.90

178

80

25.25

178

82

25.88

180

84

25.93

184

86

25.40

184

88

25.99

186

95

27.46

188

102

28.86

BMI = Weight/(Height)2.

1.2 Mean, mode and median

The diagnostic measurements of a sample of n individuals can be represented by . Their mean or average is

(1.1)

For the heights of the boys in Table 1.1, the mean becomes . Similarly, the mean of their weights is 73.1 kg. For the BMI, which is (Weight/Height2), the mean becomes 23.59.

The mode is the observation occurring more frequently than the remaining observations. For the heights of the boys, it is 176 cm. The median is the middle value of the observations. If the number of observations n is odd, it is the ()th observation. If n is an even number, it is the average of the (n/2)th and the next observation. Both the mode and median of the twenty heights of the boys in Table 1.1 are equal to 176 cm, which is slightly larger than the mean of 175.2 cm.

The mean, mode and median locate the center of the observations. The mean is also known as the first moment m1 of the observations. For the healthcare policies, for instance, it is of importance to examine the average amount of the medical expenditures incurred by families of different sizes or specified ranges of income. At the same time, useful information is provided by the median and modal values of their expenditures. Figure 1.1 is the Stem and Leaf display of the heights in Table 1.1. The cumulative number of observations below and above the median appear in the first column. The second and third columns are the stems, with the attached leaves.

2

16 23

4

16 78

9

17 02234

(6)

17 666788

5

18 044

2

18 68

Figure 1.1Stem and leaf display of the heights of the twenty boys. Leaf unit = 1.0. The median class has (6) observations. The cumulative number of observations below and above the median class are (2, 4, 9) and (5, 2).

1.3 Variance and standard deviation

The variance is a measure of the dispersion among the observations, and it is given by

(1.2)

The divisor (n – 1) in this expression represents the degrees of freedom (d.f.). If (n – 1) of the observations and the sum or mean of the n observations are known, the remaining observation is automatically determined. The expression in (1.2) can also be expressed as , which is the average of the squared differences of the n(n – 1) pairs of the observations. The standard deviation (S.D.) is given by s, the positive square root of the variance. The second central moment of the observations is the same as . For the twenty heights of boys in Table 1.1, and . The standard deviation becomes .

The unit of measurement is attached to both the mean and standard deviation; kg for weight and cm for height. It is kg/(meter‐squared) for the BMI. The coefficient of variation (C.V.), is the ratio of the standard deviation to the mean and is devoid of the unit of measurement of the observations. The mean, variance, standard deviation and C.V. for the above three characteristics for the 20 boys in Table 1.1 are presented Table 1.2.

Table 1.2 Summary figures for the heights, weights and BMIs of the 20 boys in Table 1.1.

Height

Weight

BMI

Mean

175.2

73.1

23.59

Variance (

s

2

)

51.33

188.09

6.53

m

2

48.76

178.69

6.21

S.D.(s)

7.16

13.71

2.56

C.V.(%)

4.09

18.76

10.85

m

3

–18.86

913.69

5.24

K

1

–0.055

0.383

0.341

m

4

5690

70901

75.93

K

2

2.39

2.22

1.97

1.4 Quartiles, deciles and percentiles

Any set of data can be arranged in an ascending order and divided into four parts with one quarter of the observations in each part. Twenty‐five percent of the observations are below the first quartile Q1 and 75 percent above. Similarly, half the number of observations are below the median, which is the second quartile Q2, and half above. Three‐quarters of the observations are below the third quartile Q3 and one‐fourth above. As seen in Section 1.2, the median of the heights in Table 1.1 is 176 cm. The average of the fifth and sixth observations is 171 cm, which is the first quartile. Similarly, the third quartile is 179 cm, which is the average of the fifteenth and sixteenth observations. The box and whiskers plot in Figure 1.2 presents the positions of these quartiles.

Figure 1.2Box and whiskers plot of the heights of boys in Table 1.1, obtained from Minitab. The middle line of the box is the median Q2. The bottom and top lines are the first and third quartiles Q1 and Q3. The tips of the vertical line, whiskers, are the upper and lower limits and .

Ten percent of the observations are below the first decile and 90 percent above. Ninety percent of the observations are below the ninth decile and 10 percent above. One percent of the observations are below the first percentile and 99 percent above. Similarly, 99 percent of the observations are below the ninety‐ninth percentile and 1 percent above.

1.5 Skewness and kurtosis

Physical or diagnostic measurements , of a group of individuals may not be symmetrically distributed about their mean. The third central moment, will be zero if the observations are symmetrically distributed about the mean. It will be positive if the observations are skewed to the right and negative if they are skewed to the left. For the symmetrically distributed observations, the third, fifth, seventh and all the odd central moments will be zero. The Pearsonian coefficient of skewness is given by , which does not depend on the unit of measurement of the observations unlike m2 and m3. For any set of observations symmetrically distributed about its mean, and hence . For the positively skewed observations, m3 and K1 are positive. For the negatively skewed observations, they are negative. For the heights of the boys in Table 1.1, and . These heights are slightly negatively skewed.

The fourth central moment of the observations, , becomes large as the distribution of the observations becomes peaked and small as it becomes flat. The Pearsonian coefficient of kurtosis is given by , which does not depend on the unit of measurement. For the normal distribution, which is extensively employed for statistical analysis and inference, and . For the observations on all the three characteristics in Table 1.1, the fourth moments are large, as seen from Table 1.2, but K2 is smaller than three.

1.6 Frequency distributions

Any set of clinical measurements or medical observations can be classified into a convenient number of groups and presented as the frequency distribution. The CDC, National Center for Health Statistics (NCHS) and other organizations present various health‐related measurements of the U.S. population in the form of summary tables. These measurements are obtained from periodic or continual surveys of the population in the country and also from the administrative medical records of the population. They are arranged according to age groups, education, income levels, male‐female classification and other characteristics of interest. Similar summary figures are presented by the WHO and healthcare organizations throughout the world. For the sake of illustration, the twenty heights of the boys in Table 1.1 are arranged in Table 1.3 into seven classes of the same width of five, and displayed as the histogram in Figure 1.3.

Table 1.3 Frequency distribution of the heights of the 20 boys in Table 1.1.

Class

Mid‐

x

i

Frequency (

n

i

)

Relative frequency (

f

i

)

157.5–162.5

160

1

0.05

162.5–167.5

165

2

0.10

167.5–172.5

170

4

0.20

172.5–177.5

175

6

0.30

177.5–182.5

180

3

0.15

182.5–187.5

185

3

0.15

187.5–192.5

190

1

0.05

n

 = 20

Σ

f

i

 = 1

Figure 1.3Histogram of the distribution of the heights of the boys in Table 1.3 obtained from Minitab.

In general, the n observations can be divided into k classes with ni observations in the ith class, . The mid‐values of the classes can be denoted by (x1, x2, …, xk).

With the above notation, the mean of the n observations becomes

(1.3)

where is the relative frequency in the ith class and . From the above table and (1.3), the mean of the heights is

Since the 20 observations are grouped, this mean differs slightly from the actual value of 175.2 cm.

For the grouped data, the second moment becomes

(1.4)

Now, . From (1.4), for the heights of the boys, and , which differ from the actual values 48.76 and 51.33 as a result of the grouping. From the grouped data, the third and fourth central moments are obtained from and . In general, the rth central moment for the grouped data is given by .

1.7 Covariance and correlation

The heights and weights of the 20 boys in Table 1.1 can be denoted by . With the subscripts (x, y) for these characteristics, as presented in Table 1.2, the standard deviations of these characteristics are and . Their covariance is given by

(1.5)

It is the sum of the cross‐products of the deviations of (xi, yi) from their means divided by (n – 1). It can also be expressed as The sample correlation coefficient of (x, y) is

(1.6)

It will be positive as y increases with x and negative if it decreases, and vice versa. In general, the covariance can be positive or negative. It can range from a very small negative value to a very large positive number, and the units of measurements of both x and y are attached to it. The correlation coefficient, however, ranges from –1 to , and it is devoid of the units of measurements of the two characteristics. If x increases as y increases, or x decreases as y decreases, their covariance and correlation will be positive; negative otherwise. If x and y are not related, sxy and r will be zero. For the heights and weights of the twenty‐year‐old boys in Table 1.1, from (1.5), (1.6) and Table 1.2, and . In this case, these two characteristics are highly positively correlated as expected. Figure 1.4 displays the relationship of the weights and heights of the twenty boys in Table 1.1.

Figure 1.4Plot of the weights of the twenty‐year‐old boys on their heights from the observations in Table 1.1.

1.8 Joint frequency distribution

When the number of observations on two variables (x, y) is not small, they can be grouped into the joint frequency distribution. National and international organizations present the health‐related characteristics in this form. For the sake of illustration, age (x) and weight (y) of a sample of adults classified into rows and columns are presented in Table 1.4.

Table 1.4 Age (x) and weight (y) of n = 200 adults.

Weight (lbs)

140–150

150–160

160–170

Total

30–40

20

30

20

70

Age (years)

40–50

15

35

40

90

50–60

5

15

20

40

Total

40

80

80

200

With the first and second subscripts i = (1, 2, …,r) and representing the rows and columns respectively, the ith row and jth column consists of nij adults. The total number of observations in the ith row and jth column respectively are . The overall sample size becomes . The row and column totals are the marginal totals. They provide the frequency distributions of the age and weight respectively. The means, variances and standard deviations for the row and column classifications are obtained from these distributions as described in Section 1.6. With the mid‐values (x1, x2, …, xr) of the row classification and (y1, y2, …, yc) of the column classification, the covariance of (x, y) is obtained from

(1.7)

The correlation coefficient is found from .

From Table 1.4, the mean, variance and standard deviation of the age are

and .

Similarly, for the weight, , and . From (1.7),

The correlation of age and weight now becomes , which is not very high.

1.9 Linear transformation of the observations

For computations, it may become convenient to transform the data first. For instance, we may subtract 170 from each of the heights in Table 1.1, and divide the result by 10. The new observations now become . We may also first divide each height by 100 and then subtract 5. Now, . In either case, the new observations take the form of , where (a, b) are positive or negative constants. The mean of the transformed observations becomes

(1.8)

Their variance becomes

(1.9)

With the above type of transformation, computations for ū and become simple. Now, is obtained from and from . Note that adding or subtracting a constant displaces the mean, but it has no effect on the variance. Multiplying xi by the constant a results in multiplying its variance by a2, and the standard deviation by a.

As found earlier, the average of the heights of the twenty boys is 175.2 cm. To convert xi in cm to yi in inches, . Now, the average height is inches or close to 5 feet 9 inches. The variance becomes and inches.

1.10 Linear combinations of two sets of observations

Consider the gains in weights of a sample of n adults on two occasions. The total , difference , a weighted combination , with specified constants (a, b) may be of interest. The mean and variance of ti are

(1.10)

and

(1.11)

where are the variances and sxy the covariance of x and y. The standard deviations of x and y are sx and sy, and the sample correlation is . The standard deviation st of ti is the positive square root of V(ti).

Similarly, the mean, variance, and standard deviation of di are , and the standard deviation sd of di is the positive square root of V(di). If , where (a, b, c) are constants, and . The standard deviation su of ui is obtained from the square root of V(ui).

For an illustration, consider the gains in weights (lbs) (xi, yi) of adults on two occasions: (5, 10), (10, 5), (10, 10), (5, –5), and (5, 10); the fourth candidate lost 5 lbs.

From these observations, the mean, variance and standard deviation of xi are (7, 7.5, 2.74). Corresponding figures for yi are (6, 42.5, 6.52). The covariance and correlation are and . The mean, variance and standard deviation of ti and di respectively become (13, 57.5, 7.58) and (–1, 42.5, 6.52). With , and , the mean, variance, and standard deviation of ui become (1.25, 57.78, 5.08).

Exercises

1.1.

Find the summary figures for the 20 ten‐year old boys and girls in

Tables T1.1

and

T1.2

.

1.2.

(a) Find the means and standard deviations of the three characteristics for the sixteen‐year‐old boys and girls in

Tables T1.3

and

T1.4

. (b) Find the means and S.D.s for the heights with grouping.

1.3.

2Probability, random variable, expected value and variance

2.1 Introduction

The basic principles of probability are essential for the development of statistical theory, inference and applications. Probabilities of mutually exclusive, independent and dependent events are described in the following sections. Bayes' theorem is illustrated through an example. General definitions of a probability distribution, expected value, variance and moments of a random variable are presented.

2.2 Events and probabilities

Clinically examining the difference between the effects of two or more medical treatments and evaluating the benefits of different diets for weight reduction or hypertension control are two illustrations of experiments. The outcome of an experiment can be a success or failure, Event A and its complement Event B. For instance, an exercise program may increase the HDL of a person by less than 10, or by more than 10 mg/dL. These two events can be denoted by A and its complement B. In a random sample of 100 persons participating in the exercise program, HDL may increase by less than 10 mg/dL for 40 persons and more than 10 mg/dL for the remaining. Thus (4/10)th or 40 percent of the outcomes are in favor of the event A and (6/10