Applied Linear Regression - Sanford Weisberg - ebook

Applied Linear Regression ebook

Sanford Weisberg

509,99 zł


Praise for the Third Edition "...this is an excellent book which could easily be used as acourse text..." --International Statistical Institute The Fourth Edition of Applied LinearRegression provides a thorough update of the basic theoryand methodology of linear regression modeling. Demonstrating thepractical applications of linear regression analysis techniques,the Fourth Edition uses interesting, real-worldexercises and examples. Stressing central concepts such as model building, understandingparameters, assessing fit and reliability, and drawing conclusions,the new edition illustrates how to develop estimation, confidence,and testing procedures primarily through the use of least squaresregression. While maintaining the accessible appeal of eachprevious edition,Applied Linear Regression, FourthEdition features: * Graphical methods stressed in the initial exploratory phase,analysis phase, and summarization phase of an analysis * In-depth coverage of parameter estimates in both simple andcomplex models, transformations, and regression diagnostics * Newly added material on topics including testing, ANOVA, andvariance assumptions * Updated methodology, such as bootstrapping, cross-validationbinomial and Poisson regression, and modern model selectionmethods Applied Linear Regression, Fourth Edition is anexcellent textbook for upper-undergraduate and graduate-levelstudents, as well as an appropriate reference guide forpractitioners and applied statisticians in engineering, businessadministration, economics, and the social sciences.

Ebooka przeczytasz w aplikacjach Legimi na:

czytnikach certyfikowanych
przez Legimi

Liczba stron: 573

Table of Contents

Wiley Series in Probability and Statistics

Title page

Copyright page


Preface to the Fourth Edition


CHAPTER 1: Scatterplots and Regression

1.1 Scatterplots

1.2 Mean Functions

1.3 Variance Functions

1.4 Summary Graph

1.5 Tools for Looking at Scatterplots

1.6 Scatterplot Matrices

CHAPTER 2: Simple Linear Regression

2.1 Ordinary Least Squares Estimation

2.2 Least Squares Criterion

2.3 Estimating the Variance σ2

2.4 Properties of Least Squares Estimates

2.5 Estimated Variances

2.6 Confidence Intervals and t-Tests

2.7 The Coefficient of Determination, R2

2.8 The Residuals

CHAPTER 3: Multiple Regression

3.1 Adding a Regressor to a Simple Linear Regression Model

3.2 The Multiple Linear Regression Model

3.3 Predictors and Regressors

3.4 Ordinary Least Squares

3.5 Predictions, Fitted Values, and Linear Combinations

CHAPTER 4: Interpretation of Main Effects

4.1 Understanding Parameter Estimates

4.2 Dropping Regressors

4.3 Experimentation versus Observation

4.4 Sampling from a Normal Population

4.5 More on R2

CHAPTER 5: Complex Regressors

5.1 Factors

5.2 Many Factors

5.3 Polynomial Regression

5.4 Splines

5.5 Principal Components

5.6 Missing Data

CHAPTER 6: Testing and Analysis of Variance

6.1 F-Tests

6.2 The Analysis of Variance

6.3 Comparisons of Means

6.4 Power and Non-Null Distributions

6.5 Wald Tests

6.6 Interpreting Tests

CHAPTER 7: Variances

7.1 Weighted Least Squares

7.2 Misspecified Variances

7.3 General Correlation Structures

7.4 Mixed Models

7.5 Variance Stabilizing Transformations

7.6 The Delta Method

7.7 The Bootstrap

CHAPTER 8: Transformations

8.1 Transformation Basics

8.2 A General Approach to Transformations

8.3 Transforming the Response

8.4 Transformations of Nonpositive Variables

8.5 Additive Models

CHAPTER 9: Regression Diagnostics

9.1 The Residuals

9.2 Testing for Curvature

9.3 Nonconstant Variance

9.4 Outliers

9.5 Influence of Cases

9.6 Normality Assumption

CHAPTER 10: Variable Selection

10.1 Variable Selection and Parameter Assessment

10.2 Variable Selection for Discovery

10.3 Model Selection for Prediction

CHAPTER 11: Nonlinear Regression

11.1 Estimation for Nonlinear Mean Functions

11.2 Inference Assuming Large Samples

11.3 Starting Values

11.4 Bootstrap Inference

11.5 Further Reading

CHAPTER 12: Binomial and Poisson Regression

12.1 Distributions for Counted Data

12.2 Regression Models For Counts

12.3 Poisson Regression

12.4 Transferring What You Know about Linear Models

12.5 Generalized Linear Models


A.1 Website

A.2 Means, Variances, Covariances, and Correlations

A.3 Least Squares for Simple Regression

A.4 Means and Variances of Least Squares Estimates

A.5 Estimating E(Y|X) Using a Smoother

A.6 A Brief Introduction to Matrices and Vectors

A.7 Random Vectors

A.8 Least Squares Using Matrices

A.9 The QR Factorization

A.10 Spectral Decomposition

A.11 Maximum Likelihood Estimates

A.12 The Box–Cox Method for Transformations

A.13 Case Deletion in Linear Regression


Author Index

Subject Index

Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved

Published by John Wiley & Sons, Inc., Hoboken, New Jersey

Published simultaneously in Canada

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at

Library of Congress Cataloging-in-Publication Data:

Weisberg, Sanford, 1947–

Applied linear regression / Sanford Weisberg, School of Statistics, University of Minnesota, Minneapolis, MN.—Fourth edition.

pages cm

Includes bibliographical references and index.

ISBN 978-1-118-38608-8 (hardback)

1. Regression analysis. I. Title.

QA278.2.W44 2014



To Carol, Stephanie,


the memory of my parents

Preface to the Fourth Edition

This is a textbook to help you learn about applied linear regression. The book has been in print for more than 30 years, in a period of rapid change in statistical methodology and particularly in statistical computing. This fourth edition is a thorough rewriting of the book to reflect the needs of current students. As in previous editions, the overriding theme of the book is to help you learn to do data analysis using linear regression. Linear regression is a excellent model for learning about data analysis, both because it is important on its own and it provides a framework for understanding other methods of analysis.

This edition of the book includes the majority of the topics in previous editions, although much of the material has been rearranged. New methodology and examples have been added throughout.

Even more emphasis is placed on graphics. The first two editions stressed graphics for diagnostic methods (Chapter 9) and the third edition added graphics for understanding data before any analysis is done (Chapter 1). In this edition, effects plots are stressed to summarize the fit of a model.Many applied analyses are based on understanding and interpreting parameters. This edition puts much greater emphasis on parameters, with part of Chapters 2–3 and all of Chapters 4–5 devoted to this important topic.Chapter 6 contains a greatly expanded treatment of testing and model comparison using both likelihood ratio and Wald tests. The usefulness and limitations of testing are stressed.Chapter 7 is about the variance assumption in linear models. The discussion of weighted least squares has been been expanded to cover problems of ecological regressions, sample surveys, and other cases. Alternatives such as the bootstrap and heteroskedasticity corrections have been added or expanded.Diagnostic methods using transformations (Chapter 8) and residuals and related quantities (Chapter 9) that were the heart of the earlier editions have been maintained in this new edition.The discussion of variable selection in Chapter 10 has been updated from the third edition. It is designed to help you understand the key problems in variable selection. In recent years, this topic has morphed into the area of machine learning and the goal of this chapter is to show connections and provide references.As in the third edition, brief introductions to nonlinear regression (Chapter 11) and to logistic regression (Chapter 12) are included, with Poisson regression added in Chapter 12.

Using This Book

The website for this book is

As with previous editions, this book is not tied to any particular computer program. A primer for using the free R package (R Core Team, 2013) for the material covered in the book is available from the website. The primer can also be accessed directly from within R as you are working. An optional published companion book about R is Fox and Weisberg (2011).

All the data files used are available from the website and in an R package called alr4 that you can download for free. Solutions for odd-numbered problems, all using R, are available on the website for the book1. You cannot learn to do data analysis without working problems.

Some advanced topics are introduced to help you recognize when a problem that looks like linear regression is actually a little different. Detailed methodology is not always presented, but references at the same level as this book are presented. The bibliography, also available with clickable links on the book's website, has been greatly expanded and updated.

Mathematical Level

The mathematical level of this book is roughly the same as the level of previous editions. Matrix representation of data is used, particularly in the derivation of the methodology in Chapters 3–4. Derivations are less frequent in later chapters, and so the necessary mathematics is less. Calculus is generally not required, except for an occasional use of a derivative. The discussions requiring calculus can be skipped without much loss.


Thanks are due to Jeff Witmer, Yuhong Yang, Brad Price, and Brad's Stat 5302 students at the University of Minnesota. New examples were provided by April Bleske-Rechek, Tom Burk, and Steve Taff. Work with John Fox over the last few years has greatly influenced my writing.

For help with previous editions, thanks are due to Charles Anderson, Don Pereira, Christopher Bingham, Morton Brown, Cathy Campbell, Dennis Cook, Stephen Fienberg, James Frane, Seymour Geisser, John Hartigan, David Hinkley, Alan Izenman, Soren Johansen, Kenneth Koehler, David Lane, Michael Lavine, Kinley Larntz, Gary Oehlert, Katherine St. Clair, Keija Shan, John Rice, Donald Rubin, Joe Shih, Pete Stewart, Stephen Stigler, Douglas Tiffany, Carol Weisberg, and Howard Weisberg.

Finally, I am grateful to Stephen Quigley at Wiley for asking me to do a new edition. I have been working on versions of this book since 1976, and each new edition has pleased me more that the one before it. I hope it pleases you, too.

Sanford Weisberg

St. Paul, Minnesota

September 2013


1 All solutions are available to instructors using the book in a course; see the website for details.


Scatterplots and Regression

Regression is the study of dependence. It is used to answer interesting questions about how one or more predictors influence a response. Here are a few typical questions that may be answered using regression:

Are daughters taller than their mothers?Does changing class size affect success of students?Can we predict the time of the next eruption of Old Faithful Geyser from the length of the most recent eruption?Do changes in diet result in changes in cholesterol level, and if so, do the results depend on other characteristics such as age, sex, and amount of exercise?Do countries with higher per person income have lower birth rates than countries with lower income?Are highway design characteristics associated with highway accident rates? Can accident rates be lowered by changing design characteristics?Is water usage increasing over time?Do conservation easements on agricultural property lower land value?

In most of this book, we study the important instance of regression methodology called linear regression. This method is the most commonly used in regression, and virtually all other regression methods build upon an understanding of how linear regression works.

As with most statistical analyses, the goal of regression is to summarize observed data as simply, usefully, and elegantly as possible. A theory may be available in some problems that specifies how the response varies as the values of the predictors change. If theory is lacking, we may need to use the data to help us decide on how to proceed. In either case, an essential first step in regression analysis is to draw appropriate graphs of the data.

We begin in this chapter with the fundamental graphical tools for studying dependence. In regression problems with one predictor and one response, the scatterplot of the response versus the predictor is the starting point for regression analysis. In problems with many predictors, several simple graphs will be required at the beginning of an analysis. A scatterplot matrix is a convenient way to organize looking at many scatterplots at once. We will look at several examples to introduce the main tools for looking at scatterplots and scatterplot matrices and extracting information from them. We will also introduce notation that will be used throughout the book.

1.1 Scatterplots

We begin with a regression problem with one predictor, which we will generically call X, and one response variable, which we will call Y.1 Data consist of values (xi, yi), i = 1, … , n, of (X, Y) observed on each of n units or cases. In any particular problem, both X and Y will have other names that will be displayed in this book using typewriter font, such as temperature or concentration, that are more descriptive of the data that are to be analyzed. The goal of regression is to understand how the values of Y change as X is varied over its range of possible values. A first look at how Y changes as X is varied is available from a scatterplot.

Inheritance of Height

One of the first uses of regression was to study inheritance of traits from generation to generation. During the period 1893–1898, Karl Pearson (1857–1936) organized the collection of n = 1375 heights of mothers in the United Kingdom under the age of 65 and one of their adult daughters over the age of 18. Pearson and Lee (1903) published the data, and we shall use these data to examine inheritance. The data are given in the data file Heights.2

Our interest is in inheritance from the mother to the daughter, so we view the mother's height, called mheight, as the predictor variable and the daughter's height, dheight, as the response variable. Do taller mothers tend to have taller daughters? Do shorter mothers tend to have shorter daughters?

A scatterplot of dheight versus mheight helps us answer these questions. The scatterplot is a graph of each of the n points with the response dheight on the vertical axis and predictor mheight on the horizontal axis. This plot is shown in Figure 1.1a. For regression problems with one predictor X and a response Y, we call the scatterplot of Y versus X a summary graph.

Figure 1.1 Scatterplot of mothers' and daughters' heights in the Pearson and Lee data. The original data have been jittered to avoid overplotting in (a). Plot (b) shows the original data, so each point in the plot refers to one or more mother–daughter pairs.

Here are some important characteristics of this scatterplot:

1. The range of heights appears to be about the same for mothers and for daughters. Because of this, we draw the plot so that the lengths of the horizontal and vertical axes are the same, and the scales are the same. If all mothers and daughters pairs had exactly the same height, then all the points would fall exactly on a 45°-line. Some computer programs for drawing a scatterplot are not smart enough to figure out that the lengths of the axes should be the same, so you might need to resize the plot or to draw it several times.
2. The original data that went into this scatterplot were rounded so each of the heights was given to the nearest inch. The original data are plotted in Figure 1.1b. This plot exhibits substantial overplotting with many points at exactly the same location. This is undesirable because one point on the plot can correspond to many cases. The easiest solution is to use jittering, in which a small uniform random number is added to each value. In Figure 1.1a, we used a uniform random number on the range from −0.5 to +0.5, so the jittered values would round to the numbers given in the original source.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!