509,99 zł
Praise for the Third Edition "...this is an excellent book which could easily be used as acourse text..." --International Statistical Institute The Fourth Edition of Applied LinearRegression provides a thorough update of the basic theoryand methodology of linear regression modeling. Demonstrating thepractical applications of linear regression analysis techniques,the Fourth Edition uses interesting, real-worldexercises and examples. Stressing central concepts such as model building, understandingparameters, assessing fit and reliability, and drawing conclusions,the new edition illustrates how to develop estimation, confidence,and testing procedures primarily through the use of least squaresregression. While maintaining the accessible appeal of eachprevious edition,Applied Linear Regression, FourthEdition features: * Graphical methods stressed in the initial exploratory phase,analysis phase, and summarization phase of an analysis * In-depth coverage of parameter estimates in both simple andcomplex models, transformations, and regression diagnostics * Newly added material on topics including testing, ANOVA, andvariance assumptions * Updated methodology, such as bootstrapping, cross-validationbinomial and Poisson regression, and modern model selectionmethods Applied Linear Regression, Fourth Edition is anexcellent textbook for upper-undergraduate and graduate-levelstudents, as well as an appropriate reference guide forpractitioners and applied statisticians in engineering, businessadministration, economics, and the social sciences.
Ebooka przeczytasz w aplikacjach Legimi na:
Liczba stron: 573
Table of Contents
Wiley Series in Probability and Statistics
Title page
Copyright page
Dedication
Preface to the Fourth Edition
Acknowledgments
CHAPTER 1: Scatterplots and Regression
1.1 Scatterplots
1.2 Mean Functions
1.3 Variance Functions
1.4 Summary Graph
1.5 Tools for Looking at Scatterplots
1.6 Scatterplot Matrices
CHAPTER 2: Simple Linear Regression
2.1 Ordinary Least Squares Estimation
2.2 Least Squares Criterion
2.3 Estimating the Variance σ2
2.4 Properties of Least Squares Estimates
2.5 Estimated Variances
2.6 Confidence Intervals and t-Tests
2.7 The Coefficient of Determination, R2
2.8 The Residuals
CHAPTER 3: Multiple Regression
3.1 Adding a Regressor to a Simple Linear Regression Model
3.2 The Multiple Linear Regression Model
3.3 Predictors and Regressors
3.4 Ordinary Least Squares
3.5 Predictions, Fitted Values, and Linear Combinations
CHAPTER 4: Interpretation of Main Effects
4.1 Understanding Parameter Estimates
4.2 Dropping Regressors
4.3 Experimentation versus Observation
4.4 Sampling from a Normal Population
4.5 More on R2
CHAPTER 5: Complex Regressors
5.1 Factors
5.2 Many Factors
5.3 Polynomial Regression
5.4 Splines
5.5 Principal Components
5.6 Missing Data
CHAPTER 6: Testing and Analysis of Variance
6.1 F-Tests
6.2 The Analysis of Variance
6.3 Comparisons of Means
6.4 Power and Non-Null Distributions
6.5 Wald Tests
6.6 Interpreting Tests
CHAPTER 7: Variances
7.1 Weighted Least Squares
7.2 Misspecified Variances
7.3 General Correlation Structures
7.4 Mixed Models
7.5 Variance Stabilizing Transformations
7.6 The Delta Method
7.7 The Bootstrap
CHAPTER 8: Transformations
8.1 Transformation Basics
8.2 A General Approach to Transformations
8.3 Transforming the Response
8.4 Transformations of Nonpositive Variables
8.5 Additive Models
CHAPTER 9: Regression Diagnostics
9.1 The Residuals
9.2 Testing for Curvature
9.3 Nonconstant Variance
9.4 Outliers
9.5 Influence of Cases
9.6 Normality Assumption
CHAPTER 10: Variable Selection
10.1 Variable Selection and Parameter Assessment
10.2 Variable Selection for Discovery
10.3 Model Selection for Prediction
CHAPTER 11: Nonlinear Regression
11.1 Estimation for Nonlinear Mean Functions
11.2 Inference Assuming Large Samples
11.3 Starting Values
11.4 Bootstrap Inference
11.5 Further Reading
CHAPTER 12: Binomial and Poisson Regression
12.1 Distributions for Counted Data
12.2 Regression Models For Counts
12.3 Poisson Regression
12.4 Transferring What You Know about Linear Models
12.5 Generalized Linear Models
Appendix
A.1 Website
A.2 Means, Variances, Covariances, and Correlations
A.3 Least Squares for Simple Regression
A.4 Means and Variances of Least Squares Estimates
A.5 Estimating E(Y|X) Using a Smoother
A.6 A Brief Introduction to Matrices and Vectors
A.7 Random Vectors
A.8 Least Squares Using Matrices
A.9 The QR Factorization
A.10 Spectral Decomposition
A.11 Maximum Likelihood Estimates
A.12 The Box–Cox Method for Transformations
A.13 Case Deletion in Linear Regression
References
Author Index
Subject Index
Copyright © 2014 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Weisberg, Sanford, 1947–
Applied linear regression / Sanford Weisberg, School of Statistics, University of Minnesota, Minneapolis, MN.—Fourth edition.
pages cm
Includes bibliographical references and index.
ISBN 978-1-118-38608-8 (hardback)
1. Regression analysis. I. Title.
QA278.2.W44 2014
519.5′36–dc23
2014026538
To Carol, Stephanie,
and
the memory of my parents
Preface to the Fourth Edition
This is a textbook to help you learn about applied linear regression. The book has been in print for more than 30 years, in a period of rapid change in statistical methodology and particularly in statistical computing. This fourth edition is a thorough rewriting of the book to reflect the needs of current students. As in previous editions, the overriding theme of the book is to help you learn to do data analysis using linear regression. Linear regression is a excellent model for learning about data analysis, both because it is important on its own and it provides a framework for understanding other methods of analysis.
This edition of the book includes the majority of the topics in previous editions, although much of the material has been rearranged. New methodology and examples have been added throughout.
Even more emphasis is placed on graphics. The first two editions stressed graphics for diagnostic methods (Chapter 9) and the third edition added graphics for understanding data before any analysis is done (Chapter 1). In this edition, effects plots are stressed to summarize the fit of a model.Many applied analyses are based on understanding and interpreting parameters. This edition puts much greater emphasis on parameters, with part of Chapters 2–3 and all of Chapters 4–5 devoted to this important topic.Chapter 6 contains a greatly expanded treatment of testing and model comparison using both likelihood ratio and Wald tests. The usefulness and limitations of testing are stressed.Chapter 7 is about the variance assumption in linear models. The discussion of weighted least squares has been been expanded to cover problems of ecological regressions, sample surveys, and other cases. Alternatives such as the bootstrap and heteroskedasticity corrections have been added or expanded.Diagnostic methods using transformations (Chapter 8) and residuals and related quantities (Chapter 9) that were the heart of the earlier editions have been maintained in this new edition.The discussion of variable selection in Chapter 10 has been updated from the third edition. It is designed to help you understand the key problems in variable selection. In recent years, this topic has morphed into the area of machine learning and the goal of this chapter is to show connections and provide references.As in the third edition, brief introductions to nonlinear regression (Chapter 11) and to logistic regression (Chapter 12) are included, with Poisson regression added in Chapter 12.The website for this book is http://z.umn.edu/alr4ed.
As with previous editions, this book is not tied to any particular computer program. A primer for using the free R package (R Core Team, 2013) for the material covered in the book is available from the website. The primer can also be accessed directly from within R as you are working. An optional published companion book about R is Fox and Weisberg (2011).
All the data files used are available from the website and in an R package called alr4 that you can download for free. Solutions for odd-numbered problems, all using R, are available on the website for the book1. You cannot learn to do data analysis without working problems.
Some advanced topics are introduced to help you recognize when a problem that looks like linear regression is actually a little different. Detailed methodology is not always presented, but references at the same level as this book are presented. The bibliography, also available with clickable links on the book's website, has been greatly expanded and updated.
The mathematical level of this book is roughly the same as the level of previous editions. Matrix representation of data is used, particularly in the derivation of the methodology in Chapters 3–4. Derivations are less frequent in later chapters, and so the necessary mathematics is less. Calculus is generally not required, except for an occasional use of a derivative. The discussions requiring calculus can be skipped without much loss.
Thanks are due to Jeff Witmer, Yuhong Yang, Brad Price, and Brad's Stat 5302 students at the University of Minnesota. New examples were provided by April Bleske-Rechek, Tom Burk, and Steve Taff. Work with John Fox over the last few years has greatly influenced my writing.
For help with previous editions, thanks are due to Charles Anderson, Don Pereira, Christopher Bingham, Morton Brown, Cathy Campbell, Dennis Cook, Stephen Fienberg, James Frane, Seymour Geisser, John Hartigan, David Hinkley, Alan Izenman, Soren Johansen, Kenneth Koehler, David Lane, Michael Lavine, Kinley Larntz, Gary Oehlert, Katherine St. Clair, Keija Shan, John Rice, Donald Rubin, Joe Shih, Pete Stewart, Stephen Stigler, Douglas Tiffany, Carol Weisberg, and Howard Weisberg.
Finally, I am grateful to Stephen Quigley at Wiley for asking me to do a new edition. I have been working on versions of this book since 1976, and each new edition has pleased me more that the one before it. I hope it pleases you, too.
Sanford Weisberg
St. Paul, Minnesota
September 2013
Note
1 All solutions are available to instructors using the book in a course; see the website for details.
CHAPTER 1
Scatterplots and Regression
Regression is the study of dependence. It is used to answer interesting questions about how one or more predictors influence a response. Here are a few typical questions that may be answered using regression:
Are daughters taller than their mothers?Does changing class size affect success of students?Can we predict the time of the next eruption of Old Faithful Geyser from the length of the most recent eruption?Do changes in diet result in changes in cholesterol level, and if so, do the results depend on other characteristics such as age, sex, and amount of exercise?Do countries with higher per person income have lower birth rates than countries with lower income?Are highway design characteristics associated with highway accident rates? Can accident rates be lowered by changing design characteristics?Is water usage increasing over time?Do conservation easements on agricultural property lower land value?In most of this book, we study the important instance of regression methodology called linear regression. This method is the most commonly used in regression, and virtually all other regression methods build upon an understanding of how linear regression works.
As with most statistical analyses, the goal of regression is to summarize observed data as simply, usefully, and elegantly as possible. A theory may be available in some problems that specifies how the response varies as the values of the predictors change. If theory is lacking, we may need to use the data to help us decide on how to proceed. In either case, an essential first step in regression analysis is to draw appropriate graphs of the data.
We begin in this chapter with the fundamental graphical tools for studying dependence. In regression problems with one predictor and one response, the scatterplot of the response versus the predictor is the starting point for regression analysis. In problems with many predictors, several simple graphs will be required at the beginning of an analysis. A scatterplot matrix is a convenient way to organize looking at many scatterplots at once. We will look at several examples to introduce the main tools for looking at scatterplots and scatterplot matrices and extracting information from them. We will also introduce notation that will be used throughout the book.
We begin with a regression problem with one predictor, which we will generically call X, and one response variable, which we will call Y.1 Data consist of values (xi, yi), i = 1, … , n, of (X, Y) observed on each of n units or cases. In any particular problem, both X and Y will have other names that will be displayed in this book using typewriter font, such as temperature or concentration, that are more descriptive of the data that are to be analyzed. The goal of regression is to understand how the values of Y change as X is varied over its range of possible values. A first look at how Y changes as X is varied is available from a scatterplot.
One of the first uses of regression was to study inheritance of traits from generation to generation. During the period 1893–1898, Karl Pearson (1857–1936) organized the collection of n = 1375 heights of mothers in the United Kingdom under the age of 65 and one of their adult daughters over the age of 18. Pearson and Lee (1903) published the data, and we shall use these data to examine inheritance. The data are given in the data file Heights.2
Our interest is in inheritance from the mother to the daughter, so we view the mother's height, called mheight, as the predictor variable and the daughter's height, dheight, as the response variable. Do taller mothers tend to have taller daughters? Do shorter mothers tend to have shorter daughters?
A scatterplot of dheight versus mheight helps us answer these questions. The scatterplot is a graph of each of the n points with the response dheight on the vertical axis and predictor mheight on the horizontal axis. This plot is shown in Figure 1.1a. For regression problems with one predictor X and a response Y, we call the scatterplot of Y versus X a summary graph.
Figure 1.1 Scatterplot of mothers' and daughters' heights in the Pearson and Lee data. The original data have been jittered to avoid overplotting in (a). Plot (b) shows the original data, so each point in the plot refers to one or more mother–daughter pairs.
Here are some important characteristics of this scatterplot:
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!