Medical Statistics at a Glance is a concise and accessible introduction and revision aid for this complex subject. The self-contained chapters explain the underlying concepts of medical statistics and provide a guide to the most commonly used statistical procedures. This new edition of Medical Statistics at a Glance: * Presents key facts accompanied by clear and informative tables and diagrams * Focuses on illustrative examples which show statistics in action, with an emphasis on the interpretation of computer data analysis rather than complex hand calculations * Includes extensive cross-referencing, a comprehensive glossary of terms and flow-charts to make it easier to choose appropriate tests * Now provides the learning objectives for each chapter * Includes a new chapter on Developing Prognostic Scores * Includes new or expanded material on study management, multi-centre studies, sequential trials, bias and different methods to remove confounding in observational studies, multiple comparisons, ROC curves and checking assumptions in a logistic regression analysis * The companion website at www.medstatsaag.com contains supplementary material including an extensive reference list and multiple choice questions (MCQs) with interactive answers for self-assessment. Medical Statistics at a Glance will appeal to all medical students, junior doctors and researchers in biomedical and pharmaceutical disciplines. Reviews of the previous editions "The more familiar I have become with this book, the more I appreciate the clear presentation and unthreatening prose. It is now a valuable companion to my formal statistics course." -International Journal of Epidemiology "I heartily recommend it, especially to first years, but it's equally appropriate for an intercalated BSc or Postgraduate research. If statistics give you headaches - buy it. If statistics are all you think about - buy it." -GKT Gazette "...I unreservedly recommend this book to all medical students, especially those that dislike reading reams of text. This is one book that will not sit on your shelf collecting dust once you have graduated and will also function as a reference book." -4th Year Medical Student, Barts and the London Chronicle, Spring 2003
Ebooka przeczytasz w aplikacjach Legimi na:
Liczba stron: 533
Table of Contents
1 Types of data
Data and statistics
Categorical (qualitative) data
Numerical (quantitative) data
Distinguishing between data types
2 Data entry
Formats for data entry
Planning data entry
Multiple forms per patient
Problems with dates and times
Coding missing values
3 Error checking and outliers
Handling missing data
4 Displaying data diagrammatically
Identifying outliers using graphical methods
The use of connecting lines in diagrams
5 Describing data: the ‘average’
The arithmetic mean
The geometric mean
The weighted mean
6 Describing data: the ‘spread’
Ranges derived from percentiles
The standard deviation
Variation within- and between-subjects
7 Theoretical distributions: the Normal distribution
The rules of probability
Probability distributions: the theory
The Normal (Gaussian) distribution
The Standard Normal distribution
8 Theoretical distributions: other distributions
Some words of comfort
More continuous probability distributions
Discrete probability distributions
How do we transform?
Sampling and estimation
10 Sampling and sampling distributions
Why do we sample?
Obtaining a representative sample
Sampling distribution of the mean
Interpreting standard errors
SD or SEM?
Sampling distribution of the proportion
11 Confidence intervals
Confidence interval for the mean
Confidence interval for the proportion
Interpretation of confidence intervals
Degrees of freedom
Bootstrapping and jackknifing
12 Study design I
Experimental or observational studies
Defining the unit of observation
Cross-sectional or longitudinal studies
13 Study design II
Particular study designs
Choosing an appropriate study endpoint
14 Clinical trials
Primary and secondary endpoints
Blinding or masking
15 Cohort studies
Selection of cohorts
Follow-up of individuals
Information on outcomes and exposures
Analysis of cohort studies
Advantages of cohort studies
Disadvantages of cohort studies
16 Case–control studies
Selection of cases
Selection of controls
Identification of risk factors
Analysis of unmatched or group-matched case–control studies
Analysis of individually matched case–control studies
Advantages of case–control studies
Disadvantages of case–control studies
17 Hypothesis testing
Defining the null and alternative hypotheses
Obtaining the test statistic
Obtaining the P-value
Using the P-value
Hypothesis tests versus confidence intervals
Equivalence and non-inferiority trials
18 Errors in hypothesis testing
Making a decision
Making the wrong decision
Power and related factors
Multiple hypothesis testing
Basic techniques for analysing data
19 Numerical data: a single group
The one-sample t-test
The sign test
20 Numerical data: two related groups
The paired t-test
The Wilcoxon signed ranks test
21 Numerical data: two unrelated groups
The unpaired (two-sample) t-test
The Wilcoxon rank sum (two-sample) test
22 Numerical data: more than two groups
One-way analysis of variance
The Kruskal–Wallis test
23 Categorical data: a single proportion
The test of a single proportion
The sign test applied to a proportion
24 Categorical data: two proportions
Independent groups: the Chi-squared test
Related groups: McNemar’s test
25 Categorical data: more than two categories
Chi-squared test: large contingency tables
Chi-squared test for trend
Regression and correlation
Pearson correlation coefficient
Spearman’s rank correlation coefficient
27 The theory of linear regression
What is linear regression?
The regression line
Method of least squares
Analysis of variance table
Regression to the mean
28 Performing a linear regression analysis
The linear regression line
Drawing the line
Checking the assumptions
Failure to satisfy the assumptions
Outliers and influential points
Assessing goodness of fit
Investigating the slope
Using the line for prediction
Improving the interpretation of the model
29 Multiple linear regression
What is it?
Why do it?
Categorical explanatory variables
Analysis of covariance
Choice of explanatory variables
Outliers and influential points
30 Binary outcomes and logistic regression
The logistic regression equation
The explanatory variables
Assessing the adequacy of the model
Comparing the odds ratio and the relative risk
Multinomial and ordinal logistic regression
Conditional logistic regression
31 Rates and Poisson regression
32 Generalized linear models
Which type of model do we choose?
Likelihood and maximum likelihood estimation
Assessing adequacy of fit
33 Explanatory variables in statistical models
Nominal explanatory variables
Ordinal explanatory variables
Numerical explanatory variables
Selecting explanatory variables
34 Bias and confounding
35 Checking assumptions
Are the data Normally distributed?
Are two or more variances equal?
Are variables linearly related?
What if the assumptions are not satisfied?
36 Sample size calculations
The importance of sample size
Increasing the power for a fixed sample size
37 Presenting results
Presenting results in a paper
38 Diagnostic tools
39 Assessing agreement
Measurement variability and error
40 Evidence-based medicine
1 Formulate the problem
2 Locate the relevant information (e.g. on diagnosis, prognosis or therapy)
3 Critically appraise the methods in order to assess the validity (closeness to the truth) of the evidence
4 Extract the most useful results and determine whether they are important
5 Apply the results in clinical practice
6 Evaluate your performance
41 Methods for clustered data
Displaying the data
Comparing groups: inappropriate analyses
Comparing groups: appropriate analyses
42 Regression methods for clustered data
Aggregate level analysis
Robust standard errors
Random effects models
Generalized estimating equations (GEE)
43 Systematic reviews and meta-analysis
The systematic review
44 Survival analysis
Displaying survival data
Problems encountered in survival analysis
45 Bayesian methods
The frequentist approach
The Bayesian approach
Diagnostic tests in a Bayesian framework
Disadvantages of Bayesian methods
46 Developing prognostic scores
Why do we do it?
Assessing the performance of a prognostic score
Developing prognostic indices and risk scores for other types of data
Appendix A: Statistical tables
Appendix B: Altman’s nomogram for sample size calculations (Chapter 36)
Appendix C: Typical computer output
Appendix D: Glossary of terms
Appendix E: Chapter numbers with relevant multiple-choice questions and structured questions from Medical Statistics at a Glance Workbook
This edition first published 2009 © 2000, 2005, 2009 by Aviva Petrie and Caroline Sabin
Registered office: John Wiley & Sons, Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
Editorial offices: 9600 Garsington Road, Oxford, OX4 2DQ, UK
The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK
111 River Street, Hoboken, NJ 07030-5774, USA
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell
The right of the author to be identified as the author of this work has been asserted in accordance with the UK Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
The contents of this work are intended to further general scientific research, understanding, and discussion only and are not intended and should not be relied upon as recommending or promoting a specific method, diagnosis, or treatment by health science practitioners for any particular patient. The publisher and the author make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of fitness for a particular purpose. In view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of medicines, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each medicine, equipment, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. Readers should consult with a specialist where appropriate. The fact that an organization or Website is referred to in this work as a citation and/or a potential source of further information does not mean that the author or the publisher endorses the information the organization or Website may provide or recommendations it may make. Further, readers should be aware that Internet Websites listed in this work may have changed or disappeared between when this work was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising herefrom.
Library of Congress Cataloging-in-Publication Data
Medical statistics at a glance / Aviva Petrie, Caroline Sabin. – 3rd ed.
p.; cm. – (At a glance series)
Includes bibliographical references and index.
ISBN 978-1-4051-8051-1 (alk. paper)
1. Medical statistics. I. Sabin, Caroline. II. Title. III. Series: At a glance series (Oxford, England)
[DNLM: 1. Statistics as Topic. 2. Research Design. WA 950 P495m 2009]
A catalogue record for this book is available from the British Library.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Medical Statistics at a Glance is directed at undergraduate medical students, medical researchers, postgraduates in the biomedical disciplines and at pharmaceutical industry personnel. All of these individuals will, at some time in their professional lives, be faced with quantitative results (their own or those of others) which will need to be critically evaluated and interpreted, and some, of course, will have to pass that dreaded statistics exam! A proper understanding of statistical concepts and methodology is invaluable for these needs. Much as we should like to fire the reader with an enthusiasm for the subject of statistics, we are pragmatic. Our aim in this new edition, as it was in the earlier editions, is to provide the student and the researcher, as well as the clinician encountering statistical concepts in the medical literature, with a book which is sound, easy to read, comprehensive, relevant, and of useful practical application.
We believe Medical Statistics at a Glance will be particularly helpful as an adjunct to statistics lectures and as a reference guide. The structure of this third edition is the same as that of the first two editions. In line with other books in the At a Glance series, we lead the reader through a number of self-contained two-, three- or occasionally four-page chapters, each covering a different aspect of medical statistics. We have learned from our own teaching experiences and have taken account of the difficulties that our students have encountered when studying medical statistics. For this reason, we have chosen to limit the theoretical content of the book to a level that is sufficient for understanding the procedures involved, yet which does not overshadow the practicalities of their execution.
Medical statistics is a wide-ranging subject covering a large number of topics. We have provided a basic introduction to the underlying concepts of medical statistics and a guide to the most commonly used statistical procedures. Epidemiology is closely allied to medical statistics. Hence some of the main issues in epidemiology, relating to study design and interpretation, are discussed. Also included are chapters which the reader may find useful only occasionally, but which are, nevertheless, fundamental to many areas of medical research; for example, evidence-based medicine, systematic reviews and meta-analysis, survival analysis, Bayesian methods and the development of prognostic scores. We have explained the principles underlying these topics so that the reader will be able to understand and interpret the results from them when they are presented in the literature.
The chapter titles of this third edition are identical to those of the second edition, apart from Chapter 34 (now called ‘Bias and confounding’ instead of ‘Issues in statistical modelling’); in addition, we have added a new chapter (Chapter 46 – ‘Developing prognostic scores’). Some of the first 45 chapters remain unaltered in this new edition and some have relatively minor changes which accommodate recent advances, cross-referencing or re-organization of the new material. We have expanded many chapters; for example, we have included a section on multiple comparisons (Chapter 12), provided more information on different study designs, including multicentre studies (Chapter 12) and sequential trials (Chapter 14), emphasized the importance of study management (Chapters 15 and 16), devoted greater space to receiver operating characteristic (ROC) curves (Chapters 30, 38 and 46), supplied more details of how to check the assumptions underlying a logistic regression analysis (Chapter 30) and explored further some of the different methods to remove confounding in observational studies (Chapter 34). We have also reorganized some of the material. The brief introduction to bias in Chapter 12 in the second edition has been omitted from that chapter in the third edition and moved to Chapter 34, which covers this topic in greater depth. A discussion of ‘interaction’ is currently in Chapter 33 and the section on prognostic indices is now much expanded and contained in the new Chapter 46.
New to this third edition is a set of learning objectives for each chapter, all of which are displayed together at the beginning of the book. Each set provides a framework for evaluating understanding and progress. If you are able to complete all the bulleted tasks in a chapter satisfactorily, you will have mastered the concepts in that chapter.
As in previous editions, the description of most of the statistical techniques is accompanied by an example illustrating its use. We have generally obtained the data for these examples from collaborative studies in which we or colleagues have been involved; in some instances, we have used real data from published papers. Where possible, we have used the same data set in more than one chapter to reflect the reality of data analysis, which is rarely restricted to a single technique or approach. Although we believe that formulae should be provided and the logic of the approach explained as an aid to understanding, we have avoided showing the details of complex calculations – most readers will have access to computers and are unlikely to perform any but the simplest calculations by hand.
We consider that it is particularly important for the reader to be able to interpret output from a computer package. We have therefore chosen, where applicable, to show results using extracts from computer output. In some instances, where we believe individuals may have difficulty with its interpretation, we have included (Appendix C) and annotated the complete computer output from an analysis of a data set. There are many statistical packages in common use; to give the reader an indication of how output can vary, we have not restricted the output to a particular package and have, instead, used three well-known ones – SAS, SPSS and Stata.
There is extensive cross-referencing throughout the text to help the reader link the various procedures. A basic set of statistical tables is contained in Appendix A. Neave, H.R. (1995) Elemementary Statistical Tables, Routledge: London, and Diem, K. (1970) Documenta Geigy Scientific Tables, 7th edition, Blackwell Publishing: Oxford, amongst others, provide fuller versions if the reader requires more precise results for hand calculations. The glossary of terms in Appendix D provides readily accessible explanations of commonly used terminology.
We know that one of the greatest difficulties facing non-statisticians is choosing the appropriate technique. We have therefore produced two flow charts which can be used both to aid the decision as to what method to use in a given situation and to locate a particular technique in the book easily. These flow charts are displayed prominently on the inside back cover for easy access.
The reader may find it helpful to assess his/her progress in self-directed learning by attempting the interactive exercises on our website (www.medstatsaag.com). This website also contains a full set of references (some of which are linked directly to Medline) to supplement the references quoted in the text and provide useful background information for the examples. For those readers who wish to gain a greater insight into particular areas of medical statistics, we can recommend the following books:
Altman, D.G. (1991)
Practical Statistics for Medical Research
. London: Chapman and Hall/CRC.
Armitage, P., Berry, G. and Matthews, J.F.N. (2001)
Statistical Methods in Medical Research.
4th edition. Oxford: Blackwell Science.
Kirkwood, B.R. and Sterne, J.A.C. (2003)
Essential Medical Statistics.
2nd Edn. Oxford: Blackwell Publishing.
Pocock, S.J. (1983)
Clinical Trials: A Practical Approach
. Chichester: Wiley.
We are extremely grateful to Mark Gilthorpe and Jonathan Sterne who made invaluable comments and suggestions on aspects of the second edition, and to Richard Morris, Fiona Lampe, Shak Hajat and Abul Basar for their counsel on the first edition. We wish to thank everyone who has helped us by providing data for the examples. Naturally, we take full responsibility for any errors that remain in the text or examples. We should also like to thank Mike, Gerald, Nina, Andrew and Karen who tolerated, with equanimity, our preoccupation with the first two editions and lived with us through the trials and tribulations of this third edition.
Aviva PetrieCaroline SabinLondon
Also available to buy now!
Medical Statistics at a Glance Workbook
A brand new comprehensive workbook containing a variety of examples and exercises, complete with model answers, designed to support your learning and revision.
Fully cross-referenced to Medical Statistics at a Glance, this new workbook includes:
Over 80 MCQs, each testing knowledge of a single statistical concept or aspect of study interpretation
29 structured questions to explore in greater depth several statistical techniques or principles
Templates for the appraisal of clinical trials and observational studies, plus full appraisals of two published papers to demonstrate the use of these templates in practice
Detailed step-by-step analyses of two substantial data sets (also available at
) to demonstrate the application of statistical procedures to real-life research
Medical Statistics at a Glance Workbook is the ideal resource to improve statistical knowledge together with your analytical and interpretational skills.
By the end of the relevant chapter you should be able to:
Types of Data
The purpose of most studies is to collect data to obtain information about a particular area of research. Our data comprise observations on one or more variables; any quantity that varies is termed a variable. For example, we may collect basic clinical and demographic information on patients with a particular illness. The variables of interest may include the sex, age and height of the patients.
Our data are usually obtained from a sample of individuals which represents the population of interest. Our aim is to condense these data in a meaningful way and extract useful information from them. Statistics encompasses the methods of collecting, summarizing, analysing and drawing conclusions from the data: we use statistical techniques to achieve our aim.
Data may take many different forms. We need to know what form every variable takes before we can make a decision regarding the most appropriate statistical methods to use. Each variable and the resulting data will be one of two types: categorical or numerical (Fig. 1.1).
Figure 1.1 Diagram showing the different types of variable.
These occur when each individual can only belong to one of a number of distinct categories of the variable.
– the categories are not ordered but simply have names. Examples include blood group (A, B, AB and O) and marital status (married/widowed/single, etc.). In this case, there is no reason to suspect that being married is any better (or worse) than being single!
– the categories are ordered in some way. Examples include disease staging systems (advanced, moderate, mild, none) and degree of pain (severe, moderate, mild, none).
A categorical variable is binary or dichotomous when there are only two possible categories. Examples include ‘Yes/No’, ‘Dead/Alive’ or ‘Patient has disease/Patient does not have disease’.
These occur when the variable takes some numerical value. We can subdivide numerical data into two types.
– occur when the variable can only take certain whole numerical values. These are often counts of numbers of events, such as the number of visits to a GP in a particular year or the number of episodes of illness in an individual over the last five years.
– occur when there is no limitation on the values that the variable can take, e.g. weight or height, other than that which restricts us when we make the measurement.
We often use very different statistical methods depending on whether the data are categorical or numerical. Although the distinction between categorical and numerical data is usually clear, in some situations it may become blurred. For example, when we have a variable with a large number of ordered categories (e.g. a pain scale with seven categories), it may be difficult to distinguish it from a discrete numerical variable. The distinction between discrete and continuous numerical data may be even less clear, although in general this will have little impact on the results of most analyses. Age is an example of a variable that is often treated as discrete even though it is truly continuous. We usually refer to ‘age at last birthday’ rather than ‘age’, and therefore, a woman who reports being 30 may have just had her 30th birthday, or may be just about to have her 31st birthday.
Do not be tempted to record numerical data as categorical at the outset (e.g. by recording only the range within which each patient’s age falls rather than his/her actual age) as important information is often lost. It is simple to convert numerical data to categorical data once they have been collected.
We may encounter a number of other types of data in the medical field. These include:
– These may arise when considering improvements in patients following treatment, e.g. a patient’s lung function (forced expiratory volume in 1 second, FEV1) may increase by 24% following treatment with a new drug. In this case, it is the level of improvement, rather than the absolute value, which is of interest.
– Occasionally you may encounter the ratio or quotient of two variables. For example, body mass index (BMI), calculated as an individual’s weight (kg) divided by her/his height squared (m
), is often used to assess whether s/he is over- or underweight.
– Disease rates, in which the number of disease events occurring among individuals in a study is divided by the total number of years of follow-up of all individuals in that study (Chapter 31), are common in epidemiological studies (Chapter 12).
– We sometimes use an arbitrary value, such as a score, when we cannot measure a quantity. For example, a series of responses to questions on quality of life may be summed to give some overall quality of life score on each individual.
All these variables can be treated as numerical variables for most analyses. Where the variable is derived using more than one value (e.g. the numerator and denominator of a percentage), it is important to record all of the values used. For example, a 10% improvement in a marker following treatment may have different clinical relevance depending on the level of the marker before treatment.
We may come across censored data in situations illustrated by the following examples.
If we measure laboratory values using a tool that can only detect levels above a certain cut-off value, then any values below this cut-off will not be detected, i.e. they are censored. For example, when measuring virus levels, those below the limit of detectability will often be reported as ‘undetectable’ or ‘unquantifiable’ even though there may be some virus in the sample. In this situation, if the lower cut-off of a tool is
, say, the results may be reported as ‘<
’. Similarly, some tools may only be able to reliably quantify levels below a certain cut-off value, say
; any measurements above that value will also be censored and the test result may be reported as ‘>
We may encounter censored data when following patients in a trial in which, for example, some patients withdraw from the trial before the trial has ended. This type of data is discussed in more detail in Chapter 44.
When you carry out any study you will almost always need to enter the data into a computer package. Computers are invaluable for improving the accuracy and speed of data collection and analysis, making it easy to check for errors, produce graphical summaries of the data and generate new variables. It is worth spending some time planning data entry – this may save considerable effort at later stages.
There are a number of ways in which data can be entered and stored on a computer. Most statistical packages allow you to enter data directly. However, the limitation of this approach is that often you cannot move the data to another package. A simple alternative is to store the data in either a spreadsheet or database package. Unfortunately, their statistical procedures are often limited, and it will usually be necessary to output the data into a specialist statistical package to carry out analyses.
A more flexible approach is to have your data available as an ASCII or text file. Once in an ASCII format, the data can be read by most packages. ASCII format simply consists of rows of text that you can view on a computer screen. Usually, each variable in the file is separated from the next by some delimiter, often a space or a comma. This is known as free format.
The simplest way of entering data in ASCII format is to type the data directly in this format using either a word processing or editing package. Alternatively, data stored in spreadsheet packages can be saved in ASCII format. Using either approach, it is customary for each row of data to correspond to a different individual in the study, and each column to correspond to a different variable, although it may be necessary to go on to subsequent rows if data from a large number of variables are collected on each individual.
When collecting data in a study you will often need to use a form or questionnaire for recording the data. If these forms are designed carefully, they can reduce the amount of work that has to be done when entering the data. Generally, these forms/questionnaires include a series of boxes in which the data are recorded – it is usual to have a separate box for each possible digit of the response.
Some statistical packages have problems dealing with non-numerical data. Therefore, you may need to assign numerical codes to categorical data before entering the data into the computer. For example, you may choose to assign the codes of 1, 2, 3 and 4 to categories of ‘no pain’, ‘mild pain’, ‘moderate pain’ and ‘severe pain’, respectively. These codes can be added to the forms when collecting the data. For binary data, e.g. yes/no answers, it is often convenient to assign the codes 1 (e.g. for ‘yes’) and 0 (for ‘no’).
variables – there is only one possible answer to a question, e.g. ‘is the patient dead?’. It is not possible to answer both ‘yes’ and ‘no’ to this question.
variables – more than one answer is possible for each respondent. For example, ‘what symptoms has this patient experienced?’. In this case, an individual may have experienced any of a number of symptoms. There are two ways to deal with this type of data depending upon which of the two following situations applies.
There are only a few possible symptoms, and individuals may have experienced many of them.
A number of different binary variables can be created which correspond to whether the patient has answered yes or no to the presence of each possible symptom. For example, ‘did the patient have a cough?’, ‘did the patient have a sore throat?’
There are a very large number of possible symptoms but each patient is expected to suffer from only a few of them.
A number of different nominal variables can be created; each successive variable allows you to name a symptom suffered by the patient. For example, ‘what was the first symptom the patient suffered?’, ‘what was the second symptom?’. You will need to decide in advance the maximum number of symptoms you think a patient is likely to have suffered.
Numerical data should be entered with the same precision as they are measured, and the unit of measurement should be consistent for all observations on a variable. For example, weight should be recorded in kilograms or in pounds, but not both interchangeably.
Sometimes, information is collected on the same patient on more than one occasion. It is important that there is some unique identifier (e.g. a serial number) relating to the individual that will enable you to link all of the data from an individual in the study.
Dates and times should be entered in a consistent manner, e.g. either as day/month/year or month/day/year, but not interchangeably. It is important to find out what format the statistical package can read.
You should consider what you will do with missing values before you enter the data. In most cases you will need to use some symbol to represent a missing value. Statistical packages deal with missing values in different ways. Some use special characters (e.g. a full stop or asterisk) to indicate missing values, whereas others require you to define your own code for a missing value (commonly used values are 9, 999 or −99). The value that is chosen should be one that is not possible for that variable. For example, when entering a categorical variable with four categories (coded 1, 2, 3 and 4), you may choose the value 9 to represent missing values. However, if the variable is ‘age of child’ then a different code should be chosen. Missing data are discussed in more detail in Chapter 3.
Figure 2.1 Portion of a spreadsheet showing data collected on a sample of 64 women with inherited bleeding disorders.
Tysiące ebooków i audiobooków
Ich liczba ciągle rośnie, a Ty masz gwarancję niezmiennej ceny.
Napisali o nas:
Nowy sposób na e-księgarnię
Czytelnicy nie wierzą
Legimi idzie na całość
Projekt Legimi wielkim wydarzeniem
Spotify for ebooks