Data analysis and graphics using R : an example-based approach /
Saved in:
Author / Creator: | Maindonald, J. H. (John Hilary), 1937- |
---|---|
Imprint: | Cambridge, UK ; New York : Cambridge University Press, 2003. |
Description: | xxiii, 362 p. : ill. ; 26 cm. |
Language: | English |
Series: | Cambridge series in statistical and probabilistic mathematics Cambridge series on statistical and probabilistic mathematics. |
Subject: | |
Format: | Print Book |
URL for this record: | http://pi.lib.uchicago.edu/1001/cat/bib/4915270 |
Table of Contents:
- Preface
- A Chapter by Chapter Summary
- 1. A Brief Introduction to R
- 1.1. A Short R Session
- 1.1.1. R must be installed!
- 1.1.2. Using the console (or command line) window
- 1.1.3. Reading data from a file
- 1.1.4. Entry of data at the command line
- 1.1.5. Online help
- 1.1.6. Quitting R
- 1.2. The Uses of R
- 1.3. The R Language
- 1.3.1. R objects
- 1.3.2. Retaining objects between sessions
- 1.4. Vectors in R
- 1.4.1. Concatenation--joining vector objects
- 1.4.2. Subsets of vectors
- 1.4.3. Patterned data
- 1.4.4. Missing values
- 1.4.5. Factors
- 1.5. Data Frames
- 1.5.1. Variable names
- 1.5.2. Applying a function to the columns of a data frame
- 1.5.3. Data frames and matrices
- 1.5.4. Identification of rows that include missing values
- 1.6. R Packages
- 1.6.1. Data sets that accompany R packages
- 1.7. Looping
- 1.8. R Graphics
- 1.8.1. The function plot () and allied functions
- 1.8.2. Identification and location on the figure region
- 1.8.3. Plotting mathematical symbols
- 1.8.4. Row by column layouts of plots
- 1.8.5. Graphs--additional notes
- 1.9. Additional Points on the Use of R in This Book
- 1.10. Further Reading
- 1.11. Exercises
- 2. Styles of Data Analysis
- 2.1. Revealing Views of the Data
- 2.1.1. Views of a single sample
- 2.1.2. Patterns in grouped data
- 2.1.3. Patterns in bivariate data--the scatterplot
- 2.1.4. Multiple variables and times
- 2.1.5. Lattice (trellis style) graphics
- 2.1.6. What to look for in plots
- 2.2. Data Summary
- 2.2.1. Mean and median
- 2.2.2. Standard deviation and inter-quartile range
- 2.2.3. Correlation
- 2.3. Statistical Analysis Strategies
- 2.3.1. Helpful and unhelpful questions
- 2.3.2. Planning the formal analysis
- 2.3.3. Changes to the intended plan of analysis
- 2.4. Recap
- 2.5. Further Reading
- 2.6. Exercises
- 3. Statistical Models
- 3.1. Regularities
- 3.1.1. Mathematical models
- 3.1.2. Models that include a random component
- 3.1.3. Smooth and rough
- 3.1.4. The construction and use of models
- 3.1.5. Model formulae
- 3.2. Distributions: Models for the Random Component
- 3.2.1. Discrete distributions
- 3.2.2. Continuous distributions
- 3.3. The Uses of Random Numbers
- 3.3.1. Simulation
- 3.3.2. Sampling from populations
- 3.4. Model Assumptions
- 3.4.1. Random sampling assumptions--independence
- 3.4.2. Checks for normality
- 3.4.3. Checking other model assumptions
- 3.4.4. Are non-parametric methods the answer?
- 3.4.5. Why models matter--adding across contingency tables
- 3.5. Recap
- 3.6. Further Reading
- 3.7. Exercises
- 4. An Introduction to Formal Inference
- 4.1. Standard Errors
- 4.1.1. Population parameters and sample statistics
- 4.1.2. Assessing accuracy--the standard error
- 4.1.3. Standard errors for differences of means
- 4.1.4. The standard error of the median
- 4.1.5. Resampling to estimate standard errors: bootstrapping
- 4.2. Calculations Involving Standard Errors: the t-Distribution
- 4.3. Confidence Intervals and Hypothesis Tests
- 4.3.1. One- and two-sample intervals and tests for means
- 4.3.2. Confidence intervals and tests for proportions
- 4.3.3. Confidence intervals for the correlation
- 4.4. Contingency Tables
- 4.4.1. Rare and endangered plant species
- 4.4.2. Additional notes
- 4.5. One-Way Unstructured Comparisons
- 4.5.1. Displaying means for the one-way layout
- 4.5.2. Multiple comparisons
- 4.5.3. Data with a two-way structure
- 4.5.4. Presentation issues
- 4.6. Response Curves
- 4.7. Data with a Nested Variation Structure
- 4.7.1. Degrees of freedom considerations
- 4.7.2. General multi-way analysis of variance designs
- 4.8. Resampling Methods for Tests and Confidence Intervals
- 4.8.1. The one-sample permutation test
- 4.8.2. The two-sample permutation test
- 4.8.3. Bootstrap estimates of confidence intervals
- 4.9. Further Comments on Formal Inference
- 4.9.1. Confidence intervals versus hypothesis tests
- 4.9.2. If there is strong prior information, use it!
- 4.10. Recap
- 4.11. Further Reading
- 4.12. Exercises
- 5. Regression with a Single Predictor
- 5.1. Fitting a Line to Data
- 5.1.1. Lawn roller example
- 5.1.2. Calculating fitted values and residuals
- 5.1.3. Residual plots
- 5.1.4. The analysis of variance table
- 5.2. Outliers, Influence and Robust Regression
- 5.3. Standard Errors and Confidence Intervals
- 5.3.1. Confidence intervals and tests for the slope
- 5.3.2. SEs and confidence intervals for predicted values
- 5.3.3. Implications for design
- 5.4. Regression versus Qualitative ANOVA Comparisons
- 5.5. Assessing Predictive Accuracy
- 5.5.1. Training/test sets, and cross-validation
- 5.5.2. Cross-validation--an example
- 5.5.3. Bootstrapping
- 5.6. A Note on Power Transformations
- 5.7. Size and Shape Data
- 5.7.1. Allometric growth
- 5.7.2. There are two regression lines!
- 5.8. The Model Matrix in Regression
- 5.9. Recap
- 5.10. Methodological References
- 5.11. Exercises
- 6. Multiple Linear Regression
- 6.1. Basic Ideas: Book Weight and Brain Weight Examples
- 6.1.1. Omission of the intercept term
- 6.1.2. Diagnostic plots
- 6.1.3. Further investigation of influential points
- 6.1.4. Example: brain weight
- 6.2. Multiple Regression Assumptions and Diagnostics
- 6.2.1. Influential outliers and Cook's distance
- 6.2.2. Component plus residual plots
- 6.2.3. Further types of diagnostic plot
- 6.2.4. Robust and resistant methods
- 6.3. A Strategy for Fitting Multiple Regression Models
- 6.3.1. Preliminaries
- 6.3.2. Model fitting
- 6.3.3. An example--the Scottish hill race data
- 6.4. Measures for the Comparison of Regression Models
- 6.4.1. R[superscript 2] and adjusted R[superscript 2]
- 6.4.2. AIC and related statistics
- 6.4.3. How accurately does the equation predict?
- 6.4.4. An external assessment of predictive accuracy
- 6.5. Interpreting Regression Coefficients--the Labor Training Data
- 6.6. Problems with Many Explanatory Variables
- 6.6.1. Variable selection issues
- 6.6.2. Principal components summaries
- 6.7. Multicollinearity
- 6.7.1. A contrived example
- 6.7.2. The variance inflation factor (VIF)
- 6.7.3. Remedying multicollinearity
- 6.8. Multiple Regression Models--Additional Points
- 6.8.1. Confusion between explanatory and dependent variables
- 6.8.2. Missing explanatory variables
- 6.8.3. The use of transformations
- 6.8.4. Non-linear methods--an alternative to transformation?
- 6.9. Further Reading
- 6.10. Exercises
- 7. Exploiting the Linear Model Framework
- 7.1. Levels of a Factor--Using Indicator Variables
- 7.1.1. Example--sugar weight
- 7.1.2. Different choices for the model matrix when there are factors
- 7.2. Polynomial Regression
- 7.2.1. Issues in the choice of model
- 7.3. Fitting Multiple Lines
- 7.4. Methods for Passing Smooth Curves through Data
- 7.4.1. Scatterplot smoothing--regression splines
- 7.4.2. Other smoothing methods
- 7.4.3. Generalized additive models
- 7.5. Smoothing Terms in Multiple Linear Models
- 7.6. Further Reading
- 7.7. Exercises
- 8. Logistic Regression and Other Generalized Linear Models
- 8.1. Generalized Linear Models
- 8.1.1. Transformation of the expected value on the left
- 8.1.2. Noise terms need not be normal
- 8.1.3. Log odds in contingency tables
- 8.1.4. Logistic regression with a continuous explanatory variable
- 8.2. Logistic Multiple Regression
- 8.2.1. A plot of contributions of explanatory variables
- 8.2.2. Cross-validation estimates of predictive accuracy
- 8.3. Logistic Models for Categorical Data--an Example
- 8.4. Poisson and Quasi-Poisson Regression
- 8.4.1. Data on aberrant crypt foci
- 8.4.2. Moth habitat example
- 8.4.3. Residuals, and estimating the dispersion
- 8.5. Ordinal Regression Models
- 8.5.1. Exploratory analysis
- 8.5.2. Proportional odds logistic regression
- 8.6. Other Related Models
- 8.6.1. Loglinear models
- 8.6.2. Survival analysis
- 8.7. Transformations for Count Data
- 8.8. Further Reading
- 8.9. Exercises
- 9. Multi-level Models, Time Series and Repeated Measures
- 9.1. Introduction
- 9.2. Example--Survey Data, with Clustering
- 9.2.1. Alternative models
- 9.2.2. Instructive, though faulty, analyses
- 9.2.3. Predictive accuracy
- 9.3. A Multi-level Experimental Design
- 9.3.1. The ANOVA table
- 9.3.2. Expected values of mean squares
- 9.3.3. The sums of squares breakdown
- 9.3.4. The variance components
- 9.3.5. The mixed model analysis
- 9.3.6. Predictive accuracy
- 9.3.7. Different sources of variance--complication or focus of interest?
- 9.4. Within and between Subject Effects--an Example
- 9.5. Time Series--Some Basic Ideas
- 9.5.1. Preliminary graphical explorations
- 9.5.2. The autocorrelation function
- 9.5.3. Autoregressive (AR) models
- 9.5.4. Autoregressive moving average (ARMA) models--theory
- 9.6. Regression Modeling with Moving Average Errors--an Example
- 9.7. Repeated Measures in Time--Notes on the Methodology
- 9.7.1. The theory of repeated measures modeling
- 9.7.2. Correlation structure
- 9.7.3. Different approaches to repeated measures analysis
- 9.8. Further Notes on Multi-level Modeling
- 9.8.1. An historical perspective on multi-level models
- 9.8.2. Meta-analysis
- 9.9. Further Reading
- 9.10. Exercises
- 10. Tree-based Classification and Regression
- 10.1. The Uses of Tree-based Methods
- 10.1.1. Problems for which tree-based regression may be used
- 10.1.2. Tree-based regression versus parametric approaches
- 10.1.3. Summary of pluses and minuses
- 10.2. Detecting Email Spam--an Example
- 10.2.1. Choosing the number of splits
- 10.3. Terminology and Methodology
- 10.3.1. Choosing the split--regression trees
- 10.3.2. Within and between sums of squares
- 10.3.3. Choosing the split--classification trees
- 10.3.4. The mechanics of tree-based regression--a trivial example
- 10.4. Assessments of Predictive Accuracy
- 10.4.1. Cross-validation
- 10.4.2. The training/test set methodology
- 10.4.3. Predicting the future
- 10.5. A Strategy for Choosing the Optimal Tree
- 10.5.1. Cost-complexity pruning
- 10.5.2. Prediction error versus tree size
- 10.6. Detecting Email Spam--the Optimal Tree
- 10.6.1. The one-standard-deviation rule
- 10.7. Interpretation and Presentation of the rpart Output
- 10.7.1. Data for female heart attack patients
- 10.7.2. Printed Information on Each Split
- 10.8. Additional Notes
- 10.9. Further Reading
- 10.10. Exercises
- 11. Multivariate Data Exploration and Discrimination
- 11.1. Multivariate Exploratory Data Analysis
- 11.1.1. Scatterplot matrices
- 11.1.2. Principal components analysis
- 11.2. Discriminant Analysis
- 11.2.1. Example--plant architecture
- 11.2.2. Classical Fisherian discriminant analysis
- 11.2.3. Logistic discriminant analysis
- 11.2.4. An example with more than two groups
- 11.3. Principal Component Scores in Regression
- 11.4. Propensity Scores in Regression Comparisons--Labor Training Data
- 11.5. Further Reading
- 11.6. Exercises
- 12. The R System--Additional Topics
- 12.1. Graphs in R
- 12.2. Functions--Some Further Details
- 12.2.1. Common useful functions
- 12.2.2. User-written R functions
- 12.2.3. Functions for working with dates
- 12.3. Data input and output
- 12.3.1. Input
- 12.3.2. Data output
- 12.4. Factors--Additional Comments
- 12.5. Missing Values
- 12.6. Lists and Data Frames
- 12.6.1. Data frames as lists
- 12.6.2. Reshaping data frames; reshape ()
- 12.6.3. Joining data frames and vectors--cbind ()
- 12.6.4. Conversion of tables and arrays into data frames
- 12.6.5. Merging data frames--merge ()
- 12.6.6. The function sapply () and related functions
- 12.6.7. Splitting vectors and data frames into lists--split ()
- 12.7. Matrices and Arrays
- 12.7.1. Outer products
- 12.7.2. Arrays
- 12.8. Classes and Methods
- 12.8.1. Printing and summarizing model objects
- 12.8.2. Extracting information from model objects
- 12.9. Data-bases and Environments
- 12.9.1. Workspace management
- 12.9.2. Function environments, and lazy evaluation
- 12.10. Manipulation of Language Constructs
- 12.11. Further Reading
- 12.12. Exercises
- Epilogue--Models
- Appendix. S-PLUS Differences
- References
- Index of R Symbols and Functions
- Index of Terms
- Index of Names