Data analysis and graphics using R : an example-based approach /

Saved in:
Bibliographic Details
Author / Creator:Maindonald, J. H. (John Hilary), 1937-
Imprint:Cambridge, UK ; New York : Cambridge University Press, 2003.
Description:xxiii, 362 p. : ill. ; 26 cm.
Language:English
Series:Cambridge series in statistical and probabilistic mathematics
Cambridge series on statistical and probabilistic mathematics.
Subject:
Format: Print Book
URL for this record:http://pi.lib.uchicago.edu/1001/cat/bib/4915270
Hidden Bibliographic Details
Other authors / contributors:Braun, John, 1963-
ISBN:0521813360
Notes:Includes bibliographical references (p. [346]-351) and indexes.
Table of Contents:
  • Preface
  • A Chapter by Chapter Summary
  • 1. A Brief Introduction to R
  • 1.1. A Short R Session
  • 1.1.1. R must be installed!
  • 1.1.2. Using the console (or command line) window
  • 1.1.3. Reading data from a file
  • 1.1.4. Entry of data at the command line
  • 1.1.5. Online help
  • 1.1.6. Quitting R
  • 1.2. The Uses of R
  • 1.3. The R Language
  • 1.3.1. R objects
  • 1.3.2. Retaining objects between sessions
  • 1.4. Vectors in R
  • 1.4.1. Concatenation--joining vector objects
  • 1.4.2. Subsets of vectors
  • 1.4.3. Patterned data
  • 1.4.4. Missing values
  • 1.4.5. Factors
  • 1.5. Data Frames
  • 1.5.1. Variable names
  • 1.5.2. Applying a function to the columns of a data frame
  • 1.5.3. Data frames and matrices
  • 1.5.4. Identification of rows that include missing values
  • 1.6. R Packages
  • 1.6.1. Data sets that accompany R packages
  • 1.7. Looping
  • 1.8. R Graphics
  • 1.8.1. The function plot () and allied functions
  • 1.8.2. Identification and location on the figure region
  • 1.8.3. Plotting mathematical symbols
  • 1.8.4. Row by column layouts of plots
  • 1.8.5. Graphs--additional notes
  • 1.9. Additional Points on the Use of R in This Book
  • 1.10. Further Reading
  • 1.11. Exercises
  • 2. Styles of Data Analysis
  • 2.1. Revealing Views of the Data
  • 2.1.1. Views of a single sample
  • 2.1.2. Patterns in grouped data
  • 2.1.3. Patterns in bivariate data--the scatterplot
  • 2.1.4. Multiple variables and times
  • 2.1.5. Lattice (trellis style) graphics
  • 2.1.6. What to look for in plots
  • 2.2. Data Summary
  • 2.2.1. Mean and median
  • 2.2.2. Standard deviation and inter-quartile range
  • 2.2.3. Correlation
  • 2.3. Statistical Analysis Strategies
  • 2.3.1. Helpful and unhelpful questions
  • 2.3.2. Planning the formal analysis
  • 2.3.3. Changes to the intended plan of analysis
  • 2.4. Recap
  • 2.5. Further Reading
  • 2.6. Exercises
  • 3. Statistical Models
  • 3.1. Regularities
  • 3.1.1. Mathematical models
  • 3.1.2. Models that include a random component
  • 3.1.3. Smooth and rough
  • 3.1.4. The construction and use of models
  • 3.1.5. Model formulae
  • 3.2. Distributions: Models for the Random Component
  • 3.2.1. Discrete distributions
  • 3.2.2. Continuous distributions
  • 3.3. The Uses of Random Numbers
  • 3.3.1. Simulation
  • 3.3.2. Sampling from populations
  • 3.4. Model Assumptions
  • 3.4.1. Random sampling assumptions--independence
  • 3.4.2. Checks for normality
  • 3.4.3. Checking other model assumptions
  • 3.4.4. Are non-parametric methods the answer?
  • 3.4.5. Why models matter--adding across contingency tables
  • 3.5. Recap
  • 3.6. Further Reading
  • 3.7. Exercises
  • 4. An Introduction to Formal Inference
  • 4.1. Standard Errors
  • 4.1.1. Population parameters and sample statistics
  • 4.1.2. Assessing accuracy--the standard error
  • 4.1.3. Standard errors for differences of means
  • 4.1.4. The standard error of the median
  • 4.1.5. Resampling to estimate standard errors: bootstrapping
  • 4.2. Calculations Involving Standard Errors: the t-Distribution
  • 4.3. Confidence Intervals and Hypothesis Tests
  • 4.3.1. One- and two-sample intervals and tests for means
  • 4.3.2. Confidence intervals and tests for proportions
  • 4.3.3. Confidence intervals for the correlation
  • 4.4. Contingency Tables
  • 4.4.1. Rare and endangered plant species
  • 4.4.2. Additional notes
  • 4.5. One-Way Unstructured Comparisons
  • 4.5.1. Displaying means for the one-way layout
  • 4.5.2. Multiple comparisons
  • 4.5.3. Data with a two-way structure
  • 4.5.4. Presentation issues
  • 4.6. Response Curves
  • 4.7. Data with a Nested Variation Structure
  • 4.7.1. Degrees of freedom considerations
  • 4.7.2. General multi-way analysis of variance designs
  • 4.8. Resampling Methods for Tests and Confidence Intervals
  • 4.8.1. The one-sample permutation test
  • 4.8.2. The two-sample permutation test
  • 4.8.3. Bootstrap estimates of confidence intervals
  • 4.9. Further Comments on Formal Inference
  • 4.9.1. Confidence intervals versus hypothesis tests
  • 4.9.2. If there is strong prior information, use it!
  • 4.10. Recap
  • 4.11. Further Reading
  • 4.12. Exercises
  • 5. Regression with a Single Predictor
  • 5.1. Fitting a Line to Data
  • 5.1.1. Lawn roller example
  • 5.1.2. Calculating fitted values and residuals
  • 5.1.3. Residual plots
  • 5.1.4. The analysis of variance table
  • 5.2. Outliers, Influence and Robust Regression
  • 5.3. Standard Errors and Confidence Intervals
  • 5.3.1. Confidence intervals and tests for the slope
  • 5.3.2. SEs and confidence intervals for predicted values
  • 5.3.3. Implications for design
  • 5.4. Regression versus Qualitative ANOVA Comparisons
  • 5.5. Assessing Predictive Accuracy
  • 5.5.1. Training/test sets, and cross-validation
  • 5.5.2. Cross-validation--an example
  • 5.5.3. Bootstrapping
  • 5.6. A Note on Power Transformations
  • 5.7. Size and Shape Data
  • 5.7.1. Allometric growth
  • 5.7.2. There are two regression lines!
  • 5.8. The Model Matrix in Regression
  • 5.9. Recap
  • 5.10. Methodological References
  • 5.11. Exercises
  • 6. Multiple Linear Regression
  • 6.1. Basic Ideas: Book Weight and Brain Weight Examples
  • 6.1.1. Omission of the intercept term
  • 6.1.2. Diagnostic plots
  • 6.1.3. Further investigation of influential points
  • 6.1.4. Example: brain weight
  • 6.2. Multiple Regression Assumptions and Diagnostics
  • 6.2.1. Influential outliers and Cook's distance
  • 6.2.2. Component plus residual plots
  • 6.2.3. Further types of diagnostic plot
  • 6.2.4. Robust and resistant methods
  • 6.3. A Strategy for Fitting Multiple Regression Models
  • 6.3.1. Preliminaries
  • 6.3.2. Model fitting
  • 6.3.3. An example--the Scottish hill race data
  • 6.4. Measures for the Comparison of Regression Models
  • 6.4.1. R[superscript 2] and adjusted R[superscript 2]
  • 6.4.2. AIC and related statistics
  • 6.4.3. How accurately does the equation predict?
  • 6.4.4. An external assessment of predictive accuracy
  • 6.5. Interpreting Regression Coefficients--the Labor Training Data
  • 6.6. Problems with Many Explanatory Variables
  • 6.6.1. Variable selection issues
  • 6.6.2. Principal components summaries
  • 6.7. Multicollinearity
  • 6.7.1. A contrived example
  • 6.7.2. The variance inflation factor (VIF)
  • 6.7.3. Remedying multicollinearity
  • 6.8. Multiple Regression Models--Additional Points
  • 6.8.1. Confusion between explanatory and dependent variables
  • 6.8.2. Missing explanatory variables
  • 6.8.3. The use of transformations
  • 6.8.4. Non-linear methods--an alternative to transformation?
  • 6.9. Further Reading
  • 6.10. Exercises
  • 7. Exploiting the Linear Model Framework
  • 7.1. Levels of a Factor--Using Indicator Variables
  • 7.1.1. Example--sugar weight
  • 7.1.2. Different choices for the model matrix when there are factors
  • 7.2. Polynomial Regression
  • 7.2.1. Issues in the choice of model
  • 7.3. Fitting Multiple Lines
  • 7.4. Methods for Passing Smooth Curves through Data
  • 7.4.1. Scatterplot smoothing--regression splines
  • 7.4.2. Other smoothing methods
  • 7.4.3. Generalized additive models
  • 7.5. Smoothing Terms in Multiple Linear Models
  • 7.6. Further Reading
  • 7.7. Exercises
  • 8. Logistic Regression and Other Generalized Linear Models
  • 8.1. Generalized Linear Models
  • 8.1.1. Transformation of the expected value on the left
  • 8.1.2. Noise terms need not be normal
  • 8.1.3. Log odds in contingency tables
  • 8.1.4. Logistic regression with a continuous explanatory variable
  • 8.2. Logistic Multiple Regression
  • 8.2.1. A plot of contributions of explanatory variables
  • 8.2.2. Cross-validation estimates of predictive accuracy
  • 8.3. Logistic Models for Categorical Data--an Example
  • 8.4. Poisson and Quasi-Poisson Regression
  • 8.4.1. Data on aberrant crypt foci
  • 8.4.2. Moth habitat example
  • 8.4.3. Residuals, and estimating the dispersion
  • 8.5. Ordinal Regression Models
  • 8.5.1. Exploratory analysis
  • 8.5.2. Proportional odds logistic regression
  • 8.6. Other Related Models
  • 8.6.1. Loglinear models
  • 8.6.2. Survival analysis
  • 8.7. Transformations for Count Data
  • 8.8. Further Reading
  • 8.9. Exercises
  • 9. Multi-level Models, Time Series and Repeated Measures
  • 9.1. Introduction
  • 9.2. Example--Survey Data, with Clustering
  • 9.2.1. Alternative models
  • 9.2.2. Instructive, though faulty, analyses
  • 9.2.3. Predictive accuracy
  • 9.3. A Multi-level Experimental Design
  • 9.3.1. The ANOVA table
  • 9.3.2. Expected values of mean squares
  • 9.3.3. The sums of squares breakdown
  • 9.3.4. The variance components
  • 9.3.5. The mixed model analysis
  • 9.3.6. Predictive accuracy
  • 9.3.7. Different sources of variance--complication or focus of interest?
  • 9.4. Within and between Subject Effects--an Example
  • 9.5. Time Series--Some Basic Ideas
  • 9.5.1. Preliminary graphical explorations
  • 9.5.2. The autocorrelation function
  • 9.5.3. Autoregressive (AR) models
  • 9.5.4. Autoregressive moving average (ARMA) models--theory
  • 9.6. Regression Modeling with Moving Average Errors--an Example
  • 9.7. Repeated Measures in Time--Notes on the Methodology
  • 9.7.1. The theory of repeated measures modeling
  • 9.7.2. Correlation structure
  • 9.7.3. Different approaches to repeated measures analysis
  • 9.8. Further Notes on Multi-level Modeling
  • 9.8.1. An historical perspective on multi-level models
  • 9.8.2. Meta-analysis
  • 9.9. Further Reading
  • 9.10. Exercises
  • 10. Tree-based Classification and Regression
  • 10.1. The Uses of Tree-based Methods
  • 10.1.1. Problems for which tree-based regression may be used
  • 10.1.2. Tree-based regression versus parametric approaches
  • 10.1.3. Summary of pluses and minuses
  • 10.2. Detecting Email Spam--an Example
  • 10.2.1. Choosing the number of splits
  • 10.3. Terminology and Methodology
  • 10.3.1. Choosing the split--regression trees
  • 10.3.2. Within and between sums of squares
  • 10.3.3. Choosing the split--classification trees
  • 10.3.4. The mechanics of tree-based regression--a trivial example
  • 10.4. Assessments of Predictive Accuracy
  • 10.4.1. Cross-validation
  • 10.4.2. The training/test set methodology
  • 10.4.3. Predicting the future
  • 10.5. A Strategy for Choosing the Optimal Tree
  • 10.5.1. Cost-complexity pruning
  • 10.5.2. Prediction error versus tree size
  • 10.6. Detecting Email Spam--the Optimal Tree
  • 10.6.1. The one-standard-deviation rule
  • 10.7. Interpretation and Presentation of the rpart Output
  • 10.7.1. Data for female heart attack patients
  • 10.7.2. Printed Information on Each Split
  • 10.8. Additional Notes
  • 10.9. Further Reading
  • 10.10. Exercises
  • 11. Multivariate Data Exploration and Discrimination
  • 11.1. Multivariate Exploratory Data Analysis
  • 11.1.1. Scatterplot matrices
  • 11.1.2. Principal components analysis
  • 11.2. Discriminant Analysis
  • 11.2.1. Example--plant architecture
  • 11.2.2. Classical Fisherian discriminant analysis
  • 11.2.3. Logistic discriminant analysis
  • 11.2.4. An example with more than two groups
  • 11.3. Principal Component Scores in Regression
  • 11.4. Propensity Scores in Regression Comparisons--Labor Training Data
  • 11.5. Further Reading
  • 11.6. Exercises
  • 12. The R System--Additional Topics
  • 12.1. Graphs in R
  • 12.2. Functions--Some Further Details
  • 12.2.1. Common useful functions
  • 12.2.2. User-written R functions
  • 12.2.3. Functions for working with dates
  • 12.3. Data input and output
  • 12.3.1. Input
  • 12.3.2. Data output
  • 12.4. Factors--Additional Comments
  • 12.5. Missing Values
  • 12.6. Lists and Data Frames
  • 12.6.1. Data frames as lists
  • 12.6.2. Reshaping data frames; reshape ()
  • 12.6.3. Joining data frames and vectors--cbind ()
  • 12.6.4. Conversion of tables and arrays into data frames
  • 12.6.5. Merging data frames--merge ()
  • 12.6.6. The function sapply () and related functions
  • 12.6.7. Splitting vectors and data frames into lists--split ()
  • 12.7. Matrices and Arrays
  • 12.7.1. Outer products
  • 12.7.2. Arrays
  • 12.8. Classes and Methods
  • 12.8.1. Printing and summarizing model objects
  • 12.8.2. Extracting information from model objects
  • 12.9. Data-bases and Environments
  • 12.9.1. Workspace management
  • 12.9.2. Function environments, and lazy evaluation
  • 12.10. Manipulation of Language Constructs
  • 12.11. Further Reading
  • 12.12. Exercises
  • Epilogue--Models
  • Appendix. S-PLUS Differences
  • References
  • Index of R Symbols and Functions
  • Index of Terms
  • Index of Names