Data analysis and graphics using R : an example-based approach /

Saved in:

Bibliographic Details
Author / Creator:	Maindonald, J. H. (John Hilary), 1937-
Imprint:	Cambridge, UK ; New York : Cambridge University Press, 2003.
Description:	xxiii, 362 p. : ill. ; 26 cm.
Language:	English
Series:	Cambridge series in statistical and probabilistic mathematics Cambridge series on statistical and probabilistic mathematics.
Subject:	R (Computer program language) Statistics -- Data processing. Statistics -- Graphic methods -- Data processing. Statistics -- Data processing. Statistics -- Graphic methods -- Data processing.
Format:	Print Book
URL for this record:	http://pi.lib.uchicago.edu/1001/cat/bib/4915270

Hidden Bibliographic Details
Other authors / contributors:	Braun, John, 1963-
ISBN:	0521813360
Notes:	Includes bibliographical references (p. [346]-351) and indexes.

Table of Contents:

Preface
A Chapter by Chapter Summary
1. A Brief Introduction to R
1.1. A Short R Session
1.1.1. R must be installed!
1.1.2. Using the console (or command line) window
1.1.3. Reading data from a file
1.1.4. Entry of data at the command line
1.1.5. Online help
1.1.6. Quitting R
1.2. The Uses of R
1.3. The R Language
1.3.1. R objects
1.3.2. Retaining objects between sessions
1.4. Vectors in R
1.4.1. Concatenation--joining vector objects
1.4.2. Subsets of vectors
1.4.3. Patterned data
1.4.4. Missing values
1.4.5. Factors
1.5. Data Frames
1.5.1. Variable names
1.5.2. Applying a function to the columns of a data frame
1.5.3. Data frames and matrices
1.5.4. Identification of rows that include missing values
1.6. R Packages
1.6.1. Data sets that accompany R packages
1.7. Looping
1.8. R Graphics
1.8.1. The function plot () and allied functions
1.8.2. Identification and location on the figure region
1.8.3. Plotting mathematical symbols
1.8.4. Row by column layouts of plots
1.8.5. Graphs--additional notes
1.9. Additional Points on the Use of R in This Book
1.10. Further Reading
1.11. Exercises
2. Styles of Data Analysis
2.1. Revealing Views of the Data
2.1.1. Views of a single sample
2.1.2. Patterns in grouped data
2.1.3. Patterns in bivariate data--the scatterplot
2.1.4. Multiple variables and times
2.1.5. Lattice (trellis style) graphics
2.1.6. What to look for in plots
2.2. Data Summary
2.2.1. Mean and median
2.2.2. Standard deviation and inter-quartile range
2.2.3. Correlation
2.3. Statistical Analysis Strategies
2.3.1. Helpful and unhelpful questions
2.3.2. Planning the formal analysis
2.3.3. Changes to the intended plan of analysis
2.4. Recap
2.5. Further Reading
2.6. Exercises
3. Statistical Models
3.1. Regularities
3.1.1. Mathematical models
3.1.2. Models that include a random component
3.1.3. Smooth and rough
3.1.4. The construction and use of models
3.1.5. Model formulae
3.2. Distributions: Models for the Random Component
3.2.1. Discrete distributions
3.2.2. Continuous distributions
3.3. The Uses of Random Numbers
3.3.1. Simulation
3.3.2. Sampling from populations
3.4. Model Assumptions
3.4.1. Random sampling assumptions--independence
3.4.2. Checks for normality
3.4.3. Checking other model assumptions
3.4.4. Are non-parametric methods the answer?
3.4.5. Why models matter--adding across contingency tables
3.5. Recap
3.6. Further Reading
3.7. Exercises
4. An Introduction to Formal Inference
4.1. Standard Errors
4.1.1. Population parameters and sample statistics
4.1.2. Assessing accuracy--the standard error
4.1.3. Standard errors for differences of means
4.1.4. The standard error of the median
4.1.5. Resampling to estimate standard errors: bootstrapping
4.2. Calculations Involving Standard Errors: the t-Distribution
4.3. Confidence Intervals and Hypothesis Tests
4.3.1. One- and two-sample intervals and tests for means
4.3.2. Confidence intervals and tests for proportions
4.3.3. Confidence intervals for the correlation
4.4. Contingency Tables
4.4.1. Rare and endangered plant species
4.4.2. Additional notes
4.5. One-Way Unstructured Comparisons
4.5.1. Displaying means for the one-way layout
4.5.2. Multiple comparisons
4.5.3. Data with a two-way structure
4.5.4. Presentation issues
4.6. Response Curves
4.7. Data with a Nested Variation Structure
4.7.1. Degrees of freedom considerations
4.7.2. General multi-way analysis of variance designs
4.8. Resampling Methods for Tests and Confidence Intervals
4.8.1. The one-sample permutation test
4.8.2. The two-sample permutation test
4.8.3. Bootstrap estimates of confidence intervals
4.9. Further Comments on Formal Inference
4.9.1. Confidence intervals versus hypothesis tests
4.9.2. If there is strong prior information, use it!
4.10. Recap
4.11. Further Reading
4.12. Exercises
5. Regression with a Single Predictor
5.1. Fitting a Line to Data
5.1.1. Lawn roller example
5.1.2. Calculating fitted values and residuals
5.1.3. Residual plots
5.1.4. The analysis of variance table
5.2. Outliers, Influence and Robust Regression
5.3. Standard Errors and Confidence Intervals
5.3.1. Confidence intervals and tests for the slope
5.3.2. SEs and confidence intervals for predicted values
5.3.3. Implications for design
5.4. Regression versus Qualitative ANOVA Comparisons
5.5. Assessing Predictive Accuracy
5.5.1. Training/test sets, and cross-validation
5.5.2. Cross-validation--an example
5.5.3. Bootstrapping
5.6. A Note on Power Transformations
5.7. Size and Shape Data
5.7.1. Allometric growth
5.7.2. There are two regression lines!
5.8. The Model Matrix in Regression
5.9. Recap
5.10. Methodological References
5.11. Exercises
6. Multiple Linear Regression
6.1. Basic Ideas: Book Weight and Brain Weight Examples
6.1.1. Omission of the intercept term
6.1.2. Diagnostic plots
6.1.3. Further investigation of influential points
6.1.4. Example: brain weight
6.2. Multiple Regression Assumptions and Diagnostics
6.2.1. Influential outliers and Cook's distance
6.2.2. Component plus residual plots
6.2.3. Further types of diagnostic plot
6.2.4. Robust and resistant methods
6.3. A Strategy for Fitting Multiple Regression Models
6.3.1. Preliminaries
6.3.2. Model fitting
6.3.3. An example--the Scottish hill race data
6.4. Measures for the Comparison of Regression Models
6.4.1. R[superscript 2] and adjusted R[superscript 2]
6.4.2. AIC and related statistics
6.4.3. How accurately does the equation predict?
6.4.4. An external assessment of predictive accuracy
6.5. Interpreting Regression Coefficients--the Labor Training Data
6.6. Problems with Many Explanatory Variables
6.6.1. Variable selection issues
6.6.2. Principal components summaries
6.7. Multicollinearity
6.7.1. A contrived example
6.7.2. The variance inflation factor (VIF)
6.7.3. Remedying multicollinearity
6.8. Multiple Regression Models--Additional Points
6.8.1. Confusion between explanatory and dependent variables
6.8.2. Missing explanatory variables
6.8.3. The use of transformations
6.8.4. Non-linear methods--an alternative to transformation?
6.9. Further Reading
6.10. Exercises
7. Exploiting the Linear Model Framework
7.1. Levels of a Factor--Using Indicator Variables
7.1.1. Example--sugar weight
7.1.2. Different choices for the model matrix when there are factors
7.2. Polynomial Regression
7.2.1. Issues in the choice of model
7.3. Fitting Multiple Lines
7.4. Methods for Passing Smooth Curves through Data
7.4.1. Scatterplot smoothing--regression splines
7.4.2. Other smoothing methods
7.4.3. Generalized additive models
7.5. Smoothing Terms in Multiple Linear Models
7.6. Further Reading
7.7. Exercises
8. Logistic Regression and Other Generalized Linear Models
8.1. Generalized Linear Models
8.1.1. Transformation of the expected value on the left
8.1.2. Noise terms need not be normal
8.1.3. Log odds in contingency tables
8.1.4. Logistic regression with a continuous explanatory variable
8.2. Logistic Multiple Regression
8.2.1. A plot of contributions of explanatory variables
8.2.2. Cross-validation estimates of predictive accuracy
8.3. Logistic Models for Categorical Data--an Example
8.4. Poisson and Quasi-Poisson Regression
8.4.1. Data on aberrant crypt foci
8.4.2. Moth habitat example
8.4.3. Residuals, and estimating the dispersion
8.5. Ordinal Regression Models
8.5.1. Exploratory analysis
8.5.2. Proportional odds logistic regression
8.6. Other Related Models
8.6.1. Loglinear models
8.6.2. Survival analysis
8.7. Transformations for Count Data
8.8. Further Reading
8.9. Exercises
9. Multi-level Models, Time Series and Repeated Measures
9.1. Introduction
9.2. Example--Survey Data, with Clustering
9.2.1. Alternative models
9.2.2. Instructive, though faulty, analyses
9.2.3. Predictive accuracy
9.3. A Multi-level Experimental Design
9.3.1. The ANOVA table
9.3.2. Expected values of mean squares
9.3.3. The sums of squares breakdown
9.3.4. The variance components
9.3.5. The mixed model analysis
9.3.6. Predictive accuracy
9.3.7. Different sources of variance--complication or focus of interest?
9.4. Within and between Subject Effects--an Example
9.5. Time Series--Some Basic Ideas
9.5.1. Preliminary graphical explorations
9.5.2. The autocorrelation function
9.5.3. Autoregressive (AR) models
9.5.4. Autoregressive moving average (ARMA) models--theory
9.6. Regression Modeling with Moving Average Errors--an Example
9.7. Repeated Measures in Time--Notes on the Methodology
9.7.1. The theory of repeated measures modeling
9.7.2. Correlation structure
9.7.3. Different approaches to repeated measures analysis
9.8. Further Notes on Multi-level Modeling
9.8.1. An historical perspective on multi-level models
9.8.2. Meta-analysis
9.9. Further Reading
9.10. Exercises
10. Tree-based Classification and Regression
10.1. The Uses of Tree-based Methods
10.1.1. Problems for which tree-based regression may be used
10.1.2. Tree-based regression versus parametric approaches
10.1.3. Summary of pluses and minuses
10.2. Detecting Email Spam--an Example
10.2.1. Choosing the number of splits
10.3. Terminology and Methodology
10.3.1. Choosing the split--regression trees
10.3.2. Within and between sums of squares
10.3.3. Choosing the split--classification trees
10.3.4. The mechanics of tree-based regression--a trivial example
10.4. Assessments of Predictive Accuracy
10.4.1. Cross-validation
10.4.2. The training/test set methodology
10.4.3. Predicting the future
10.5. A Strategy for Choosing the Optimal Tree
10.5.1. Cost-complexity pruning
10.5.2. Prediction error versus tree size
10.6. Detecting Email Spam--the Optimal Tree
10.6.1. The one-standard-deviation rule
10.7. Interpretation and Presentation of the rpart Output
10.7.1. Data for female heart attack patients
10.7.2. Printed Information on Each Split
10.8. Additional Notes
10.9. Further Reading
10.10. Exercises
11. Multivariate Data Exploration and Discrimination
11.1. Multivariate Exploratory Data Analysis
11.1.1. Scatterplot matrices
11.1.2. Principal components analysis
11.2. Discriminant Analysis
11.2.1. Example--plant architecture
11.2.2. Classical Fisherian discriminant analysis
11.2.3. Logistic discriminant analysis
11.2.4. An example with more than two groups
11.3. Principal Component Scores in Regression
11.4. Propensity Scores in Regression Comparisons--Labor Training Data
11.5. Further Reading
11.6. Exercises
12. The R System--Additional Topics
12.1. Graphs in R
12.2. Functions--Some Further Details
12.2.1. Common useful functions
12.2.2. User-written R functions
12.2.3. Functions for working with dates
12.3. Data input and output
12.3.1. Input
12.3.2. Data output
12.4. Factors--Additional Comments
12.5. Missing Values
12.6. Lists and Data Frames
12.6.1. Data frames as lists
12.6.2. Reshaping data frames; reshape ()
12.6.3. Joining data frames and vectors--cbind ()
12.6.4. Conversion of tables and arrays into data frames
12.6.5. Merging data frames--merge ()
12.6.6. The function sapply () and related functions
12.6.7. Splitting vectors and data frames into lists--split ()
12.7. Matrices and Arrays
12.7.1. Outer products
12.7.2. Arrays
12.8. Classes and Methods
12.8.1. Printing and summarizing model objects
12.8.2. Extracting information from model objects
12.9. Data-bases and Environments
12.9.1. Workspace management
12.9.2. Function environments, and lazy evaluation
12.10. Manipulation of Language Constructs
12.11. Further Reading
12.12. Exercises
Epilogue--Models
Appendix. S-PLUS Differences
References
Index of R Symbols and Functions
Index of Terms
Index of Names

Data analysis and graphics using R : an example-based approach /

Similar Items