## Grafiti LLC

## SYSTAT 13.2 Statistics

## Statistics

## Descriptive Statistics

## Column

- Arithmetic mean, median, sum and number of cases
- Min, max, range and variance
- Coefficient of variation, std err of mean
- Adjustable confidence intervals of mean
- Skewness, kurtosis, including standard errors
- Shapiro-Wilk normality test
- Anderson-Darling normality test
- Multivariate skewness and kurtosis, testing for significance of these
- Henze-Zirkler test for multivariate normality
- N- & P- Tiles: Cleveland, Weighted average 1, Weighted average 2, Weighted average 3, Closest, Empirical CDF, Empirical CDF (average)
- Trimmed, Geometric, and Harmonic means
- Stem-and-Leaf display
- Bootstrap estimates, bias, standard error and confidence intervals, histograms of estimates
- Resampling – Bootstrap, without replacement, Jackknife

## Row

- Arithmetic mean, median, sum and number of cases
- Min, max, range and variance
- Coefficient of variation, std err of mean
- Adjustable confidence intervals of mean
- Skewness, kurtosis, including standard errors
- Shapiro-Wilk normality test
- Anderson-Darling normality test
- Multivariate skewness and kurtosis, testing for significance of these
- Henze-Zirkler test for multivariate normality
- N- & P- Tiles: Cleveland, Weighted average 1, Weighted average 2, Weighted average 3, Empirical CDF, Empirical CDF (average), Closest
- Trimmed, Geometric, and Harmonic means
- Stem-and-Leaf display
- Resampling – Bootstrap, without replacement, Jackknife
- Bootstrap estimates, bias, standard error and confidence intervals, histograms of estimates

## MANOVA

- Handles wide variety of designs
- Performs repeated measures analysis
- Means model for missing cells designs
- Within-group and between-group testing
- MANCOVA
- AIC, AICc, BIC computation
- Resampling – Bootstrap, without replacement, Jackknife

## General Linear Model

- Any general linear model Y = XB+e
- Any general linear hypothesis ABC’ = D
- Mixed categorical and continuous variables
- Stepwise model building
- AIC, AICc, BIC computation
- Post-hoc tests
- Resampling – Bootstrap, without replacement, Jackknife
- See also linear regression and ANOVA

## Mixed Model Analysis

- Variance components and linear mixed model structures
- Estimates of parameters by:

- Maximum likelihood (ML)
- Restricted maximum likelihood (REML)
- MIVQUE(0) in the case of variance components
- ANOVA in the case of variance components
- Confidence intervals and hypothesis tests based on these estimates

- Structures of covariance matrix of random effects

- Variance components
- Diagonal
- Compound symmetry
- Unstructured

- Structures for error matrix:

- Variance components
- Compound symmetry

- AIC, AICc, BIC computation

## Discriminant Analysis

- Classical Discriminant Analysis (Linear or quadratic)

- Prior probabilities, contrasts
- Output: F statistics, F matrix, eigenvalues, canonical correlations, canonical scores, classification matrix, Wilks’ lambda, Lawley-Hotelling, Pillai and Wilks’ trace, classification tables, including jackknifed, canonical variables, covariance and correlation matrix, posterior probabilities and Mahalanobis distances
- Stepwise modeling: automatic, forward, backward and interactive stepping
- Resampling – Bootstrap, without replacement, Jackknife

- Robust Discriminant Analysis

- Useful when the data sets are suspected to contain outliers
- Linear or quadratic analysis
- Save the robust Mahalanobis distance, weights, and predicted group membership

## Cluster Analysis

- Hierarchical

- Distance measures: Euclidean, percent, gamma, Pearson, R-squared, Minkowski, chi-square, phi-square, absolute, Anderberg, Jaccard, Mahalanobis, RT, Russel, SS
- Additional options to specify the covariance matrix for computing the Mahalanobis distance
- Linkage methods: single, complete, centroid, average, median, Ward, flexible beta, k-neighborhood, uniform, weighted
- Cutting cluster tree based on specified nodes and tree height
- Five indices for cluster validity: RMSTTD, Dunn, Davies-Bouldin, Pseudo F, Pseudo T2
- Quick Graphs: dendrogram, matrix and polar
- Resampling – Bootstrap, without replacement, Jackknife

- K-means and K-medians

- Distance measures: Euclidean, MWSS, gamma, Pearson, R-squared, Minkowski, chi-square, phi-square, absolute, Mahalanobis
- Additional options to specify the covariance matrix for computing the Mahalanobis distance
- Initial seeds can be specified from: None, first, last or random k, random or hierarchical segmentation, principal component, partition variable, from file
- Quick Graphs: parallel coordinate and mean/std deviation profile plots

- Additive trees

- Input: similarity, dissimilarity matrices
- Quick Graph: dendrogram

## Factor Analysis

- Principal components, iterated principal axis, maximum likelihood
- Rotation: varimax, quartimax, equimax, orthomax, oblimin
- Resampling – Bootstrap, without replacement, Jackknife

## Time Series

- Smoothing: LOWESS, moving average, running median, and exponential
- Seasonal adjustment
- Fourier and inverse Fourier transforms
- Box-Jenkins ARIMA model
- Specify autoregressive, difference and moving average parameters
- Forecast and standard errors
- Polynomially distributed lags
- Trend Analysis: Mann-Kendall test for nonseasonal data, and seasonal Kendall and Homogeneity tests with Sen slope estimator
- Quick Graphs: series plot, autocorrelation, partial autocorrelation, cross correlation, periodogram

## Missing Value Analysis

- EM Algorithm
- Regression imputation
- Save estimates, correlation, covariance, SSCP matrices
- Resampling – Bootstrap, without replacement, Jackknife

## Quality Analysis

- Histogram, Pareto Chart, Box-and-Whisker Plot
- Control Charts: Run Chart, Shewhart Control Chart, Average Run Length, Operating Characteristic Curve, Cumulative Sum Chart, Moving Average, Expected Weighted Moving Average, X-MR Chart, Regression Chart, TSQ
- Process Capability Analysis

## Survival Analysis

- Nonparametric: Kaplan-Meier, Nelson-Aalen and actuarial life tables with confidence intervals
- Turnbull KM estimation (EM)
- Cumulative hazards and log cumulative hazards
- Cox regression, parametric models: exponential, accelerated exponential, Weibull, accelerated Weibull, lognormal, log-logistic
- Type I, II and III censoring
- Stratification, time dependent covariates
- Forward, backward, automatic and interactive stepwise regression
- AIC, AICc, BIC computation
- Quick Graphs: survival function, quantile, reliability and hazard plots, Cox-Snell residual plot

## Response Surface Methods

- Fits a second degree polynomial to one or more responses on several factors
- Output: regression coefficients, analysis of variance, tests of significance
- Optimum factor settings using canonical (for each response) or desirability (for all responses jointly) analysis,
- Quick Graphs: Desirability plots
- Contour and surface plots with fixed settings for one or more factors

## Path Analysis (RAMONA)

- Analyze covariance or correlation matrices
- MWL (maximum Wishart likelihood)
- GLS (generalized least-squares)
- OLS (ordinary least-squares)
- ADFG (asymptotically distribution free estimate biased, Gramian)
- ADFU (unbiased)

## Conjoint Analysis

- Monotonic, linear, log and power
- Stress and tau loss functions
- Quick Graph: utility function plot
- Resampling – Bootstrap, without replacement, Jackknife

## Multidimensional Scaling

- Two-way scaling: Kruskal, Guttman, Young
- Three-way scaling: INDSCAL
- Non-metric unfolding
- EM estimation
- Power scaling for ratio data
- Quick Graphs: MDS plot, Shepard diagram

## Perceptual Mapping

- MDPREF
- Preference mapping (vector, circle, ellipse)
- Procrustes and canonical rotations
- Quick Graph: biplots

## Partially Ordered Scalogram Analysis with Coordinates (POSAC)

- Guttman-Shye algorithm; automatic serialization
- Quick Graph: item plot
- Resampling – Bootstrap, without replacement, Jackknife

## Test Item Analysis

- Classical analysis
- One- and two-parameter logistic model
- Quick Graph: item plot

## Signal Detection Analysis

- Models: normal, Chi-square, exponential
- Quick Graph: receiver operating characteristic curve

## Spatial Statistics

- 2D & 3D variogram, Kriging and simulation
- Variogram types: semi, covariance, correlogram, general relative, pairwise relative, semi-log, semimadogram
- Semivariogram models: spherical, exponential, gaussian, power and hole effect
- Kriging types: simple, ordinary, nonstationary and drift
- Quick Graphs: variogram and contour plot
- Resampling – Bootstrap, without replacement, Jackknife

## Classification and Regression Trees

- Loss functions: least-squares, trimmed mean, LAD, phi coefficient, Gini index, twoing
- Quick Graph: unique tree mobile including split statistics and color coded subgroup densities (box, dot, dit, jitter, stripe)
- Resampling – Bootstrap, without replacement, Jackknife

## Monte Carlo (Add-on)

- Mersenne-Twister random number generator
- Multivariate random sampling: multinomial, bivariate exponential, Dirichlet, multivariate normal, and Wishart distributions
- IID Monte Carlo: Two generic algorithms – rejection sampling and adaptive rejection sampling (ARS)
- Markov Chain Monte Carlo (MCMC): Metropolis-Hastings (M-H) and Gibbs sampling algorithms
- Monte Carlo integration

## Quality Analysis (Add-on)

- Gauge R & R studies
- Sigma measurements
- Taguchi’s loss function
- Taguchi’s online control – beta correction, taguchi’s loss/savings

## Probability Calculator

- Computes probability density function, cumulative distribution function, inverse cumulative distribution function, and upper-tail probabilities for 9 univariate discrete and 28 continuous probability distributions
- Quick Graphs: graphs of the probability density function and the cumulative distribution function for continuous distributions

## Design of Experiments

- Choose between Classic and Advanced DOE with dynamic wizard
- Optimal Designs
- Complete and incomplete factorial designs
- Latin square designs, 3-12 levels per factor
- Box and Hunter 2-level incomplete designs
- Taguchi designs
- Plackett and Burman designs
- Mixture: lattice, centroid, axial, and screening
- Response surface designs: Box-Behnken and central composite designs

## Random Sampling

- Mersenne-Twister random number generator
- Random Sampling from a list of 9 univariate discrete and 28 univariate continuous distributions with given parameters

## Power Analysis

- Determine sample size to achieve a specified power
- Determine power for a single sample size or a range of sample sizes
- Proportions, correlations, t-tests, z-tests, ANOVA (one-way and two-way), and generic designs
- Conforms to the Hypothesis tests on means and their various options
- One-sided and two-sided alternatives
- Quick Graph: power curve

## Fitting Distributions

- 9 discrete and 21 continuous univariate distributions with given or estimated parameters
- QuickGraphs: graph of the respective observed and expected frequencies while fitting
- Chi-squared and Kolmogorov-Smirnov goodness-of-fit tests; Shapiro-Wilk normality test for normal, lognormal and logit normal

## ANOVA

- Designs: unbalanced, randomized block, complete block, fractional factorial, mixed model, nested, split plot, Latin square, crossover and change over, Hotelling’s T2
- ANCOVA
- Means model for missing cells designs
- Repeated measures: one-way, two or more factors, three or more factors
- Options to test normality and homoscedasticity assumptions
- Type I , II and III sums of squares
- Automatic outlier and influential point detection
- AIC, AICc, BIC computation
- Multiple comparison tests – Tukey-Kramer HSD, Bonferroni, Fisher’s LSD, Scheffe, Dunnett, Sidak, Tukey’s b, Duncan, R-E-G-W-Q, Hochberg GT2, Gabriel Students-Newman_Keuls, Tamhane T2, Games-Howell, Dunnett’s T3
- Confidence intervals and hypothesis tests for adjacent difference, polynomial of specified order and metric, sum, custom, Helmert, reverse Helmert, deviation and simple contrasts
- Quick Graph: least -squares means
- Resampling – Bootstrap, without replacement, Jackknife

## Crosstabulation and Measures of Association

- One-, two-, and multiway tables
- Row and column frequencies, percents, expected values and deviates
- List layouts, order categories, define intervals, including missing intervals
- 2 x 2 tables: likelihood ratio chi-square, Yates’, Fisher’s exact test, odds ratio, Yule’s Q
- 2 x k tables: Cochran test
- r x r tables: McNemar’s test, Cohen’s kappa
- r x c tables, unordered levels: phi, Cramer’s V, contingency, Goodman-Kruskal’s lambda, and uncertainty coefficients
- r x c ordered levels: Spearman’s rho, Goodman-Kruskal’s gamma, Kendall’s tau-b, Stuart’s tau-c, Somers’ D
- Multiway tables: Mantel-Haenszel test
- Table of counts and percents
- Row-dependent and symmetric statistics
- Cell statistics
- Association measures for two-way tables along with confidence intervals; specified confidence level
- Standardized tables (two-way tables after controlling the effect of a third variable)
- Resampling – Bootstrap, without replacement, Jackknife

## Loglinear Models

- Full maximum likelihood
- Pearson and likelihood ratio chi-square
- Expected values, lambda, SE lambda
- Covariance matrix, correlation matrix
- Deviates, Pearson deviates, Iikelihood deviates, Freeman-Tukey deviates, log-likelihood
- Resampling – Bootstrap, without replacement, Jackknife
- Dialog box with facility to type the desired model directly

## Multinormal Tests

- Shapiro-Wilk (marginal) normality test
- Multivariate skewness and kurtosis, testing for significance of these
- Henze-Zirkler test for multivariate normality
- Save Mahalanobis distances
- Quick Graph: beta Q-Q plot

## Correspondence Analysis

- Simple and multiple – raw data or data in tabular form
- Quick Graphs: vector and casewise plots
- Resampling – Bootstrap, without replacement, Jackknife

## Correlations, Distances and Similarities

- Continuous data: Pearson correlations, covariance, SSCP
- Distance measures: Euclidean, city-block, Bray-Curtis, QSK
- Rank order data: Spearman, gamma, mu2, tau-b, tau-c
- Unordered data: phi, Cramer’s V, contingency, Goodman-Kruskal’s lambda, uncertainty coefficients
- Binomial data: S2, S3, S4, S5, S6, Tetrachoric, Anderberg (S7), Yule’s Q, Hamman, Dice, Sneath, Ochiai, Kulczynski, Gower2
- Missing data: pairwise, listwise deletion, EM
- Hadi outlier detection and estimation
- Probabilities: Bonferroni, Dunn-Sidak
- Quick Graph: scatterplot matrix
- Resampling – Bootstrap, without replacement, Jackknife
- Bootstrap estimates, bias, standard error and confidence intervals, histograms of estimates in the case of Pearson correlations and rank-ordered data

## Hypothesis Testing

- Mean: One-Sample z-test, Two-sample z-test, One-Sample t-test, Two-Sample t-test, Paired t-test, Poisson test with Bonferroni, Dunn-Sidak adjustments
- Variance: Single Variance, Equality of Two Variances, Equality of Several Variances
- Correlation: Zero Correlation, Specific Correlation, Equality of Two Correlations
- Proportion: Single Proportion, Equality of Two Proportions
- Appropriate Quick Graphs
- Resampling – Bootstrap, without replacement, Jackknife

## Nonparametric Tests

- Independent samples: Kruskal-Wallis, two- sample Kolmogorov-Smirnov, Mann-Whitney
- Related variables; sign test, Wilcoxon signed rank test, Friedman test , Quade test
- One-sample: Wald-Wolfowitz runs test
- One-sample: Kolmogorov-Smirnov test providing 9 discrete and 28 continuous univariate distributions, also Lilliefors test
- One-sample: Anderson-Darling test providing 29 continuous univariate distributions
- Resampling – Bootstrap, without replacement, Jackknife

## Set and Canonical Correlation

- Whole, semi and bi-partial set correlations
- Rao F, R-square, shrunk R-square, T-square, shrunk T-square, P-square, shrunk P-square, within, between and inter-set correlations
- Row/Column betas, standard errors, T-statistics and probabilities
- Stewart-Love canonical redundancy index
- Canonical coefficients, loadings and redundancies
- Varimax rotation
- Resampling – Bootstrap, without replacement, Jackknife

## Robust Regression

- Least Absolute Deviation (LAD) regression
- M regression
- Least Median of Squares (LMS) regression
- Least Trimmed Squares (LTS) regression
- Scale (S) regression
- Rank Regression

## Cronbach’s Alpha

- Cronbach’s alpha value for tow or more variables
- Resampling – Bootstrap, without replacement, Jackknife

## Smooth & Plot

- 126 non-parametric smoothers including LOESS
- Windows: fixed width or nearest neighbors
- Kernels: uniform, Epanechnikov, biweight, triweight, tricube, Gaussian, Cauchy
- Method: median, mean, polynomial, robust, trimmed mean
- Save predicted values and residuals
- Resampling – Bootstrap, without replacement, Jackknife

## Linear Regression

- Least-squares

- Crossvalidation, saving residuals and diagnostics, Durbin-Watson statistic
- Multiple linear regression
- Prediction for new observations
- Stepwise regression: automatic, customized and interactive stepping, partial correlations
- AIC, AICc, BIC computation
- Hypothesis testing, mixture models
- Automatic outlier and influential point detection
- Quick Graph: residuals vs. predicted values, fitted model plot in the case of one or two predictors (confidence and prediction intervals in the case of one predictor)
- Resampling – Bootstrap, without replacement, Jackknife
- Bootstrap estimates, bias, standard error and confidence intervals, histograms of estimates

- Bayesian

- Prior distribution: diffuse or (multivariate) normal-gamma distribution
- Bayes estimates and credible intervals for regression coefficients computed
- Parameters of the posterior distribution provided
- Quick Graphs: plots of prior and posterior densities of regression coefficients

- Ridge

- Two types of ridge coefficients: standardized and unstandardized
- Quick Graph: plot of the ridge factor against the ridge coefficients

## Logistic Regression

- Binary, multinomial, discrete choice and conditional
- AIC, AICc, BIC computation
- Robust standard errors, prediction success table, derivatives table
- Classification table with specified cutoff point
- Dummy variables and interactions
- Forward, backward, automatic and interactive stepwise regression
- Deciles of risk, quantiles and simulation
- Hypothesis tests
- Quick Graph: ROC curve for binary logistic regression

## Probit Regression

- Dummy variables and interactions
- AIC, AICc, BIC computation

## Partial Least-Squares Regression

- Useful in situations where the number of variables is large relative to the number of cases or there is likely to be multicollinearity among the predictor variables
- NIPALS and SIMPLS algorithms
- Cross validation

## Two-Stage Least-Squares

- Model with independent and/or instrumental variables, with lags
- Diagnostic tests for heteroskedasticity and nonlinearity
- Polynomially distributed lags
- Hypothesis tests

## Mixed Regression

- Hierarchical Linear Models (HLM)
- Specify effects as fixed or random
- Autocorrelated error structures
- Nested Models (2-Level): Repeated Measures, Clustered Data
- Unbalanced or balanced data
- Quick Graph: scatterplot, histogram or scatterplot matrix of empirical Bayes estimates

## Nonlinear Regression

- Gauss-Newton, Quasi Newton, Simplex
- Output: predicted values, residuals, asymptotic standard errors and correlations, confidence curves and regions
- Special features: Cook-Weisberg confidence intervals, Wald intervals, Marquardting
- Robust estimation: absolute, power, trim, Huber, Hampel, t, bisquare, Ramsay, Andrews, Tukey
- Maximum likelihood estimation
- Piecewise regression, kinetic models, logistic model for quantal response data
- Exact derivatives
- Quick Graph: scatterplot with fitted curve
- Resampling – Bootstrap, without replacement, Jackknife

## Smooth & Plot

- 126 non-parametric smoothers including LOESS
- Windows: fixed width or nearest neighbors
- Kernels: uniform, Epanechnikov, biweight, triweight, tricube, Gaussian, Cauchy
- Method: median, mean, polynomial, robust, trimmed mean
- Save predicted values and residuals
- Resampling – Bootstrap, without replacement, Jackknife