Robust PCA in high dimensions with applications to PCR and PLS
Mia Hubert
Department of Mathematics
Katholieke Universiteit Leuven
Celestijnenlaan 200B, B-3001 Heverlee
tel.: +32 16 32 70 48; fax: +32 16 32 79 98
Email: mia.hubert@wis.kuleuven.ac.be
In chemometrics, multivariate calibration is often performed using Principal Component Regression (PCR) and Partial Least Squares (PLS), because these regression methods can cope with data sets, which have more variables than observations. They are, however, very sensitive to outliers, mainly because they are based on the empirical covariance matrix of the data.
In this talk I will first present a robust covariance estimator for high-dimensional data. It combines projection pursuit ideas with robust covariance estimation in low dimensions. Then I will explain how we obtain a robust PCR and a robust PLS method when we combine this estimator with a robust multivariate regression method. Simulation results pinpoint the good behaviour of these methods at data sets, which are contaminated with different types of outliers. Monte-Carlo simulations are also performed at uncontaminated data sets to compare the finite-sample efficiencies of the robust methods with those of the classical estimators. A robust R-squared measure is proposed to select the number of latent variables. Finally several two- and three-dimensional diagnostic plots are introduced which are very helpful to visualize the observations and to classify them into regular observations and different types of outliers. Striking differences between the classical and the robust analysis are illustrated on a real data set.