Comparison of variable selection methods: Prediction of
chromatographic retention indices of saturated alcohols
Orsolya Farkas, Károly Héberger
Institute of Chemistry, Chemical Research
Center, Hungarian Academy of Sciences H-1525 Budapest, P. O. Box 17, Hungary
Fax No: +36 1 325 75 54; Phone: +36 1
438 04 90
E-mail: ofarkas@chemres.hu
Quantitative Structure – (chromatographic) Retention Relationship has
been searched between Kováts retention indices and various descriptors
characterizing the molecular structure. The following alcohols were chosen as
model compounds: linear and branched alcohols with their functional groups on a
primary, secondary, tertiary or quaternary carbon atom. Our principal aim was
to understand better the retention mechanism and to predict the retention
indices using descriptors pertaining to molecular structure.
Constitutional and WHIM descriptors were calculated using the Dragon program package [1]. The number of descriptors was reduced by various variable selection methods, such as principal component analysis, ridge regression, pair correlation method [2], and applying the correlation matrix of the descriptors. The stability and validity of the models were tested by cross-validation technique and the plots of the predicted versus observed data of the prediction sets were visually evaluated. The statistical analysis shows that the pair correlation method is a useful technique to select proper variables and some of the WHIM descriptors are suitable to characterize the retention properties of alcohol compounds.
[1] R. Todeschini, V. Consonni and M. Pavan, Dragon Software Version 2.1
(2002).
[2] K. Héberger and R. Rajkó, Generalization of Pair-Correlation Method
(PCM) for nonparametric variable selection J.
Chemometrics 16, (2002) 436-443.