Assessing and improving the stability of chemometric models in small sample size situations

Beleites, Claudia; Salzer, Reiner
March 2008
Analytical & Bioanalytical Chemistry;Mar2008, Vol. 390 Issue 5, p1261
Academic Journal
Small sample sizes are very common in multivariate analysis. Sample sizes of 10–100 statistically independent objects (rejects from processes or loading dock analysis, or patients with a rare disease), each with hundreds of data points, cause unstable models with poor predictive quality. Model stability is assessed by comparing models that were built using slightly varying training data. Iterated k-fold cross-validation is used for this purpose. Aggregation stabilizes models. It is possible to assess the quality of the aggregated model without calculating further models. The validation and aggregation methods investigated in this study apply to regression as well as to classification. These techniques are useful for analyzing data with large numbers of variates, e.g., any spectral data like FT-IR, Raman, UV/VIS, fluorescence, AAS, and MS. FT-IR images of tumor tissue were used in this study. Some tissue types occur frequently, while some are very rare. They are classified using LDA. Initial models were severely unstable. Aggregation stabilizes the predictions. The hit rate increased from 67% to 82%.


Related Articles

  • Towards the Disease Biomarker in an Individual Patient Using Statistical Health Monitoring. Engel, Jasper; Blanchet, Lionel; Engelke, Udo F. H.; Wevers, Ron A.; Buydens, Lutgarde M. C. // PLoS ONE;Apr2014, Vol. 9 Issue 4, p1 

    In metabolomics, identification of complex diseases is often based on application of (multivariate) statistical techniques to the data. Commonly, each disease requires its own specific diagnostic model, separating healthy and diseased individuals, which is not very practical in a diagnostic...

  • Comparison of scoring methods for the detection of causal genes with or without rare variants. Scholz, Markus; Kirsten, Holger // BMC Proceedings;2011 Supplement 9, Vol. 5 Issue Suppl 9, p1 

    Rare causal variants are believed to significantly contribute to the genetic basis of common diseases or quantitative traits. Appropriate statistical methods are required to discover the highest possible number of disease-relevant variants in a genome-wide screening study. The publicly available...

  • Chemometric Analysis of Raman Spectra of Lactobacilli Isolated from Kefir. Araujo-Andrade, C.; Mobili, P.; Frausto-Reyes, C.; Gerbino, E.; De Antoni, G.; Ivanov-Tzonchev, R.; Gómez-Zavaglia, A. // AIP Conference Proceedings;8/6/2010, Vol. 1267 Issue 1, p326 

    The article presents a chemometric analysis of Raman spectra of Lactobacilli isolated from kefir grains. It was found that the features in the Raman spectra corresponding to Lactobacillus kefir strains were clearly different from those of non-Lactobacillus kefir strains and the differences...

  • Spectral signatures for the classification of microbial species using Raman spectra. Webb-Robertson, Bobbie-Jo; Bailey, Vanessa; Fansler, Sarah; Wilkins, Michael; Hess, Nancy // Analytical & Bioanalytical Chemistry;Dec2012, Vol. 404 Issue 2, p563 

    In general, classification-based methods based on confocal Raman microscopy are focused on targeted studies under which the spectral libraries are collected under controlled instrument parameters, which facilitate analyses via standard multivariate data analysis methods and cross-validation. We...

  • Aggregation And Excitonic Coupling Of Bio-molecules Studied By Polarized Resonance Raman Scattering And Multivariate Analysis. Hassing, S.; Jernshøj, K. D. // AIP Conference Proceedings;8/6/2010, Vol. 1267 Issue 1, p410 

    The article presents the findings of a research work on aggregation and excitonic coupling of biomolecules studied by polarized resonance Raman scattering and multivariate analysis. It was studied that aggregation processes involving chromophores can be studied in their natural environment by...

  • Design of a Virtual Sensor Data Array for the Analysis of RDX, HMX and DMNB Using Metal-Doped Screen Printed Electrodes and Chemometric Analysis. Jimenez-Perez, Rebeca; Baron, Mark; LeonieElie; Rodriguez, Jose-Gonzalez // International Journal of Electrochemical Science;Mar2013, Vol. 8 Issue 3, p3279 

    The detection of explosive substances is a subject of high importance in several areas including environmental health, de-mining efforts (land and sea) and security and defence against terrorist activity. The use of electrochemical methods for the detection of these substances has increased in...

  • Sensitivity assessment of freshwater macroinvertebrates to pesticides using biological traits. Ippolito, A.; Todeschini, R.; Vighi, M. // Ecotoxicology;Mar2012, Vol. 21 Issue 2, p336 

    Assessing the sensitivity of different species to chemicals is one of the key points in predicting the effects of toxic compounds in the environment. Trait-based predicting methods have proved to be extremely efficient for assessing the sensitivity of macroinvertebrates toward compounds with non...

  • Classification of cacao beans ( Theobroma cacao L.) of southern Mexico based on chemometric analysis with multivariate approach. Vázquez-Ovando, Alfredo; Molina-Freaner, Francisco; Nuñez-Farfán, Juan; Betancur-Ancona, David; Salvador-Figueroa, Miguel // European Food Research & Technology;Jun2015, Vol. 240 Issue 6, p1117 

    The aim of this study was to group samples of cacao collected in Southern of Mexico. For this, several physical bean variables (weight, length, width and bean circumference), chemical (moisture, ash, fat, protein, fatty acids) in addition to polyphenol content and antioxidant capacity (ABTS...

  • Evaluation of association tests for rare variants using simulated data sets in the Genetic Analysis Workshop 17 data. Wenan Chen; Xi Gao; Jiexun Wang; Chuanyu Sun; Wen Wan; Degui Zhi; Nianjun Liu; Xiangning Chen; Guimin Gao // BMC Proceedings;2011 Supplement 9, Vol. 5 Issue Suppl 9, p1 

    We evaluate four association tests for rare variants—the combined multivariate and collapsing (CMC) method, two weighted-sum methods, and a variable threshold method—by applying them to the simulated data sets of unrelated individuals in the Genetic Analysis Workshop 17 (GAW17)...


Read the Article


Sorry, but this item is not currently available from your library.

Try another library?
Sign out of this library

Other Topics