Lab Home | Phone | Search | ||||||||
|
||||||||
The accumulation of high-throughput genomic data has spawned a host of proposed gene-expression classifiers to discriminate between phenotypes, in particular, different types, stages, and prognoses for disease. Classical pattern recognition typically involved features possessing contextual meaning, such as geometric features in machine vision and character recognition, and samples that were large in comparison to the number of features. On the other hand, genomic features have generally not depended on biological understanding and the number of features has been extraordinarily large in comparison to sample size. This situation obviously calls for the development of the relevant small-sample theory; however, there has been little effort to understand and address the epistemological issues created by the reversal of the classical paradigm. The consequence is a large number of published papers demonstrably lacking scientific validity and no rigorous scientific road ahead to realize the potential of molecular-based diagnosis and prognosis. This talk discusses the issue of validity in classification, reviews the extensive epistemological failings over the last decade, and proposes an epistemologically sound path ahead based on extending the methods of classical mathematical statistics into the current high-throughput environment. Host: Garrett Kenyon, gkenyon@lanl.gov |