Lab Home | Phone | Search | ||||||||
|
||||||||
Informatics based discovery can be greatly accelerated by integrating existing theory with big data, particularly when sample points are expensive or difficult to acquire. This is because the predictive capacity of a classifier is quantified by its error rate, and a purely data-driven approach requires a large number of examples to guarantee accurate error estimation. Realizing the need for a general model-based framework to integrate scientific knowledge on the mechanisms governing the behavior of a system with observable data, in this talk I will present a Bayesian approach for optimal and predictive classification and classifier error estimation.
The basis of the method is to construct a prior distribution over an uncertainty class of probabilistic models, effectively constraining the relationship between observations and the decision to be made with a higher weight on models most consistent with available scientific knowledge. Using Bayesian estimation principles, this prior is integrated with observed data to produce a posterior distribution. We then formulate classification and error estimation as optimization problems in this Bayesian framework, leading to (1) optimal classifiers, (2) optimal minimum-mean-square-error (MMSE) estimators of classifier error, and (3) a sample-conditioned mean-square-error (MSE) quantifying the accuracy of error estimation. In essence, this work puts forth a rigorous methodology to find these optimized tools, all taking into account both theoretical and empirical knowledge available. |