Lab Home | Phone | Search | ||||||||
|
||||||||
Most supervised machine learning algorithms assume that each training data point is paired with an accurate training label (for classification) or value (for regression). However, obtaining accurate training label information is often time consuming, expensive, and/or infeasible for large data sets. Furthermore, human annotators may be inconsistent when labeling a data set, thus providing inherently imprecise label information. Given this, in many applications, one has access only to inaccurately labeled training data. For example, consider the case of single-pixel or sub-pixel target detection within remotely sensed imagery, often only GPS coordinates for targets of interest are available with an accuracy ranging across several pixels. Thus, the specific pixels that correspond to target is unknown (even with the GPS ground-truth information). Training an accurate classifier or learning a representative target signature from this sort of uncertain labeled training data is extremely difficult in practice. In this example, accurately labeled training is unavailable and an approach, such as Multiple Instance Learning (MIL) methods, that can learn from uncertain training labels is required. The challenge of needing to learn from weakly labeled data or uncertain training labels plagues many applications. Host: James Theiler |