Lab Home | Phone | Search | ||||||||
|
||||||||
HMAX/Neocognitron models of visual cortex use learned hierarchical (sparse) representations to describe visual scenes. These models have reported state-of-the-art accuracy on whole-image labeling tasks using natural still imagery (Serre, et al.,[4]). Generalizations of these models (e.g., Brumby, et al., AIPR 2009) allow localized detection of objects within a scene. Itti and Koch [21] have proposed non-task specific models of visual attention ("saliency maps"), which have been compared to human and animal data using eye-tracking systems. Chikkerur, et al., [25] have reported using eye-tracking to compare visual fixations on objects in detection tasks within still images (finding pedestrians and vehicles in urban scenes), compared to an extension of an HMAX model that adds a model of attention in parietal cortex. Here, we describe new work comparing human eye-tracking data for object detection in natural video sequences to task-specific saliency maps generated by a sparse, hierarchical model of the ventral pathway of visual cortex called PANN (Petascale Artificial Neural Network), our high-performance implementation of an HMAX/Neocognitron type model. We explore specific object detection tasks including vehicle detection in aerial video from a low-flying aircraft, for which we collect eye-tracking data from several human subjects. We train our model using hand-marked training data on a few frames, and compare our results to eye-tracking data over an independent set of test video sequences. We also compare our task-specific saliency maps to non-task specific saliency maps (Itti et al. PAMI 1998 [22]; Harel et al. NIPS 2006 [23]). Host: Peter Loxley, loxley@lanl.gov |