Lab Home | Phone | Search | ||||||||
|
||||||||
We consider learning permutations in the on-line setting. In this setting the algorithm's goal is to have small regret, i.e. the cumulative loss of the algorithm on any sequence of trials should not be too much greater than the loss of the best permutation in hindsight. We present a new algorithm that maintains a doubly stochastic weight matrix representing its uncertainty about the best permutation and makes predictions by decomposing this weight matrix into a convex combination of permutations. After receiving feedback information, the algorithm updates its weight matrix for the next trial. A new insight allows us to prove an optimal (up to small constant factors) bound on the regret our algorithm despite the fact that there is no closed form for the re-balanced weight matrix. This regret bound is significantly better than the bounds on either Kalai and Vempala's more computationally efficient "Follow the Perturbed Leader" algorithm or the straightforward but expensive method that explicitly tracks each permutation. Host: James Theiler |