Lab Home | Phone | Search | ||||||||
|
||||||||
Decision making processes for power networks can be roughly categorized per four time-scales: real-time, short-term, mid-term and long-term. Each of these categories of problems encompasses its own set of visible information and considerations. Examples for such are day-ahead generator scheduling, months-ahead asset management, and years-ahead system development. I will open this talk with our formulation of the mid-term asset management task, crafted with feedback from industry experts. It is a chance-constrained stochastic pro-gram, which proves highly challenging due to high-dimensional decision variables in the case of large networks; non-convex and even non-analytic mathematical forms; interdependent sub-problems encapsulating decision-making processes of shorter time-scales; and high uncertainty due to renewable energy sources. I will then present our methodology for solving the resulting hierarchical formulation, which relies on distributed simulation-based optimization using efficient scenario generation. To tackle the computational burden of hierarchically simulating lower-level decision making, I introduce a novel concept – using machine learning for designing ’proxies’ that quickly approximate outcomes of short-term decisions. These proxies are thus trained to predict, e.g., unit commitment and ACOPF solutions. An additional, natural approach for solving such dynamic control problems is reinforce-ment learning (RL). To facilitate joint decision making among several stakeholders, I propose a new hierarchical RL model to be served as a proxy for real-time power grid reliability. Our matching algorithm alternates between slow time-scale policy improvement and fast time-scale value function evaluation. Lastly, pursuing the goal of enabling analysis of such hierarchical RL algorithms, I will present our recent convergence rate results for several well-known and commonly used RL algorithms; these are obtained using tailor-made stochastic approximation tools. Our results include: i) the ï¬rst reported concentration bound for an unaltered online Temporal Difference (TD) algorithm with function approximation; and ii) the ï¬rst reported concentration bound for two-timescale stochastic approximation algorithms. In particular, we apply the latter result to the “gradient TD†family of two-timescale RL algorithms. Host: Michael Chertkov |