基于深度强化学习算法的电动汽车动态充放电调度

杨静如; 马悦; 杨子文; 周利平; 耿娜

doi:10.3969/j.issn.1007-7375.250130

基于深度强化学习算法的电动汽车动态充放电调度

Dynamic Charging and Discharging Scheduling of Electric Vehicles Based on Deep Reinforcement Learning Algorithm

摘要

摘要: 随着我国能源结构转型升级加快，新能源汽车得到广泛的普及与应用。截至2024年底，全国新能源汽车保有量已突破三千万辆。然而，电动汽车(electric vehicle, EV)充电负荷呈现“双峰”特性，将进一步增大电力系统在源荷两侧的不确定性，对电网的稳定运行带来新的挑战。车网互动(vehicle-to-grid, V2G)技术将EV作为移动储能设备，通过双向充放电为电网提供关键灵活性资源，成为调节电网供需平衡的重要载体。其中，聚合商是连接EV与电网的枢纽，其在对不同的EV进行充放电调度时，面临着EV用户到达时间和需求量高度随机、基础电价峰谷分时变化等难点，如何在满足用户需求的同时最大化聚合商收益，是V2G模式可持续运营的重要问题。为了解决该问题，本文提出一种基于近端策略优化(proximal policy optimization, PPO)深度强化学习方法的EV充放电调度算法，针对动态的调度环境，设计了基于事件驱动的强化学习算法框架，并基于演员-评论家框架，针对性地设计了策略网络结构。数值实验结果表明，本文提出的调度优化算法可以有效求解不同规模的充放电调度问题，并在不同的场景中均优于几种基准算法，提升了聚合商的收益。

Abstract: With the rapid transformation and upgrade of the energy structure in China, new energy vehicles have been widely popularized and applied. By the end of 2024, the number of new energy vehicles in China had exceeded 30 million. However, the charging load of electric vehicle (EV) exhibits a double-peak characteristic, which will further increase uncertainty on both the generation and consumption sides of the power system, posing new challenges to the stable operation of the power grid. Vehicle-to-Grid (V2G) technology utilizes EVs as mobile energy storage units, providing critical flexibility resources to the power grid through bidirectional charging and discharging, thereby serving as an important enabler for balancing power supply and demand. As a hub connecting EVs and the power grid, aggregators face significant challenges when scheduling EV charging and discharging: the high randomness of EV arrival times and energy demands, and the time-of-use fluctuations in base electricity prices. Maximizing aggregator revenue while meeting user needs is a critical issue for the sustainable operation of the V2G model. To address this challenge, this paper proposes an EV charging and discharging scheduling algorithm based on the proximal policy optimization (PPO) algorithm. An event-driven reinforcement learning framework is designed, with a tailored policy network architecture developed using the actor-critic framework. Numerical experiment results demonstrate that the proposed scheduling optimization algorithm can effectively solve charging and discharging scheduling problems for different charging station scales and achieves better performance in comparisons with benchmark algorithms across various scenarios, thereby enhancing aggregators’ revenue.

HTML全文

参考文献(17)

施引文献

资源附件(0)