工业工程 ›› 2024, Vol. 27 ›› Issue (4): 132-140,149.doi: 10.3969/j.issn.1007-7375.230095

• 系统建模与优化 • 上一篇    

考虑未来运营收益的自动驾驶出租车充放电协同路径规划

曾伟良1, 韩宇1, 傅惠2   

  1. 1. 广东工业大学;
    2. 机电工程学院,广东 广州 510006
  • 收稿日期:2023-05-11 发布日期:2024-09-07
  • 作者简介:曾伟良 (1986—),广东省人,副教授,博士,主要研究方向为智能交通。Email:weiliangzeng49@163.com
  • 基金资助:
    国家自然科学基金资助项目 (62273102);广东省基础与应用基础研究基金资助项目 (2024A1515010629)

Charging and Discharging Coordinated Routing for Autonomous Electric Taxis Considering Future Operating Value

ZENG Weiliang1, HAN Yu1, FU Hui2   

  1. 1. School of Automation;
    2. School of Electromechanical Engineering, Guangdong University of Technology, Guangzhou 510006, China
  • Received:2023-05-11 Published:2024-09-07

摘要: 现有的出租车调度模型通常只优化实时成本而忽视当前路径规划对未来运营收益的影响,这不利于自动驾驶环境下的连续调度。为此,本文提出一个专注于长期收益的路径规划模型,并利用强化学习将预估的未来运营收益整合到实时调度问题中。模型的具体求解方法是先利用神经网络来拟合车辆的不同时空状态的状态价值函数,再通过双神经网络和经验池的方式加快算法收敛。深圳路网仿真实验表明,所提出的调度模型能够预先精准地调度车队,服务更多乘客,获得更大的运营收益;并且模型能够利用分时电价的峰谷特征和电动汽车入网 (vehicle to grid, V2G) 技术进行充放电,从而降低车队的能耗成本。相较于其他调度模型,该模型在长期运营中实现乘客匹配服务率增加4%,总收益提高25%,能耗成本节省50%以及乘客等待时间降低20%。

关键词: 未来运营收益, 强化学习, 分时电价, 电动汽车入网技术, 状态价值函数

Abstract: Existing taxi scheduling models typically focus on the optimization of real-time cost while the potential impact of currently planned routes on future operating value is ignored, which is detrimental to continuous scheduling in autonomous driving environment. To this end, this paper proposes a route planning model focusing on long-term benefit, in which the estimated future operating value is incorporated into the real-time scheduling problem by reinforcement learning. Specifically, the model is solved by a neural network first to fit the state-value function for different temporal and spatial states of vehicles, after which a double neural network and the experience replay are used to accelerate the convergence of the algorithm. Through the simulation experiments on the road network of Shenzhen, it demonstrates that our model enables to accurately schedule the fleet in advance, serving more passengers and achieving greater operational profit. Additionally, the cost of fleet energy consumption can be reduced since the model can utilized the peak and off-peak characteristics of time-of-use electricity pricing and vehicle-to-grid (V2G) technology for charging and discharging. Compared to other scheduling models, the proposed model enables to increase the passenger response rate by 4% and the total profit by 25% in long-term operation, which also reduces energy consumption by 50% and passenger waiting time by 20%.

Key words: future operating value, reinforcement learning, time-of-use electricity pricing, vehicle-to-grid technology, state-value function

中图分类号: