工业工程 ›› 2021, Vol. 24 ›› Issue (4): 93-99.doi: 10.3969/j.issn.1007-7375.2021.04.011

• 专题论述 • 上一篇    下一篇

基于多智能体强化学习的自动化电力仓库货位优化

王铁铮, 胡亚楠, 潘焜, 喻晓   

  1. 国网北京市电力公司 物资分公司,北京 100053
  • 收稿日期:2020-08-10 发布日期:2021-09-02
  • 作者简介:王铁铮(1988-),男,北京市人,高级工程师,硕士,主要研究方向为电力物资管理

Optimizing the Location of Automated Power Warehouse Based on Multi-agent Reinforcement Learning

WANG Tiezheng, HU Ya'nan, PAN Kun, YU Xiao   

  1. Material Supply Branch, Beijing Electric Power Corporation, Beijing 100053, China
  • Received:2020-08-10 Published:2021-09-02

摘要: 自动化仓库的货位优化是提高仓库效率的重要途经之一。本文针对电力仓库货位优化问题,采用基于多智能体强化学习的方法,提升优化效果。首先分析DDPG算法和MADDPG等算法的不足;然后在此基础上提出改进算法ECS-MADDPG及其模型。在该算法中,同时考虑当前时间点的即时奖励和未来奖励因素;最后利用电力物资的历史出入库数据,应用强化学习算法训练货位优化模型。研究表明,与MADDPG、DDPG等算法相比,ECS-MADDPG拥有较高的稳定性和回报值。

关键词: 电力仓库, 货位优化, 多智能体强化学习

Abstract: The optimization of the cargo location of an automated warehouse is vital to improve warehouse efficiency. Aiming at the optimization of power warehouse cargo location, the method based on multi-agent reinforcement learning is adopted to improve the optimization. First, the deficiencies of DDPG algorithm and MADDPG algorithm are analyzed, and on this basis an improved algorithm ECS-MADDPG and its model proposed. In this algorithm, both the immediate reward at the current time point and the future reward factors are considered. Finally, using the historical incoming and outgoing data of electric power materials, the reinforcement learning algorithm is applied to train the cargo location optimization model. Experiments show that ECS-MADDPG has higher stability and rewards compared with algorithms such as MADDPG and DDPG.

Key words: power warehouse, inventory optimization, multi-agent deep reinforcement learning

中图分类号: