工业工程 ›› 2024, Vol. 27 ›› Issue (1): 78-85,103.doi: 10.3969/j.issn.1007-7375.230101

• 系统建模与优化 • 上一篇    下一篇

基于深度强化学习的柔性作业车间节能调度研究

张中伟, 李艺, 高增恩, 武照云   

  1. 河南工业大学 机电工程学院 河南省超硬磨料磨削装备重点实验室,河南 郑州 450001
  • 收稿日期:2023-05-18 发布日期:2024-03-05
  • 通讯作者: 武照云 (1981—),男,辽宁省人,教授,博士,主要研究方向为制造业信息化、智能制造。Email: wuzhaoyun@haut.edu.cn E-mail:wuzhaoyun@haut.edu.cn
  • 作者简介:张中伟 (1987—),男,河南省人,副教授,博士,主要研究方向为低碳制造、智能制造
  • 基金资助:
    国家自然科学基金资助项目 (U1704156);河南省科技攻关计划资助项目 (212102210357);河南省高等学校重点科研资助项目 (23A460003);河南省超硬磨料磨削装备重点实验室开放课题资助项目 (JDKFJJ2022012)

Energy-efficient Flexible Job-shop Scheduling Based on Deep Reinforcement Learning

ZHANG Zhongwei, LI Yi, GAO Zengen, WU Zhaoyun   

  1. Henan Key Laboratory of Superhard Abrasives and Grinding Equipment, School of Mechanical & Electrical Engineering, Henan University of Technology, Zhengzhou 450001, China
  • Received:2023-05-18 Published:2024-03-05

摘要: 针对当前柔性作业车间节能调度研究无法充分利用历史生产数据,且对复杂、动态、多变的车间生产环境适应性不足的问题,引入深度强化学习思想,利用具有代表性的深度Q网络(deep Q-network, DQN)求解柔性作业车间节能调度问题。将柔性作业车间节能调度问题转化为强化学习对应的马尔科夫决策过程。进而,提炼表征车间生产状态特征的状态值作为神经网络输入,通过神经网络拟合状态值函数,输出复合调度动作规则实现对工件以及加工机器的选择,并利用动作规则与奖励函数协同优化能耗。在3个不同规模的案例上与非支配排序遗传算法、超启发式遗传算法、改进狼群算法等典型智能优化方法进行求解效果对比。结果表明,DQN算法有较强的搜索能力,且最优解分布情况与提出的柔性作业车间节能调度模型聚焦能耗目标相一致,从而验证了所用DQN方法的有效性。

关键词: 柔性作业车间节能调度, 深度强化学习, 深度Q网络, 马尔科夫决策

Abstract: The current research on energy-efficient flexible job-shop scheduling problems (EFJSPs) cannot make full use of historical production data, and is insufficiently adaptable to the complex, dynamic and changeable job-shop production environment. In view of this, deep reinforcement learning is introduced to solve EFJSPs, where a representative method named deep Q-network (DQN) is utilized. First, EFJSP is transformed into a Markov decision process corresponding to reinforcement learning. Moreover, the state values characterizing the job-shop production states are extracted as inputs of a neural network. By fitting the state value function through the neural network, compound scheduling action rules are output to achieve the selection of workpieces and processing machines. Furthermore, scheduling action rules and reward functions are utilized to jointly optimize the total production energy consumption. Finally, solutions of the proposed method are compared with those using typical intelligent optimization algorithms, such as non-dominated sorting genetic algorithm, hyper-heuristic genetic algorithm and multi-objective wolf pack algorithm, in three cases with different scales. Results demonstrate the powerful search capability of DQN algorithm, and the distribution of optimal solutions is consistent with the optimization objective obtained by the proposed EJFSP model. These verify the effectiveness of the utilized DQN method.

Key words: energy-efficient flexible job-shop scheduling, deep reinforcement learning, deep Q-network, Markov decision

中图分类号: