基于深度多智能体强化学习的机床混流装配线调度优化

    Scheduling Optimization for Mixed-flow Assembly Lines of Machine Tools Based on Deep Multi-agent Reinforcement Learning

    • 摘要: 为保证机床混流装配车间生产的机床准时交付,提出一种基于改进的深度多智能体强化学习的机床混流装配线调度优化方法,以解决最小延迟生产调度优化模型求解质量低、训练速度缓慢问题,构建以最小延迟时间目标的混流装配线调度优化模型,应用去中心化分散执行的双重深度Q网络(double deep Q network,DDQN)的智能体来学习生产信息与调度目标的关系。该框架采用集中训练与分散执行的策略,并使用参数共享技术,能处理多智能体强化学习中的非稳态问题。在此基础上,采用递归神经网络来管理可变长度的状态和行动表示,使智能体具有处理任意规模问题的能力。同时引入全局/局部奖励函数,以解决训练过程中的奖励稀疏问题。通过消融实验,确定了最优的参数组合。数值实验结果表明,与标准测试方案相比,本算法在目标达成度方面,平均总延迟工件数较改善前提升了24.1% ~ 32.3%,训练速度提高了8.3%。

       

      Abstract: In order to ensure the on-time delivery of machine tools in mixed-flow assembly shops, a scheduling optimization method based on improved deep multi-agent reinforcement learning is proposed, aiming to to address the low solution quality and slow training speed in minimizing production delays. A scheduling optimization model for mixed-flow assembly lines is constructed with the objective of minimizing delay time, where double deep Q network (DDQN) agents with decentralized execution are applied to learn the relationship between production information and scheduling objectives. The framework adopts the strategies of centralized training and decentralized execution, utilizing parameter sharing to deal with the non-stationary problem in multi-agent reinforcement learning. On this basis, a recurrent neural network is used to manage variable-length state and action representations, enabling agents to handle problems of arbitrary scale. A global/local reward function is also introduced to solve the reward sparsity problem in the training process. The optimal parameter combinations are identified through ablation experiments. Numerical experimental results show that, compared with the standard benchmarks, the proposed algorithm improves the average total delay of workpieces by 24.1% to 32.3% compared to before the improvement, and the training speed increased by 8.3% in terms of the achievement of the objective.

       

    /

    返回文章
    返回