基于近端策略优化算法的带批处理机的混合流水车间在线调度方法

    Online Scheduling Method for Hybrid Flow Shop with Batch Machines Based on Proximal Policy Optimization Algorithm

    • 摘要: 批处理机实现了连续的重叠操作,这对缩短生产周期、减少不必要的等待时间、提高生产能力具有重要意义。然而面对车间动态事件时,批处理机的工件类型加工选择会导致各工件完成时间产生不可避免的变化。因此,根据实时车间生产加工特征,自适应为批处理机选择合适的工件加工类型,以达到全部工件的拖期成本最小化是研究重点。本文研究一个带批处理机的混合流水车间调度问题,将其建模为马尔科夫决策过程,设计了结合工件加工信息和车间资源信息的工件资源多重实时特征,制定了工件选择规则和批处理机批处理选择规则。智能体根据决策点的实时特征,通过复合调度规则决定机器的加工工件及批处理的工件类型,构造了以工件拖期成本为基准的智能体奖励回报函数,通过近端策略优化算法对智能体的网络进行训练。在大量不同生产配置的实例上进行了数值实验。结果证实了所提算法与启发式方法相比的优越性和通用性。

       

      Abstract: Batch processing machines enable continuous overlapping operations, which is important for shortening production cycle times, reducing unnecessary waiting times, and increasing productivity. However, when faced with dynamic shop-floor events, the batch processor's workpiece type machining selection leads to unavoidable variations in the completion time of each workpiece. Therefore, the focus of our research is to adaptively select the appropriate workpiece processing type for the batch processor based on the real-time shop floor production and machining characteristics in order to minimize the delay cost of all workpieces. A hybrid flow shop scheduling problem with batch processing machines is studied and modeled as a Markov Decision Process. Multiple real-time features of job-resource integration, which combine job processing information with workshop resource information, are designed. Furthermore, job selection rules and batch processing selection rules for batch processing machines are formulated. The intelligent body decides the workpieces to be machined by the machine and the type of workpieces to be batch processed based on the real-time characteristics of the decision points through the composite scheduling rules, constructs the reward payoff function of the intelligent body in terms of the cost of the workpieces' dragging time, and trains the network of the intelligent body through the proximal policy optimization algorithm. Numerical experiments were conducted on a large number of instances with different production configurations. The results confirm the superiority and generalizability of the proposed algorithm compared to heuristics.

       

    /

    返回文章
    返回