工业工程 ›› 2024, Vol. 27 ›› Issue (3): 12-21,30.doi: 10.3969/j.issn.1007-7375.240100

• 晶圆制造过程优化与测试 • 上一篇    下一篇

基于门控循环单元强化学习的晶圆光刻区实时调度方法研究

吴立辉1, 石津铭1, 金克山1, 张洁2   

  1. 1. 上海应用技术大学 机械工程学院,上海 201400;
    2. 东华大学 人工智能研究院,上海 201620
  • 收稿日期:2024-03-19 发布日期:2024-07-12
  • 作者简介:吴立辉 (1981—),男,湖南省人,副教授,博士,主要研究方向为智能制造、复杂制造系统调度
  • 基金资助:
    国家重点研发资助项目 (2022YFB3305003);上海应用技术大学引进人才科研启动项目 (YJ2022-33)

Real-time Scheduling of Wafer Photolithography Area Based on Reinforcement Learning with Gated Recurrent Unit

WU Lihui1, SHI Jinming1, JIN Keshan1, ZHANG Jie2   

  1. 1. School of Mechanical Engineering, Shanghai Institute of Technology, Shanghai 201400, China;
    2. Institute of Artificial Intelligence, Donghua University, Shanghai 201620, China
  • Received:2024-03-19 Published:2024-07-12

摘要: 为求解具有动态性、实时性、多约束、多目标特点的晶圆光刻区调度问题,提出一种基于门控循环单元强化学习的晶圆光刻区实时调度方法。设计引入门控循环单元学习光刻区历史调度决策与状态的时序信息,为双深度强化学习模型提供辅助决策信息;设计双深度强化学习模型的输入状态空间、输出动作集,并面向晶圆最小化最大完工时间和晶圆准时交货率指标设计多目标奖励函数,为智能体优化调度输出;设计设备专用性约束与掩模版约束的解约束规则与调度方法相结合,提高调度方案实施的实用性。通过某晶圆制造企业实际算例,将该方法与传统双深度强化学习和光刻区启发式规则方法比较,该方法均为最优,证明了其解决此问题的有效性。

关键词: 晶圆制造系统, 光刻区调度, 深度强化学习, 门控循环单元(GRU), 多目标

Abstract: To address the scheduling problem of wafer photolithography area, characterized by dynamic nature, real-time requirements, multiple constraints, and multiple objectives, a real-time scheduling method based on gated recurrent unit (GRU) reinforcement learning is proposed. This method incorporates GRU to learn the temporal information of historical scheduling decisions and states in the photolithography area, providing auxiliary decision-making information for the double deep reinforcement learning (DDRL) model. The input state space and output action set of the DDRL model are designed, and a multi-objective reward function is established with the objective of minimizing the maximum completion time of wafers and maximizing the on-time delivery rate, optimizing the scheduling output by intelligent agents. Additionally, constraint relaxation rules and scheduling methods are proposed combining equipment-specific constraints and mask constraints, to enhance the practicality of scheduling strategies. Through empirical evaluation using real-world cases from a wafer manufacturing enterprise, this method is compared with traditional double deep reinforcement learning and heuristic rule methods for photolithography area, demonstrating its superiority and verifying its effectiveness in solving this problem.

Key words: wafer fabrication system, scheduling of photolithography area, deep reinforcement learning, gated recurrent unit (GRU), multi-objective

中图分类号: