基于多智能体强化学习的视情维修策略优化

赵浅予; 赵秀杰

doi:10.3969/j.issn.1007-7375.250166

基于多智能体强化学习的视情维修策略优化

Optimization of Condition-Based Maintenance Strategies via Multi-Agent Reinforcement Learning

摘要

摘要: 多组件系统在航空航天、能源与制造等领域中应用广泛，其视情维修(condition-based maintenance, CBM)中的协同决策面临多目标权衡与智能体间协作的挑战。为此，本文提出一种面向多智能体环境的强化学习方法，用于优化视情维修策略。该方法以系统维修成本、组件健康状态、整体可靠性与协同行为为优化目标，设计了细粒度、多维度的奖励机制，以引导智能体策略学习；同时引入共享策略网络与ε-贪婪(epsilon-greedy)探索机制，以提升学习过程的稳定性和策略探索的多样性。在此基础上，构建多智能体双重深度Q网络(multi-agent double deep Q-network，MA-DDQN)框架，实现策略间的信息共享与协同更新。为验证方法有效性，本文在基于齐次伽马过程退化建模的多组件系统环境中开展仿真实验，并与传统规则策略和独立DQN策略进行对比，重点评估系统收益、计算开销及智能体策略行为等方面的表现。实验结果表明，所提方法在最终收益上较独立DQN提升约10.6%，整体训练时间缩短约66%，展现出良好的多目标适应性与工程部署潜力。

Abstract: Multi-component systems are widely applied in aerospace, energy, and manufacturing industries, where condition-based maintenance (CBM) faces challenges of multi-objective trade-offs and coordination among agents. To address these challenges, this paper proposes a reinforcement learning method for multi-agent environments to optimize condition-based maintenance strategies. The proposed approach optimizes system maintenance cost, component health status, overall reliability, and cooperative behavior by designing a fine-grained, multi-dimensional reward mechanism that guides agent policy learning. Meanwhile, a shared policy network and an ε-greedy exploration mechanism are introduced to enhance the stability of learning and the diversity of policy exploration. On this basis, a multi-agent double deep Q-network (MA-DDQN) framework is constructed to enable information sharing and collaborative policy updating among agents. To validate the proposed method, simulation experiments are conducted in a multi-component system environment modeled by a homogeneous gamma degradation process, and the results are compared with those of traditional rule-based and independent DQN strategies. The experimental results demonstrate that the proposed method achieves approximately 10.6% improvement in final cumulative reward and reduces overall training time by about 66%, showing superior multi-objective adaptability and strong potential for engineering deployment.

HTML全文

参考文献(17)

施引文献

资源附件(0)