GUO Hongjiang, Zhu Zhengze, Gong Jiayuan, Yin Zheng, Lyu Chengzhi. A Two-Stage GAIL-PPO Optimization Framework for High-Speed Vehicle Lane-changing Decision-Making[J]. Industrial Engineering Journal. DOI: 10.3969/j.issn.1007-7375.250150
    Citation: GUO Hongjiang, Zhu Zhengze, Gong Jiayuan, Yin Zheng, Lyu Chengzhi. A Two-Stage GAIL-PPO Optimization Framework for High-Speed Vehicle Lane-changing Decision-Making[J]. Industrial Engineering Journal. DOI: 10.3969/j.issn.1007-7375.250150

    A Two-Stage GAIL-PPO Optimization Framework for High-Speed Vehicle Lane-changing Decision-Making

    • This paper addresses challenges in intelligent vehicle lane-changing decisions on highways, including the high dependency on data quality and limited generalization capability of imitation learning, as well as the low training efficiency and difficulty in balancing multiple objectives in reinforcement learning. It proposes a two-stage collaborative optimization framework based on generative adversarial imitation learning (GAIL) and proximal policy optimization (PPO). An adversarial mechanism based on Wasserstein distance combined with gradient penalties is introduced into the GAIL discriminator to enhance training stability. PPO is integrated into the generator update process of GAIL, leveraging an Actor-Critic architecture to strengthen the robustness of policy learning. PPO is employed for multi-objective reinforcement fine-tuning of the pre-trained policy, constructing a multi-objective reward function that balances traffic efficiency and safety constraints. This enables progression from expert imitation to policy optimization under complex scenario constraints. Experimental results on the highway-env simulation environment demonstrate that compared to the PPO baseline and DQN methods, the proposed approach achieves approximately 4% and 8% improvements in average travel speed, respectively, while effectively reducing unnecessary lane changes. Combined with longitudinal acceleration time series analysis and robustness testing, these results further validate the method's stability and generalization capabilities under varying driving durations, traffic flow densities, lane numbers, and vehicle dynamics constraints.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return