2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    面向無人艇的T-DQN智能避障算法研究

                    周治國 余思雨 于家寶 段俊偉 陳龍 陳俊龍

                    周治國, 余思雨, 于家寶, 段俊偉, 陳龍, 陳俊龍. 面向無人艇的T-DQN智能避障算法研究. 自動化學報, 2021, x(x): 1001?1011 doi: 10.16383/j.aas.c210080
                    引用本文: 周治國, 余思雨, 于家寶, 段俊偉, 陳龍, 陳俊龍. 面向無人艇的T-DQN智能避障算法研究. 自動化學報, 2021, x(x): 1001?1011 doi: 10.16383/j.aas.c210080
                    Zhou Zhi-Guo, Yu Si-Yu, Yu Jia-Bao, Duan Jun-Wei, Chen Long, Chen Jun-Long. Research on t-dqn intelligent obstacle avoidance algorithm of unmanned surface vehicle. Acta Automatica Sinica, 2021, x(x): 1001?1011 doi: 10.16383/j.aas.c210080
                    Citation: Zhou Zhi-Guo, Yu Si-Yu, Yu Jia-Bao, Duan Jun-Wei, Chen Long, Chen Jun-Long. Research on t-dqn intelligent obstacle avoidance algorithm of unmanned surface vehicle. Acta Automatica Sinica, 2021, x(x): 1001?1011 doi: 10.16383/j.aas.c210080

                    面向無人艇的T-DQN智能避障算法研究

                    doi: 10.16383/j.aas.c210080
                    基金項目: “十三五”裝備預研領域基金資助(61403120109), 暨南大學中央高?;究蒲袠I務費專項資金資助(21619412)
                    詳細信息
                      作者簡介:

                      周治國:北京理工大學信息與電子學院副教授. 主要研究方向包括智能無人系統、感知與導航和機器學習. 本文通信作者. E-mail: zhiguozhou@bit.edu.cn

                      余思雨:北京理工大學信息與電子學院碩士研究生. 主要研究方向為智能無人系統信息感知與導航. E-mail: yusiyu3408@163.com

                      于家寶:北京理工大學信息與電子學院碩士研究生. 主要研究方向為智能無人航行器信息感知與導航. E-mail: 3120200722@bit.edu.cn

                      段俊偉:暨南大學信息科學與技術學院講師. 主要研究方向為圖像融合、機器學習和計算智能. E-mail: jwduan@jnu.edu.cn

                      陳龍:中國澳門大學計算機與信息科學系副教授. 主要研究方向為計算智能、貝葉斯方法機器學習. E-mail: longchen@um.edu.mo

                      陳俊龍:華南理工大學計算機科學與工程學院教授. 主要研究方向包括控制論、智能系統和計算智能. Email: philipchen@scut.edu.cn

                    Research on T-DQN Intelligent Obstacle Avoidance Algorithm of Unmanned Surface Vehicle

                    Funds: Supported by Equipment Pre-Research Field Fund Thirteen Five-year (61403120109), the Fundamental Research Funds for the Central Universities of Jinan University(21619412)
                    More Information
                      Author Bio:

                      ZHOU Zhi-Guo Associate Professor at the School of Information and Electronic, Beijing Institute of Technology. His current research interests include Intelligent unmanned ship, information perception and navigation, machine learning. Corresponding author of this paper

                      YU Si-Yu Postgraduate student at the School of Information and Electronic, Beijing Institute of Technology. Her research interest is information perception and navigation of intelligent unmanned vehicle

                      YU Jia-Bao Postgraduate student at the School of Information and Electronic, Beijing Institute of Technology. Her research interest is information perception and navigation of intelligent unmanned vehicle

                      DUAN Jun-Wei Assistant Professor at the College of Information Science and Technology, Jinan University, Guangzhou, China. His current research interests include image fusion, machine learning, and computational intelligence

                      CHEN Long Associate Professor at the Department of Computer and Information Science, University of Macau. His current research interests include computational intelligence, Bayesian methods, and machine learning

                      C. L. PHILIP CHEN Professor and Dean of the College of Computer Science and Engineering, South China University of Technology. His current research interests include cybernetics, systems, and computational intelligence

                    • 摘要: 無人艇作為一種具有廣泛應用前景的無人系統, 其自主決策能力尤為關鍵. 由于水面運動環境較為開闊, 傳統避障決策算法難以在量化規則下自主規劃最優路線, 而一般強化學習方法在大范圍復雜環境下難以快速收斂. 針對這些問題, 本文提出一種基于閾值的深度Q網絡(Threshold deep Q network, T-DQN)避障算法, 在深度Q網絡(Deep Q network, DQN)基礎上增加長短期記憶(Long short term memory, LSTM)網絡來保存訓練信息, 并設定經驗回放池閾值加速算法的收斂. 通過在不同尺度的柵格環境中進行實驗仿真, 其結果表明所提出的T-DQN算法能快速地收斂到最優路徑, 其整體收斂步數相比Q-Learning算法, DQN算法分別減少69.1 %與24.8 %, 引入的閾值篩選機制使整體收斂步數降低41.1 %. 在Unity 3D強化學習仿真平臺中驗證了復雜地圖場景下的避障任務完成情況, 實驗結果表明, 該算法能實現無人艇的精細化避障和智能安全行駛.
                    • 圖  1  T-DQN算法架構圖

                      Fig.  1  T-DQN algorithm architecture

                      圖  2  LSTM網絡結構圖

                      Fig.  2  LSTM network structure

                      圖  3  加入LSTM后的網絡層結構

                      Fig.  3  Network layer structure adding LSTM

                      圖  4  無人艇路徑規劃流程圖

                      Fig.  4  USV path planning flow chart

                      圖  5  無人艇實際參數

                      Fig.  5  Actual parameters of USV

                      圖  6  10×10大小柵格地圖下采用T-DQN訓練后的路徑結果

                      Fig.  6  Path results after T-DQN training under 10×10 grid map

                      圖  7  20×20大小柵格地圖下采用T-DQN訓練后的路徑結果

                      Fig.  7  Path results after T-DQN training under 20×20 grid map

                      圖  8  30×30大小柵格地圖下采用T-DQN訓練后的路徑結果

                      Fig.  8  Path results after T-DQN training under 30×30 grid map

                      圖  9  四類算法分別在10×10, 20×20, 30×30大小柵格地圖下的平均回報值對比

                      Fig.  9  Comparison of the average return values of the four algorithms under 10×10, 20×20 and 30×30 grid maps

                      圖  10  Spaitlab-Unity仿真實驗平臺

                      Fig.  10  Spaitlab-Unity simulation experiment platform

                      圖  11  無人艇全局路徑規劃仿真運動軌跡

                      Fig.  11  Global path planning simulation trajectory of USV

                      圖  12  柵格化水域空間內的全局路徑

                      Fig.  12  Global path planning in grid water space

                      圖  13  無人艇全局局部仿真運動軌跡對比

                      Fig.  13  Comparison of global and local simulation trajectories of USV

                      表  1  T-DQN與Q-Learning, DQN, LSTM+DQN的收斂步數對比

                      Table  1  Convergence episodes comparison ofQ-learning, DQN, LSTM+DQN and T-DQN

                      算法10×10地圖收
                      斂步數
                      20×20地圖收
                      斂步數
                      30×30地圖收
                      斂步數
                      Q-Learning888>2000>2000
                      DQN317600>2000
                      LSTM+DQN750705850
                      T-DQN400442517
                      下載: 導出CSV
                      360彩票
                    • [1] Tang PP, Zhang RB, Liu DL, Huang LH, Liu GQ, Deng TQ. Local reactive obstacle avoidance approach for high-speed unmanned surface vehicle. Ocean Engineering, 2015, 106: 128-140 doi: 10.1016/j.oceaneng.2015.06.055
                      [2] Campbell S, Naeem W, Irwin G W. A review on improving the autonomy of unmanned surface vehicles through intelligent collision avoidance manoeuvres. Annual Reviews in Control, 2012, 36(2): 267-283 doi: 10.1016/j.arcontrol.2012.09.008
                      [3] Liu ZX, Zhang YM, Yu X, Yuan C. Unmanned surface vehicles: An overview of developments and challenges. Annual Review in Control, 2016, 41: 71-93 doi: 10.1016/j.arcontrol.2016.04.018
                      [4] 張衛東, 劉笑成, 韓鵬. 水上無人系統研究進展及其面臨的挑戰. 自動化學報, 2020, 46(5): 847?857

                      Zhang Wei-Dong, Liu Xiao-Cheng, Han Peng. Progress and challenges of overwater unmanned systems. Acta Automatica Sinica, 2020, 46(5): 847?857
                      [5] 范云生, 柳健, 王國峰, 孫宇彤. 基于異源信息融合的無人水面艇動態路徑規劃. 大連海事大學學報, 2018, 44(1): 9-16

                      Fan Yun-Sheng, Liu Jian, Wang Guo-Feng, Sun Yu-Tong. Dynamic path planning for unmanned surface vehicle based on heterologous information fusion. Journal of Dalian Maritime University, 2018, 44(1): 9-16
                      [6] Zhan WQ, Xiao CS, Wen YQ, Zhou CH, Yuan HW, Xiu SP, Zhang YM, Zou X, Liu X, Li QL. Autonomous visual perception for unmanned surface vehicle navigation in an unknown environment. Sensors, 2019, 19(10): 2216 doi: 10.3390/s19102216
                      [7] Zhou CH, Gu SD, Wen YQ, Du Z, Xiao CS, Huang L, Zhu M. The review unmanned surface vehicle path planning: Based on multi-modality constraint. Ocean Engineering, 2020, 200: 107043 doi: 10.1016/j.oceaneng.2020.107043
                      [8] Yang X, Cheng W. AGV path planning based on smoothing A* algorithm. International Journal of Software Engineering and Applications, 2015, 6(5): 1-8 doi: 10.5121/ijsea.2015.6501
                      [9] Lozano-Pérez T, Wesley M A. An algorithm for planning collision-free paths among polyhedral obstacles. Communications of the ACM, 1979, 22(10): 560-570 doi: 10.1145/359156.359164
                      [10] 姚鵬, 解則曉. 基于修正導航向量場的AUV自主避障方法. 自動化學報, 2020, 46(08): 1670-1680

                      Yao Peng, Xie Ze-Xiao. Autonomous obstacle avoidance for AUV based on modified guidance vector field. Acta Automatica Sinica, 2020, 46(08): 1670-1680
                      [11] 董瑤, 葛瑩瑩, 郭鴻湧, 董永峰, 楊琛. 基于深度強化學習的移動機器人路徑規劃. 計算機工程與應用, 2019, 55(13): 15-19+157 doi: 10.3778/j.issn.1002-8331.1812-0321

                      Dong Yao, Ge Yingying, Guo Hong-Yong, Dong Yong-Feng, Yang Chen. Path planning for mobile robot based on deep reinforcement learning. Computer Engineering and Applications, 2019, 55(13): 15-19+157 doi: 10.3778/j.issn.1002-8331.1812-0321
                      [12] 吳曉光, 劉紹維, 楊磊, 鄧文強, 賈哲恒. 基于深度強化學習的雙足機器人斜坡步態控制方法. 自動化學報, 2020, 46(x): 1?12

                      Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A gait control method for biped robot on slope based on deep reinforcement learning. Acta Automatica Sinica, 2020, 46(x): 1?12
                      [13] Szepesvári C. Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 2010, 4(1): 1-103
                      [14] Sigaud O, Buffet O. Markov Decision Processes in Artificial Intelligence: John Wiley & Sons, 2013. 39−44
                      [15] 王子強, 武繼剛. 基于RDC-Q學習算法的移動機器人路徑規劃. 計算機工程, 2014, 40(6): 211-214 doi: 10.3969/j.issn.1000-3428.2014.06.045

                      Wang Zi-Qiang, Wu Ji-Gang. Mobile robot path planning based on RDC-Q learning algorithm. Computer Engineering, 2014, 40(6): 211-214 doi: 10.3969/j.issn.1000-3428.2014.06.045
                      [16] Silva Junior, A G D, Santos D H D, Negreiros A P F D, Silva J M V B D S, Gon?alves L M G. High-level path planning for an autonomous sailboat robot using Q-learning. Sensors, 2020, 20(6), 1550 doi: 10.3390/s20061550
                      [17] Kim B, Kaelbling L P, Lozano-Pérez T. Adversarial actor-critic method for task and motion planning problems using planning experience. In: Proceedings of the 33rd AAAI Conference on Artificial Intelligence. Honolulu, HI, USA: AAAI, 2019. 8017?8024
                      [18] Chen YF, Liu M, Everett M, How J P. Decentralized non-communicating multiagent collision avoidance with deep reinforcement learning. In: Proceedings of 2017 IEEE international conference on robotics and automation (ICRA). Singapore: IEEE, 2017: 285?292
                      [19] Tai L, Paolo G, Liu M. Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). Vancouver, BC, Canada: IEEE, 2017: 31?36
                      [20] Zhang J, Springenberg J T, Boedecker J, Burgard W. Deep reinforcement learning with successor features for navigation across similar environments. In: Proceedings of 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada: IEEE, 2017: 2371?2378
                      [21] Matthew H, Stone P. Deep recurrent q-learning for partially observable mdps. arXiv preprint arXiv: 1507.06527, 2015
                      [22] Liu F, Chen C, Li Z, Guan Z, Wang H. Research on path planning of robot based on deep reinforcement learning. In: Proceedings of 2020 39th Chinese Control Conference (CCC), Shenyang, China: IEEE, 2020: 3730?3734
                      [23] Wang P, Chan CY. Formulation of deep reinforcement learning architecture toward autonomous driving for on-ramp merge. In: Proceedings of 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan: IEEE, 2017: 1?6
                      [24] Deshpande N, Vaufreydaz D, Spalanzani A. Behavioral decision-making for urban autonomous driving in the presence of pedestrians using deep recurrent Q-Network. In: Proceedings of 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV), Shenzhen, China: IEEE, 2020: 428?433
                      [25] Peixoto Maria J. P., Azim A. Context-based learning for autonomous vehicles. In: Proceedings of 2020 IEEE 23rd International Symposium on Real-Time Distributed Computing (ISORC), Nashville, TN, USA: IEEE, 2020: 150?151
                      [26] Degris T, Pilarski P M, Sutton R S. Model-Free reinforcement learning with continuous action in practice. In: Proceedings of 2012 American Control Conference (ACC), Montreal, QC, Canada: IEEE, 2012: 2177?2182
                      [27] Gao N, Qin Z, Jing X, Ni Q, Jin S. Anti-Intelligent UAV jamming strategy via deep Q-Networks. IEEE Transactions on Communications, 2019, 68(1): 569-581
                      [28] Mnih V, Kavukcuoglu K, Silver D. Playing atari with deep reinforcement learning. arXiv preprint arXiv: 1312.5602, 2013
                      [29] Mnih V, Kavukcuoglu K, Silver D. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529-533 doi: 10.1038/nature14236
                      [30] Zhang CL, Liu XJ, Wan DC, Wang JB. Experimental and numerical investigations of advancing speed effects on hydrodynamic derivatives in MMG model, part I: X-vv, Y-v, N-v. Ocean Engineering, 2019, 179: 67-75 doi: 10.1016/j.oceaneng.2019.03.019
                    • 加載中
                    計量
                    • 文章訪問數:  62
                    • HTML全文瀏覽量:  23
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2021-01-25
                    • 修回日期:  2021-06-25
                    • 網絡出版日期:  2021-07-31

                    目錄

                      /

                      返回文章
                      返回