2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    基于深度強化學習的雙足機器人斜坡步態控制方法

                    吳曉光 劉紹維 楊磊 鄧文強 賈哲恒

                    吳曉光, 劉紹維, 楊磊, 鄧文強, 賈哲恒. 基于深度強化學習的雙足機器人斜坡步態控制方法. 自動化學報, 2020, 46(x): 1?12 doi: 10.16383/j.aas.c190547
                    引用本文: 吳曉光, 劉紹維, 楊磊, 鄧文強, 賈哲恒. 基于深度強化學習的雙足機器人斜坡步態控制方法. 自動化學報, 2020, 46(x): 1?12 doi: 10.16383/j.aas.c190547
                    Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning. Acta Automatica Sinica, 2020, 46(x): 1?12 doi: 10.16383/j.aas.c190547
                    Citation: Wu Xiao-Guang, Liu Shao-Wei, Yang Lei, Deng Wen-Qiang, Jia Zhe-Heng. A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning. Acta Automatica Sinica, 2020, 46(x): 1?12 doi: 10.16383/j.aas.c190547

                    基于深度強化學習的雙足機器人斜坡步態控制方法

                    doi: 10.16383/j.aas.c190547
                    基金項目: 國家自然科學基金(61503325), 中國博士后科學基金(2015M581316)資助
                    詳細信息
                      作者簡介:

                      吳曉光:燕山大學副教授, 2012年獲得哈爾濱工業大學博士學位. 主要研究方向為雙足機器人、三維虛擬視覺重構等E-mail: wuxiaoguang@ysu.edu.cn

                      劉紹維:燕山大學電氣工程學院碩士研究生. 主要研究方向為深度強化學習、雙足機器人. 本文通信作者.E-mail: lwsalpha@outlook.com

                      楊磊:燕山大學電氣工程學院碩士研究生. 主要研究方向為雙足機器人穩定性分析.E-mail: 15733513567@163.com

                      鄧文強:燕山大學電氣工程學院碩士研究生. 主要研究方向為生成對抗網絡、人體運動協調性分析等.E-mail: dengwq24@163.com

                      賈哲恒:燕山大學電氣工程學院碩士研究生. 主要研究方向為人體姿態估計、目標識別、深度學習.E-mail: jiazheheng@163.com

                    A Gait Control Method for Biped Robot on Slope Based on Deep Reinforcement Learning

                    Funds: Supported by National Natural Science Foundation of China (61503325), China Postdoctoral Science Foundation under Grants (2015M581316)
                    • 摘要: 為提高準被動雙足機器人斜坡步行穩定性, 本文提出了一種基于深度強化學習的準被動雙足機器人步態控制方法. 通過分析準被動雙足機器人的混合動力學模型與穩定行走過程, 建立了狀態空間、動作空間、episode過程與獎勵函數. 在利用基于DDPG改進的Ape-X DPG算法持續學習后, 準被動雙足機器人能在較大斜坡范圍內實現穩定行走. 仿真實驗表明, Ape-X DPG無論是學習能力還是收斂速度均優于基于PER的DDPG. 同時, 相較于能量成型控制, 使用Ape-X DPG的準被動雙足機器人步態收斂更迅速、步態收斂域更大, 證明Ape-X DPG可有效提高準被動雙足機器人的步行穩定性.
                    • 圖  1  機器人模型示意圖

                      Fig.  1  Sketch of the biped model

                      圖  2  被動步行過程

                      Fig.  2  Passive dynamic waking process

                      圖  3  DDPG中神經網絡訓練過程

                      Fig.  3  The neural network training process in DDPG

                      圖  4  APE-X DPG算法結構

                      Fig.  4  The structure of Ape-X DPG

                      圖  5  交互單元n中episode過程

                      Fig.  5  Episode process in interaction unit n

                      圖  6  falls = 0時的獎勵函數空間

                      Fig.  6  Landscape of the reward function when falls = 0

                      圖  7  平均獎勵值曲線

                      Fig.  7  The curve of the average reward

                      圖  8  測試集穩定行走次數

                      Fig.  8  Stable walking times in test

                      圖  9  機器人左腿相空間圖

                      Fig.  9  The phase plane of the right leg

                      圖  10  初始狀態b時機器人行走狀態

                      Fig.  10  Biped walking state in initial state b

                      圖  11  機器人行走過程棍狀圖

                      Fig.  11  The git diagrams of the biped

                      圖  12  機器人物理模型示意圖

                      Fig.  12  Sketch of the biped physical model

                      圖  13  機器人物理仿真

                      Fig.  13  Robot physics simulation

                      圖  14  穩定行走胞數

                      Fig.  14  The number of the state walking

                      圖  15  $ \phi = 0.1 $時機器人步態收斂域

                      Fig.  15  The biped BOA when $ \phi = 0.1 $

                      表  1  機器人符號及無量綱參數

                      Table  1  Symbols and dimensionless default values of biped parameters

                      參數 符號 數值
                      腿長 I 1
                      腿部質心 m1 1
                      髖關節質心 m2 2
                      足半徑 r 0.3
                      腿部質心與圓弧足中心距離 I1 0.55
                      髖關節與圓弧足中心距離 I2 0.7
                      髖關節到腿部質心距離 c 0.15
                      腿部轉動慣量 J1 0.01
                      重力加速度 g 9.8
                      下載: 導出CSV

                      表  2  擾動函數N分配與學習耗時

                      Table  2  Noise function N settings and learning time

                      算法 高斯擾動 O-U擾動 網絡參數擾動[39] 耗時
                      DDPG 0 1 0 6.4 h
                      2交互單元 1 1 0 4.2 h
                      4交互單元 2 1 1 4.2 h
                      6交互單元 2 2 2 4.3 h
                      下載: 導出CSV

                      表  3  機器人初始狀態

                      Table  3  The Initial states of the biped

                      狀態 $\theta_1$ $\dot\theta_1$ $\dot\theta_2$ $\phi$
                      a 0.37149 ?1.24226 2.97253 0.078
                      b 0.24678 ?1.20521 0.15476 0.121
                      下載: 導出CSV
                      360彩票
                    • [1] 田彥濤, 孫中波, 李宏揚, 王靜. 動態雙足機器人的控制與優化研究進展. 自動化學報, 2016, 42(08): 1142?1157

                      1 Tian Yan-Tao, Sun Zhong-Bo, Li Hong-Yang, Wang Jing. A review of optimal and control strategies for dynamic walking bipedal robots. Acta Automatica Sinica, 2016, 42(08): 1142?1157
                      [2] 2 Chin C S, Lin W P. Robust genetic algorithm and fuzzy inference mechanism embedded in a sliding-mode controller for an uncertain underwater robot. IEEE/ASME Transactions on Mechatronics, 2018, 23(2): 655?666 doi: 10.1109/TMECH.2018.2806389
                      [3] 3 Wang Y, Wang S, Wei Q, et al. Development of an Underwater Manipulator and Its Free-Floating Autonomous Operation. IEEE/ASME Transactions on Mechatronics, 2016, 21(2): 815?824 doi: 10.1109/TMECH.2015.2494068
                      [4] 4 Wang Y, Wang S, Tan M, et al. Real-Time Dynamic Dubins-Helix Method for 3-D Trajectory Smoothing. IEEE Transactions on Control Systems Technology, 2015, 23(2): 730?736 doi: 10.1109/TCST.2014.2325904
                      [5] 5 Wang Y, Wang S, Tan M. Path Generation of Autonomous Approach to a Moving Ship for Unmanned Vehicles. IEEE Transactions on Industrial Electronics, 2015, 62(9): 5619?5629 doi: 10.1109/TIE.2015.2405904
                      [6] Ma K Y, Chirarattananon P, Wood R J. Design and fabrication of an insect-scale flying robot for control autonomy. In: 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. Hamburg: IEEE, 2015, 1558−1564.
                      [7] 7 McGeer T. Passive Dynamic Walking. The International Journal of Robotics Research, 1990, 9(2): 62?82 doi: 10.1177/027836499000900206
                      [8] Bhounsule P A, Cortell J, Ruina A. Design and control of Ranger: an energy-efficient, dynamic walking robot. In: proceedings of the 15th International Conference on Climbing and Walking Robots and the Support Technologies for Mobile Machines. Baltimore, USA, 2012: 441−448.
                      [9] 9 Kurz M J, Stergiou N. An artificial neural network that utilizes hip joint actuations to control bifurcations and chaos in a passive dynamic bipedal walking model. Biological Cybernetics, 2005, 93(3): 213?221 doi: 10.1007/s00422-005-0579-6
                      [10] 10 Sun Chang-Yin, He Wei, Ge Wei-Liang, Chang Cheng. Adaptive Neural Network Control of Biped Robots. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2016, 47(2): 315?326
                      [11] Sugimoto Y, Osuka K. Walking control of quasi passive dynamic walking robot “Quartet Ⅲ” based on continuous delayed feedback control. In: Proceedings of the 2004 IEEE International Conference on Robotics and Biomimetics. Shenyang, China: IEEE, 2004: 606−611.
                      [12] 劉德君, 田彥濤, 張雷. 雙足欠驅動機器人能量成型控制. 機械工程學報, 2012, 48(23): 16?22 doi: 10.3901/JME.2012.23.016

                      12 Liu De-Jun, Tian Yan-Tao, Zhang Lei. Energy shaping control of under-actuated biped robot. Chinese Journal of Mechanical Engineering, 2012, 48(23): 16?22 doi: 10.3901/JME.2012.23.016
                      [13] 13 Spong M W, Holm J K, Lee D. Passivity-based control of bipedal locomotion. IEEE Robotics & Automation Magazine, 2007, 14(2): 30?40
                      [14] 劉乃軍, 魯濤, 蔡瑩皓, 王碩. 機器人操作技能學習方法綜述. 自動化學報, 2019, 45(3): 458?470

                      14 LIU Nai-Jun, LU Tao, CAI Ying-Hao, WANG Shuo. A Review of Robot Manipulation Skills Learning Methods. Acta Automatica Sinica, 2019, 45(3): 458?470
                      [15] Tedrake R, Zhang T W, Seung H S. Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems. Sendai, Japan: IEEE, 2004, 3: 2849-2854.
                      [16] 16 Hitomi K, Shibata T, Nakamura Y, Ishii S. Reinforcement learning for quasi-passive dynamic walking of an unstable biped robot. Robotics and Autonomous Systems, 2006, 54(12): 982?988 doi: 10.1016/j.robot.2006.05.014
                      [17] Ueno T, Nakamura Y, Takuma T, Shibata T, Hosoda K, Ishii S. Fast and Stable Learning of Quasi-Passive Dynamic Walking by an Unstable Biped Robot based on Off-Policy Natural Actor-Cnrtic. In: Proceedings of 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. Beijing, China: IEEE, 2006: 5226−5231.
                      [18] 劉全, 翟建偉, 章宗長, 鐘珊, 周倩, 章鵬, 等. 深度強化學習綜述. 計算機學報, 2018, 41(01): 1?27

                      18 Liu Quan, Zhai Jian-Wei, Zhang Zong-Zhang, Zhong Shan, Zhou Qian, et al. A Survey on Deep Reinforcement Learning. Chinese Journal of Computers, 2018, 41(01): 1?27
                      [19] Kendall A, Hawke J, Janz D, Mazur P, Reda D, Allen J K, etal. Learning to Drive in a Day[Online], available: https://arxiv.org/abs/1807.00412, Jul 1, 2018
                      [20] 王云鵬, 郭戈. 基于深度強化學習的有軌電車信號優先控制. 自動化學報, 2019, 45(12): 2366?2377

                      20 Wang Yun-Peng, Guo Ge. Signal priority control for trams using deep reinforcement learning. Acta Automatica Sinica, 2019, 45(12): 2366?2377
                      [21] 張一珂, 張鵬遠, 顏永紅. 基于對抗訓練策略的語言模型數據增強技術. 自動化學報, 2018, 44(5): 891?900

                      21 Zhang Yi-Ke, Zhang Peng-Yuan, Yan Yong-Hong. Data Augmentation for Language Models via Adversarial Training. Acta Automatica Sinica, 2018, 44(5): 891?900
                      [22] Andreas J, Rohrbach M, Darrell T, Klein D. Learning to Compose Neural Networks for Question Answering In: proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, 2016. 1545−1554.
                      [23] Zhang X, Lapata M. Sentence simplification with deep reinforcement learning. In: proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark: Association for Computational Linguistics, 2017. 584−594
                      [24] 趙玉婷, 韓寶玲, 羅慶生. 基于deep Q-network雙足機器人非平整地面行走穩定性控制方法. 計算機應用, 2018, 38(9): 2459?2463

                      24 Zhao Yu-Ting, Han Bao-Ling, Luo Qing-Sheng. Walking stability control method for biped robot on uneven ground based on Deep Q-Network. Journal of Computer Applications, 2018, 38(9): 2459?2463
                      [25] 25 Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529?533 doi: 10.1038/nature14236
                      [26] Kumar A, Paul N, Omkar S N. Bipedal Walking Robot using Deep Deterministic Policy Gradient. In: proceedings of 2018 IEEE Symposium Series on Computational Intelligence. Bengaluru, India: IEEE, 2018.
                      [27] Lillicrap T P, Hunt J J, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning[Online], available: https: //arxiv.org/abs/1509.02971, Sep 9, 2015
                      [28] Song D R, Yang Chuan-Yu, McGreavy C, Li Zhi-Bin. Recurrent Deterministic Policy Gradient Method for Bipedal Locomotion on Rough Terrain Challenge. In: proceedings of 2018 15th International Conference on Control, Automation, Robotics and Vision. Singapore, Singapore: IEEE, 2018. 311−318.
                      [29] Todorov E, Erez T, Tassa Y. Mujoco: A physics engine for model-based control. In: proceedings of 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems. Algarve, Portugal: IEEE. 2012: 5026−5033.
                      [30] Palanisamy P. Hands-On Intelligent Agents with OpenAI Gym: Your guide to developing AI agents using deep reinforcement learning. Packt Publishing Ltd, 2018.
                      [31] Schaul T, Quan J, Antonoglou I, Silver D. Prioritized Experience Replay. In: proceedings of International Conference on Learning Representations 2016. San Juan, Puerto Rico, 2016. 322−355.
                      [32] Horgan D, Quan J, Budden D, Maron G B, Hessel M, Hasselt H, et al. Distributed prioritized experience replay. In: proceedings of International Conference on Learning Representations 2018. Vancouver, Canada, 2018.
                      [33] 33 Zhao Jie, Wu Xiao-Guang, Zang X Z, Yang Ji-Hong. Analysis of period doubling bifurcation and chaos mirror of biped passive dynamic robot gait. Chinese science bulletin, 2012, 57(14): 1743?1750 doi: 10.1007/s11434-012-5113-3
                      [34] Silver D, Lever G, Heess N, Degris T, Wierstra D, Riedmiller M, et al. Deterministic policy gradient algorithms. In: proceedings of International Conference on International Conference on Machine Learning, Beijing, China, 2014.
                      [35] Sutton R S, Barto A G. Reinforcement learning: An introduction. Cambridge:MIT press, 1998.
                      [36] Zhao Jie, Wu Xiao-Guang, Zhu Yan-He, Li Ge. The improved passive dynamic model with high stability. In: proceedings of 2009 International Conference on Mechatronics and Automation. Changchun, China: IEEE, 2009. 4687−4692.
                      [37] Abadi M, Barham P, Chen Jian-Min, Chen Zhi-Feng, Andy D, Jeffrey D, et al. Tensorflow: A system for large-scale machine learning. In: proceedings of 12th USENIX Symposium on Operating Systems Design and Implementation. Savannah, USA, 2016: 265−283.
                      [38] Kingma D P, Ba J. Adam: A Method for Stochastic Optimization. In: proceedings of 3rd International Conference for Learning Representations. San Diego, USA. 2015.
                      [39] Plappert M, Houthooft R, Dhariwal P, Sidor S, Chen R Y, Chen Xi, et al. Parameter Space Noise for Exploration[Online], available: https://arxiv.org/abs/1706.01905, Jun 6, 2017
                      [40] Schwab A L, Wisse M. Basin of attraction of the simplest walking model. In: proceedings of the ASME design engineering technical conference. Pittsburgh, USA: ASME, 2001. 6: 531−539.
                    • 加載中
                    計量
                    • 文章訪問數:  2554
                    • HTML全文瀏覽量:  1731
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2019-07-23
                    • 錄用日期:  2020-01-09

                    目錄

                      /

                      返回文章
                      返回