2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    知識和數據協同驅動的群體智能決策方法研究綜述

                    蒲志強 易建強 劉振 丘騰海 孫金林 李非墨

                    蒲志強, 易建強, 劉振, 丘騰海, 孫金林, 李非墨. 知識和數據協同驅動的群體智能決策方法研究綜述. 自動化學報, 2021, x(x): 1?17 doi: 10.16383/j.aas.c210118
                    引用本文: 蒲志強, 易建強, 劉振, 丘騰海, 孫金林, 李非墨. 知識和數據協同驅動的群體智能決策方法研究綜述. 自動化學報, 2021, x(x): 1?17 doi: 10.16383/j.aas.c210118
                    Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2021, x(x): 1?17 doi: 10.16383/j.aas.c210118
                    Citation: Pu Zhi-Qiang, Yi Jian-Qiang, Liu Zhen, Qiu Teng-Hai, Sun Jin-Lin, Li Fei-Mo. Knowledge-based and data-driven integrating methodologies for collective intelligence decision making: A survey. Acta Automatica Sinica, 2021, x(x): 1?17 doi: 10.16383/j.aas.c210118

                    知識和數據協同驅動的群體智能決策方法研究綜述

                    doi: 10.16383/j.aas.c210118
                    基金項目: 科技創新2030—"新一代人工智能”重大項目(2020AAA0103404), 國家自然科學基金(62073323, 61806199), 中國科學院戰略性先導研究項目(XDA27030403), 中國科學院對外合作重點項目(173211KYSB20200002)資助
                    詳細信息
                      作者簡介:

                      蒲志強:中國科學院自動化研究所綜合信息系統研究中心副研究員. 2014年獲得中國科學院大學控制理論與控制工程博士學位. 主要研究方向為群體智能、多智能體強化學習、無人系統魯棒自適應控制等. 本文通信作者. E-mail: zhiqiang.pu@ia.ac.cn

                      易建強:中國科學院自動化研究所綜合信息系統研究中心研究員. 1992年獲得日本九州工業大學自動控制博士學位. 主要研究方向為智能控制、智能機器人、自主無人系統等. E-mail: jianqiang.yi@ia.ac.cn

                      劉振:中國科學院自動化研究所綜合信息系統研究中心副研究員. 2015年獲得中國科學院大學控制理論與控制工程博士學位. 主要研究方向為飛行控制、魯棒自適應控制、多智能體強化學習等. E-mail: liuzhen@ia.ac.cn

                      丘騰海:中國科學院自動化研究所綜合信息系統研究中心助理研究員. 2016年獲得北京航空航天大學控制理論與控制工程碩士學位. 主要研究方向為智能決策、多智能體、自主無人系統應用等. E-mail: tenghai.qiu@ia.ac.cn

                      孫金林:江蘇大學電氣信息工程學院講師. 主要研究方向為魯棒與自適應控制, 計算智能, 抗干擾控制. E-mail: jinlinsun@outlook.com

                      李非墨:中國科學院自動化研究所綜合信息系統研究中心助理研究員. 2017年獲得中國科學院大學計算機應用技術博士學位. 主要研究方向為遙感圖像處理、計算機視覺、智能感知等. E-mail: lifeimo2012@ia.ac.cn

                    Knowledge-Based and Data-Driven Integrating Methodologies for Collective Intelligence Decision Making: A Survey

                    Funds: Supported by the National Key Research and Development Program of China (2020AAA0103404), National Natural Science Foundation of China (62073323, 61806199), the Strategic Priority Research Program of Chinese Academy of Sciences (XDA27030403), the External Cooperation Key Project of Chinese Academy Sciences (173211KYSB20200002)
                    More Information
                      Author Bio:

                      PU Zhi-Qiang Associate professor at the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He received the Ph. D. degree in control theory and control engineering from University of Chinese Academy of Sciences, Beijing, China, in 2014. His research interests include collective intelligence, multi-agent reinforcement learning, and robust adaptive control of unmanned systems. Corresponding author of this paper

                      YI Jian-Qiang Professor at the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He received the Ph. D. degree in automation control from the Kyushu Institute of Technology, Kitakyushu, Japan, in 1992. His research interests include intelligent control, intelligent robotics, and autonomous unmanned systems

                      LIU Zhen Associate professor at the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He received the Ph. D. degree in control theory and control engineering from University of Chinese Academy of Sciences, Beijing, China, in 2015. His research interests include flight control, robust adaptive control, and multi-agent reinforcement learning

                      QIU Teng-Hai Research assistant at the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He received the M.Eng. degree in control theory and control engineering from Beihang University, Beijing, China, in 2016. His research interests include intelligence decision making, multi-agent, and the applications of unmanned autonomous systems

                      SUN Jin-Lin Assistant Professor at the School of Electrical and Information Engineering, Jiangsu University. His research interest covers robust and adaptive control, computational intelligence, and anti-disturbance control

                      LI Fei-Mo Research assistant at the Integrated Information System Research Center, Institute of Automation, Chinese Academy of Sciences. He received the Ph. D. degree in computer applied technology from University of Chinese Academy of Sciences in 2017. His research interests include remote sensing image processing, computer vision, and intelligent perception

                    • 摘要: 群體智能系統擁有廣泛的應用前景. 當前的群體智能決策方法主要包括知識驅動、數據驅動兩大類, 但各自存在優缺點. 本文指出, 知識與數據協同驅動將為群體智能決策提供新解法. 文章系統梳理了知識與數據協同驅動可能存在的不同方法路徑, 從知識與數據的架構級協同、算法級協同兩個層面對典型方法進行了分類, 同時將算法級協同方法進一步劃分為算法的層次化協同和組件化協同, 前者包含神經網絡樹、遺傳模糊樹、分層強化學習等層次化方法, 后者進一步總結為知識增強的數據驅動、數據調優的知識驅動、知識與數據的互補結合等方法. 最后, 從理論發展與實際應用的需求出發, 指出了知識與數據協同驅動的群體智能決策中未來幾個重要的研究方向.
                    • 圖  1  知識驅動和數據驅動各自優缺點

                      Fig.  1  Advantages and disadvantages of knowledge- based and data-driven methodologies

                      圖  2  知識與數據協同驅動總體框架

                      Fig.  2  Overall framework of knowledge-based and data-driven methods integration

                      圖  3  知識和數據架構級協同概念模型

                      Fig.  3  Conceptual model for framework-level integration of knowledge-based and data-driven methods

                      圖  4  MDP與SMDP比較

                      Fig.  4  Comparison between MDP and SMDP

                      圖  5  知識增強的數據驅動方法

                      Fig.  5  Knowledge enhanced data-driven methods

                      圖  6  知識的網絡化展開概念模型

                      Fig.  6  Conceptual networking expansion of knowledge

                      圖  7  知識驅動與神經網絡互補結合控制框架

                      Fig.  7  Control diagrams of complementary knowledge-driven and neural network methods

                      360彩票
                    • [1] Li W, Wu W J, Wang H M, Cheng X Q, Chen H J, Zhou Z H, et al. Crowd intelligence in AI 2.0 era. Frontiers of Information Technology & Electronic Engineering, 2017, 18(1): 15?43
                      [2] Chung S J, Paranjape A A, Dames P, Shen S J, Kumar V. A survey on aerial swarm robotics. IEEE Transactions on Robotics, 2018, 34(4): 837?855 doi: 10.1109/TRO.2018.2857475
                      [3] 杜永浩, 邢立寧, 蔡昭權. 無人飛行器集群智能調度技術綜述. 自動化學報, 2020, 46(2): 222?241

                      Du Yong-Hao, Xing Li-Ning, Cai Zhao-Quan. Survey on intelligent scheduling technologies for unmanned flying craft clusters. Acta Automatica Sinica, 2020, 46(2): 222?241
                      [4] Nguyen T T, Nguyen N D, Nahavandi S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Transactions on Cybernetics, 2020, 50(9): 3826?3839 doi: 10.1109/TCYB.2020.2977374
                      [5] 孫長銀, 穆朝絮. 多智能體深度強化學習的若干關鍵科學問題. 自動化學報, 2020, 46(7): 1302?1313

                      Sun Chang-Yin, Mu Chao-Xu. Important scientific problems of multi-agent deep reinforcement learning. Acta Automatica Sinica, 2020, 46(7): 1302?1313
                      [6] He F J, Pan Y D, Lin Q K, Miao X L, Chen Z G. Collective intelligence: a taxonomy and survey. IEEE Access, 2019, 7: 170213?170225 doi: 10.1109/ACCESS.2019.2955677
                      [7] Krause J, Ruxton G D, Krause S. Swarm intelligence in animals and humans. Trends in Ecology and Evolution, 2010, 25(1): 28?34 doi: 10.1016/j.tree.2009.06.016
                      [8] Wu T, Zhou P, Liu K, Yuan Y L, Wang X M, Huang H W, et al. Multi-agent deep reinforcement learning for urban traffic light control in vehicular networks. IEEE Transactions on Vehicular Technology, 2020, 69(8): 8243?8256 doi: 10.1109/TVT.2020.2997896
                      [9] 陳杰, 方浩, 辛斌. 多智能體系統的協同群集運動控制. 北京: 科學出版社, 2017.

                      Chen Jie, Fang Hao, Xin Bin. Cooperative Flocking Control of Multi-Agent Systems. Beijing: Science Press, 2017.
                      [10] Zhu B, Zaini A H B, Xie L H. Distributed guidance for interception by using multiple rotary-wing unmanned aerial vehicles. IEEE Transactions on Industrial Electronics, 2017, 64(7): 5648?5656 doi: 10.1109/TIE.2017.2677313
                      [11] Qin J H, Gao H J, Zheng W X. Exponential synchronization of complex networks of linear systems and nonlinear oscillators: a unified analysis. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(3): 510?521 doi: 10.1109/TNNLS.2014.2316245
                      [12] 王祥科, 劉志宏, 從一睿, 李杰, 陳浩. 小型固定翼無人機集群綜述和未來發展. 航空學報, 2020, 41(4): 023732

                      Wang Xiang-Ke, Liu Zhi-Hong, Cong Yi-Rui, Li Jie, Chen Hao. Miniature fixed-wing UAV swarms: review and outlook. Acta Aeronautica et Astronautica Sinica, 2020, 41(4): 023732
                      [13] 段海濱, 邱華鑫. 基于群體智能的無人機集群自主控制. 北京: 科學出版社, 2018.

                      Duan Hai-Bin, Qiu Hua-Xin. Swarm UAV Autonomous Control based on Swarm Intelligence. Beijing: Science Press, 2018.
                      [14] Watts D J, Strogatz S H. Collective dynamics of ‘small-world’ networks. Nature, 1998, 393: 440?442 doi: 10.1038/30918
                      [15] Barabasi A L, Albert R. Emergence of scaling in random networks. Science, 1999, 286: 509?512 doi: 10.1126/science.286.5439.509
                      [16] Su Q, McAvoy A, Wang L, Nowak M A. Evolutionary dynamics with game transitions. PNAS, 2019, 116(51): 25398?25404 doi: 10.1073/pnas.1908936116
                      [17] 邢立寧, 陳英武. 基于知識的智能優化引導方法研究進展. 自動化學報, 2011, 37(11): 1285?1289

                      Xing Li-Ning, Cheng Ying-Wu. Research progress on intelligent optimization guidance approaches using knowledge. Acta Automatica Sinica, 2011, 37(11): 1285?1289
                      [18] Xu J X, Hou Z S. Notes on data-driven system approaches. Acta Automatica Sinica, 2009, 35(6): 668?675
                      [19] Xu Z B, Sun J. Model-driven deep-learning. National Science Review, 2018, 5(11): 22?24
                      [20] 李晨溪, 曹雷, 張永亮, 陳希亮, 周宇歡, 段理文. 基于知識的深度強化學習研究綜述. 系統工程與電子技術, 2017, 39(11): 2603?2613 doi: 10.3969/j.issn.1001-506X.2017.11.30

                      Li Chen-Xi, Cao Lei, Zhang Yong-Liang, Chen Xi-Liang, Zhou Yu-Huan, Duan Li-Wen. Knowledge–based deep reinforcement learning: a review. Systems Engineering and Electronics, 2017, 39(11): 2603?2613 doi: 10.3969/j.issn.1001-506X.2017.11.30
                      [21] Agarwal M. Combining neural and conventional paradigms for modelling, prediction and control. International Journal of Systems Science, 1997, 28(1): 65?81 doi: 10.1080/00207729708929364
                      [22] Hsiao Y T, Lee W P, Yang W, Muller S, Flamm C, Hofacker I, et al. Practical guidelines for incorporating knowledge-based and data-driven strategies into the inference of gene regulatory networks. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, 13(1): 64?75 doi: 10.1109/TCBB.2015.2465954
                      [23] Zhang J, Xiao W D, Li Y J. Data and knowledge twin driven integration for large-scale device-free localization. IEEE Internet of Things Journal, 2021, 8(1): 320?331 doi: 10.1109/JIOT.2020.3005939
                      [24] Reynold C W. Flocks, herds, and schools: a distributed behavioral model. Computer Graphics, 1987, 21(4): 25?34 doi: 10.1145/37402.37406
                      [25] Vicsek T, Czirok A, Ben-Jacob E, Cohen I, Shochet O. Novel type of phase transition in a system of self-driven particles. Physical Review Letters, 1995, 75(6): 1226?1229 doi: 10.1103/PhysRevLett.75.1226
                      [26] Couzin I D, Krause J, James R, Ruxton G D, Franks N R. Collective memory and spatial sorting in animal groups. Journal of Theoretical Biology, 2002, 218(1): 1?11 doi: 10.1006/jtbi.2002.3065
                      [27] Cucker F, Smale S. Emergent behavior in flocks. IEEE Transactions on Automatic Control, 2007, 52(5): 852?862 doi: 10.1109/TAC.2007.895842
                      [28] Frazzoli E, Dahleh M A, Feron E. Real-time motion planning for agile autonomous vehicles. Journal of Guidance, Control, and Dynamics, 2002, 25(1): 116?129 doi: 10.2514/2.4856
                      [29] Choi H, Brunet L, How J P. Consensus-based decentralized auctions for robust task allocation. IEEE Transactions on Robotics, 2009, 25(4): 912?926 doi: 10.1109/TRO.2009.2022423
                      [30] Sui Z Z, Pu Z Q, Yi J Q. Optimal UAVs formation transformation strategy based on task assignment and particle swarm optimization, In: Proceedings of 2017 IEEE International Conference on Mechatronics and Automation. Takamatsu, Japan: IEEE, 2017.1804–1809.
                      [31] Huang J. The cooperative output regulation problem of discrete-time linear multi-agent systems by the adaptive distributed observer. IEEE Transactions on Automatic Control, 2017, 62(4): 1979?1984 doi: 10.1109/TAC.2016.2592802
                      [32] Jiang H, He H B. Data-driven distributed output consensus control for partially observable multiagent systems. IEEE Transactions on Cybernetics, 2019, 49(3): 848?858 doi: 10.1109/TCYB.2017.2788819
                      [33] Tian B L, Lu H C, Zuo Z Y, Yang W. Fixed-time leader-follower output feedback consensus for second-order multiagent systems. IEEE Transactions on Cybernetics, 49(4): 1545?1550
                      [34] Gao F, Chen W S, Li Z W, Li J, Xu B. Neural network based distributed cooperative learning control for multiagent systems via event-triggered communication. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(2): 407?419 doi: 10.1109/TNNLS.2019.2904253
                      [35] Dong X W, Shi Z Y, Lu G, Zhong Y S. Output containment analysis and design for high-order linear time-invariant swarm systems. International Journal of Robust and Nonlinear Control, 2015, 25(6): 3439?3456
                      [36] Zhang Y H, Sun J, Liang H J, Li H Y. Event-triggered adaptive tracking control for multiagent systems with unknown disturbances. IEEE Transactions on Cybernetics, 2020, 50(3): 890?901 doi: 10.1109/TCYB.2018.2869084
                      [37] Lee M, Tarokh M, Cross M. Fuzzy logic decision making for multi-robot security systems. Artificial Intelligence Review, 2010, 34(2): 177?194 doi: 10.1007/s10462-010-9168-8
                      [38] Burgin G H, Sidor L B. Rule-based air combat simulation, Technique Report 4160, Titan systems Inc., La Jolla, California, USA, 1988.
                      [39] 劉源. 兵棋與兵棋推演. 北京: 國防大學出版社, 2013.

                      Liu Yuan. War Game and War-Game Deduction. Beijing: National Defense University Press, 2013.
                      [40] Gao K Z. A review on swarm intelligence and evolutionary algorithms for solving flexible job shop scheduling problems. IEEE/CAA Journal of Automatica Sinica, 2019, 6(4): 904?916 doi: 10.1109/JAS.2019.1911540
                      [41] 黃剛, 李軍華. 基于AC-DSDE進化算法多UAVs協同目標分配. 自動化學報, 2021, 47(1): 173?185

                      Huang Gang, Li Jun-Hua. Multi-UAV cooperative target allocation based on AC-DSDE evolutionary algorithm. Acta Automatica Sinica, 2021, 47(1): 173?185
                      [42] Lecun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521: 436?444 doi: 10.1038/nature14539
                      [43] Littman M L. Reinforcement learning improves behavior from evaluative feedback. Nature, 2015, 521: 445?451 doi: 10.1038/nature14540
                      [44] Mnih V, Kavukcuoglu K, Silver D, Rusu A A, Veness J, Bellemare M G, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529?533 doi: 10.1038/nature14236
                      [45] Silver D, Huang A, Maddison C J, Guez A, Sifre L, van den Driessche G, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 2016, 529: 484?489 doi: 10.1038/nature16961
                      [46] Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, et al. Mastering the game of Go without human knowledge. Nature, 2017, 550: 354?359 doi: 10.1038/nature24270
                      [47] Schwartz H M. Multi-agent machine learning: a reinforcement approach. New Yourk: Wiley, 2014.
                      [48] 梁星星, 馮旸赫, 馬揚, 程光權, 黃金才, 王琦, 等. 多Agent深度強化學習綜述. 自動化學報, 2020, 46(12): 2537?2557

                      Liang Xing-Xing, Feng Yang-He, Ma Yang, Cheng Guang-Quan, Huang Jin-Cai, Wang Qi, et al. Deep multi-agent reinforcement learning: a survey. Acta Automatica Sinica, 2020, 46(12): 2537?2557
                      [49] Vinyals O, Babuschkin I, Czarnecki W M, Mathieu M, Dudzik A, Chung J, et al. Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 2019, 575: 350?354 doi: 10.1038/s41586-019-1724-z
                      [50] Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I, et al. Multi-agent actor-critic for mixed cooperative- competitive environments. arXiv: 1706.02275, 2018.
                      [51] Berner C, Brockman G, Chan B, Cheung V, Debiak P, Dennison C, et al. Dota 2 with large scale deep reinforcement learning. arXiv: 1912.06680, 2019.
                      [52] 顏躍進, 李舟軍, 陳躍新. 多Agent系統體系結構. 計算機科學, 2001, 28(5): 77?80 doi: 10.3969/j.issn.1002-137X.2001.05.020

                      Yan Yue-Jin, Li Zhou-Jun, Chen Yue-Xin. Multi-agent system architecture. Computer Science, 2001, 28(5): 77?80 doi: 10.3969/j.issn.1002-137X.2001.05.020
                      [53] Brooks R A. A Robust layered control system for a mobile robot. IEEE Journal of Robotics and Automation, 1986, 2(1): 14?23 doi: 10.1109/JRA.1986.1087032
                      [54] Lawton J R T, Beard R W, Young B J. A decentralized approach to formation maneuvers. IEEE Transactions on Robotics and Automation, 2003, 19(6): 933?941 doi: 10.1109/TRA.2003.819598
                      [55] Bratman M E, Israe D J, Pollack M E. Plans and resource bounded practical reasoning. Computational Intelligence, 1998, 4: 349?355
                      [56] Lieto A, Bhatt M, Oltramari A, Vernon D. The role of cognitive architectures in general artificial intelligence. Cognitive Systems Research, 2018, 48: 1?3 doi: 10.1016/j.cogsys.2017.08.003
                      [57] Li D Y, Ge S Z, He W, Li C J, Ma G F. Distributed formation control of multiple Euler-Lagrange systems: a multilayer framework. IEEE Transactions on Cybernetics, 2020, doi: 10.1109/TCYB.2020.3022535.
                      [58] Amduka M, Russo J, Jha K, DeHon A, Lethin R. The Design of a Polymorphous Cognitive Agent Architecture (PCCA). Lockheed Martin Advanced Technology Labs, Cherry Hill NJ, 2008.
                      [59] Keller J. DARPA to develop swarming unmanned vehicles for better military reconnaissance. Military & Aerospace Electronics, 2017, 28(2): 4?6
                      [60] 孫瑞, 王智學, 姜志平, 蔣鑫. 外軍指揮控制過程模型剖析. 艦船電子工程, 2012, 32(5): 12?15 doi: 10.3969/j.issn.1627-9730.2012.05.004

                      Sun Rui, Wang Zhi-Xue, Jiang Zhi-Ping, Jiang Xin. Analysis of the foreign military command and control process model. Ship Electronic Engineering, 2012, 32(5): 12?15 doi: 10.3969/j.issn.1627-9730.2012.05.004
                      [61] Fusano A, Sato H, Namatame A. Multi-agent based combat simulation from OODA and network perspective. In: Proceedings of 2011 UkSim 13rd International Conference on Computer Modelling and Simulation. Cambridge, UK: IEEE, 2011.249–254.
                      [62] Huang Y Y. Modeling and simulation method of the emergency response systems based on OODA. Knowledge- Based Systems, 2015, 89: 527?540 doi: 10.1016/j.knosys.2015.08.020
                      [63] Zhao Q. Training and retraining of neural network trees. In: Proceedings of International Joint Conference on Neural Networks. Washingto, DC, USA: IEEE, 2001.7023424.
                      [64] Brent R P. Fast training algorithms for multilayer neural nets. IEEE Transactions on Neural Networks, 1991, 2(3): 346?351 doi: 10.1109/72.97911
                      [65] Schmitz G, Aldrich C, Gouws F S. ANN-DT: an algorithm for extraction of decision trees from artificial neural networks. IEEE Transactions on Neural Networks, 1999, 10(6): 1392?1401 doi: 10.1109/72.809084
                      [66] Utkin L V, Zhuk Y A, Zaborovsky V S. An anomalous behavior detection of a robot system by using a hierarchical Siamese neural network. In: Proceedings of International Conference on Soft Computing and Measurements. St. Petersburg, Russia: IEEE, 2017.17014458.
                      [67] Bromley J, et al. Signature verification using a Siamese time delay neural network. International Journal of Pattern Recognition and Artificial Intelligence, 1993, 7(4): 737?744
                      [68] Calvo R, Figueiredo M. Reinforcement learning for hierarchical and modular neural network in autonomous robot navigation. In: Proceedings of the International Joint Conference on Neural Networks. Portland, OR, USA: IEEE, 2003.7883904.
                      [69] Roy D, Panda P, Roy K. Tree-CNN: a hierarchical deep convolutional neural network for incremental learning. Neural Networks, 2020, 121: 148?160 doi: 10.1016/j.neunet.2019.09.010
                      [70] Zheng Y, Chen Q Y, Fan J P, Gao X B. Hierarchical convolutional neural network via hierarchical cluster validity based visual tree learning. Neurocomputing, 2020, 409: 408?419 doi: 10.1016/j.neucom.2020.05.095
                      [71] Yang Y X, Morillo I G, Hospedales T M. Deep neural decision trees. In: Proceedings of the 35th ICML Workshop on Human Interpretability in Machine Learning. Stockholm, Sweden: ACM, 2018.34–40.
                      [72] Fei H, Ren Y F, Ji D H. A tree-based neural network model for biomedical event trigger detection. Information Sciences, 2020, 512: 175?185 doi: 10.1016/j.ins.2019.09.075
                      [73] Ren X M, Gu H X, Wei W T. Tree-RNN: tree structural recurrent neural network for network traffic classification. Expert Systems With Applications, 2021, 167: 114363 doi: 10.1016/j.eswa.2020.114363
                      [74] Nicholas E. Genetic fuzzy trees for intelligent control of unmanned combat aerial vehicles [Ph. D. dissertation], University of Cincinnati, 2015.
                      [75] Nicholas E, David C, Corey S, Matthew, C. Genetic fuzzy based artificial intelligence for unmanned combat aerial vehicle control in simulated air combat missions. Journal of Defense Management, 2016, 6(1): 2167?0374.1000144
                      [76] Nicholas E, Kelly C, Elad K, Corey S. Genetic fuzzy trees and their application towards autonomous training and control of a squadron of unmanned combat aerial vehicles. Journal of Unmanned Systems, 2015, 3(3): 185?204 doi: 10.1142/S2301385015500120
                      [77] Kang Y M, Pu Z Q, Liu Z, Li G, Yi J Q. Air-to-air combat tactical decision method based on SIRMs fuzzy logic and improved genetic algorithm. In: Proceedings of 2020 International Conference on Guidance, Navigation and Control. Tianjin, China: Springer, to be published.
                      [78] Botvinick M M. Hierarchical reinforcement learning and decision making. Current Opinion in Neurobiology, 2012, 22: 956?962 doi: 10.1016/j.conb.2012.05.008
                      [79] Bradtke S J, Duff M O. Reinforcement learning methods for continuous-time Markov decision problems. In: Proceedings of the 7th International Conference on Neural Information Processing Systems. MIT Press, Cambridge, MA, US: ACM, 1994.393–400.
                      [80] Dayan P, Hinton G E. Feudal reinforcement learning. In: Proceedings of the 1993 Conference on Advances in Neural Information Processing Systems. San Mateo, CA: ACM, 1993.1–8.
                      [81] Sutton R S, Precup D, Singh S. Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning. Artificial intelligence, 1999, 112(1-2): 181?211 doi: 10.1016/S0004-3702(99)00052-1
                      [82] Parr R, Russell S. Reinforcement learning with hierarchies of machines. In: Proceedings of the 1997 Conference on Advances in Neural Information Processing Systems. MIT Press, Cambridge, MA, US: ACM, 1998.1043–1049.
                      [83] Dietterich T G. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of artificial intelligence research, 2000, 13: 227?303 doi: 10.1613/jair.639
                      [84] Jaderberg M, Czarnecki W M, Dunning Iain, Marris L, Lever G, Castaneda A G, et al. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science, 2019, 364(6443): 859?865 doi: 10.1126/science.aau6249
                      [85] Yang J C, Borovikov I, Zha H Y. Hierarchical cooperative multi-agent reinforcement learning with skill discovery. In: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. Auckland, New Zealand: ACM, 2020.1566–1574.
                      [86] Tang H Y, Hao J Y, Lv T J, Chen Y F, Zhang Z Z, Jia H T, et al. Hierarchical deep multiagent reinforcement learning with temporal abstraction. arXiv: 1809.09332, 2019.
                      [87] Zheng Y B, Li B, An D Y, LI N. Multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field. Journal of computer applications, 2015, 35(12): 3491?3496
                      [88] 王沖, 景寧, 李軍, 王鈞, 陳浩. 一種基于多Agent 強化學習的多星協同任務規劃算法. 國防科技大學學報, 2011, 33(1): 53?58 doi: 10.3969/j.issn.1001-2486.2011.01.012

                      Wang Chong, Jing Ning, Li Jun, Wang Jun, Chen Hao. An algorithm of cooperative multiple satellites mission planning based on multi-agent reinforcement learning. Journal of national university of defense technology, 2011, 33(1): 53?58 doi: 10.3969/j.issn.1001-2486.2011.01.012
                      [89] Pierre B, Jean H. The option-critic architecture. In: Proceedings of 31st AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2017.1726–1734.
                      [90] Vezhnevets A S, Osindero S, Schaul T, Heess N, Jaderberg M, Silver D, et al. Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ACM, 2017.3540–3549.
                      [91] Piot B, Geist M, Pietquin O. Bridging the Gap between imitation learning and inverse reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems, 2017, 28(8): 1814?1826 doi: 10.1109/TNNLS.2016.2543000
                      [92] 周志華. 機器學習. 北京: 清華大學出版社, 2016, 390–393.

                      Zhou Zhi-Hua. Machine learning. Beijing: Tsinghua University Press, 2016, 390–393.
                      [93] Argall B D, Chernova S, Veloso M, Browning B. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 2009, 57(5): 469?483 doi: 10.1016/j.robot.2008.10.024
                      [94] Wu B, Fu Q, Liang J, Qu P, Li X Q, Wang L, et al. Hierarchical macro strategy model for MOBA game AI. In: Proceedings of 33rd AAAI Conference on Artificial Intelligence. Hawaii, USA: AAAI, 2019.1206–1213.
                      [95] Sui Z Z, Pu Z Q, Yi J Q, Wu S G. Formation control with collision avoidance through deep reinforcement learning using model-guided demonstration. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32(6): 2358?2372 doi: 10.1109/TNNLS.2020.3004893
                      [96] Ng A, Russell S. Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning. Stanford, USA: ACM, 2000.663–670.
                      [97] Shao Z F, Er M J. A review of inverse reinforcement learning theory and recent advances. In: Proceedings of 2012 IEEE World Congress on Computational Intelligence. Brisbane, Australia: IEEE, 2012.12910335.
                      [98] 陳希亮, 曹雷, 何明, 李晨溪, 徐志雄. 深度逆向強化學習研究綜述. 計算機工程與應用, 2018, 54(5): 24?35 doi: 10.3778/j.issn.1002-8331.1711-0289

                      Chen Xi-Liang, Cao Lei, He Ming, Li Chen-Xi, Xu Zhi-Xiong. Overview of deep inverse reinforcement learning. Computer Engineering and Applications, 2018, 54(5): 24?35 doi: 10.3778/j.issn.1002-8331.1711-0289
                      [99] Finn C, Levine S, Abbeel P. Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on Machine Learning. New York, USA: ACM, 2016.49–58.
                      [100] Wulfmeier M, Ondruska P, Posner I. Maximum entropy deep inverse reinforcement learning. arXiv: 1507.04888, 2015.
                      [101] Choi S, Kim E, Lee K, Oh S. Leveraged non-stationary Guassian process regression for autonomous robot navigation. In: Proceedings of 2015 IEEE International Conference on Robotics and Automation. Washington, USA: IEEE, 2015.473–478.
                      [102] Reddy T S, Gopikrishna V, Zaruba G, Huber M. Inverse reinforcement learning for decentralized non-cooperative multiagent systems. In: Proceedings of 2012 IEEE International Conference on Systems, Man, and Cybernetics. Seoul, Korea: IEEE, 2012.1930–1935.
                      [103] Lin X M, Beling P A, Cogill R. Multiagent inverse reinforcement learning for two-person zero-sum games. IEEE Transactions on Games, 2018, 10(1): 56?68 doi: 10.1109/TCIAIG.2017.2679115
                      [104] 王雪松, 朱美強, 程玉虎. 強化學習原理及其應用. 北京: 科學出版社, 2014, 15–16.

                      Wang Xue-Song, Zhu Mei-Qiang, Cheng Yu-Hu. Principle and Applications of Reinforcement Learning. Beijing: Science Press, 2014, 15–16.
                      [105] Land A. Theory and application of reward shaping in reinforcement learning [Ph. D. dissertation], University of Illinois, 2004.
                      [106] Wu S G, Pu Z Q, Liu Z, Yi, J Q, Zhang T L. Multi-target coverage with connectivity maintenance using knowledge- incorporated policy framework. In: Proceedings of 2021 IEEE International Conference on Robotics and Automation. Xi’an, China: IEEE, 2021.8772–8778.
                      [107] Wang J J, Zhang Q C, Zhao D B, Chen Y R. Lane change decision-making through deep reinforcement learning with rule-based constraints. In: Proceedings of 2019 International Joint Conference on Neural Networks. Budapest, Hungary: IEEE, 2019.19028433.
                      [108] Khooban M H, Gheisarnejad M. A novel deep reinforcement learning controller based type-II fuzzy system: frequency regulation in microgrids. IEEE Transactions on Emerging Topics in Computational Intelligence, 2020, doi: 10.1109/TETCI.2020.2964886.
                      [109] Ng A Y, Harada D, Russell S J. Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning. Bled, Slovenia: ACM, 1999.278–287.
                      [110] Wiewiora E. Potential-based shaping and Q-value initialization are equivalent. Journal of Artificial Intelligence Research, 2003, 19: 205?208 doi: 10.1613/jair.1190
                      [111] Hussein A, Elyan E, Gaber M M, Jayne C. Deep reward shaping from demonstrations. In: Proceedings of 2017 International Joint Conference on Neural Networks. Anchorage, USA: IEEE, 2017.17010339.
                      [112] Wiewiora E, Cottrell G W, Elkan C. Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning. Washington, DC, USA: ACM, 2003.792–799.
                      [113] Devlin S, Kudenko D. Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems. Valencia, Spain: ACM, 2012.433–440.
                      [114] Harutyunyan A, Devlin S, Vrancx P, Nowe A. Expressing arbitrary reward functions as potential-based advice. In: Proceedings of 29th AAAI Conference on Artificial Intelligence. San Francisco, USA: AAAI, 2015.2652–2658.
                      [115] Singh S, Barto A G, Chentanez N. Intrinsically motivated reinforcement learning. In: Proceedings of the 2005 Conference on Advances in Neural Information Processing Systems. Vancouver, Canada: ACM, 2005.1281–1288.
                      [116] Singh S, Lewis R L, Barto A G. Where do rewards come from. Proceedings of the Annual Meeting of the Cognitive Science Society, 2009, 31: 2601?2606
                      [117] Zhang, T, Xu H, Wang X, Wu Y, Keutzer K, Gonzalez J E, et al. BeBold: exploration beyond the boundary of explored regions. arXiv: 2012.08621, 2020.
                      [118] Burda, Y, Edwards H, Pathak D, Storkey A, Darrell T, Efros A A. Large-scale study of curiosity-driven learning. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ACM, 2019.1-17.
                      [119] Yang D, Tang Y H. Adaptive inner-reward shaping in sparse reward games. In: Proceedings of 2020 International Joint Conference on Neural Networks. Glasgow, United Kingdom: IEEE, 2020.20006441.
                      [120] Kasabov N. Evolving fuzzy neural networks for supervised/unsupervised online knowledge-based learning. IEEE Transactions on Systems, Man, and Cybernetics – Part B: Cybernetics, 2001, 31(6): 902?918 doi: 10.1109/3477.969494
                      [121] Zhao D B, Yi J Q. GA-based control to swing up an acrobot with limited torque. Transactions of the Institute of Measurement and Control, 2006, 28(1): 3?13 doi: 10.1191/0142331206tm158oa
                      [122] Zhou M L, Zhang Q. Hysteresis model of magnetically controlled shape memory alloy based on a PID neural network. IEEE Transactions on Magnetics, 2015, 51(11): 7301504
                      [123] Lutter M, Ritter C, Peters J. Deep Lagrangian networks: using physics as model prior for deep learning. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ACM, 2019.
                      [124] Raissi M, Perdikaris P, Karniadakis G E. Physics informed deep learning (part I): data-driven solutions of nonlinear partial differential equations. arXiv: 1711.10561, 2017.
                      [125] Ledezma F D, Haddadin S. First-order-principles-based constructive network topologies: an application to robot inverse dynamics. In: Proceedings of the 17th International Conference on Humanoid Robotics (Humanoids). Birmingham, UK: IEEE, 2017.438–445.
                      [126] Sanchez-Gonzalez A, Heess N, Springenberg J T, Merel J, Riedmiller M, Hadsell R, et al. Graph networks as learnable physics engines for inference and control. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ACM, 2018.4470–4479.
                      [127] Jiang J, Dun C, Lu Z. Graph convolutional reinforcement learning for multi-agent cooperation. arXiv: 1810.09202, 2018.
                      [128] Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y. Graph attention networks. In: Proceedings of the 6th International Conference on Learning Representations. Vancouver, Canada: ACM, 2018.1?12.
                      [129] Nagabandi A, Clavera I, Liu S, Fearing R S, Abbeel P, Levine S, et al. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ACM, 2019.1−17.
                      [130] Liu D R, Xue S, Zhao B, Luo B, Wei Q L. Adaptive dynamic programming for control: a survey and recent advances. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2021, 51(1): 142?160 doi: 10.1109/TSMC.2020.3042876
                      [131] 陳鑫, 魏海軍, 吳敏, 曹衛華. 基于高斯回歸的連續空間多智能體跟蹤學習. 自動化學報, 2013, 39(12): 2019?2029

                      Chen Xin, Wei Hai-Jun, Wu Min, Cao Wei-Hua. Tracking learning based on Gaussian regression for multi-agent systems in continuous space. Acta Automatica Sinica, 2013, 39(12): 2019?2029
                      [132] Desouky S, Schwartz H. Q(λ)-learning fuzzy logic controller for a multi-robot system. In: Proceedings of IEEE International Conference on Systems, Man and Cybernetics. Istanbul, Turkey: IEEE, 2010.4075–4080.
                      [133] Xiong T Y, Pu Z Q, Yi J Q, Sui Z Z. Adaptive neural network time-varying formation tracking control for multi-agent systems via minimal learning parameter approach. In: Proceedings of 2019 International Joint Conference on Neural Networks. Budapest, Hungary: IEEE, 2019.19028650.
                      [134] Xiong T Y, Pu Z Q, Yi J Q, Tao X L. Fixed-time observer based adaptive neural network time-varying formation tracking control for multi-agent systems via minimal learning parameter approach. IET Control Theory & Applications, 2020, 14(9): 1147?1157
                      [135] 楊彬, 周琪, 曹亮, 魯仁全. 具有指定性和全狀態約束的多智能體系統事件觸發控制. 自動化學報, 2019, 45(8): 1527?1534

                      Yang Bin, Zhou Qi, Cao Liang, Lu Ren-Quan. Event- triggered control for multi-agent systems with prescribed performance and full state constraints. Acta Automatica Sinica, 2019, 45(8): 1527?1534
                      [136] Yu J L, Dong X W, Li Q D, Ren Z. Practical time-varying formation tracking for second-order nonlinear multiagent systems with multiple leaders using adaptive neural networks. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(12): 6015?6025 doi: 10.1109/TNNLS.2018.2817880
                      [137] PatinoD, Carelli R, Kuchen B. Stability analysis of neural networks based adaptive controllers for robot manipulators. In: Proceedings of the American Control Conference. Baltimore, USA: IEEE, 1994.609–613.
                      [138] Li X B, Sun C Y. Supplementary reinforcement learning controller designed for quadrotor UAVs. IEEE Access, 2019, 7: 26422?26431 doi: 10.1109/ACCESS.2019.2901295
                      [139] Alshiekh M, Bloem R, Ehlers R, et al. Safe reinforcement learning via shielding. In: Proceedings of 32nd AAAI Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018. 2669–2678.
                      [140] Ye D H, Liu Z, Sun M F, Shi B, Zhao P L, Wu H, et al. Mastering complex control in MOBA games with deep reinforcement learning. In: Proceedings of 34th AAAI Conference on Artificial Intelligence. New Youk, USA: AAAI, 2020.1–8.
                      [141] Shoham Y, Powers R, Grenager T. If multi-agent learning is the answer, what is the question? Artificial Intelligence, 2007, 171: 365–377.
                      [142] Tuyls K, Parsons S. What evolutionary game theory tells us about multiagent learning. Artificial Intelligence, 2007, 171: 406?416 doi: 10.1016/j.artint.2007.01.004
                      [143] Molnar C. Interpretable machine learning – a guide for making black box models explainable [Online], available: https://christophm.github.io/interpretable-ml-book/, June 24, 2021.
                      [144] Albrecht S V, Stone P. Autonomous agents modelling other agents: a comprehensive survey and open problems. Artificial Intelligence, 2018, 258: 66?95 doi: 10.1016/j.artint.2018.01.002
                    • 加載中
                    計量
                    • 文章訪問數:  490
                    • HTML全文瀏覽量:  350
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2021-02-04
                    • 錄用日期:  2021-06-18
                    • 網絡出版日期:  2021-07-16

                    目錄

                      /

                      返回文章
                      返回