2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    基于強化學習的濃密機底流濃度在線控制算法

                    袁兆麟 何潤姿 姚超 李佳 班曉娟 李瀟睿

                    袁兆麟, 何潤姿, 姚超, 李佳, 班曉娟, 李瀟睿. 基于強化學習的濃密機底流濃度在線控制算法. 自動化學報, 2019, 45(x): 1?14 doi: 10.16383/j.aas.c190348
                    引用本文: 袁兆麟, 何潤姿, 姚超, 李佳, 班曉娟, 李瀟睿. 基于強化學習的濃密機底流濃度在線控制算法. 自動化學報, 2019, 45(x): 1?14 doi: 10.16383/j.aas.c190348
                    Yuan Zhao-Lin, He Run-Zi, Yao Chao, Li Jia, Ban Xiao-Juan, Li Xiao-Rui. Online reinforcement learning control algorithm for concentration of thickener underflow. Acta Automatica Sinica, 2019, 45(x): 1?14 doi: 10.16383/j.aas.c190348
                    Citation: Yuan Zhao-Lin, He Run-Zi, Yao Chao, Li Jia, Ban Xiao-Juan, Li Xiao-Rui. Online reinforcement learning control algorithm for concentration of thickener underflow. Acta Automatica Sinica, 2019, 45(x): 1?14 doi: 10.16383/j.aas.c190348

                    基于強化學習的濃密機底流濃度在線控制算法

                    doi: 10.16383/j.aas.c190348
                    基金項目: 海南省重點研發計劃(No. ZDYF2019009), 國家重點基礎研究發展計劃(No.2019YFC0605300, No.2016YFB0700500),國家自然科學基金(No.61572075, No.61702036, No.61873299)資助
                    詳細信息
                      作者簡介:

                      袁兆麟:北京科技大學計算機與通信工程學院博士研究生, 2017年獲得北京科技大學計算機科學與技術系學士學位, 主要研究方向為自適應動態規劃和強化學習. E-mail: b20170324@xs.ustb.edu.cn

                      何潤姿:北京科技大學計算機與通信工程學院碩士研究生. 2017年獲得北京信息科技大學計算機科學與技術系學士學位. 主要研究方向為流體仿真和強化學習. E-mail: hrz.claire@gmail.com

                      姚超:2009年獲北京交通大學計算機科學學士學位, 2016年獲北京交通大學信息科學研究所博士學位. 2014年至2015年, 他在瑞士洛桑聯邦理工學院擔任訪問博士. 2016年至2018年, 他在北京郵電大學傳感技術與商業研究所擔任博士后. 自2018年以來, 他一直是北京科技大學的助理教授. 他目前的研究方向包括圖像和視頻處理以及計算機視覺. E-mail: yaochao@ustb.edu.cn

                      李佳:北京科技大學計算機與通信工程學院碩士研究生, 主要研究方向為自適應動態規劃, 自適應控制, 強化學習. E-mail: lijia1117@foxmail.com

                      班曉娟:北京科技大學教授, 中國人工智能學會常務理事. 研究領域: 人工智能、自然人機交互、三維可視化技術. 本文的通信作者. E-mail: banxj@ustb.edu.cn

                      李瀟睿:本科就讀于北京科技大學, 主要研究方向為蒙特卡羅樹搜索、強化學習. E-mail: i@lixiaorui.xyz

                    Online Reinforcement Learning Control Algorithm for Concentration of Thickener Underflow

                    Funds: Supported by Finance science and technology project of Hainan province (No. ZDYF2019009), National Key Research and Development Program of China (No.2019YFC0605300, No.2016YFB0700500), National Natural Science Foundation of China (No.61572075, No.61702036, No.61873299)
                    More Information
                      Author Bio:

                      YUAN Zhao-Lin Ph.D. candidate at the School of Computer and Communication Engineering in University of Science and Technology Beijing. He received his bachelor degree from University of Science and Technology Beijing in computer science in 2017. His research interest covers adaptive dynamic programming and reinforcement learning

                      HE Run-Zi Master student of School in Computer and Communication Engineering, University of Science and Technology in Beijing. She received her bachelor degree from Beijing Science and Technology University in 2017. Her research interest covers fluid simulation and reinforcement learning

                      YAO Chao Received the B.S. degree in computer science from Beijing Jiaotong University (BJTU), Beijing, China, in 2009 and the Ph.D. degree from the Institute of Information Science, BJTU, in 2016.From 2014 to 2015, he served as a visiting Ph.D.student with the Ecole Polytechnique Federale de Lausanne, Switzerland. From 2016 to 2018, he served as a Post-Doctoral with the Institute of Sensing Technology and Business, Beijing University of Posts and Telecommunications, Beijing. Since 2018, he has been an Assistant Professor with University of Science Technology, Beijing (USTB), China. His current research interests include image and video processing and computer vision

                      LI Jia Master student of School in Computer and Communication Engineering, University of Science and Technology in Beijing, the main research direction is adaptive dynamic programming, adaptive control, and reinforcement learning

                      BAN Xiao-Juan Professor at University of Science and Technology Beijing and she is an executive council member in Chinese Association for Artificial Intelligence(CAAI). Her current research interest Artificial intelligence, Natural human-computer interaction and 3D Visualization. Corresponding author of this paper

                      LI Xiao-Rui Is currently pursuing the Bachelor's degree of Communication Engineering at University of Science and Technology Beijing, China.His research interests include reinforcement learning and game AI and he is investigating the application of Monte Carlo Tree Search methods to games

                    • 摘要: 復雜過程工業控制一直是控制應用領域研究的前沿問題. 濃密機作為一種復雜大型工業設備廣泛用于冶金、采礦等領域. 由于其在運行過程中具有多變量、非線性、高時滯等特點, 濃密機的底流濃度控制技術一直是學界、工業界的研究難點與熱點. 本文提出了一種基于強化學習技術的濃密機在線控制算法. 該算法在傳統啟發式動態規劃 (Heuristic dynamic programming, HDP)算法的基礎上, 設計融合了評價網絡與模型網絡的雙網結構, 并提出了基于短期經驗回放的方法用于增強評價網絡的訓練準確性, 實現了對濃密機底流濃度的穩定控制, 并保持控制輸入穩定在設定范圍之內. 最后, 通過濃密機仿真實驗的方式驗證了算法的有效性, 實驗結果表明本文提出的方法在時間消耗、控制精度上優于其他算法.
                    • 圖  1  濃密過程示意圖

                      Fig.  1  Illustration of thickening process.

                      圖  2  HCNVI算法結構示意圖

                      Fig.  2  Structure diagram of algorithm HCNVI

                      圖  3  人工神經網絡結構示意圖

                      Fig.  3  Structure diagram of artificial neural network

                      圖  4  迭代梯度下降過程可視化

                      Fig.  4  Visualize the process of iterative gradient decline

                      圖  5  短期經驗回放對評價網絡的輸出值的影響

                      Fig.  5  The effect of short-term experience replay on critic network

                      圖  6  噪音量變化曲線

                      Fig.  6  Noise input in the simulation experiment

                      圖  7  HCNVI與其他ADP算法在恒定噪音輸入下的對比

                      Fig.  7  HCNVI versu other ADP algorithms under stable noisy input

                      圖  8  短期經驗回放對HDP與HCNVI的影響

                      Fig.  8  The influence of short-term experience replay on HDP and HCNVI

                      圖  9  實驗一中HDP與HCNVI在時間消耗上的對比

                      Fig.  9  Comparison of time consuming in HDP and HCNVI in experiment 1

                      圖  10  噪音量變化曲線

                      Fig.  10  The fluctuation of noisy input

                      圖  11  HCNVI與其他ADP算法在波動噪聲輸入下的對比

                      Fig.  11  HCNVI versu other ADP algorithms under fluctuate noisy input

                      圖  12  噪音持續變化下短期經驗回放對HCNVI的影響

                      Fig.  12  The influence of short-term experience replay on HCNVI

                      圖  13  實驗二中HCNVI算法與HDP算法在時間消耗上的對比

                      Fig.  13  Comparison of time consuming in HDP and HCNVI in experiment 2

                      表  1  參量定義

                      Table  1  Variables definition

                      變量含義量綱初始值補充說明
                      $f_{i}(t)$進料泵頻$Hz$40擾動量
                      $f_{u}(t)$底流泵頻$Hz$85控制量
                      $f_{f}(t)$絮凝劑泵頻$Hz$40控制量
                      $c _ { i } ( t )$進料濃度$kg/m^3$73擾動量
                      $h(t)$泥層高度$m$1.48狀態量
                      $c_u(t)$底流濃度$kg/m^3$680目標量
                      下載: 導出CSV

                      表  2  仿真模型常量

                      Table  2  Definitions for constant variables

                      變量含義量綱參考值
                      $\rho _s$干砂密度$kg/m^3$4 150
                      $\rho _e$介質表觀密度$kg/m^3$1 803
                      $\mu _ { e }$懸浮體系的表觀粘度$Pa \cdot s$1
                      $d_0$進料顆粒直徑$m$0.00008
                      $p$平均濃度系數0.5
                      $A$濃密機橫截面積$m^2$300.5
                      $k_s$絮凝劑作用系數$s/m^2$0.157
                      $k_i$壓縮層濃度系數$m^3/s$0.0005*3600
                      $K_i$進料流量與進料泵頻的系數$m^3/r$50/3 600
                      $K_u$底流流量與底流泵頻的系數$m^3/r$2/3 600
                      $K_f$絮凝劑流量與絮凝劑泵頻的系數$m^3/r$0.75/3 600
                      $\theta$壓縮時間$s$2 300
                      下載: 導出CSV

                      表  3  部分變量計算方法

                      Table  3  Definitions for part intermediate variables

                      變量含義公式
                      $q_i(t)$進料流量$q _ { i } ( t ) = K _ { i } f _ { i } ( t )$
                      $q_u(t)$底流流量$q _ { u } ( t ) = K _ { u } f _ { u } ( t )$
                      $q_f(t)$絮凝劑添加量$q _ { f } ( t ) = K _ { f } f _ { f } ( t )$
                      $d(t)$絮凝作用后的顆粒直徑$d ( t ) = k _ { s } q _ { f } ( t ) + d _ { 0 }$
                      $u_t(t)$顆粒的干涉沉降速度$u _ { t} ( t ) = \dfrac { d ^ { 2 } ( t ) \left( \rho _ { s } - \rho _ { e } \right) g } { 18 \mu _ { e } }$
                      $u_r(t)$底流導致的顆粒下沉速度$u _ { r } ( t ) = \dfrac { q _ { u } ( t ) } { A }$
                      $c_l(t)$泥層高度處單位體積含固量$c _ { l } ( t ) = k _ { i } q _ { i } ( t ) c _ { i } ( t )$
                      $c_a(t)$泥層界面內單位體積含固量$c _ { a } ( t ) = p \left[ c _ { l } ( t ) + c _ { u } ( t ) \right]$
                      $r(t)$泥層內液固質量比$r(t)=\rho_{l}\left(\dfrac{1}{c_ a(t)}-\frac{1}{\rho_s}\right)$
                      $W ( t )$單位時間進入濃密機內的固體質量$W ( t ) = c _ { i } (t ) q _ { i } ( t )$
                      下載: 導出CSV

                      表  4  不同控制算法之間性能分析

                      Table  4  Performances analysis of different algorithms

                      實驗組實驗一實驗二
                      對比指標MSE1MAE2IAE3MSEMAEIAE
                      HDP414.182141.8547.2466 105.619275.07554.952
                      DHP290.886109.3125.392732.81496.14516.560
                      ILPL364.397135.4748.2892 473.661211.61535.222
                      HCNVI44.44566.6043.867307.61876.17612.998
                      下載: 導出CSV
                      360彩票
                    • [1] Shen Y, Hao L, Ding S X. Real-time implementation of fault tolerant control systems with performance optimization. IEEE Trans. Ind. Electron, 2014, 61(5): 2402?2411 doi: 10.1109/TIE.2013.2273477
                      [2] Kouro S, Cortes P, Vargas R, Ammann U, Rodriguez J. Model predictive control-A simple and powerful method to control power converters. IEEE Trans. Ind. Electron, 2009, 56(6): 1826?1838 doi: 10.1109/TIE.2008.2008349
                      [3] Dai W, Chai T, Yang S X. Data-driven optimization control for safety operation of hematite grinding process. IEEE Trans. Ind. Electron, 2015, 62(5): 2930?2941 doi: 10.1109/TIE.2014.2362093
                      [4] Wang D, Liu D, Zhang Q, Zhao D. Data-based adaptive critic designs for nonlinear robust optimal control with uncertain dynamics. IEEE Trans. Syst., Man, Cybern., Syst., 2016, 46(11): 1544?1555 doi: 10.1109/TSMC.2015.2492941
                      [5] Sutton S R, Barto G A. Reinforcement Learning: An Introduction. Cambridge: MIT Press, 2nd edition, 2018.
                      [6] Lewis F L, Vrabie D, Syrmos V L. Optimal Control. New York, USA: John Wiley & Sons, Hoboken, 3rd Edition, 2012.
                      [7] Prokhorov V D, Wunsch C D. Adaptive critic design. IEEE Transactions on Neural Networks, 1997, 8(5): 997?1007 doi: 10.1109/72.623201
                      [8] Werbos P J. Foreword - ADP: the key direction for future research in intelligent control and understanding brain intelligence. *IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics)., 2008, 38(4): 898?900 doi: 10.1109/TSMCB.2008.924139
                      [9] 段艷杰, 呂宜生, 張杰, 趙學亮, 王飛躍. 深度學習在控制領域的研究現狀與展望. 自動化學報, 2016, 42(5): 643?654

                      Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control: the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5): 643?654
                      [10] Liu Y-J, Tang L, Tong S-C, Chen C L P, Li D-J. Reinforcement learning design-based adaptive tracking control with less learning parameters for nonlinear discrete-time MIMO systems. *IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(1): 165?176 doi: 10.1109/TNNLS.2014.2360724
                      [11] Liu L, Wang Z, Zhang H. Adaptive fault-tolerant tracking control for MIMO discrete-time systems via reinforcement learning algorithm with less learning parameters. *IEEE Transactions on Automation Science and Engineering, 2017, 14(1): 299?313 doi: 10.1109/TASE.2016.2517155
                      [12] Xu X, Yang H, Lian C, Liu J. Self-learning control using dual heuristic programming with global laplacian eigenmaps. *IEEE Transactions on Industrial Electronics, 2017, 64(12): 9517?9526 doi: 10.1109/TIE.2017.2708002
                      [13] Wei Q-L, Liu D-R. Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification. IEEE Transactions on Automation Science and Engineering, 2014, 11(4): 1020?1036 doi: 10.1109/TASE.2013.2284545
                      [14] Jiang Y, Fan J-L, Chai T-Y, Li J-N, Lewis L F. Data-driven flotation industrial process operational optimal control based on reinforcement learning. IEEE Transactions on Industrial Informatics, 2017, 14(5): 1974?1989
                      [15] Jiang Y, Fan J-L, Chai T-Y, Lewis L F. Dual-rate operational optimal control for flotation industrial process with unknown operational model. IEEE Transactions on Industrial Electronics, 2019, 66(6): 4587?4599 doi: 10.1109/TIE.2018.2856198
                      [16] Modares H, Lewis F L. Automatica integral reinforcement learning and experience replay for adaptive optimal control of partiallyunknownconstrained-input. Automatica, 2014, 50(1): 193?202 doi: 10.1016/j.automatica.2013.09.043
                      [17] Mnih V, Silver D, Riedmiller M. Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop 2013, Lake Tahoe, USA: NIPS 2013, 1−9
                      [18] Wang D, Liu D-R, Wei Q-L, Zhao D-B, Jin N. Automatica optimal control of unknown nonaffine nonlinear discrete-time systems basedon adaptive dynamic programming. Automatica, 2012, 48(8): 1825?1832 doi: 10.1016/j.automatica.2012.05.049
                      [19] Chai T-Y, Jia Y, Li H-B, Wang H. An intelligent switching control for a mixed separation thickener process. Control Engineering Practice, 2016, 57: 61?71 doi: 10.1016/j.conengprac.2016.07.007
                      [20] Kim B H, Klima M S. Development and application of a dynamic model for hindered-settling column separations. Minerals Engineering, 2004, 17(3): 403?410 doi: 10.1016/j.mineng.2003.11.013
                      [21] Wang L-Y, Jia Y, Chai T-Y, Xie W-F. Dual rate adaptive control for mixed separationthickening process using compensation signal basedapproach. IEEE Transactions on Industrial Electronics, 2017, PP: 1?1
                      [22] 王猛. 礦漿中和沉降分離過程模型軟件的研發. 東北大學, 2011

                      Wang Meng. Design and development of model software of processes of slurry neutralization, sedimentation and separation. Northeastern University, 2011
                      [23] 唐謨堂. 濕法冶金設備. 中南大學出版社, 2009

                      Tang Mo-Tang. Hydrometallurgical equipment. Central South University, 2009
                      [24] 王琳巖, 李健, 賈瑤, 柴天佑. 混合選別濃密過程雙速率智能切換控制. 自動化學報, 2018, 44(2): 330?343

                      Wang Lin-Yan, Li Jian, Jia Yao, Chai Tian-You. Dual-rate intelligent switching control for mixed separation thickening process. Acta Automatica Sinica, 2018, 44(2): 330?343
                      [25] Luo B, Liu D-R, Huang T-W, Wang D. Model-free optimal tracking control via critic-only Q-learning. IEEE Transactions on Neural Networks and Learning Systems, 2016, 27(10): 2134?2144 doi: 10.1109/TNNLS.2016.2585520
                      [26] Padhi R, Unnikrishnan N, Wang X-H, Balakrishnan S N. A single network adaptive critic (SNAC) architecture for optimal controlsynthesis for a class of nonlinear systems. Neural Networks, 2006, 19(10): 1648?1660 doi: 10.1016/j.neunet.2006.08.010
                    • 加載中
                    計量
                    • 文章訪問數:  3472
                    • HTML全文瀏覽量:  885
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2019-05-10
                    • 錄用日期:  2019-08-15
                    • 修回日期:  2019-07-02
                    • 網絡出版日期:  2019-12-25

                    目錄

                      /

                      返回文章
                      返回