2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    基于 GBDT 的鐵路事故類型預測及成因分析

                    鐘敏慧 張婉露 李有儒 朱振峰 趙耀

                    鐘敏慧, 張婉露, 李有儒, 朱振峰, 趙耀. 基于 GBDT 的鐵路事故類型預測及成因分析. 自動化學報, 2020, 45(x): 1?9 doi: 10.16383/j.aas.c190630
                    引用本文: 鐘敏慧, 張婉露, 李有儒, 朱振峰, 趙耀. 基于 GBDT 的鐵路事故類型預測及成因分析. 自動化學報, 2020, 45(x): 1?9 doi: 10.16383/j.aas.c190630
                    Zhong Min-Hui, Zhang Wan-Lu, Li You-Ru, Zhu Zhen-Feng, Zhao Yao. GBDT based railway accident type prediction and cause analysis. Acta Automatica Sinica, 2020, 45(x): 1?9 doi: 10.16383/j.aas.c190630
                    Citation: Zhong Min-Hui, Zhang Wan-Lu, Li You-Ru, Zhu Zhen-Feng, Zhao Yao. GBDT based railway accident type prediction and cause analysis. Acta Automatica Sinica, 2020, 45(x): 1?9 doi: 10.16383/j.aas.c190630

                    基于 GBDT 的鐵路事故類型預測及成因分析

                    doi: 10.16383/j.aas.c190630
                    基金項目: 科技創新 2030-“新一代人工智能”重大項目(2018AAA0102101), 中央高?;究蒲袠I務費(2018JBZ001), 國家自然科學基金(61976018和61532005)資助
                    詳細信息
                      作者簡介:

                      鐘敏慧:北京交通大學信息科學研究所碩士研究生. 主要研究方向為計算機視覺, 機器學習. 本文通信作者. E-mail: mhzhong@bjtu.edu.cn

                      張婉露:北京交通大學信息科學研究所碩士研究生. 主要研究方向為計算機視覺, 深度學習. E-mail: wlzhang@bjtu.edu.cn

                      李有儒:北京交通大學信息科學研究所碩士研究生. 主要研究方向為數據挖掘, 機器學習. E-mail: liyouru@bjtu.edu.cn

                      朱振峰:北京交通大學信息科學研究所教授. 2005年獲中國科學院自動化研究所模式識別國家重點實驗室工學博士學位. 主要研究方向為圖像視頻分析與理解, 計算機視覺, 機器學習. E-mail: zhfzhu@bjtu.edu.cn

                      趙耀:北京交通大學信息科學研究所教授, 所長. 1996年獲北京交通大學工學博士學位. 主要研究方向為圖像與視頻編碼, 數字水印與取證, 視頻分析及理解, 人工智能. E-mail: yzhao@bjtu.edu.cn

                    GBDT Based Railway Accident Type Prediction and Cause Analysis

                    Funds: Supported by Science and Technology Innovation 2030 Major Program: New Generation Artificial Intelligence (2018AAA0102101), the Fundamental Research Funds for the Central Universities (2018JBZ001), National Natural Science Foundation of China (61976018 and 61532005)
                    • 摘要: 運用數據挖掘技術進行鐵路事故類型預測及成因分析, 對于建立鐵路事故預警機制具有重要意義. 為此, 本文提出一種基于梯度提升決策樹(Grandient Boosting Decision Tree, GBDT)的鐵路事故類型預測及成因分析算法. 針對鐵路事故記錄數據缺失的問題, 提出一種基于屬性分布概率的補全算法, 最大程度保持原有數據分布, 從而降低數據缺失對事故類型預測造成的影響. 針對鐵路事故記錄數據類別失衡的問題, 提出一種集成的GBDT模型, 完成對事故類型的魯棒性預測. 在此基礎上, 根據GBDT預測模型中特征重要度排序, 實現事故成因分析. 通過在開放數據庫上進行實驗, 驗證了本文模型的有效性.
                    • 圖  1  基于GBDT的鐵路事故類型預測及成因分析框架

                      Fig.  1  The framework of GBDT-based railroad accident type prediction and cause analysis

                      圖  2  三種補全方法結果對比

                      Fig.  2  Comparison of three methods results

                      圖  3  不同GBDT集成個數下分類準確率

                      Fig.  3  Accuracy of classifiers with different number of GBDT

                      圖  4  混淆矩陣

                      Fig.  4  Confusion matrix

                      圖  5  不同特征數量下預測結果

                      Fig.  5  Prediction results of classifier with different features

                      圖  6  兩類事故致因中不同因素的比例

                      Fig.  6  Proportion of different factors in causes of two types of railroad accident

                      表  1  原始數據描述

                      Table  1  Description of original data

                      RecordAccident typeAttribute
                      Number 5 434 11 144
                      下載: 導出CSV

                      表  2  事故類型描述

                      Table  2  Description of accident types

                      TYPEDescribe
                      1 Derailment
                      2 Head on collision
                      3 Rearend collision
                      4 Side collision
                      5 Raking collision
                      6 Broken train collision
                      7 Hwy-rail crossing
                      8 RR Grade crossing
                      9 Obstruction
                      10 Fire
                      11 Other impacts
                      下載: 導出CSV

                      表  3  數據集部分示例

                      Table  3  Examples of the dataset

                      NameDescribeNumType
                      RAILROADRailroad code5 434Object
                      CARSNum.of cars carrying hazmat5 434Int64
                      TYPSPDTrain speed type5 086Object
                      TRNDIRTrain direction5 161Float64
                      TONSGross tonnage, excluding power units5 434Int64
                      TYPEQType of consist5 081Object
                      EQATTEquipment attended5 074Object
                      CDTRHRNum.of hours conductors on duty3 628Int64
                      ENGHRNum.of hours engineers on duty4 201Int64
                      TRKNAMETrack identification5 434Object
                      下載: 導出CSV

                      表  4  預處理后數據描述

                      Table  4  Description of preprocessed data

                      RecordAccident typeAttribute
                      Number5 43411119
                      下載: 導出CSV

                      表  5  三種方法補全前后特征TRNDIR取值分布

                      Table  5  Distribution of the attribute TRNDIR values before and after three completion method

                      Algorithm$a_j=1$$a_j=2$$a_j=3$$a_j=4$
                      Before completion0.220.200.310.27
                      Interpolation0.210.190.300.30
                      Mode0.210.190.340.26
                      Our algorithm0.220.200.310.27
                      下載: 導出CSV

                      表  6  不同采樣率下集成GBDT分類準確率

                      Table  6  Accuracy of classifiers with different sampling rate

                      $\alpha$0.60.70.80.91.0
                      Accuracy (%)0.8410.8460.8450.8520.848
                      下載: 導出CSV

                      表  7  各分類器性能對比

                      Table  7  Performance comparison of classifiers

                      ClassifierAccuracyPrecisionRecallF1
                      DT0.7280.730.730.73
                      RF0.7730.740.770.75
                      ET0.7340.700.730.71
                      GBDT0.8410.840.840.84
                      ensemble GBDT0.8520.850.850.85
                      下載: 導出CSV

                      表  8  重要度排名前15的特征

                      Table  8  Features of Top15 in importance

                      No.NameDescription
                      1LatitudeLatitude in decimal degrees
                      2LongitudeLongitude in decimal degrees
                      3CNTYCDFIPS County Code
                      4HIGHSPDMaximum speed
                      5TRKNAMETrack identification
                      6RRCAR1Car initials (fist involved)
                      7TEMPTemperature in degrees fahrenheit
                      8MILEPOSTMilepost
                      9STATIONNearest city and town
                      10TRNSPDSpeed of train in miles per hour
                      11RRCAR2Car initials (causing)
                      12SUBDIVRailroad subdivision
                      13ENGHRNum. of hours engineers on duty
                      14CDTRHRNum. of hours conductors on duty
                      15TONSGross tonnage
                      下載: 導出CSV
                      360彩票
                    • [1] 1 Mehmed K. Data mining concepts, models, methods and algorithms. IIe Transaction, 2005, 36(5): 495?496
                      [2] 馮士雍. 回歸分析方法. 北京: 科學出版社, 1974.

                      Feng Shi-Yong. Regression Analysis Method. Beijing: Science Press, 1974
                      [3] 3 Rutkowski L, Jaworski M, Pietruczuk L, Duda P. Decision trees for mining data streams based on the gaussian approximation. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(1): 108?119 doi: 10.1109/TKDE.2013.34
                      [4] 李定啟, 程遠平, 王海峰, 王亮, 周紅星, 孫建華. 基于決策樹ID3改進算法的煤與瓦斯突出預測. 煤炭學報, 2011, 36(4): 619?622

                      4 Li Ding-Qi, Cheng Yuan-Ping, Wang Hai-Feng, Wang Liang, Zhou Hong-Xing, Sun Jian-Hua. Coal and gas outburst prediction based on improved decision tree ID3 algorithm. Journal of China Coal Society, 2011, 36(4): 619?622
                      [5] 5 Breiman L. Random forest. Machine Learning, 2001, 45(1): 5?32 doi: 10.1023/A:1010933404324
                      [6] 6 Friedman J H. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, 2001, 29(5): 1189?1232
                      [7] 7 Friedman J H. Stochastic gradient boosting. Computational Statistics and Data Analysis, 2002, 38(4): 367?378 doi: 10.1016/S0167-9473(01)00065-2
                      [8] 周志華. 機器學習. 北京: 清華大學出版社, 2016.

                      Zhou Zhi-Hua. Machine Learning. Beijing: Tsinghua University Press, 2016.
                      [9] 9 Schonlau M. Boosted regression (boosting): an introductory tutorial and a stata plugin. The Stata Journal, 2005, 5(3): 330?354 doi: 10.1177/1536867X0500500304
                      [10] 翁小雄, 呂攀龍. 基于 GBDT 算法的地鐵 IC 卡通勤人群識別. 重慶交通大學學報 (自然科學版), 2019, 38(5): 8?12

                      10 Weng Xiao-Xiong, Lv Pan-Long. Subway IC card commuter crowd identification based on GBDT algorithm. Journal of Chongqing Jiaotong University(Natural Science), 2019, 38(5): 8?12
                      [11] 11 Mursalin M, Zhang Yuan, Chen Yue-Hui, Chawla N V. Automated epileptic seizure detection using improved correlation-based feature selection with random forest classifier. Neurocomputing, 2017, 241: 204?214 doi: 10.1016/j.neucom.2017.02.053
                      [12] 12 Cheng J, Li G, Chen X H. Research on travel time prediction model of freeway based on gradient boosting decision tree. IEEE Access, 2018, 7: 7466?7480
                      [13] 13 Ma X, Ding C, Luan S, Wang Y, Wang Y P. Prioritizing influential factors for freeway incident clearance time prediction using the gradient boosting decision trees method. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(9): 2303?2310 doi: 10.1109/TITS.2016.2635719
                      [14] Su H W, Zhang W J, Li Z H. Analysis and prediction of water traffic accidents in jingtang port based on improved GM(1, 1) model. In: Proceedings of the 37th Chinese Control Conference (CCC). New York, USA: IEEE, 2018.2212?2217
                      [15] Das S, Sun X D. Investigating the pattern of traffic crashes under rainy weather by association rules in data mining. In: Proceedings of the 93rd Transportation Research Board (TRB) Annual Meeting. Washington, USA: Nation Academy of Sciences, 2014
                      [16] 金勇進. 缺失數據的統計處理, 北京: 中國統計出版社, 2009.

                      Jin Yong-Jin. Statistical Processing of Missing Data. Beijing: China Statistics Press, 2009.
                      [17] 金勇進. 調查中的數據缺失及處理 (I)-缺失數據及其影響. 數理統計與管理, 2001, 20(4): 58?60 doi: 10.3969/j.issn.1002-1566.2001.04.012

                      17 Jin Yong-Jin. Data loss and processing in survey(I)) data missing and impact. Journal of Applied Statistics and Management, 2001, 20(4): 58?60 doi: 10.3969/j.issn.1002-1566.2001.04.012
                      [18] 18 Collell G, Prelec D, Patil K R. A simple plug-in bagging ensemble based on threshold-moving for classifying binary and multiclass imbalanced data. Neurocomputing, 2018, 275: 330?340 doi: 10.1016/j.neucom.2017.08.035
                      [19] 19 Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F. A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approaches. IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 2012, 42(4): 463?484 doi: 10.1109/TSMCC.2011.2161285
                      [20] 朱振峰, 湯靜遠, 常冬霞, 趙耀. 基于 GBDT 的商品分配層次化預測模型. 北京交通大學學報, 2018, 42(2): 9?13+45 doi: 10.11860/j.issn.1673-0291.2018.02.002

                      20 Zhu Zhen-Feng, Tang Jing-Yuan, Chang Dong-Xia, Zhao Yao. GBDT based hierarchical model for commodity distribution prediction. Journal of Beijing Jiaotong University, 2018, 42(2): 9?13+45 doi: 10.11860/j.issn.1673-0291.2018.02.002
                      [21] 楊連報, 李平, 薛蕊, 馬小寧, 吳艷華, 鄒丹. 基于不平衡文本數據挖掘的鐵路信號設備故障智能分類. 鐵道學報, 2018, 40(2): 59?66 doi: 10.3969/j.issn.1001-8360.2018.02.009

                      21 Yang Lian-Bao, Li Ping, Xue Rui, Ma Xiao-Ning, Wu YanHua, Zou Dan. Intelligent classification of faults of railway signal equipment based on imbalancd text data mining. Journal of the China Railway Society, 2018, 40(2): 59?66 doi: 10.3969/j.issn.1001-8360.2018.02.009
                      [22] Federal Railroad Administration Office of Safety Analysis [Online], available: https://safetydata.fra.dot.gov/OfficeofSafety/Default.aspx, June 1, 2019
                    • 加載中
                    計量
                    • 文章訪問數:  555
                    • HTML全文瀏覽量:  342
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2019-09-11
                    • 錄用日期:  2020-01-17

                    目錄

                      /

                      返回文章
                      返回