2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    面向精準價格牌識別的多任務循環神經網絡

                    牟永強 范寶杰 孫超 嚴蕤 郭怡適

                    牟永強, 范寶杰, 孫超, 嚴蕤, 郭怡適. 面向精準價格牌識別的多任務循環神經網絡. 自動化學報, 2020, 45(x): 1?7 doi: 10.16383/j.aas.c190633
                    引用本文: 牟永強, 范寶杰, 孫超, 嚴蕤, 郭怡適. 面向精準價格牌識別的多任務循環神經網絡. 自動化學報, 2020, 45(x): 1?7 doi: 10.16383/j.aas.c190633
                    Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2020, 45(x): 1?7 doi: 10.16383/j.aas.c190633
                    Citation: Mou Yong-Qiang, Fan Bao-Jie, Sun Chao, Yan Rui, Guo Yi-Shi. Towards accurate price tag recognition algorithm with multi-task RNN. Acta Automatica Sinica, 2020, 45(x): 1?7 doi: 10.16383/j.aas.c190633

                    面向精準價格牌識別的多任務循環神經網絡

                    doi: 10.16383/j.aas.c190633
                    詳細信息
                      作者簡介:

                      牟永強:廣州圖匠數據科技有限公司首席AI架構師. 在此之前任職惠普實驗室高級機器學習研究員. 2012年獲西安理工大學信號與信息處理專業碩士學位. 主要研究方向為機器視覺,模式識別以及深度學習. 本文通信作者.E-mail: yongqiang.mou@gmail.com

                      范寶杰:廣東工業大學碩士研究生,主要研究方向為深度學習和計算機視覺.E-mail: 735678367@qq.com

                      孫超:華南農業大學研究生,主要研究方向為深度學習和計算機視覺.E-mail: ice_moyan@163.com

                      嚴蕤:廣州圖匠數據科技有限公司高級研究員,主要研究方向為深度學習和計算機視覺.E-mail: reeyree@163.com

                      郭怡適:廣州圖匠數據科技有限公司首席執行官,主要研究方向為深度學習和計算機視覺.E-mail: yi.shi@imagedt.com

                    Towards accurate price tag recognition algorithm with multi-task RNN

                    • 摘要: 為了促進智能新零售在線下業務場景的發展, 提高作為銷售關鍵信息價格牌的識別精度. 本文對價格牌識別問題進行研究, 有效地提高了價格牌的識別精度, 并解決小數點定位不準確的難題. 通過深度卷積神經網絡提取價格牌的深度語義表達特征, 將提取到的特征圖送入多任務循環網絡層進行編碼, 然后根據解碼網絡設計的注意力機制解碼出價格數字, 最后將多個分支的結果整合并輸出完整價格. 本文所提出的方法能夠非常有效的提高線下零售場景價格牌的識別精度, 并解決了一些領域難題如小數點的定位問題, 此外, 為了驗證本文方法的普適性, 在其他場景數據集上進行了對比實驗, 相關結果也驗證了本文方法的有效性.
                    • 圖  1  卷積循環網絡結構

                      Fig.  1  The structure of convolutional recurrent neural network

                      圖  2  價格牌圖像

                      Fig.  2  Images of some price tag samples

                      圖  4  基礎單任務識別網絡結構

                      Fig.  4  The structure of our basic single recognition network

                      圖  3  基準識別與多分支識別結果的生成方式

                      Fig.  3  Baseline method compared with multi-branch method

                      圖  5  多任務循環卷積網絡結構

                      Fig.  5  The structure of multi-task RNN

                      圖  6  注意力機制網絡解碼流程圖

                      Fig.  6  Flowchart of decoder network based on attention

                      圖  7  與直接識別方法的比較

                      Fig.  7  Compared with the single-branch method

                      表  1  模塊的研究

                      Table  1  Study of modules

                      ModelGeneral-dataHard-data
                      VGG-BiLSTM-CTC50.20%20.20%
                      VGG-BiLSTM-Attn61.20%38.60%
                      ResNet-BiLSTM-CTC55.60%28.80%
                      ResNet-BiLSTM-Attn68.10%41.40%
                      下載: 導出CSV

                      表  2  多任務模型結果

                      Table  2  Results of multitask model

                      ModelGeneral-dataHard-data
                      Baseline[13]68.10%41.40%
                      NDPB&IB90.10%72.90%
                      NDPB&DB91.70%74.30%
                      IB&DB92.20%73.20%
                      NDPB&IB&DB93.20%75.20%
                      下載: 導出CSV

                      表  3  車牌數據集實驗結果

                      Table  3  Experimental results on license plate dataset

                      DBFNRotateTiltWeatherChallenge
                      TE2E[17]96.90%94.30%90.80%92.50%87.90%85.10%
                      CCPD[16]96.90%94.30%90.80%92.50%87.90%85.10%
                      Ours method98.24%98.81%98.12%98.79%98.19%91.92%
                      下載: 導出CSV
                      360彩票
                    • [1] 1 Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 2016, 39(11): 2298?2304
                      [2] Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez A N, Polosukhin I. Attention is all you need//Advances in neural information processing systems. 2017: 5998−6008
                      [3] Luong M T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv: 1508.04025, 2015
                      [4] Li H, Wang P, Shen C. Towards end-to-end text spotting with convolutional recurrent neural networks//Proceedings of the IEEE International Conference on Computer Vision. IEEE, 2017: 5238−5246
                      [5] Yuan X, He P, Li X A. Adaptive adversarial attack on scene text recognition. arXiv preprint arXiv: 1807.03326, 2018
                      [6] Graves A, Fernández S, Gomez F, Schmidhuber J. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks//Proceedings of the 23rd international conference on Machine learning. ACM, 2006: 369−376
                      [7] 7 Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 2014, : 3104?3112
                      [8] 8 Lei Z, Zhao S, Song H, Shen J. Scene text recognition using residual convolutional recurrent neural network. Machine Vision and Applications, 2018, 29(5): 861?871 doi: 10.1007/s00138-018-0942-y
                      [9] Shi B, Yang M, Wang X, Lyu P, Yao C, Bai X. Aster: An attentional scene text recognizer with flexible rectification. IEEE transactions on pattern analysis and machine intelligence, 2018
                      [10] Long M, Wang J. Learning multiple tasks with deep relationship networks. arXiv preprint arXiv: 1506.02117, 2015, 2
                      [11] Veit A, Matera T, Neumann L, Matas J, Belongie S. Coco-text: Dataset and benchmark for text detection and recognition in natural images. arXiv preprint arXiv: 1601.07140, 2016
                      [12] Karatzas D, Gomez-Bigorda L, Nicolaou A, Ghosh S, Bagdanov A, Iwamura M, Shafait F. ICDAR 2015 competition on robust reading//2015 13th International Conference on Document Analysis and Recognition (ICDAR). IEEE, 2015: 1156−1160
                      [13] Baek J, Kim G, Lee J, Park S, Han D. What is wrong with scene text recognition model comparisons? dataset and model analysis. arXiv preprint arXiv: 1904.01906, 2019
                      [14] Bingel J, S?gaard A. Identifying beneficial task relations for multi-task learning in deep neural networks. arXiv preprint arXiv: 1702.08303, 2017
                      [15] Xie Z, Huang Y, Zhu Y, Jin L, Liu Y, Xie L. Aggregation Cross-Entropy for Sequence Recognition// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2019: 6538−6547
                      [16] 16 Li H, Wang P, Shen C. Toward end-to-end car license plate detection and recognition with deep neural networks. IEEE Transactions on Intelligent Transportation Systems, 2018, 20(3): 1126?1136
                      [17] Xu Z, Yang W, Meng A, Lu N, Huang H, Ying C, Huang L. Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline// Proceedings of the European Conference on Computer Vision (ECCV). 2018: 255−271
                    • 加載中
                    計量
                    • 文章訪問數:  655
                    • HTML全文瀏覽量:  359
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2019-09-06
                    • 錄用日期:  2020-02-23

                    目錄

                      /

                      返回文章
                      返回