2.793

                    2018影響因子

                    (CJCR)

                    • 中文核心
                    • EI
                    • 中國科技核心
                    • Scopus
                    • CSCD
                    • 英國科學文摘

                    留言板

                    尊敬的讀者、作者、審稿人, 關于本刊的投稿、審稿、編輯和出版的任何問題, 您可以本頁添加留言。我們將盡快給您答復。謝謝您的支持!

                    姓名
                    郵箱
                    手機號碼
                    標題
                    留言內容
                    驗證碼

                    解耦表征學習綜述

                    文載道 王佳蕊 王小旭 潘泉

                    文載道, 王佳蕊, 王小旭, 潘泉. 解耦表征學習綜述. 自動化學報, 2021, 47(x): 1?24 doi: 10.16383/j.aas.c210096
                    引用本文: 文載道, 王佳蕊, 王小旭, 潘泉. 解耦表征學習綜述. 自動化學報, 2021, 47(x): 1?24 doi: 10.16383/j.aas.c210096
                    Wen Zai-Dao, Wang Jia-Rui, Wang Xiao-Xu, Pan Quan. A Review of Disentangled Representation Learning. Acta Automatica Sinica, 2021, 47(x): 1?24 doi: 10.16383/j.aas.c210096
                    Citation: Wen Zai-Dao, Wang Jia-Rui, Wang Xiao-Xu, Pan Quan. A Review of Disentangled Representation Learning. Acta Automatica Sinica, 2021, 47(x): 1?24 doi: 10.16383/j.aas.c210096

                    解耦表征學習綜述

                    doi: 10.16383/j.aas.c210096
                    基金項目: 國家自然科學基金(61806165, 61790552, 61801020), 陜西省基礎研究計劃) (2020JQ-196)資助
                    詳細信息
                      作者簡介:

                      文載道:西北工業大學自動化學院副教授. 主要研究方向為壓縮感知與稀疏模型, 認知機器學習, 合成孔徑雷達圖像解譯, 多源自主目標識別. E-mail: wenzaidao@nwpu.edu.cn

                      王佳蕊:西北工業大學自動化學院博士研究生. 主要研究方向為解耦表征學習, SAR圖像處理, 因果推理. E-mail: wangjiarui_wyy163@163.com

                      王小旭:西北工業大學自動化學院教授. 主要研究方向為慣性器件與慣性導航, 合成孔徑雷達圖像解譯, 協同感知. 本文通信作者. E-mail: woyaofly1982@163.com

                      潘泉:西北工業大學自動化學院教授, 信息融合技術教育部重點實驗室主任. 主要研究方向為信息融合理論及應用, 目標跟蹤與識別技術, 光譜成像及圖像處理. E-mail: quanpan@nwpu.edu.cn

                    A Review of Disentangled Representation Learning

                    Funds: Supported by National Natural Science Foundation of China (61806165, 61790552, 61801020), the Natural Science Basic Research Plan in ShaanXi Province of China (2020JQ-196)
                    More Information
                      Author Bio:

                      WEN Zai-Dao Associate professor at the School of Automation, Northwestern Polytechnical University. His research interest covers compressed sensing and sparse model, cognitive machine learning, Synthetic Aperture Radar image interpretation, and multisource automatic target recognition

                      WANG Jia-Rui Ph. D., candidate at the school of automation, Northwestern Polytechnical University. Her research interest covers disentangled representation learning, SAR image processing and Causal reasoning

                      WANG Xiao-Xu Professor at the school of automation, Northwestern Polytechnical University. His research interest covers inertial devices and inertial navigation, Synthetic Aperture Radar image interpretation, cooperative sensing

                      PAN Quan Professor at the School of Automation, Northwestern Polytechnical University. He is also the director of the Key Laboratory of Information Fusion Technology, Ministry of Education. His research interest covers information fusion theory and application, target tracking and recognition technology, spectral imaging and image processing

                    • 摘要: 在大數據時代下, 以高效自主隱式特征提取能力聞名的深度學習引發了新一代人工智能的熱潮, 然而其背后黑箱不可解釋的“捷徑學習”現象成為制約其進一步發展的關鍵性瓶頸問題. 解耦表征學習通過探索大數據內部蘊含的物理機制和邏輯關系復雜性, 從數據生成的角度解耦數據內部多層次、多尺度的潛在生成因子, 促使深度網絡模型學會像人類一樣對數據進行自主智能感知, 逐漸成為新一代基于復雜性的可解釋深度學習領域內重要研究方向, 具有重大的理論意義和應用價值. 本文系統地綜述了解耦表征學習的研究進展, 對當前解耦表征學習中的關鍵技術及典型方法進行了分類闡述, 分析并匯總了現有各類算法的適用場景并對此進行了可視化實驗性能展示, 最后指明了解耦表征學習今后的發展趨勢以及未來值得研究的方向.
                    • 圖  1  人類對于交通場景量測數據的層次化智能感知示意圖

                      Fig.  1  Human's hierarchical intelligent perception of a traffic scene

                      圖  2  深度網絡的捷徑學習(Shortcut Learning)現象示例圖[21]

                      Fig.  2  Examples of “Shortcut Learning” in DNNs[21]

                      圖  3  決策空間示意圖[21]

                      Fig.  3  Taxonomy of decision rules[21]

                      圖  4  人類視網膜瞥視過程圖[60]

                      Fig.  4  Illustration of the retinal transformation[60]

                      圖  5  模型架構設計圖[64]

                      Fig.  5  AIR framework[64]

                      圖  6  深度梯形網絡模型圖

                      Fig.  6  Deep ladder network models

                      圖  7  簡易樹形變分自編碼模型示意圖[74]

                      Fig.  7  Latent tree variational autoencoder

                      圖  8  RCN模型示意圖[73]

                      Fig.  8  RCN[73]

                      圖  9  遙感艦船圖像組數據示例圖

                      Fig.  9  Samples from remote sensing ship group images

                      圖  10  GSL模型[78]用在遙感艦船圖像組數據集中對應的網絡架構示意圖

                      Fig.  10  The structure of GSL model[78] when it is used in the remote sensing ship image group data set.

                      圖  11  人類想象泛化能力示意圖[97]

                      Fig.  11  An example of human imagination generalization ability[97]

                      圖  12  堆棧膠囊自編碼網絡(SCAE)模型架構圖[91]

                      Fig.  12  Stacked Capsule Autoencoders (SCAE)[91]

                      圖  15  Factor-VAE[51]算法在3D Chairs[104]以及3D Faces[105]數據集上的解耦性能展示圖. 每一行代表僅有左側標注的潛在表征取值發生改變時所對應的重構圖像變化

                      Fig.  15  The disentangled performance of Factor-VAE[51] for 3D Chairs[104] and 3D Faces[105] data sets. Each row represents the changes in the reconstructed image when only the specific latent marked on the left changes

                      圖  14  AAE[48]算法對于MNIST[100]和SVHN[101]數字數據集中類別與風格屬性的解耦表征結果展示圖. 圖中每一行代表風格類潛在表征保持不變的情況下, 改變類別類潛在表征取值所對應的重構圖像變化; 每一列代表類別類潛在表征保持不變的情況下, 改變風格類潛在表征取值所對應的重構圖像變化

                      Fig.  14  The disentangled performance of AAE[48] in the MNIST[100] and SVHN[101] data set. Each row represents the change of the reconstructed images corresponding to the category latent while the style latent remains unchanged; when each column represents the change of the reconstructed images corresponding to the style latent while the category latent is unchanged

                      圖  16  SQAIR[66]用于視頻目標檢測、跟蹤實驗結果圖. 其中不同顏色的標注框代表網絡遞歸過程中所檢測、跟蹤到的不同目標

                      Fig.  16  The video target detection and tracking results of SQAIR[66], where the bounding boxes with different colors represent different objects

                      圖  17  RCN[73]用于字符分割識別的實驗結果展示圖. 其中左側圖像中黃色輪廓線為字符分割結果, 右側第一列為輸入遮掩數字, 第二列為網絡預測的去遮掩掩碼圖

                      Fig.  17  Scene-text parsing results with RCN[73]. The yellow outline in the left image show segmentations, the first column on the right is the occlusion input, and the second column shows the predicted occlusion mask

                      圖  19  文獻[74]所提算法的聚類實驗結果圖

                      Fig.  19  The clustering results of the algorithm proposed in the literature[74]

                      圖  18  GSL[78]算法所實現的圖像屬性遷移實驗結果圖

                      Fig.  18  The image synthesis qualitative performance by GSL[78]

                      圖  20  文獻[83]所提算法在人類關節動作識別以及部分關節風格轉換后生成圖像的實驗結果圖

                      Fig.  20  The human action recognition and swapping part appearance results of the algorithm proposed in the literature[83]

                      圖  21  文獻[97]所提算法在自然場景下按照人類偏好重組目標位置以及遮蓋順序后的實驗結果圖

                      Fig.  21  The generation results of the algorithm proposed in the literature[97] after reorganizing the target position and the masking order in a natural scene.

                      圖  22  文獻[98]所提方法應用在CLEVR[131]數據集上的智能知識問答實驗結果圖

                      Fig.  22  The VQA results on the CLEVR[131] data set using the method proposed in the literature[98]

                      圖  13  多目標場景去遮掩實現過程示意圖[97]

                      Fig.  13  The framework of the ordering-Grounded Amodal Completion for multi-objective scene[97]

                      表  1  非結構化表征先驗歸納偏好方法對比

                      Table  1  Comparison of Unstructured Representation Priori Induction Preference Methods

                      工作 正則項 優點 缺點
                      $\beta$-VAE[46] $-\beta {D_{\mathrm{KL}}}\left( {{q_\phi }(\boldsymbol{z}|\boldsymbol{x})\;{\rm{||}}\;p(\boldsymbol{z})} \right)$ 高$\beta$值促使網絡所學到的后驗分布與先驗分布盡可能服從相似的獨立統計特性, 提升解耦性能. 高$\beta$值在提升解耦性能的同時會限制網絡的數據表征能力, 直觀反映為重構性能降低, 無法很好權衡二者.
                      Understanding
                      disentangling in
                      $\beta$-VAE[47]
                      $ -\gamma \left| {\mathrm{KL}\left( {q(\boldsymbol{z}|\boldsymbol{x})\;{\rm{||}}\;p(\boldsymbol{z})} \right) - C} \right| $ 從信息瓶頸角度分析$\beta$-VAE, 在訓練過程中漸進增大潛在變量的信息容量$ C $, 能夠在一定程度上改善了網絡對于數據表征能力與解耦能力間的權衡. 該設計下的潛在變量依舊缺乏明確的物理語義, 且網絡增加了信息容量$ C $這一超參數, 需要人為設計其漸進增長趨勢.
                      Joint-VAE[53] $- \gamma \left| {\mathrm{KL}\left( { {q_\phi }(\boldsymbol{z}|\boldsymbol{x})\;{\rm{||} }\;p(\boldsymbol{z})} \right) - {C_{\boldsymbol{z} } } } \right|\\- \gamma \left| {\mathrm{KL}\left( { {q_\phi }(\boldsymbol{c}|\boldsymbol{x})\;{\rm{||} }\;p(\boldsymbol{c})} \right) - {C_{\boldsymbol{c} } } } \right|\;$ 運用Concrete分布[54]解決離散型潛在變量的解耦問題. 潛在變量缺乏明確物理語義.
                      AAE[48] ${D_\mathrm{JS}}\left[ {{E_\phi }\left( \boldsymbol{z} \right)||p\left( \boldsymbol{z} \right)} \right]$ 利用對抗網絡完成累積后驗分布與先驗分布間的相似性度量, 使得潛在變量的表達空間更大, 表達能力更強. 面臨對抗網絡所存在的鞍點等訓練問題[50].
                      DIP-VAE[49] $- {\lambda _{od} }\sum\nolimits_{i \ne j} {\left[ {Co{v_{ {q_\phi }\left( \boldsymbol{z} \right)} }\left[ \boldsymbol{z} \right]} \right]} _{ij}^2\\- {\lambda _d}\sum\nolimits_i { { {\left( { { {\left[ {Co{v_{ {q_\phi }\left( \boldsymbol{z} \right)} }\left[ \boldsymbol{z} \right]} \right]}_{ii} } - I} \right)}^2} }$ 設計更簡便的矩估計項替代AAE[48]中對抗網絡的設計, 計算更為簡潔有效. 該設計僅適用于潛在變量服從高斯分布的情況且并未限制均值矩或更高階矩, 適用范圍有限.
                      Factor-VAE[51] ${D_\mathrm{JS}}(q(\boldsymbol{z})||\prod\nolimits_{i = 1}^d {q({z_i})})$ 設計對抗網絡直接鼓勵累積后驗分布$q(z)$服從因子分布, 進一步改善了網絡在強表征能力與強解耦能力間的權衡. 面臨對抗網絡所存在的鞍點等訓練問題[50].
                      RF-VAE[56] ${D_\mathrm{JS}}(q(\boldsymbol{r} \circ \boldsymbol{z})||\prod\nolimits_{i = 1}^d {q({r_i \circ z_i})})$ 引入相關性指標$ r $使得網絡對于無關隱變量間的解耦程度不作約束. 相關性指標$ r $也需要由網絡學習得到, 加深了網絡訓練的復雜性.
                      $\beta $-TCVAE[52] $- \alpha {I_q}(\boldsymbol{x};\boldsymbol{z}) -\\ \beta \mathrm{KL}\left( {q\left( \boldsymbol{z} \right)||\prod\nolimits_{i = 1}^d {q\left( { {z_i} } \right)} } \right)\\- \gamma \sum\nolimits_j {KL(q({z_j})||p({z_i}))}$ 證明了TC總相關項$\mathrm{KL}(q(\boldsymbol{z})||\prod\nolimits_{i = 1}^d q({z_i}) )$的重要性并賦予各個正則項不同的權重值構成新的優化函數使其具有更強的表示能力. 引入更多的超參需要人為調試.
                      下載: 導出CSV

                      表  2  不同歸納偏好方法對比

                      Table  2  Comparisons of methods based on different inductive bias

                      歸納偏好分類 模型 簡要描述 適用范圍 數據集
                      非結構化表征先驗 $ \beta $-VAE[46]
                      InfoGAN[55]
                      文獻[47]
                      Joint-VAE[53]
                      AAE[48]
                      DIP-VAE[49]
                      Factor-VAE[51]
                      RF-VAE[56]
                      $ \beta $-TCVAE[52]
                      在網絡優化過程中施加表1中不同的先驗正則項, 能夠促使網絡學習到的潛在表征具備一定的解耦性能. 但該類方法并未涉及足夠的顯式物理語義約束, 網絡不一定按照人類理解的方式進行解耦, 因此該類方法一般用于規律性較強的簡易數據集中. 適用于解耦表征存在顯著可分離屬性的簡易數據集, 如人臉數據集、數字數據集等. MNIST[100]; SVHN[101]; CelebA[102]; 2D Shapes[103]; 3D Chairs[104]; dSprites[103]; 3D Faces[105]
                      結構化模型
                      先驗
                      順序深度遞歸網絡 DRAW[62]
                      AIR[64]
                      SQAIR[66]
                      通過構建順序深度遞歸網絡架構, 可以在執行決策時反復結合歷史狀態特征, 實現如簡易場景下的檢測、跟蹤等. 適用于需要關聯記憶的多次決策任務場景. 3D Scenes[64]; Multi-MNIST[64]; dSprites[103]; Moving-MNIST[66]; Omniglot[106]; pedestrian CCTV data[107]
                      層次深度梯形網絡 VLAE[70]
                      文獻[71]
                      HFVAE[72]
                      使用層次梯形網絡模擬人類由淺入深的層次化認知過程, 促使每層潛在變量代表著不同的涵義, 可用作聚類等任務. 適用于簡易數據集下由淺入深的屬性挖掘. MNIST[100]; CelebA[102]; SVHN[101]; dSprites[103]
                      樹形網絡 RCN[73]
                      LTVAE[74]
                      使用樹形網絡模擬人類高級神經元間的橫向交互過程, 完成底層特征解耦的同時高層特征語義交互, 可用作聚類、自然場景文本識別等任務. 適用于底層特征解耦共享, 高級特征耦合交互的場景任務. CAPTCHA[108]; ICDAR-13 Robust Reading[108]; MNIST[100]; HHAR[74]; Reuters[109]; STL-10[74]
                      物理知識
                      先驗
                      分組數據的相關性 MLVAE[75]
                      文獻[77]
                      GSL[78]
                      文獻[81]
                      文獻[82]
                      文獻[83]
                      文獻[85]
                      文獻[86]
                      通過交換、共享潛在表征、限制互信息相關性、循環回歸等方式, 實現分組數據相關因子的解耦表征. 后續可單獨利用有效因子表征實現分類、分割、屬性遷移數據集生成等任務. 適用于分組數據的相關有效屬性挖掘. MNIST[100]; RaFD[110]; Fonts[78]; CelebA[102]; Colored-MNIST[?]; dSprites[103]; MS-Celeb-1M[111]; CUB birds[112]; ShapeNet[113]; iLab-20M[114]; 3D Shapes[81]; IAM[115]; PKU Vehicle ID[116]; Sentinel-2[117]; Norb[118]; BBC Pose dataset[119]; NTU[120]; KTH[121]; Deep Fashion[122]; Cat Head[123]; Human3.6M[124]; Penn Action[125]; 3D cars[126]
                      基于對象的物理空間組合關系 MixNMatch[88] 結合數據組件化、層次化生成過程實現單目標場景的背景、姿態、紋理、形狀解耦表征. 適用于單目標場景屬性遷移的數據集生成. CUB birds[112]; Stanford Dogs[127]; Stanford Cars[126]
                      文獻[83] 考慮單目標多部件間的組合關系. 適用于人類特定部位、面部表情轉換等數據生成. Cat Head[123]; Human3.6M[124]; Penn Action[128]
                      SCAE[91] 提出了膠囊網絡的新思想, 考慮多目標、多部件間的組合關聯關系. 適用于簡易數據集的目標、部件挖掘. MNIST[100]; SVHN[101]; CIFAR10
                      TAGGER[87]
                      IODINE[94]
                      MONET[95]
                      考慮多目標場景的逐次單目標解耦表征方式. 適用于簡易多目標場景的目標自主解譯任務. Shapes[129]; Textured MNIST[87]; CLEVR[131]; dSprites[103]; Tetris[94]; Objects Room[95]
                      文獻[97] 引入目標空間邏輯樹狀圖, 解耦多目標復雜場景的遮掩關系, 可用于去遮擋等任務. 適用于自然復雜場景下少量目標的去遮擋任務. KINS[130]; COCOA[113]
                      文獻[98] 將目標三維本體特征視為目標內稟不變屬性進行挖掘, 解決視角、尺度大差異問題, 有望實現檢測、識別、智能問答等高級場景理解任務. 適用于簡易數據集的高級場景理解. CLEVR[131]
                      下載: 導出CSV
                      360彩票
                    • [1] 段艷杰, 呂宜生, 張杰, 趙學亮, 王飛躍. 深度學習在控制領域的研究現狀與展望. 自動化學報, 2016, 42(5): 643?654

                      Duan Yan-Jie, Lv Yi-Sheng, Zhang Jie, Zhao Xue-Liang, Wang Fei-Yue. Deep learning for control: the state of the art and prospects. Acta Automatica Sinica, 2016, 42(5): 643?654
                      [2] 王曉峰, 楊亞東. 基于生態演化的通用智能系統結構模型研究. 自動化學報, 2020, 46(5): 1017?1030

                      Wang Xiao-Feng, Yang Ya-Dong. Research on structure model of general intelligent system based on ecological evolution. Acta Automatica Sinica, 2020, 46(5): 1017?1030
                      [3] Saeed Amizadeh, Hamid Palangi, Alex Polozov, Yichen Huang, Kazuhito Koishida. Neuro-Symbolic visual reasoning: disentangling “visual” from “reasoning". In: Proceedings of the 37th International Conference on Machine Learning. Online: ICML, 2020. 279?290
                      [4] Tameem Adel, Han Zhao, Richard E Turner. Continual learning with adaptive weights (CLAW). In: Proceedings of 8th International Conference on Learning Representations. Online: ICLR, 2020
                      [5] Hinton G E, Salakhutdinov R R. Reducing the dimensionality of data with neural networks. Science, 2006, 313(5786): 504?507 doi: 10.1126/science.1127647
                      [6] Lee G, Li H. Modeling code-switch languages using bilingual parallel corpus. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020. 860?870
                      [7] Chen X. Simulation of English speech emotion recognition based on transfer learning and CNN neural network. Journal of Intelligent and Fuzzy Systems, 2021, 40(2): 2349?2360 doi: 10.3233/JIFS-189231
                      [8] Lü Y, Lin H, Wu P, Ch en, Y. Feature compensation based on independent noise estimation for robust speech recognition. EURASIP Journal on Audio, Speech, and Music Processing, 2021, 2021(1): 1?9 doi: 10.1186/s13636-020-00191-3
                      [9] Torfi A, Shirvani R A, Keneshloo Y, Tavaf N, Fox E A. Natural language processing advancements by deep learning: A survey. arxiv: 2003.01200, 2020
                      [10] Stoll S, Camgoz N C, Hadfield S, Bowden R. Text2Sign: Towards sign language production using neural machine translation and generative adversarial networks. International Journal of Computer Vision, 2020, 128(4): 891?908 doi: 10.1007/s11263-019-01281-2
                      [11] Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen. DeBERTa: Decoding-enhanced bert with disentangled attention. In: Proceedings of the 9th International Conference on Learning Representations. Online: ICLR, 2021
                      [12] Shi Y, Yu X, Sohn K, Chandraker M, Jain A K. Towards universal representation learning for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 6816?6825
                      [13] Ni T, Gu X, Zhang C, Wang W, Fan Y. Multi-Task deep metric learning with boundary discriminative information for cross-age face verification. Journal of Grid Computing, 2020, 18(2): 197?210 doi: 10.1007/s10723-019-09495-x
                      [14] Shi X, Yang C, Xia X, Chai X. Deep cross-species feature learning for animal face recognition via residual interspecies equivariant network. In: Proceedings of the 16th European Conference on Computer Vision. Online: ECCV, 2020. 667?682
                      [15] Chen J, Lei B, Song Q, Ying H, Wu J. A hierarchical graph network for 3D object detection on point clouds. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 389?398
                      [16] 蔣弘毅, 王永娟, 康錦煜. 目標檢測模型及其優化方法綜述. 自動化學報, 2021, 47(6): 1232?1255

                      Jiang Hong-Yi, Wang Yong-Juan, Kang Jin-Yu. A survey of object detection models and its optimization methods. Acta Automatica Sinica, 2021, 47(6): 1232?1255
                      [17] Xu Z, Hrustic E, Vivet D. CenterNet heatmap propagation for real-time video object detection. In: Proceedings of the 16th European Conference on Computer Vision. Online: ECCV, 2020. 220?234
                      [18] hang D, Tian H, Han J. Few-cost salient object detection with adversarial-paced learning. arxiv: 2104.01928, 2021
                      [19] 張慧, 王坤峰, 王飛躍. 深度學習在目標視覺檢測中的應用進展與展望. 自動化學報, 2017, 43(8): 1289?1305

                      Zhang Hui, Wang Kun-Feng, Wang Fei-Yue. Advances and perspectives on applications of deep learning in visual object detection. Acta Automatica Sinica, 2017, 43(8): 1289?1305
                      [20] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature, 2015, 521(7553): 436?444 doi: 10.1038/nature14539
                      [21] Geirhos R, Jacobsen J H, Michaelis C, Zemel R, Brendel W, Bethge M, et al. Shortcut learning in deep neural networks. Nature Machine Intelligence, 2020, 2(11): 665?673 doi: 10.1038/s42256-020-00257-z
                      [22] Minderer M, Bachem O, Houlsby N, Tschannen M. Automatic shortcut removal for self-supervised representation learning. In: Proceedings of the 37th International Conference on Machine Learning. Online: ICML, 2020. 6927?6937
                      [23] Ran X, Xu M, Mei L, Xu Q, Liu Q. Detecting Out-of-distribution samples via variational auto-encoder with reliable uncertainty estimation. arxiv: 2007.08128v1, 2020
                      [24] Charakorn R, Thawornwattana Y, Itthipuripat S, Pawlowski N, Manoonpong P, Dilokthanakul N. An explicit local and global representation disentanglement framework with applications in deep clustering and unsupervised object detection. arXiv: 2001.08957, 2020
                      [25] 張鈸, 朱軍, 蘇航. 邁向第三代人工智能. 中國科學: 信息科學, 2020, 50: 1281?1302 doi: 10.1360/SSI-2020-0204

                      Zhang B, Zhu J, Su H. Toward the third generation of articial intelligence. Sci Sin Inform, 2020, 50: 1281?1302 doi: 10.1360/SSI-2020-0204
                      [26] Lake B M, Ullman T D, Tenenbaum J B, Gershman S J. Building machines that learn and think like people. Behavioral and Brain Sciences, 2017, 40: e253 doi: 10.1017/S0140525X16001837
                      [27] Geirhos R, Meding K, Wichmann F A. Beyond accuracy: quantifying trial-by-trial behaviour of CNNs and humans by measuring error consistency. arxiv: 2006.16736v1, 2020.
                      [28] Regazzoni C S, Marcenaro L, Campo D, Rinner B. Multisensorial generative and descriptive self-awareness models for autonomous systems. Proceedings of the IEEE, 2020, 108(7): 987?1010 doi: 10.1109/JPROC.2020.2986602
                      [29] Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun. Visual commonsense R-CNN. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 10760?10770
                      [30] Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun. Visual Commonsense Representation Learning via Causal Inference. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 378?379
                      [31] Schlkopf B, Locatello F, Bauer S, Ke N R, Kalchbrenner N, Goyal A, et al. Toward causal representation learning. Proceedings of the IEEE, 2021, 109(5): 612?634 doi: 10.1109/JPROC.2021.3058954
                      [32] Locatello F, Tschannen M, Bauer S, Rtsch G, Schlkopf B, Bachem O. Disentangling factors of variation using few labels. In: Proceedings of the 8th International Conference on Learning Representations. Online: ICLR, 2020
                      [33] Dittadi A, Truble F, Locatello F, Wüthrich M, Agrawal V, Winther O, et al. On the transfer of disentangled representations in realistic settings. In: Proceedings of the 9th International Conference on Learning Representations. Online: ICLR, 2021
                      [34] Tschannen M, Bachem O, Lucic M. Recent advances in autoencoder-based representation learning. arXiv: 1812.05069, 2018
                      [35] Shu R, Chen Y, Kumar A, Ermon S, Poole B. Weakly supervised disentanglement with guarantees. In: Proceedings of the 8th International Conference on Learning Representations. Online: ICLR, 2020
                      [36] Kim H, Shin S, Jang J H, Song K, Moon I C. Counterfactual fairness with disentangled causal effect variational autoencoder. In: Proceedings of the 35nd Conference on Artificial Intelligence. Online: AAAI, 2021. 8128?8136
                      [37] Locatello F, Bauer S, Lucic M, Gelly S, Schlkopf B, Bachem O. Challenging common assumptions in the unsupervised learning of disentangled representations. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, California, USA: ICML, 2019. 4114?4124
                      [38] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives. IEEE transactions on pattern analysis and machine intelligence, 2013, 35(8): 1798?1828 doi: 10.1109/TPAMI.2013.50
                      [39] Sikka H. A Deeper Look at the unsupervised learning of disentangled representations in β-VAE from the perspective of core object recognition. arXiv: 2005.07114, 2020
                      [40] Francesco Locatello, Ben Poole, Gunnar R?tsch, Bernhard Sch?lkopf, Olivier Bachem, Michael Tschannen. Weakly-supervised disentanglement without compromises. In: Proceedings of the 37th International Conference on Machine Learning. Online: ICML, 2020. 6348?6359
                      [41] 翟正利, 梁振明, 周煒, 等. 變分自編碼器模型綜述. 計算機工程與應用, 2019, 55(3): 1?9 doi: 10.3778/j.issn.1002-8331.1810-0284

                      ZHAI Zheng-Li, LIANG Zhen-Ming, ZHOU Wei, et al. Research overview of Variational Auto-Encoders models. Computer Engineering and Applications, 2019, 55(3): 1?9 doi: 10.3778/j.issn.1002-8331.1810-0284
                      [42] Schmidhuber J. Learning factorial codes by predictability minimization. Neural Computation, 1992, 4(6): 863?879 doi: 10.1162/neco.1992.4.6.863
                      [43] Kingma D P, Welling M. Auto-encoding variational bayes. arXiv: 1312.6114, 2013
                      [44] Goodfellow I J, Pouget-Abadie J, Mirza M, Bing X, Warde-Farley D, Ozair S, et al. Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada: NIPS, 2014. 2672?2680
                      [45] 林懿倫, 戴星原, 李力, 王曉, 王飛躍. 人工智能研究的新前線: 生成式對抗網絡. 自動化學報, 2018, 44(5): 775?792

                      Lin Yi-Lun, Dai Xing-Yuan, Li Li, Wang Xiao, Wang Fei-Yue. The new frontier of AI research: generative adversarial networks. Acta Automatica Sinica, 2018, 44(5): 775?792
                      [46] Higgins I, Matthey L, Pal A, Burgess C, Glorot X, Botvinick M, et al. Beta-vae: Learning basic visual concepts with a constrained variational framework. In: Proceedings of the 5th International Conference on Learning Representations. Toulon, France: ICLR, 2017
                      [47] Burgess C P, Higgins I, Pal A, Matthey L, Lerchner A. Understanding disentangling in β-VAE. arXiv: 1804.03599, 2018
                      [48] Makhzani A, Shlens J, Jaitly N, Goodfellow I. Adversarial autoencoders. arXiv: 1511.05644, 2015
                      [49] Kumar A, Sattigeri P, Balakrishnan A. Variational inference of disentangled latent concepts from unlabeled observations. In: Proceedings of the 5th International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018
                      [50] Arjovsky M, Bottou L. Towards principled methods for training generative adversarial networks. arXiv: 1701.04862, 2017
                      [51] Kim H, Mnih A. Disentangling by factorising. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ICML, 2018. 2649?2658.
                      [52] Chen R T Q, Li X, Grosse R, Duvenaud D. Isolating sources of disentanglement in variational autoencoders. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, CANADA: NIPS, 2018. 2615?2625
                      [53] Dupont E. Learning disentangled joint continuous and discrete representations. arXiv: 1804.00104, 2018
                      [54] Maddison C J, Mnih A, Teh Y W. The concrete distribution: A continuous relaxation of discrete random variables. arXiv: 1611.00712, 2016
                      [55] Chen X, Duan Y, Houthooft R, et al. Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, SPAIN: NIPS, 2016. 2172?2180
                      [56] Kim M, Wang Y, Sahu P, Pavlovic V. Relevance factor vae: Learning and identifying disentangled factors. arXiv: 1902.01568, 2019
                      [57] Grathwohl W, Wilson A. Disentangling space and time in video with hierarchical variational auto-encoders. arXiv: 1612.04440, 2016
                      [58] Kim M, Wang Y, Sahu P, Pavlovic V. Bayes-factor-vae: Hierarchical bayesian deep auto-encoder models for factor disentanglement. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach, USA: IEEE, 2019. 2979?2987
                      [59] Montero M L, Ludwig C J H, Costa R P, Malhotra G, Bowers J S. The role of disentanglement in generalisation. In: Proceedings of the 9th International Conference on Learning Representations. Online: ICLR, 2021
                      [60] Larochelle H, Hinton G E. Learning to combine foveal glimpses with a third-order boltzmann machine. Advances in Neural Information Processing Systems, 2010, 23: 1243?1251
                      [61] Mnih V, Heess N, Graves A, Kavukcuoglu K. Recurrent models of visual attention. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. Palais des Congrès de Montréal, Montréal CANADA: NIPS, 2014. 2204-2212
                      [62] Gregor K, Danihelka I, Graves A, Rezende D J, Wierstra D. Draw: A recurrent neural network for image generation. In: Proceedings of the 32nd International Conference on Machine Learning. Lille, France: ICML, 2015. 1462?1471
                      [63] Henderson J M, Hollingworth A. High-level scene perception. Annual Review of Psychology, 1999, 50(1): 243?271 doi: 10.1146/annurev.psych.50.1.243
                      [64] Eslami S M, Heess N, Weber T, Tassa Y, D Szepesvari, Kavukcuoglu K, et al. Attend, infer, repeat: Fast scene understanding with generative models. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, SPAIN: NIPS, 2016. 3233?3241
                      [65] Crawford E, Pineau J. Spatially invariant unsupervised object detection with convolutional neural networks. In: Proceedings of the 32nd Conference on Artificial Intelligence. Hawaii, USA: AAAI, 2019. 3412?3420
                      [66] Kosiorek A R, Kim H, Posner I, Teh Y W. Sequential attend, infer, repeat: Generative modelling of moving objects. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montréal, CANADA: NIPS, 2018. 8615?8625
                      [67] Santoro A, Raposo D, Barrett D G T, et al. A simple neural network module for relational reasoning. In: Proceedings of the 31th International Conference on Neural Information Processing Systems. Long Beach Convention Center, Long Beach: NIPS, 2017. 4967?4976
                      [68] Massague A C, Zhang C, Feric Z, Camps O, Yu R. Learning disentangled representations of video with missing data. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS, 2020. 3625?3635
                      [69] S?nderby C K, Raiko T, Maal?e L, Snderby S K, Winther O. Ladder variational autoencoders. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, SPAIN: NIPS, 2016. 3745?3753
                      [70] Zhao S, Song J, Ermon S. Learning hierarchical features from generative models. In: Proceedings of the 34th International Conference on Machine Learning. Sydney, Australia: ICML, 2017. 4091?4099
                      [71] Willetts M, Roberts S, Holmes C. Disentangling to cluster: Gaussian mixture variational lLadder autoencoders. arXiv: 1909.11501, 2019
                      [72] Esmaeili B, Wu H, Jain S, Bozkurt A, Siddharth N, Paige B, et al. Structured disentangled representations. In: Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics. Okinawa, Japan: AISTATS, 2019. 2525?2534
                      [73] George D, Lehrach W, Kansky K, Lázaro-Gredilla M, Laan C, Marthi B, et al. A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs. Science, 2017, 358(6368): eaag2612 doi: 10.1126/science.aag2612
                      [74] Li X, Chen Z, Poon L K M, Zhang N L. Learning latent superstructures in variational autoencoders for deep multidimensional clustering. In: Proceedings of the 7th International Conference on Learning Representations. New Orleans, USA: ICLR, 2019
                      [75] Bouchacourt D, Tomioka R, Nowozin S. Multi-level variational autoencoder: Learning disentangled representations from grouped observations. In: Proceedings of the 32nd Conference on Artificial Intelligence. New Orleans, USA: AAAI, 2018. 2095?2102
                      [76] Hwang H J, Kim G H, Hong S, Kim K E. Variational interaction information maximization for cross-domain disentanglement. In: Proceedings of the 34th Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS, 2020. 22479?22491
                      [77] Szabo A, Hu Q, Portenier T, Zwicker M, Favaro P. Understanding degeneracies and ambiguities in attribute transfer. In: Proceedings of the 15th European Conference on Computer Vision. Munich, Germany: ECCV, 2018. 700?714
                      [78] Ge Y, Abu-El-Haija S, Xin G, Itti L. Zero-shot synthesis with group-supervised learning. In: Proceedings of the 9th International Conference on Learning Representations. ICLR, 2021
                      [79] Lee S, Cho S, Im S. Dranet: disentangling representation and adaptation networks for unsupervised cross-domain adaptation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021. 15252?15261
                      [80] Zhu J Y, Park T, Isola P, Efros A A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the 2017 IEEE International Conference on Computer Vision. Venice, Italy: IEEE, 2017: 2223?2232
                      [81] Sanchez E H, Serrurier M, Ortner M. Learning disentangled representations via mutual information estimation. In: Proceedings of European Conference on Computer Vision. Glasgow, UK: Springer, 2020. 205?221
                      [82] Esser P, Haux J, Ommer B. Unsupervised robust disentangling of latent characteristics for image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. California, USA: IEEE, 2019. 2699?2709
                      [83] Lorenz D, Bereska L, Milbich T, et al. Unsupervised part-based disentangling of object shape and appearance. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. California, USA: IEEE, 2019. 10955?10964
                      [84] Liu S, Zhang L, Yang X, Su H, Zhu J. Unsupervised part segmentation through disentangling appearance and shape. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021. 8355?8364
                      [85] Dundar A, Shih K, Garg A, Pottorff R, Catanzaro B. Unsupervised disentanglement of pose, appearance and background from images and videos. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, PP(99): 1?1
                      [86] Vowels M J, Camgoz N C, Bowden R. Gated variational autoencoders: Incorporating weak supervision to encourage disentanglement. In: Proceedings of the 15th IEEE International Conference on Automatic Face and Gesture Recognition. Buenos Aires, Argentina: IEEE, 2020. 125?132
                      [87] Greff K, Rasmus A, Berglund M, Hao T H, Schmidhuber J, Valpola H. Tagger: Deep unsupervised perceptual grouping. In: Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona, SPAIN: NIPS, 2016. 4491?4499
                      [88] Li Y, Singh K K, Ojha U, Lee Y J. MixNMatch: multifactor disentanglement and encoding for conditional image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE, 2020. 8039?8048
                      [89] Singh K K, Ojha U, Lee Y J. Finegan: Unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. California, USA: IEEE, 2019. 6490?6499.
                      [90] Ojha U, Singh K K, Lee Y J. Generating Furry Cars: Disentangling Object Shape & Appearance across Multiple Domains. In: Proceedings of the 9th International Conference on Learning Representations. Online: ICLR, 2021
                      [91] Kosiorek A R, Sabour S, Teh Y W, Ommer B. Stacked capsule autoencoders. In: Proceedings of the 33rd International Conference on Neural Information Processing Systems. Vancouver, Canada: NIPS, 2019. 15486?15496
                      [92] Lee J, Lee Y, Kim J, Kosiorek A, Teh Y W. Set transformer: A framework for attention-based permutation-invariant neural networks. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: ICML, 2019. 3744?3753
                      [93] Yang M, Liu F, Chen Z, Shen X, Hao J, Wang J. Causalvae: disentangled representation learning via neural structural causal models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021. 9593?9602
                      [94] Greff K, Kaufman R L, Kabra R, Watters N, Burgess C, Zoran D, et al. Multi-object representation learning with iterative variational inference. In: Proceedings of the 36th International Conference on Machine Learning. Long Beach, USA: ICML, 2019. 2424?2433
                      [95] Burgess C P, Matthey L, Watters N, Kabra R, Higgins I, Botvinick M, et al. Monet: Unsupervised scene decomposition and representation. arXiv: 1901.11390, 2019
                      [96] Marino J, Yue Y, Mandt S. Iterative amortized inference. In: Proceedings of the 35th International Conference on Machine Learning. Stockholm, Sweden: ICML, 2018. 3403?3412
                      [97] Zhan X, Pan X, Dai B, Liu Z, Chen C L. Self-supervised scene de-occlusion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. WA, USA: IEEE, 2020. 3784?3792
                      [98] Prabhudesai M, Lal S, Patil D, Tung H Y, Harley A W, Fragkiadaki K. Disentangling 3D prototypical networks for few-shot concept learning. arXiv: 2011.03367, 2020
                      [99] Eastwood C, Williams C K I. A framework for the quantitative evaluation of disentangled representations. In: Proceedings of the 5th International Conference on Learning Representations. Vancouver, Canada: ICLR, 2018
                      [100] Lecun Y, Bottou L. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998, 86(11): 2278?2324 doi: 10.1109/5.726791
                      [101] Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng A Y. Reading digits in natural images with unsupervised feature learning. In: Proceedings of Advances in neural information processing systems. Granada, SPAIN: NIPS, 2011. 1?9
                      [102] Liu Z, Luo P, Wang X, Tang X. Deep learning face attributes in the wild. Proceedings of the IEEE international conference on computer vision. Santiago, Chile: ICCV, 2015. 3730?3738
                      [103] Matthey L, Higgins I, Hassabis D, Lerchner A. dSprites: Disentanglement testing sprites dataset [Online], available: https: //github. com/deepmind/dsprites-dataset, 2017
                      [104] Aubry M, Maturana D, Efros A A, Russell B C, Sivic J. Seeing 3d chairs: exemplar part-based 2d-3d alignment using a large dataset of cad models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Columbus, OH, USA: IEEE, 2014. 3762?3769
                      [105] Paysan P, Knothe R, Amberg B, Romdhani S, Vetter T. A 3D face model for pose and illumination invariant face recognition. In: Proceedings of the 6th IEEE International Conference on Advanced Video and Signal based Surveillance. Genova, Italy: IEEE, 2009. 296?301
                      [106] Lake B M, Salakhutdinov R, Tenenbaum J B. Human-level concept learning through probabilistic program induction. Science, 2015, 350(6266): 1332?1338 doi: 10.1126/science.aab3050
                      [107] Ristani E, Solera F, Zou R, Cucchiara R, Tomasi C. Performance measures and a data set for multi-target, multi-camera tracking. In: Proceedings of the 14th European Conference on Computer Vision. Amsterdam, The Netherlands: ECCV, 2016. 17?35
                      [108] Karatzas D, Shafait F, Uchida S, Tang A, Young R. ICDAR 2013 robust reading competition. In: Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA: IEEE, 2013. 1484?1493
                      [109] Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33th International Conference on Machine Learning. New York City, NY, USA: ICML, 2016. 478?487
                      [110] Langner O, Dotsch R, Bijlstra G, Wigboldus D H J, Hawk S T, Knippenberg A V. Presentation and validation of the Radboud Faces Database. Cognition and emotion, 2010, 24(8): 1377?1388 doi: 10.1080/02699930903485076
                      [111] Guo Y, Zhang L, Hu Y, He X, Gao, J. Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: Proceedings of the 14th European conference on computer vision. Springer, Cham, 2016: 87?102
                      [112] Wah C, Branson S, Welinder P, Perona P, Belongie S. The caltech-ucsd birds-200-2011 dataset [Online], available: http://www.vision.caltech.edu/visipedia/CUB-200-2011.html, 2011
                      [113] Zhu Y, Tian Y, Metaxas D, Dollár P. Semantic amodal segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 1464?1472
                      [114] Borji A, Izadi S, Itti L. ilab-20m: A large-scale controlled object dataset to investigate deep learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016. 2221?2230
                      [115] Marti U V, Bunke H. The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 2002, 5(1): 39?46 doi: 10.1007/s100320200071
                      [116] Liu H, Tian Y, Yang Y, Lu P, Huang T. Deep relative distance learning: Tell the difference between similar vehicles. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. Las Vegas, USA: IEEE, 2016. 2167?2175
                      [117] Drusch M, Bello U D, Carlier S, Colin O, Fernandez V, Gascon F, et al. Sentinel-2 optical high resolution mission for GMES land operational services. Remote Sensing of Environment, 2012, 120: 25?36 doi: 10.1016/j.rse.2011.11.026
                      [118] LeCun Y, Huang F J, Bottou L. Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Washington, DC, USA: IEEE, 2004. 97?104
                      [119] Charles J, Pfister T, Everingham M, Zisserman A. Automatic and efficient human pose estimation for sign language videos. International Journal of Computer Vision, 2014, 110(1): 70?90 doi: 10.1007/s11263-013-0672-6
                      [120] Shahroudy A, Liu J, Ng T T, Wang, G.. Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. Las Vegas, USA: IEEE, 2016. 1010?1019
                      [121] Schuldt C, Laptev I, Caputo B. Recognizing human actions: a local SVM approach. In: Proceedings of the 17th International Conference on Pattern Recognition. Cambridge, UK: IEEE, 2004. 32?36
                      [122] Liu Z, Luo P, Qiu S, Wang X, Tang X. Deepfashion: Powering robust clothes recognition and retrieval with rich annotations. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. Las Vegas, USA: IEEE, 2016. 1096?1104
                      [123] Zhang W, Sun J, Tang X. Cat head detection-how to effectively exploit shape and texture features. In: Proceedings of the 10th European Conference on Computer Vision. Marseille, France: ECCV, 2008. 802?816
                      [124] Ionescu C, Papava D, Olaru V, Sminchisescu C. Human3. 6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE transactions on pattern analysis and machine intelligence, 2013, 36(7): 1325?1339
                      [125] Zhang W, Zhu M, Derpanis K G. From actemes to action: A strongly-supervised representation for detailed action understanding. In: Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: ICCV, 2013. 2248?2255
                      [126] Krause J, Stark M, Deng J, Li F F. 3d object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: ICCV, 2013. 554?561
                      [127] Khosla A, Jayadevaprakash N, Yao B, Li F F. Novel dataset for fine-grained image categorization: Stanford dogs. In: Proceedings of the 1st Workshop on Fine-Grained Visual Categorization. Colorado Springs, USA: IEEE, 2011. 1?2
                      [128] Zhang W, Zhu M, Derpanis K G. From actemes to action: A strongly-supervised representation for detailed action understanding. In: Proceedings of the IEEE International Conference on Computer Vision. Sydney, Australia: ICCV, 2013. 2248?2255
                      [129] Reichert D P, Series P, Storkey A J. A hierarchical generative model of recurrent object-based attention in the visual cortex. In: Proceedings of the International Conference on Artificial Neural Networks. Espoo, Finland: ICANN, 2011. 18?25
                      [130] Qi L, Jiang L, Liu S, Shen X, Jia J. Amodal instance segmentation with kins dataset. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach, CA, USA: IEEE, 2019. 3014?3023
                      [131] Johnson J, Hariharan B, Van Der Maaten L, Li F F, Girshick R. Clevr: A diagnostic dataset for compositional language and elementary visual reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Honolulu, HI, USA: IEEE, 2017. 2901?2910
                      [132] Wu Z, Lischinski D, Shechtman E. Stylespace analysis: disentangled controls for stylegan image generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition. Online: IEEE, 2021. 12863?12872
                    • 加載中
                    計量
                    • 文章訪問數:  750
                    • HTML全文瀏覽量:  473
                    • 被引次數: 0
                    出版歷程
                    • 收稿日期:  2021-01-28
                    • 錄用日期:  2021-06-18
                    • 網絡出版日期:  2021-07-26

                    目錄

                      /

                      返回文章
                      返回