ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Original Paper

Cross-media semantic retrieval with deep canonical correlation analysis

Cite this:
https://doi.org/10.3969/j.issn.0253-2778.2018.04.008
  • Received Date: 01 June 2017
  • Rev Recd Date: 14 July 2017
  • Publish Date: 30 April 2018
  • The cross-media retrieval with canonical correlation analysis (CCA) is a method to map different media features to the largest correlation isomorphism subspace through the canonical correlation analysis, and compare the similarity between cross-media data in the subspace. However CCA is a linear model and can not adequately exploit the complex correlation between cross-media data. The structure of the traditional deep canonical correlation analysis (DCCA) is improved, and the latent dirichlet allocation (LDA) is used to discover the semantic information in the text data and learns the semantic mapping. The cross-media correlation learning with deep canonical correlation analysis (CMC-DCCA) and the cross-media semantic correlation retrieval (CMSCR) are proposed. Experiments on the Wikipedia text image dataset shows that the CMC-DCCA model can mine the complex correlation between cross-media data better, and that CMSCR has better performance in cross-media retrieval.
    The cross-media retrieval with canonical correlation analysis (CCA) is a method to map different media features to the largest correlation isomorphism subspace through the canonical correlation analysis, and compare the similarity between cross-media data in the subspace. However CCA is a linear model and can not adequately exploit the complex correlation between cross-media data. The structure of the traditional deep canonical correlation analysis (DCCA) is improved, and the latent dirichlet allocation (LDA) is used to discover the semantic information in the text data and learns the semantic mapping. The cross-media correlation learning with deep canonical correlation analysis (CMC-DCCA) and the cross-media semantic correlation retrieval (CMSCR) are proposed. Experiments on the Wikipedia text image dataset shows that the CMC-DCCA model can mine the complex correlation between cross-media data better, and that CMSCR has better performance in cross-media retrieval.
  • loading
  • [1]
    RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy: ACM Press, 2010: 251-260.
    [2]
    PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.
    [3]
    WANG S, LU J, GU X, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering[J]. Neurocomputing, 2016, 171(C): 425-433.
    [4]
    ZU C, ZHANG D. Canonical sparse cross-view correlation analysis[J]. Neurocomputing, 2016, 191: 263-272.
    [5]
    BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]// Proceedings of International Conference on Multimedia Retrieval. New York: ACM Press, 2014: No.73(1-8).
    [6]
    WANG S, ZHUANG F, JIANG S, et al. Cluster-sensitive structured correlation analysis for web cross-modal retrieval[J]. Neurocomputing, 2015, 168: 747-760.
    [7]
    GONG Y, KE Q, ISARD M, et al. A multi-view embedding space for modeling internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2): 210-233.
    [8]
    张博, 郝杰, 马刚, 等. 混合概率典型相关性分析[J]. 计算机研究与发展, 2015, 52(7):1463-1476.
    ZHANG B, HAO J, MA G, et al. Mixture of probabilistic canonical correlation analysis[J]. Journal of Computer Research and Development, 2015, 52(7): 1463-1476.
    [9]
    张博, 郝杰, 马刚, 等. 基于弱匹配概率典型相关性分析的图像自动标注[J]. 软件学报, 2017, 28(2): 292-309.
    ZHANG B, HAO J, MA G, et al. Automatic image annotation based on semi-paired probabilistic canonical correlation analysis [J]. Journal of Software, 2017, 28(2): 292-309.
    [10]
    SRIVASTAVA N, SALAKHUTDINOV R. Learning representations for multimodal data with deep belief nets[C]// International Conference on Machine Learning Workshop. Edinburgh, Scotland: IMLS Press, 2012: 1-8.
    [11]
    FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM international conference on Multimedia. San Francisco, USA: ACM Press, 2014: 7-16.
    [12]
    WANG C, YANG H, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]// 27th International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE Computer Society, 2015: 234-241.
    [13]
    ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]// Proceedings of the 30th International Conference on Machine Learning . Atlanta, USA: IMLS Press, 2013: 1247-1255.
    [14]
    BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, (3): 993-1022.
    [15]
    LIU D C, NOCEDAL J. On the limited memory BFGS method for large scale optimization[J]. Mathematical programming, 1989, 45(1): 503-528.
    [16]
    JOACHIMS T. Optimizing search engines using clickthrough data[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. Edmonton, Canada: ACM Press, 2002: 133-142.
  • 加载中

Catalog

    [1]
    RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy: ACM Press, 2010: 251-260.
    [2]
    PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.
    [3]
    WANG S, LU J, GU X, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering[J]. Neurocomputing, 2016, 171(C): 425-433.
    [4]
    ZU C, ZHANG D. Canonical sparse cross-view correlation analysis[J]. Neurocomputing, 2016, 191: 263-272.
    [5]
    BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]// Proceedings of International Conference on Multimedia Retrieval. New York: ACM Press, 2014: No.73(1-8).
    [6]
    WANG S, ZHUANG F, JIANG S, et al. Cluster-sensitive structured correlation analysis for web cross-modal retrieval[J]. Neurocomputing, 2015, 168: 747-760.
    [7]
    GONG Y, KE Q, ISARD M, et al. A multi-view embedding space for modeling internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2): 210-233.
    [8]
    张博, 郝杰, 马刚, 等. 混合概率典型相关性分析[J]. 计算机研究与发展, 2015, 52(7):1463-1476.
    ZHANG B, HAO J, MA G, et al. Mixture of probabilistic canonical correlation analysis[J]. Journal of Computer Research and Development, 2015, 52(7): 1463-1476.
    [9]
    张博, 郝杰, 马刚, 等. 基于弱匹配概率典型相关性分析的图像自动标注[J]. 软件学报, 2017, 28(2): 292-309.
    ZHANG B, HAO J, MA G, et al. Automatic image annotation based on semi-paired probabilistic canonical correlation analysis [J]. Journal of Software, 2017, 28(2): 292-309.
    [10]
    SRIVASTAVA N, SALAKHUTDINOV R. Learning representations for multimodal data with deep belief nets[C]// International Conference on Machine Learning Workshop. Edinburgh, Scotland: IMLS Press, 2012: 1-8.
    [11]
    FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM international conference on Multimedia. San Francisco, USA: ACM Press, 2014: 7-16.
    [12]
    WANG C, YANG H, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]// 27th International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE Computer Society, 2015: 234-241.
    [13]
    ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]// Proceedings of the 30th International Conference on Machine Learning . Atlanta, USA: IMLS Press, 2013: 1247-1255.
    [14]
    BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, (3): 993-1022.
    [15]
    LIU D C, NOCEDAL J. On the limited memory BFGS method for large scale optimization[J]. Mathematical programming, 1989, 45(1): 503-528.
    [16]
    JOACHIMS T. Optimizing search engines using clickthrough data[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. Edmonton, Canada: ACM Press, 2002: 133-142.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return