Cross-media semantic retrieval with deep canonical correlation analysis

WANG Shu; SHI Zhongzhi

doi:10.3969/j.issn.0253-2778.2018.04.008

PDF( 5226 KB)

Open Access JUSTC Original Paper

Cross-media semantic retrieval with deep canonical correlation analysis

1.
Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190

Cite this:

https://doi.org/10.3969/j.issn.0253-2778.2018.04.008

Received Date: 01 June 2017
Rev Recd Date: 14 July 2017
Publish Date: 30 April 2018

Abstract Full text PDF

Abstract

Abstract

The cross-media retrieval with canonical correlation analysis (CCA) is a method to map different media features to the largest correlation isomorphism subspace through the canonical correlation analysis, and compare the similarity between cross-media data in the subspace. However CCA is a linear model and can not adequately exploit the complex correlation between cross-media data. The structure of the traditional deep canonical correlation analysis (DCCA) is improved, and the latent dirichlet allocation (LDA) is used to discover the semantic information in the text data and learns the semantic mapping. The cross-media correlation learning with deep canonical correlation analysis (CMC-DCCA) and the cross-media semantic correlation retrieval (CMSCR) are proposed. Experiments on the Wikipedia text image dataset shows that the CMC-DCCA model can mine the complex correlation between cross-media data better, and that CMSCR has better performance in cross-media retrieval.

Abstract

The cross-media retrieval with canonical correlation analysis (CCA) is a method to map different media features to the largest correlation isomorphism subspace through the canonical correlation analysis, and compare the similarity between cross-media data in the subspace. However CCA is a linear model and can not adequately exploit the complex correlation between cross-media data. The structure of the traditional deep canonical correlation analysis (DCCA) is improved, and the latent dirichlet allocation (LDA) is used to discover the semantic information in the text data and learns the semantic mapping. The cross-media correlation learning with deep canonical correlation analysis (CMC-DCCA) and the cross-media semantic correlation retrieval (CMSCR) are proposed. Experiments on the Wikipedia text image dataset shows that the CMC-DCCA model can mine the complex correlation between cross-media data better, and that CMSCR has better performance in cross-media retrieval.

FullText(HTML)

References(16)

References

[1]	RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy: ACM Press, 2010: 251-260.
[2]	PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.
[3]	WANG S, LU J, GU X, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering[J]. Neurocomputing, 2016, 171(C): 425-433.
[4]	ZU C, ZHANG D. Canonical sparse cross-view correlation analysis[J]. Neurocomputing, 2016, 191: 263-272.
[5]	BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]// Proceedings of International Conference on Multimedia Retrieval. New York: ACM Press, 2014: No.73(1-8).
[6]	WANG S, ZHUANG F, JIANG S, et al. Cluster-sensitive structured correlation analysis for web cross-modal retrieval[J]. Neurocomputing, 2015, 168: 747-760.
[7]	GONG Y, KE Q, ISARD M, et al. A multi-view embedding space for modeling internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2): 210-233.
[8]	张博, 郝杰, 马刚, 等. 混合概率典型相关性分析[J]. 计算机研究与发展, 2015, 52(7):1463-1476. ZHANG B, HAO J, MA G, et al. Mixture of probabilistic canonical correlation analysis[J]. Journal of Computer Research and Development, 2015, 52(7): 1463-1476.
[9]	张博, 郝杰, 马刚, 等. 基于弱匹配概率典型相关性分析的图像自动标注[J]. 软件学报, 2017, 28(2): 292-309. ZHANG B, HAO J, MA G, et al. Automatic image annotation based on semi-paired probabilistic canonical correlation analysis [J]. Journal of Software, 2017, 28(2): 292-309.
[10]	SRIVASTAVA N, SALAKHUTDINOV R. Learning representations for multimodal data with deep belief nets[C]// International Conference on Machine Learning Workshop. Edinburgh, Scotland: IMLS Press, 2012: 1-8.
[11]	FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM international conference on Multimedia. San Francisco, USA: ACM Press, 2014: 7-16.
[12]	WANG C, YANG H, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]// 27th International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE Computer Society, 2015: 234-241.
[13]	ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]// Proceedings of the 30th International Conference on Machine Learning . Atlanta, USA: IMLS Press, 2013: 1247-1255.
[14]	BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, (3): 993-1022.
[15]	LIU D C, NOCEDAL J. On the limited memory BFGS method for large scale optimization[J]. Mathematical programming, 1989, 45(1): 503-528.
[16]	JOACHIMS T. Optimizing search engines using clickthrough data[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. Edmonton, Canada: ACM Press, 2002: 133-142.

Supplements(0)

Track Citations

Proportional views

Proportional views

Get Citation

PDF

XML

[1]	RASIWASIA N, PEREIRA J C, COVIELLO E, et al. A new approach to cross-modal multimedia retrieval[C]// Proceedings of the 18th ACM International Conference on Multimedia. Firenze, Italy: ACM Press, 2010: 251-260.
[2]	PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535.
[3]	WANG S, LU J, GU X, et al. Unsupervised discriminant canonical correlation analysis based on spectral clustering[J]. Neurocomputing, 2016, 171(C): 425-433.
[4]	ZU C, ZHANG D. Canonical sparse cross-view correlation analysis[J]. Neurocomputing, 2016, 191: 263-272.
[5]	BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]// Proceedings of International Conference on Multimedia Retrieval. New York: ACM Press, 2014: No.73(1-8).
[6]	WANG S, ZHUANG F, JIANG S, et al. Cluster-sensitive structured correlation analysis for web cross-modal retrieval[J]. Neurocomputing, 2015, 168: 747-760.
[7]	GONG Y, KE Q, ISARD M, et al. A multi-view embedding space for modeling internet images, tags, and their semantics[J]. International Journal of Computer Vision, 2014, 106(2): 210-233.
[8]	张博, 郝杰, 马刚, 等. 混合概率典型相关性分析[J]. 计算机研究与发展, 2015, 52(7):1463-1476. ZHANG B, HAO J, MA G, et al. Mixture of probabilistic canonical correlation analysis[J]. Journal of Computer Research and Development, 2015, 52(7): 1463-1476.
[9]	张博, 郝杰, 马刚, 等. 基于弱匹配概率典型相关性分析的图像自动标注[J]. 软件学报, 2017, 28(2): 292-309. ZHANG B, HAO J, MA G, et al. Automatic image annotation based on semi-paired probabilistic canonical correlation analysis [J]. Journal of Software, 2017, 28(2): 292-309.
[10]	SRIVASTAVA N, SALAKHUTDINOV R. Learning representations for multimodal data with deep belief nets[C]// International Conference on Machine Learning Workshop. Edinburgh, Scotland: IMLS Press, 2012: 1-8.
[11]	FENG F, WANG X, LI R. Cross-modal retrieval with correspondence autoencoder[C]// Proceedings of the 22nd ACM international conference on Multimedia. San Francisco, USA: ACM Press, 2014: 7-16.
[12]	WANG C, YANG H, MEINEL C. Deep semantic mapping for cross-modal retrieval[C]// 27th International Conference on Tools with Artificial Intelligence. Vietri sul Mare, Italy: IEEE Computer Society, 2015: 234-241.
[13]	ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]// Proceedings of the 30th International Conference on Machine Learning . Atlanta, USA: IMLS Press, 2013: 1247-1255.
[14]	BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, (3): 993-1022.
[15]	LIU D C, NOCEDAL J. On the limited memory BFGS method for large scale optimization[J]. Mathematical programming, 1989, 45(1): 503-528.
[16]	JOACHIMS T. Optimizing search engines using clickthrough data[C]// Proceedings of the 8th ACM SIGKDD International Conference on Knowledge discovery and Data Mining. Edmonton, Canada: ACM Press, 2002: 133-142.

TrendMD

Volume 48 Issue 4 page: 322-330

Cover

Keywords

Article Metrics

Article views (902) PDF downloads(470)

Cross-media semantic retrieval with deep canonical correlation analysis

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Cross-media semantic retrieval with deep canonical correlation analysis

Share

Tools

Abstract

Abstract

References

Proportional views

Catalog

Recommended articles

TrendMD

Article Metrics

Proportional views

Authors

Browse

Contact Us

About

Export File

Citation

Format

Content