ISSN 0253-2778

CN 34-1054/N

Open AccessOpen Access JUSTC Research Article

MOVIE: Mesh oriented video inpainting network

Funds:  National Natural Science Foundation of China(61571413,61632001).
Cite this:
https://doi.org/10.52396/JUST-2020-0022
More Information
  • Author Bio:

    Sen Liu received the B.S. degree in computer science from the Beijing University of Posts and Telecommunications, Beijing, China, in 2013. Currently, he is working towards the PhD degree at School of Information Science and Technology, University of Science and Technology of China. His area of interests includes artificial intelligence, deep learning, video coding, computer vision and pattern recognition and reinforcement learning.

    Zhizheng Zhang (S'19) received the B.S. degree in electronic information engineering from the University of Electronic Science and Technology of China, Chengdu, China, in 2016. He is currently pursuing the PhD degree in the University of Science and Technology of China, Hefei, China. His current research interests include reinforcement learning, few-shot learning, and intelligent media computing.

    Tao Yu is currently pursuing the PhD degree with the Department of Electronic Engineering and Information Science, University of Science and Technology of China. He received the B.S. degree in Electronics and Information Engineering in Anhui University in 2018. His research interests include computer vision, image processing and reinforcement learning.

  • Corresponding author: Zhibo Chen (M'01-SM'11) received the B. S., and PhD degree from Department of Electrical Engineering Tsinghua University in 1998 and 2003, respectively. He is now a professor in University of Science and Technology of China. His research interests include image and video compression, visual quality of experience assessment, immersive media computing and intelligent media computing. He has more than 100 publications and more than 50 granted EU and US patent applications. He is IEEE senior member, Secretary (Chair-Elect) of IEEE Visual Signal Processing and Communications Committee. He was TPC chair of IEEE PCS 2019 and organization committee member of ICIP 2017 and ICME 2013, served as Track chair in IEEE ISCAS and Area chair in IEEE VCIP. E-mail: chenzhibo@ustc.edu.cn
  • Publish Date: 31 January 2021
  • Video inpainting aims to fill the holes across different frames upon limited spatio-temporal contexts. The existing schemes still suffer from achieving precise spatio-temporal coherence especially in hole areas due to inaccurate modeling of motion trajectories. In this paper, we introduce fexible shape-adaptive mesh as basic processing unit and mesh flow as motion representation, which has the capability of describing complex motions in hole areas more precisely and efficiently. We propose a Mesh Oriented Video Inpainting nEtwork, dubbed MOVIE, to estimate mesh flows then complete the hole region in the video. Specifically, we first design a mesh flow estimation module and a mesh flow completion module to estimate the mesh flow for visible contents and holes in a sequential way, which decouples the mesh flow estimation for visible and corrupted contents for easy optimization. A hybrid loss function is further introduced to optimize the flow estimation performance for the visible regions, the entire frames and the inpainted regions respectively. Then we design a polishing network to correct the distortion of the inpainted results caused by mesh flow transformation. Extensive experiments show that MOVIE not only achieves over four-times speed-up in completing the missing area, but also yields more promising results with much better inpainting quality in both quantitative and perceptual metrics.
    Video inpainting aims to fill the holes across different frames upon limited spatio-temporal contexts. The existing schemes still suffer from achieving precise spatio-temporal coherence especially in hole areas due to inaccurate modeling of motion trajectories. In this paper, we introduce fexible shape-adaptive mesh as basic processing unit and mesh flow as motion representation, which has the capability of describing complex motions in hole areas more precisely and efficiently. We propose a Mesh Oriented Video Inpainting nEtwork, dubbed MOVIE, to estimate mesh flows then complete the hole region in the video. Specifically, we first design a mesh flow estimation module and a mesh flow completion module to estimate the mesh flow for visible contents and holes in a sequential way, which decouples the mesh flow estimation for visible and corrupted contents for easy optimization. A hybrid loss function is further introduced to optimize the flow estimation performance for the visible regions, the entire frames and the inpainted regions respectively. Then we design a polishing network to correct the distortion of the inpainted results caused by mesh flow transformation. Extensive experiments show that MOVIE not only achieves over four-times speed-up in completing the missing area, but also yields more promising results with much better inpainting quality in both quantitative and perceptual metrics.
  • loading
  • [1]
    HuoX, Tan J. A novel non-linear method of automatic video scratch removal. Fourth International Conference on Digital Home. Guangzhou, China: IEEE, 2012: 39-45.
    [2]
    Jin X, Su Y, Zou L, et al. Video logo removal detection based on sparse representation. Multimedia Tools and Applications, 2018, 77(22): 29303-29322.
    [3]
    Qin C, He Z, Yao H, et al. Visible watermark removal scheme based on reversible data hiding and image inpainting. Signal Processing: Image Communication, 2018, 60: 160-172.
    [4]
    Le T T, Almansa A, Gousseau Y, et al. Object removal from complex videos using a few annotations. Computational Visual Media, 2019, 5(3): 267-291.
    [5]
    Callico G M , Lopez S, Sosa O, et al. Analysis of fast block matching motion estimation algorithms for video super-resolution systems. IEEE Transactions on Consumer Electronics, 2008, 54(3): 1430-1438.
    [6]
    Wang L, Guo Y, Liu L,et al. Deep video super-resolution using hr optical flow estimation. IEEE Transactions on Image Processing, 2020, 29: 4323-4336.
    [7]
    Liu S, Yuan L, Tan P,et al. Steadyflow: Spatially smooth optical flow for video stabilization. IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 4209-4216.
    [8]
    Lim A, Ramesh B, Yang Y,et al. Real-time optical flow-based video stabilization for unmanned aerial vehicles. Journal of Real-time Image Processing, 2019, 16(6): 1975-1985.
    [9]
    Granados M, Tompkin J, Kim K,et al. How not to be seen-object removal from videos of crowded scenes. Comput. Graph. Forum, 2012, 31( 2): 219-228.
    [10]
    Wexler Y, Shechtman E, Irani M. Space-time completion of video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3): 463-476.
    [11]
    Newson A, Almansa A, Fradet M,et al. Video inpainting of complex scenes. Siam Journal on Imaging Sciences, 2014, 7(4): 1993-2019.
    [12]
    Huang J, Kang S B, Ahuja N,et al. Temporally coherent completion of dynamic video. ACM Transactions on Graphics,2016,35(6):196.
    [13]
    Woo S, Kim D, Park K,et al. Align-andattend network for globally and locally coherent video inpainting. 2019, arXiv:1905.13066.
    [14]
    Chang Y L, Liu Z Y, Hsu W. Vornet: Spatio-temporally consistent video inpainting for object removal. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Long Beach, USA: IEEE, 2019: 00229.
    [15]
    Ding Y, Wang C, Huang H,et al. Framerecurrent video inpainting by robust optical flow inference. 2019, arXiv:1905.02882.
    [16]
    Xu R, Li X, Zhou B,et al. Deep flow-guided video inpainting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2019: 3723-3732.
    [17]
    Lee S, Oh S W, Won D,et al. Copy-and-paste networks for deep video inpainting. Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 4413-4421.
    [18]
    Chang Y L, Liu Z Y, Lee K Y, et al. Free-form video inpainting with 3D gated convolution and temporal patchgan. Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 9066-9075.
    [19]
    Wang C, Huang H, Han X,et al. Video inpainting by jointly learning temporal structure and spatial details. Proceedings of the AAAI Conference on Artificial Intelligence. Hawaii, USA: IEEE, 2019, 33: 5232-5239.
    [20]
    Chang Y L, Liu Z Y, Lee K Y, et al. Learnable gated temporal shift module for deep video inpainting. 2019, arXiv:1907.01131.
    [21]
    Oh S W, Lee S, Lee J Y, et al. Onion-peel networks for deep video completion. in Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 4403-4412.
    [22]
    Kim D, Woo S, Lee J Y, et al. Deep video inpainting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2019: 5792-5801.
    [23]
    He K, Zhang X, Ren S,et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
    [24]
    Goodfellow I, Pougetabadie J, Mirza M,et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2: 2672-2680.
    [25]
    Xu N, Yang L, Fan Y,et al. Youtube-vos: A large-scale video object segmentation benchmark. Computer Vision and Pattern Recognition, 2018, arXiv:1809.03327.
    [26]
    Liu G, Reda F A, Shih K J,et al. Image inpainting for irregular holes using partial convolutions. Computer Vision and Pattern Recognition, 2018: 89-105.
    [27]
    Perazzi F, Ponttuset J, Mcwilliams B,et al. A benchmark dataset and evaluation methodology for video object segmentation.IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, IEEE, 2016: 724-732.
    [28]
    Ponttuset J, Perazzi F, Caelles S,et al. The 2017 Davis challenge on video object segmentation. Computer Vision and Pattern Recognition, 2017,arXiv:1704.00675.
  • 加载中

Catalog

    [1]
    HuoX, Tan J. A novel non-linear method of automatic video scratch removal. Fourth International Conference on Digital Home. Guangzhou, China: IEEE, 2012: 39-45.
    [2]
    Jin X, Su Y, Zou L, et al. Video logo removal detection based on sparse representation. Multimedia Tools and Applications, 2018, 77(22): 29303-29322.
    [3]
    Qin C, He Z, Yao H, et al. Visible watermark removal scheme based on reversible data hiding and image inpainting. Signal Processing: Image Communication, 2018, 60: 160-172.
    [4]
    Le T T, Almansa A, Gousseau Y, et al. Object removal from complex videos using a few annotations. Computational Visual Media, 2019, 5(3): 267-291.
    [5]
    Callico G M , Lopez S, Sosa O, et al. Analysis of fast block matching motion estimation algorithms for video super-resolution systems. IEEE Transactions on Consumer Electronics, 2008, 54(3): 1430-1438.
    [6]
    Wang L, Guo Y, Liu L,et al. Deep video super-resolution using hr optical flow estimation. IEEE Transactions on Image Processing, 2020, 29: 4323-4336.
    [7]
    Liu S, Yuan L, Tan P,et al. Steadyflow: Spatially smooth optical flow for video stabilization. IEEE Conference on Computer Vision and Pattern Recognition. Columbus, USA: IEEE, 2014: 4209-4216.
    [8]
    Lim A, Ramesh B, Yang Y,et al. Real-time optical flow-based video stabilization for unmanned aerial vehicles. Journal of Real-time Image Processing, 2019, 16(6): 1975-1985.
    [9]
    Granados M, Tompkin J, Kim K,et al. How not to be seen-object removal from videos of crowded scenes. Comput. Graph. Forum, 2012, 31( 2): 219-228.
    [10]
    Wexler Y, Shechtman E, Irani M. Space-time completion of video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007, 29(3): 463-476.
    [11]
    Newson A, Almansa A, Fradet M,et al. Video inpainting of complex scenes. Siam Journal on Imaging Sciences, 2014, 7(4): 1993-2019.
    [12]
    Huang J, Kang S B, Ahuja N,et al. Temporally coherent completion of dynamic video. ACM Transactions on Graphics,2016,35(6):196.
    [13]
    Woo S, Kim D, Park K,et al. Align-andattend network for globally and locally coherent video inpainting. 2019, arXiv:1905.13066.
    [14]
    Chang Y L, Liu Z Y, Hsu W. Vornet: Spatio-temporally consistent video inpainting for object removal. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. Long Beach, USA: IEEE, 2019: 00229.
    [15]
    Ding Y, Wang C, Huang H,et al. Framerecurrent video inpainting by robust optical flow inference. 2019, arXiv:1905.02882.
    [16]
    Xu R, Li X, Zhou B,et al. Deep flow-guided video inpainting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2019: 3723-3732.
    [17]
    Lee S, Oh S W, Won D,et al. Copy-and-paste networks for deep video inpainting. Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 4413-4421.
    [18]
    Chang Y L, Liu Z Y, Lee K Y, et al. Free-form video inpainting with 3D gated convolution and temporal patchgan. Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 9066-9075.
    [19]
    Wang C, Huang H, Han X,et al. Video inpainting by jointly learning temporal structure and spatial details. Proceedings of the AAAI Conference on Artificial Intelligence. Hawaii, USA: IEEE, 2019, 33: 5232-5239.
    [20]
    Chang Y L, Liu Z Y, Lee K Y, et al. Learnable gated temporal shift module for deep video inpainting. 2019, arXiv:1907.01131.
    [21]
    Oh S W, Lee S, Lee J Y, et al. Onion-peel networks for deep video completion. in Proceedings of the IEEE International Conference on Computer Vision. Seoul, South Korea: ACM, 2019: 4403-4412.
    [22]
    Kim D, Woo S, Lee J Y, et al. Deep video inpainting. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu, USA: IEEE, 2019: 5792-5801.
    [23]
    He K, Zhang X, Ren S,et al. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA: IEEE, 2016: 770-778.
    [24]
    Goodfellow I, Pougetabadie J, Mirza M,et al. Generative adversarial nets. Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014, 2: 2672-2680.
    [25]
    Xu N, Yang L, Fan Y,et al. Youtube-vos: A large-scale video object segmentation benchmark. Computer Vision and Pattern Recognition, 2018, arXiv:1809.03327.
    [26]
    Liu G, Reda F A, Shih K J,et al. Image inpainting for irregular holes using partial convolutions. Computer Vision and Pattern Recognition, 2018: 89-105.
    [27]
    Perazzi F, Ponttuset J, Mcwilliams B,et al. A benchmark dataset and evaluation methodology for video object segmentation.IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas, USA, IEEE, 2016: 724-732.
    [28]
    Ponttuset J, Perazzi F, Caelles S,et al. The 2017 Davis challenge on video object segmentation. Computer Vision and Pattern Recognition, 2017,arXiv:1704.00675.

    Article Metrics

    Article views (1027) PDF downloads(1073)
    Proportional views

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return