JETCAS issue on immersive video coding and transmission presents the latest developments in immersive video research. This blog summarizes the papers related to coding and transmission of 360-degree video, which is one of the most popular types of immersive media.
360-degree Video Coding
To provide an excellent immersive experience, 360-degree videos require extremely high resolution with high frame rate (4K/8K + 60/90 fps). As a result, 360 video require much higher bandwidth compared with conventional 2D video. Therefore, efficient compression technology is highly desirable for storage and transmission of 360 video.
The paper [1] proposes a hybrid Equirectangular-Cubemap projection that can achieve more uniform sampling and reduce the boundary artifacts across different faces. In addition, a set of coding tools that can make use of the spherical continuity in 360 video are proposed. The proposed algorithm can effectively reduce the BR-rate and the seam artifacts caused by discontinuous edge and frame boundary.
In [2], the projection format is customized based on the input video content. Especially, the hybrid angular cubemap (HAC) projection is utilized to adapt the the sampling within each face. Also, an adaptive frame packing technique is used to select the face arrangement in a frame. To alleviate the "face seam" artifacts in the rendered viewports, the relationship of samples and blocks in the spherical geometry is considered to improve intra/inter prediction and in-loop filters.
To address the deformation of video content caused by mapping from sphere to 2D plan, the paper [3] proposes a new motion model based on spherical coordinates transform. The proposed model is shown to be effective in improving the motion compensation/estimation in panoramic video coding.
360-degree Video Transmission
Though advanced coding technologies can significantly reduce the bitrate of 360 video, delivery of 360 video is still a challenging task due to limitations in network resources, as well as constraints imposed by en-user devices. Therefore, cost-effective delivery technology is necessary for the wide adoption of VR/AR applications.
In [4], we propose a server-based adaptation framework for 360 video streaming over networks. The proposed method utilizes tiling-based viewport adaptive streaming to reduce the required network bandwidth for 360 video. Also, the proposed tile selection algorithm can effectively deal with the user head movements within each video segment.
In tiling-based viewport adaptive streaming, it is important to tile the video in an effective manner. Conventionally, the video is divided into equal sized tiles. The paper [5] addresses this issue by considering the Visual Attention map. Especially, the video is divided in to non-overlapping variable sizes taking into account the Visual Attention map.
Effective viewport adaptation methods require accurate estimations of viewport positions. However, the large buffer size in HTTP Adaptive Streaming may severely affect viewport position estimation accuracy. Taking the idea of scalable video coding, [6] can reduce the client buffer size down to one segment duration by using a two-tier system. To achieve this feature, the whole video is encoded into a base tier, which is always delivered to the client, and multiple enhancement layers each corresponds to a viewport position.
In [7], the authors analyze the impact of the end-to-end delay to tile-based viewport adaptive streaming. It is found that the gain compared to viewport-independent approach drops to 8% for a delay of 1 second. To address this issue, the authors propose to combine viewport prediction with a velocity-based QP distribution.
To facilitate delivery of 360 video over wireless networks, the paper [8] proposes a pseudo-analog transmission framework called OmniCast. The proposed framework features a spherical domain power-distortion optimization framework and two adaptive block partitions algorithms. Experiment results shows that the proposed framework outperforms JPEG2000-based solution and the conventional Softcast.
The last paper [9] presents a real time 3D 360-degree telepresence system. To deal with the mismatch between the estimated and actual viewports caused by system delay, the proposed system uses cameras with a larger field of view than the visual field of the user. The level of delay compensation is improved with Gate Recurrent Units (GRU)-based head-motion prediction method.
References
[1] J. Lin et al., "Efficient Projection and Coding Tools for 360° Video," doi: 10.1109/JETCAS.2019.2899660
[2] P. Hanhart, X. Xiu, Y. He and Y. Ye, "360-degree Video Coding based on Projection Format Adaptation and Spherical Neighboring Relationship," doi: 10.1109/JETCAS.2018.2888960
[3] Y. Wang, D. Liu, S. Ma, F. Wu and W. Gao, "Spherical Coordinates Transform-Based Motion Model for Panoramic Video Coding," doi: 10.1109/JETCAS.2019.2896265
[4] D. V. Nguyen, H. T. T. Tran, A. T. Pham and T. C. Thang, "An Optimal Tile-based Approach for Viewport-adaptive 360-degree Video Streaming," doi: 10.1109/JETCAS.2019.2899488
[5] C. Ozcinar, J. Cabrera and A. Smolic, "Visual Attention-Aware Omnidirectional Video Streaming Using Optimal Tiles for Virtual Reality," doi: 10.1109/JETCAS.2019.2895096
[6] L. Sun et al., "A Two-Tier System for On-Demand Streaming of 360 Degree Video over Dynamic Networks," doi: 10.1109/JETCAS.2019.2898877
[7] Y. Sanchez, G. S. Bhullar, R. Skupin, C. Hellge and T. Schierl, "Delay Impact on MPEG OMAF’s tile-based viewport-dependent 360° video streaming," doi: 10.1109/JETCAS.2019.2899516
[8] J. Zhao, R. Xiong and J. Xu, "OmniCast: Wireless Pseudo-Analog Transmission for Omnidirectional Video," doi: 10.1109/JETCAS.2019.2898750
[9] T. Aykut, J. Xu and E. Steinbach, "Realtime 3D 360-degree Telepresence with Deep-learning-based Head-motion Prediction," doi: 10.1109/JETCAS.2019.2897220
Welcome to my personal blog. I write to share what I learned, experienced, and created.
Subscribe to:
Post Comments (Atom)
Năm 2022 nhìn lại
Một năm bận rộn cũng sắp kết thúc. Để bố kể cho Sóc nghe về năm nay của nhà mình nhé. Nửa đầu năm là thời gian mà cả bố mẹ đều lao đầu vào c...
-
Behavioral Targeting là gì? Chắc hẳn đã có nhiều lần bạn cảm thấy khó chịu và phải tắt các quảng cáo không phù hợp khi đang lướt web. Beh...
-
Albert Einstein được xem như là một trong những nhà bác học vĩ đại nhất mọi thời đại. Tên của ông đồng nghĩa với thiên tài (genius). Hai c...
-
Phần lớn tiếng Nhật tôi học được là từ việc xem các chương trình Tivi, đặc biệt là các chương trình hài. Hồi mới sang Nhật, có một bạn sinh...
No comments:
Post a Comment