Nguyễn Văn Đức: Overview of IEEE ISM 2017

The 19th International Symposium on Multimedia was held at the Splendor Hotel, Taichung city, Taiwan from Dec. 11 to Dec. 13, 2017. This year, the conference covers broad and diverse topics of multimedia computing, which includes the following main topics:

360-degree video and image

Immersive media such as 360-degree videos is becoming more and more important, being supported by YouTube, Facebook and other streaming platforms. The first keynote speech by Prof. Girod from Standford University gives a very nice overview of immersive video for Head-Mounted Displays. His research group is now focusing on generation of stereoscopic, 6 Degree of Freedom (6DoF) immersive video content. There are four papers addressing different problems of 360-degree video delivery. Our paper proposes a novel adaptation approach for viewport-adaptive streaming of 360-degree video. The second paper studies the optimal encoding ladders for tiled 360-degree video. The problem is formulated as an optimization problem that considers not only the distortion but also the system resources such as storage cost. However, their solution is not specific to 360-degree video. The third paper performs a perceptual analysis of perspective projection for Viewport rendering in 360-degree image. The analysis focuses on two projection-related parameters: 1) the distance from the projection center and the video center and 2) the Field of View. Yet, the way they carry out the subjective test is not appropriate as the viewers watch the content on a flat screen instead of the HMD. The fourth paper studies three QoE aspects which are immersion, interaction, and Visual Quality in Interactive 3D Tele-Immersion applications. For that purpose, the authors have designed a penalty shootout game and carried out subjective tests. The results show that using HMDs such as Oculus Rift results in better user experience compared to a third person view on 3D TV. Also, the immersion and interaction are very important to the user experience.

Learning

Understanding multimedia content is a crucial task in many applications such as camera surveillance. Many papers apply deep learning for crowd scene understanding, visual relationship recognition (e.g., text-to-image translation for robot), human action classification, and automatic classification of microstructures in thermal barrier coating images, real-time annotations of motion data stream, 3D action recognition. The use of convolutional neuron network (CNN) for non-reference Image Quality Assessment (blind IQA) is also proposed. There is a very interesting paper that proposes a compression framework for deep learning models. Their results show that using a simple quantization combining with arithmetic coding can reduce the bitrate by 92% with minimal impact on the accuracy.

Retrieval, recommendation, and summarization

There are two papers regarding personalized video recommendation. The first paper applies machine learning to 1) identify users behind a shared account and 2) predict each user’s preference based on ‘contextual information’. The second paper proposes a framework to automatically select features on Factorization Machine based Context-aware recommendation systems. As for summarization, a new summarization method for blog articles using image-text alignment techniques has been proposed. Another paper proposes a new approach for automatic summarization of video collections that leverages a structured minimum-risk classifier and efficient submodular inference.

Visual Aspects

The first paper presents a method for automatically detecting a good surface in a daily living and working space to support improvisatory projection without a pre-installed projection surface. The second paper proposes a new framework to convert textual instructions into coherent visual descriptions (text instructions annotated with images).

Video Streaming

The first paper proposes to modify Peer-to-Peer Streaming Peer Protocol (PPSPP) to support streaming over Wifi P2P connection. The second paper proposes a SDN-enabled optimization-based scheme for optimally sharing the bandwidth among network flows within a residential gateway. The scheme targets online game flows and try to provide them with a higher QoE while not starving other traffic flows. The third paper presents a new scheme that limits energy consumption in a transcoding system. The fourth paper introduced a Dynamic Rate Controller (DRC) for conversational video streaming applications, especially for HDVC. DRC uses the novel concept of future budget, plus a window based bitrate history to adjust to bandwidth changes faster and with higher quality than other rate controllers.

The best paper award is awarded to a paper investigating the quantitative determinants of film mood across different types of scenes. The film scenes are classified by their location, time of day, and their use of dialogue and music. It is found that the mood ratings and their quantitative determinants differed across the scene types. There is also several demos: Deep learning based throughput estimation, real-time pattern recognition.

Nguyễn Văn Đức

Pages

Sunday, December 17, 2017

Overview of IEEE ISM 2017

No comments:

Post a Comment

Năm 2022 nhìn lại

Search This Blog