The 19th International Symposium on Multimedia was held at the Splendor Hotel, Taichung
city, Taiwan from Dec. 11 to Dec. 13, 2017. This year, the conference covers
broad and diverse topics of multimedia computing, which includes the following
main topics:
- 360-degree video and image
Immersive
media such as 360-degree videos is becoming more and more important, being
supported by YouTube, Facebook and other streaming platforms. The first keynote
speech by Prof. Girod from Standford University gives a very nice overview of
immersive video for Head-Mounted Displays. His research group is now focusing
on generation of stereoscopic, 6 Degree of Freedom (6DoF) immersive video
content. There are four papers addressing different problems of 360-degree
video delivery. Our paper proposes a novel adaptation approach for viewport-adaptive
streaming of 360-degree video. The second paper studies the optimal encoding
ladders for tiled 360-degree video. The problem is formulated as an
optimization problem that considers not only the distortion but also the system
resources such as storage cost. However, their solution is not specific to
360-degree video. The third paper performs a perceptual analysis of perspective
projection for Viewport rendering in 360-degree image. The analysis focuses on two
projection-related parameters: 1) the distance from the projection center and
the video center and 2) the Field of View. Yet, the way they carry out the
subjective test is not appropriate as the viewers watch the content on a flat
screen instead of the HMD. The fourth paper studies three QoE aspects which are
immersion, interaction, and Visual Quality in Interactive 3D Tele-Immersion
applications. For that purpose, the authors have designed a penalty shootout
game and carried out subjective tests. The results show that using HMDs such as
Oculus Rift results in better user experience compared to a third person view
on 3D TV. Also, the immersion and interaction are very important to the user experience.
- Learning
Understanding
multimedia content is a crucial task in many applications such as camera surveillance.
Many papers apply deep learning for crowd scene understanding, visual relationship
recognition (e.g., text-to-image translation for robot), human action
classification, and automatic classification of microstructures in thermal
barrier coating images, real-time annotations of motion data stream, 3D action
recognition. The use of convolutional neuron network (CNN) for non-reference
Image Quality Assessment (blind IQA) is also proposed. There is a very interesting
paper that proposes a compression framework for deep learning models. Their
results show that using a simple quantization combining with arithmetic coding can
reduce the bitrate by 92% with minimal impact on the accuracy.
- Retrieval, recommendation, and summarization
There
are two papers regarding personalized video recommendation. The first paper
applies machine learning to 1) identify users behind a shared account and 2)
predict each user’s preference based on ‘contextual information’. The second
paper proposes a framework to automatically select features on Factorization
Machine based Context-aware recommendation systems. As for summarization, a new
summarization method for blog articles using image-text alignment techniques
has been proposed. Another paper proposes a new approach for automatic
summarization of video collections that leverages a structured minimum-risk
classifier and efficient submodular inference.
- Visual Aspects
The
first paper presents a method for automatically detecting a good surface in a
daily living and working space to support improvisatory projection without a
pre-installed projection surface. The second paper proposes a new framework to
convert textual instructions into coherent visual descriptions (text
instructions annotated with images).
- Video Streaming
The
first paper proposes to modify Peer-to-Peer Streaming Peer Protocol (PPSPP) to
support streaming over Wifi P2P connection. The second paper proposes a
SDN-enabled optimization-based scheme for optimally sharing the bandwidth among
network flows within a residential gateway. The scheme targets online game
flows and try to provide them with a higher QoE while not starving other traffic
flows. The third paper presents a new scheme that limits energy consumption in
a transcoding system. The fourth paper introduced a Dynamic Rate Controller (DRC)
for conversational video streaming applications, especially for HDVC. DRC uses
the novel concept of future budget, plus a window based bitrate history to
adjust to bandwidth changes faster and with higher quality than other rate
controllers.
The best paper
award is awarded to a paper investigating the quantitative determinants of film
mood across different types of scenes. The film scenes are classified by their location,
time of day, and their use of dialogue and music. It is found that the mood
ratings and their quantitative determinants differed across the scene types.
There is also several demos: Deep learning based throughput estimation, real-time
pattern recognition.
No comments:
Post a Comment