The paper I'd like to emphasize is:
Kiran Misra, Andrew Segall, Michael Horowitz, Shilin Xu, Arild Fuldseth, and Minhua Zhou, “An Overview of Tiles in HEVC”, IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No 6, December 2013
The High Efficiency Video Coding
(HEVC) standard significantly improves coding efficiency (gains reported as 50%
when compared to the state-of-the-art MPEG-4 AVC H264), and thus is expected to
become popular despite the increase in computational complexity. HEVC also
provides various new features, which can be exploited to improve the delivery
of multimedia systems. Among them, the concept of tiles is in my opinion a promising
novelty that is worth attention. The paper "An Overview of Tiles in
HEVC" provides an excellent introduction to this concept.
The goal of a video decoder
(respectively encoder) is to convert a video bit-stream (respectively the
original sequence of arrays of pixel values) into a sequence of arrays of pixel
values (respectively a bit-stream). The main idea that is now adopted in video
compression is the hierarchical structure of video stream data. The bit-stream
is cut into independent Group of Pictures (GOP), each GOP being cut into
frames, which have temporal dependencies with regards to their types: Intra
(I), Predicted (P) or Bidirectional (B) pictures. Finally, each frame is cut
into independent sets of macroblocks, called slices in the previous encoders.
The novelty brought by HEVC is
the concept of tile, which is at the
same "level" as slice in the hierarchical structure of video stream
data.
The motivations for both slices
and tiles are, at least, twofold: error concealment and parallel computing.
First, having an independently parsable unit within a frame can break the
propagation of errors. Indeed, due to the causal dependency between frames, an
error in a frame can make the decoder unable to process a significant portion
of the frames occurring after the loss event. Slices and tiles limit, at least
from a spatial perspective, the propagation of an error on the whole frame.
Second, the complexity of recent video and the requirements of high-speed CPU
speed (which unfortunately requires power and generates heat) can be partially
addressed by parallelizing the decoding computation task across multiples
computing units, regardless of whether these are cores in many-cores
architectures or computing units in Graphics Processing Units (GPUs). The
independency of slides and tiles is expected to facilitate the implementation
of video decoder on parallel architectures.
Unfortunately, the concept of
slices suffers in practice from serious weaknesses, which tiles are expected to
fix.
In the paper, the authors
introduce the main differences between tiles and slices, which are two concepts
that, at a first glance, can be confused. They focus on the motivation for
parallel computation.
The first part of the paper
explains in details the main principles between both approaches, in particular
the fact that tiles are aligned with the boundaries of Coded Tree Blocks (CTD),
which provides more flexibility to the partitioning. This brings several benefits:
a tile is more compact, which leads to a better correlation between pixels
within a tile when compared to the correlation between pixels in a slice. Tiles
also require less headers, among other advantages.
The authors also introduce the
known constraints to be taken into account when one wants to use tiles today.
The whole Section 3 is about the tile proposal in HEVC, and the main challenges
to be addressed for a wide adoption. Next, the authors present some examples
when tiles are useful. Both parts are written so that somebody being just
familiar with the concepts can understand both the limitations behind the
concept of tiles and how these weaknesses have been addressed in practice.
The last part of the paper, in
Section 5, deals with some experiments, which demonstrate the efficacy of HEVC
for lightweight bit-streams and parallel architectures. At first authors assess
the parallelization and the sensibility of network parameters, including the
Maximum Transmission Unit (MTU), on the performances of slices versus tiles. They
finally measure the performances of stream rewriting for both approaches.
In short, the paper shows that
tiles appear to be more efficient than slices on a number of aspects. The paper
proposes a rigorous, in-depth, introduction of the main advantages of tiles.
This can foster research on the integration of tiles into next-generation
multimedia delivery systems.