May 9, 2017

Reproducibility in ACM MMSys Conference

Science is a collective action. A researcher aims at writing papers, which inspire other researchers and eventually help them to also make progresses. We are all part of a large collaborative movement toward a better understanding on how things work. In the case of the Multimedia System community, we deal with animated images, more generally objects that active our senses and more specifically how to encode, transport and process them.

Despite this evidence, the scientific community is driven by competitive processes, which sometimes lead to secrecy and unwillingness to freely discuss future work. In particular, since exploiting a dataset is a key asset to get papers accepted, the competitive process may conduct researchers to keep a valuable dataset (or a valuable software) to themselves in the fear that other may exploit it better and faster. This (natural) behavior makes the science progress slower than if a collaborative process was in place.

The “open dataset and open software track” is a tentative to fix this issue in the ACM MMSys conference. The track aims at favoring and rewarding researchers who are willing to share. It aims at making science progress faster, still in the competitive process (we accepted only a subset of the submitted datasets for presentation), but with the collaboration in mind.

The movement for the promotion of reproducible research is ongoing and we are very glad to see that the number of submitted open artifacts has increased since 2011 (the first open dataset track in MMSys history). Previous datasets for MMSys can be found here. This year, we accepted ten papers, which describe dataset and software.

To get one step further, we have embraced the new initiative launched by ACM Digital Library related to reproducibility badges. In very short, the authors of an accepted paper that let other researchers check the artifact they used can be rewarded by obtaining a badge for their paper. We have implemented badges in two tracks in 2017 ACM MMSys.

Badges for Dataset Track

In the Open Dataset Track, we have selected the badges "Artifacts Evaluated – Functional", which means that the dataset (and the code) has been tested by reviewers, who had no problem executing, testing, and playing with it, and "Artifacts Available", which means that the authors decided to publicly release their dataset and their code.

During the selection process, we have acted as usual in academic conference. We have invited a dozen of researchers (who I know are committed to a more reproducible research) to join the committee. Then, we have associated three reviewers to each paper, a paper being a description of a dataset, which is available in a public url. Reviewing an artifact is not the same experience as reviewing an academic paper. To capture a bit more the experience of engaging into the artifact, we have added some unusual questions in the reviewing form, typically:
Relevance and reusability (from 1. Artifact on a niche subject to 4. A key enabler on a hot topic)
Quality of the documentation (from 1. The documentation is below expectations to 3. Crystal-clear)
Artifact behavior (from 1. Bad experience to 3. Everything works well)

Then, as usual in academic conferences, we selected the dataset papers that got the best appreciation from the reviewers. This year, four of them are related to 360° images and video, the currently hottest topic in multimedia community. Such datasets have been cruelly missing so far, so we are very happy to fill this gap. Two artifacts are related to health, two to transport systems and two to increasingly popular human activity.

Badges for Papers in Research Track

In parallel, the organizers of the MMSys conferences have accepted to badge some of the papers that have been accepted in the main "Research Track" of the conference. In this case, the process has been different. First, we waited to know which papers have been accepted. Then, and only then, we have contacted the authors of these accepted papers and proposed them a deal: if you want a badge, you have to first release the artifact in a public website and also to write a more detailed documentation on how to use this artifact. But since we know that this latter instruction could prevent authors to apply for the badge, we authorize those who applied to get extra-pages as Appendix in their papers.

The authors gave us access to a pre-version of the camera-ready version of their papers, then, I contacted another member of the program committee and we both tested the artifact. In that case, we do not have to consider whether the dataset matters for the community or whether it is an enabler. Since the paper has already been accepted, our only mission is to test the dataset and to check if the documentation is enough for any scientist to play with it.

Three papers have followed the process until the end and we are proud to offer them the badges.

January 27, 2017

Attending an MPEG meeting as an academic researcher

I recently attended an MPEG meeting for the first time. I am now used to attending academic conferences (for the best and the worst) but I had never attended a meeting of a standardization group before. Overall, my feedback is very positive and I will probably embrace a bit more the standard circus in the future (hopefully I will not wait forty years before attending another standard meeting).

I especially appreciate the commitment of researchers during the MPEG sessions. The attendees are engaged into a "technical/scientific conversation" with the researcher who presents his contribution. It is in no way comparable to the experience of most academic talks. I identified some of the key differences between a meeting group at MPEG and a standard session in an academic event:
  • The scope of a meeting group is very narrow. For example, I attended the meeting of the ad-hoc group in charge of discussing projections of 360° videos into 2D maps. Every attendee had good reasons to attend this meeting in particular, so free riders were minority. In an academic conference, the Program Chairs tries to schedule the presentations so that papers sharing a similar topic are gathered, but the objectives of these academic papers are often quite different. Instead the contributions during a standard session share the same objectives, which inevitably invite researchers who are experts in the domain to argue about the pros and cons of every contribution, including their owns.
  • The presentation is not the end, it is the beginning of something wider, which is to eventually contribute to a common (un-authored) document. The chairman is in charge of writing a consensual document after the meeting and a presenter aims at convincing attendees that his contribution is worth being included in this document without reserve. In an academic conference, the motivation of the presenter is to be present so that an accepted paper is not withdrawn from the digital library due to no-show.
  • When a presenter is invited to introduce his contributions, it is no showtime. He usually stays at his seat and he scrolls over the document that every attendee has previously opened (most people had a look at the contributions beforehand). There is no talk, no slides, no formalism. Only the presenter, his contribution, and engaged attendees. The debate related to a contribution can be two-minutes long or one hour-long. I found it much more lively than well-formatted slide-based talks.
I also appreciate this feeling of being useful as a "public scientist" in a population that is mostly comprised of private researchers. A scientist has various ways to disseminate the knowledge he is supposed to produce with respect to his public funding and salary. Academic conference is the most common way. Some scientists create start-ups. Some scientists develop strong ties with companies and spend most of their energy collaborating in projects. Good reasons to disseminate in standard meetings include:
  • The contribution from a public academic researcher is (usually) not driven by mercantile private interests. We are supposed to provide something that is closer to The Scientific Truth than what other researchers from competing companies can claim. I understand that one of the missions of a public researcher attending a standard meeting is to ensure that what will eventually become a widely used standard is not an aggregation of patented technologies but rather a scientifically solid and open solution.
  • Every scientist hopes that the fruit of his research will be eventually exploited, whether indirectly to contribute to a better knowledge or directly by integration into an object that is useful to the society. In applied research topics such as computer science, the academic conferences are not necessarily the best way to convey ideas to the companies that are in capacity to exploit a scientific result. The academic world is mostly fuzzy and closed. A standard group appears as a direct way to enable the exploitation of scientific ideas, without restriction.
Of course, the experience of attending an MPEG meeting also includes annoyances: A lot of time is spent at orchestrating the various standard sub-groups, some guys can ruin a whole meeting by interfering with every presenter, the circus is full of jargon and bizarre usages, which prevent a newcomer to join, political and business games exist ... but the advantages are also numerous (including but not restricted to the above). Overall, the balance is in my opinion positive.

October 2, 2015

Can Multipath Boost the Network Performances of Real-time Media?

I would like to emphasize now a paper that deals with multipath networking for video streaming. This paper is:
Varun Singh, Saba Ahsan, and Jörg Ott, “MPRTP: Multipath Considerations for Real-time Media”, in Proc. 4th ACM Multimedia Systems Conference (MMSys '13), Oslo, Norway, Feb. 2013
and it has led to multiple actions in IETF standardization group.

There are multiple routes between two hosts in the current Internet. This statement tends to be even truer when considering the flattening Internet topology, where Internet Service Providers (ISPs) have multiple options to reach a distant host. It is also truer with the multiple network interfaces available in the modern mobile devices and the multiple wireless network accesses that co-exist in the urban environment. The question now is about the exploitation of these multiple routes. The network protocols that are in used today stick to the traditional monopath paradigm. Yet, scientists have shown that leveraging multipath can bring many advantages, including better traffic load balancing, higher throughput and more robustness. 

This paper, which is already two years old, studies multipath opportunities for the specific case of conversational and interactive communication systems between mobile devices (e.g. Skype). These applications are especially challenging because the traffic between communicating hosts should meet tight real-time bounds. The idea of this paper is to study whether the most widely used network protocol for the applications, namely Real Time Transport Protocol (RTP), can be turned into a multipath protocol. They thus propose a backwards-compatible extension to RTP called Multipath RTP (MPRTP).

In short, this paper presents the MPRTP extension and evaluates its performance in several scenarios. First, the authors comment the main challenges that an extension of RTP protocol must face in order to split a single RTP stream into multiples subflows. Second, the authors present the protocol details as well as the algorithms that are considered to solve these challenges. Third, simulations are conducted to evaluate the performance of the proposal.

Authors point out that a MPRTP protocol should be able to adapt to bandwidth changes on the paths by redistributing the traffic load among them in a smooth way to avoid oscillations. This is especially important in the case of mobile communications where quick capacity changes are common. To guarantee fast adaptation, the authors propose packet-scheduling mechanisms that do not abruptly reallocate traffic among congested and non-congested paths if a path becomes suddenly congested. 

Other important issue is the variation on packet inter-arrival time (packet-skew) among the different paths.  The fact of having multiple diverse paths make harder to estimate the right buffer size to prevent this issue. To overcome this problem the authors propose an adaptive playout buffer, which individually considers the path skew in each path. They also privilege the selection of paths with similar latencies.

The choice of suitable transmission paths should consider the path characteristics in terms of QoS metrics as losses, latency or capacity. The authors propose several extensions to the RTP protocol, including a new RTP reporting message (where the receiver provides QoS data per sub-flow) and a scheduling algorithm (where the sender uses these reports to decide a traffic distribution among the available paths). 

All the aforementioned extensions are always designed to be backwards compatibility, i.e. traditional RTP hosts can interoperate with hosts equipped with MPRTP extensions in single-path scenarios. 

An exhaustive battery of simulations is conducted to evaluate the MPRTP performance in a broad range of scenarios: (i) path properties (losses, delays, and capacities) vary along time; (ii) paths share a common bottleneck, and (iii) MPRTP is deployed over mobile terminals using WLAN and/or 3G paths. These evaluations show that (1) the dynamic MPRTP performance is not far from the static performance for single and multipath cases, (2) MPRTP successfully offloads traffic from congested paths to the other ones keeping some proportional fairness among them, and (3) on lossy links multipath is more robust and produces fewer losses with respect to single path.

Overall, this paper addresses a significant problem (how to make a real-time UDP-based protocol multipath) with a comprehensive study. It is one of the first attempts to exploit multipath functionalities in the framework of multimedia communications, and especially with tight real time limitations. This paper thus perfectly completes the works that have been done by the network community on multipath TCP protocols. That being said, many problems related to multipath multimedia protocols are still open. Among others, let us cite rate-adaptive streaming and multiview video in the context of multipath.

September 19, 2015

Understanding an Exciting New Feature of HEVC: Tiles

As an editor of the IEEE R-letter, I write every now and then some short "letters" (one page long easy-going text) about a recent research article that I found especially interesting. I think it is appropriate to have also these letters put in this blog. Thus, I will publish them also here.

The paper I'd like to emphasize is:
Kiran Misra, Andrew Segall, Michael Horowitz, Shilin Xu, Arild Fuldseth, and Minhua Zhou, “An Overview of Tiles in HEVC”, IEEE Journal of Selected Topics in Signal Processing, Vol. 7, No 6, December 2013

The High Efficiency Video Coding (HEVC) standard significantly improves coding efficiency (gains reported as 50% when compared to the state-of-the-art MPEG-4 AVC H264), and thus is expected to become popular despite the increase in computational complexity. HEVC also provides various new features, which can be exploited to improve the delivery of multimedia systems. Among them, the concept of tiles is in my opinion a promising novelty that is worth attention. The paper "An Overview of Tiles in HEVC" provides an excellent introduction to this concept.

The goal of a video decoder (respectively encoder) is to convert a video bit-stream (respectively the original sequence of arrays of pixel values) into a sequence of arrays of pixel values (respectively a bit-stream). The main idea that is now adopted in video compression is the hierarchical structure of video stream data. The bit-stream is cut into independent Group of Pictures (GOP), each GOP being cut into frames, which have temporal dependencies with regards to their types: Intra (I), Predicted (P) or Bidirectional (B) pictures. Finally, each frame is cut into independent sets of macroblocks, called slices in the previous encoders.

The novelty brought by HEVC is the concept of tile, which is at the same "level" as slice in the hierarchical structure of video stream data.

The motivations for both slices and tiles are, at least, twofold: error concealment and parallel computing. First, having an independently parsable unit within a frame can break the propagation of errors. Indeed, due to the causal dependency between frames, an error in a frame can make the decoder unable to process a significant portion of the frames occurring after the loss event. Slices and tiles limit, at least from a spatial perspective, the propagation of an error on the whole frame. Second, the complexity of recent video and the requirements of high-speed CPU speed (which unfortunately requires power and generates heat) can be partially addressed by parallelizing the decoding computation task across multiples computing units, regardless of whether these are cores in many-cores architectures or computing units in Graphics Processing Units (GPUs). The independency of slides and tiles is expected to facilitate the implementation of video decoder on parallel architectures.

Unfortunately, the concept of slices suffers in practice from serious weaknesses, which tiles are expected to fix.

In the paper, the authors introduce the main differences between tiles and slices, which are two concepts that, at a first glance, can be confused. They focus on the motivation for parallel computation.

The first part of the paper explains in details the main principles between both approaches, in particular the fact that tiles are aligned with the boundaries of Coded Tree Blocks (CTD), which provides more flexibility to the partitioning. This brings several benefits: a tile is more compact, which leads to a better correlation between pixels within a tile when compared to the correlation between pixels in a slice. Tiles also require less headers, among other advantages.

The authors also introduce the known constraints to be taken into account when one wants to use tiles today. The whole Section 3 is about the tile proposal in HEVC, and the main challenges to be addressed for a wide adoption. Next, the authors present some examples when tiles are useful. Both parts are written so that somebody being just familiar with the concepts can understand both the limitations behind the concept of tiles and how these weaknesses have been addressed in practice.

The last part of the paper, in Section 5, deals with some experiments, which demonstrate the efficacy of HEVC for lightweight bit-streams and parallel architectures. At first authors assess the parallelization and the sensibility of network parameters, including the Maximum Transmission Unit (MTU), on the performances of slices versus tiles. They finally measure the performances of stream rewriting for both approaches.

In short, the paper shows that tiles appear to be more efficient than slices on a number of aspects. The paper proposes a rigorous, in-depth, introduction of the main advantages of tiles. This can foster research on the integration of tiles into next-generation multimedia delivery systems.

September 2, 2015

Uploading innovative engineers: 15 years remaining

Four years ago, I wrote an outrageous post about how "un-geek" are French engineers in average. Since 2011, many things have changed in France: code is expected to be (soon) taught in elementary schools, successful geek entrepreneurs are in the spotlight, geek-ish schools and co-working hacker spaces flourish, ... It will take time, but, hopefully, France in 2025 will be geek-friendly. Now, what about innovation?

Entrepreneurship has become a cause nationale in France, with a lot of initiatives and announcements. Analysts try to decipher the structural problems regarding innovation in France, in particular an excellent study (in french) about innovation "ecosystems" was released yesterday. Everything said in this article is 100% true... but it misses a point: how "un-innovative" are the French higher-educated people in average.

As a teacher in a high-education engineering school, I have headed an "Innovation & Entrepreneurship" course for 8 years (with some success stories here and there). Every student must follow this course. From my experience of teaching this innovation course to around 180 students every year, I can just recall that the average higher-educated students (usually coming from Classes Préparatoires) struggle to:
  • Deal with uncertainties. The most brilliant scientific students are those who excel at finding solutions to problems. But what about when there is no clearly identified problem? And what about when any solution to a problem has its pros and cons? Most of the students who would have not enrolled in an Entrepreneurship program if they had the choice are very uncomfortable with uncertainties. They are the right targets for innovation mindset re-formatting.
  • Convince. The French education system does not include any training in talking, debating, arguing, more generally communication skills. When every US kid should defend a point in a science fair, the same age French kids are taught how to raise the hand before talking, the quieter the better. Oral debates barely exist at french school. As a matter of fact, it is frequent that students give their very first "public" talk when they are 20 years old. Teaching the art of pitching is necessary for every student.
  • Accept being a failure and a rebel. This is especially true during brainstorming and creativity sessions where it is common that somebody, say Jo, suggests a high-risk or out-of-the-box idea but almost immediately the fear of being judged makes Jo himself overturn his own damned idea. I'd love to put Jo in more creativity training sessions so that he becomes self-confident enough.
The percentage of engineers who have these three core competencies (an innovation-friendly mindset) in 2015 is as low as the percentage of engineers who had a geek-friendly mindset in 2011. Solutions like super-hyped incubators or  state-owned VCs are right but they are similar to providing xDSL broadband connections to geeks in 2000s. It is cool for the happy fews, but it does not change the mindset of the others.

In my opinion, a successful innovation ecosystem is such that everybody in the society (especially every higher-educated worker) has an innovation-friendly mindset. Everybody means here people who do not aim to become entrepreneur and even those who are not directly related to innovation. No society can afford that a majority of higher-educated people have not developed in particular these three key competencies at school. The structural reasons behind this failure for average higher-educated workers are in my opinion more critical than an imperfect innovative ecosystem for a tiny fraction of innovators. Indeed, the lack of inclinations toward uncertainties, communication skills and rebel-attitude is a transmissible disease for any innovative ecosystem.

It is the mission of teachers in high-education institutions to fight the stigmata of twenty years of un-innovative mindset formatting. The special "Entrepreneurship" programs that are commonly offered in other higher-education institutions (or in online courses) do not contribute to this mission because these programs enroll volunteering students who have already overcome their innovation-related mindset limitations. These students are not the right target. To set up a profoundly innovation-friendly ecosystem in 2030, we have to train all higher-educated students now so that innovation will be pervasive in the society, especially at schools, in community groups and in the traditional companies. Hopefully, the ecosystem will then be friendly to entrepreneurs... 

March 24, 2015

Ten years as an academic scientist: preamble of my HdR

Here is the preamble of my HdR, which I will defend on April the 7th 2015 at Rennes.

I defended my PhD thesis ten years ago. At that time, my research domains included peer-to-peer systems, mobile ad-hoc networks and large-scale virtual worlds. Today, these topics hardly get any attention from the academic world. Although most papers published in the early 2000s advocated that centralized systems would never scale, today's most popular services, which are used by billions of users, rely on a centralized architecture powered by data-centers. In the meantime, the open virtual worlds based on 3D graphical representation (e.g. Second Life) fell short of users while social networks based on static text-based web pages (e.g. Twitter and Facebook) have exploded. I do not want to blame myself for having worked in areas that have not proved to be as critical as they were supposed to be. Instead, I would like to emphasize that I work in an ever-changing area, which is highly sensitive to the development of new technologies (e.g. big data middleware), of new hardware (e.g. smartphone), and of new social trends (e.g. user-generated content).

I envy the scientists who are able to precisely describe a multi-year research plan, and to stick to it. I am not one of them. But I am not ashamed to admit that my research activity is mostly driven by short-term intuition and opportunities and that the process of academic funding directly impacts my work. Indeed, despite all of the above, I have built a research work, which I retrospectively find consistent. And more importantly, I have been relatively successful in advising PhD students and managing post-docs, all of them having become better scientists to some extents.

In very short, I have developed during the past ten years a more solid expertise in (i) theoretical aspects of optimization algorithms, (ii) multimedia streaming, and (iii) Internet architecture. I have applied these triple expertise to a specific set of applications: massive multimedia interactive services. I provide in this manuscript an overview of the activities that have been developed under my lead since 2006. It is a subset of selected studies, which are in my opinion the most representative of my core activity.

I hope you will have as much fun reading this document as I had writing it.

November 7, 2014

A Dataset for Cloud Live Rate-Adaptive Video

There is an audience for non-professional video "broadcasters", like gamers, online courses teachers and witnesses of public events. To meet this demand, live streaming service providers such as ustream, livestream, twitch or dailymotion have to find a solution for the delivery of thousands of good quality live streams to millions of viewers who consume video on a wide range of devices (from smartphone to HDTV). Yet, in current live streaming services, the video is encoded on the computer of the broadcaster and streamed to the data-center of the service provider, which in most cases chooses to simply forward the video it get from the broadcaster. The problem is that many viewers cannot properly watch the streams due to mismatches between encoding video parameters (i.e. video rate and resolution) and features of viewers’ connections and devices (i.e. connection bandwidth and device display).

To address this issue, adaptive streaming working along with cloud computing could be the answer. Whereas adaptive streaming allows managing the diversity of end-viewers requirements by encoding several video representations at different rates and resolutions, cloud computing provides the CPU resources to live transcode all these alternate representations from the broadcaster-prepared raw video.

It is well known that the QoE of an end-viewer watching a stream depends on the encoded video and the parameters values used in the transcoding. But, in this new scenario in the cloud, we also need to consider the transcoding CPU requirements. In the “cloud video” era, the selection of video encoding parameters should take into account not only the client (for the QoE), but also the data-center (for the allocated CPU). To set the video transcoding parameters, the cloud video service provider should know the relations among transcoding parameters, CPU resources and end-viewers QoE, ideally for any kind of video encoded on the broadcaster side.

We would like to announce the publication of a dataset containing CPU and QoE measurements corresponding to an extensive battery of transcoding operations in with the purpose of contributing to research in this topic. Most of the credits for this work (and so this post) have to be given to Ramon Aparicio-Pardo.

To elaborate the dataset, we have used four types of video content, four resolutions (from 224p up to 1080p) and bit rates values ranging from 100 kbps up to 3000 kbps. Initially, we have encoded each of the four video streams into 78 different combinations of rates and resolutions, emulating the encoding operations at the broadcaster side. Then, we transcode each of these broadcaster-prepared videos into all the representations with lower resolutions and bit rates values than the original one. The overall number of these operations, representing the cloud-transcoding, was 12168. For each one of these operations, we have measured the CPU cycles required to generate the transcoded representation and we have estimated the end-viewers’ satisfaction using the Peak Signal to Noise Ratio (PSNR) score). We depict a basic sketch of these operations for one specific case where the broadcaster encoded its raw video with 720p resolution at 2.25 Mbps and we transcode it into a 360p video at 1.6Mbps.

We give below an appetizer of how these CPU cycles and satisfaction decibels vary with transcoding parameters. They show some examples of the kind of results that you will find in the dataset, here a broadcaster-prepared video of type “movie,” 1080p resolution and encoded at 2750 kbps. If you wonder how the rest of figures look like, 558 curves and their corresponding 12168 measurements of cycles of hard CPU work and decibels of viewers’ satisfaction are waiting for you in