November 28, 2011

Incremental improvements for CS conferences

Scientists like to debate about the general organization of academic life. Lately, some have called for a clean-slate revolution based on open archives. Yet, as for the majority of clean-slate proposals on well-established processes, I am doubtful that such a shift can occur. But in the meantime, nothing is done to actually fix the issues of the current process. In particular, I have the feeling that academic conferences in computer science (at least in my communities, which span networking, multimedia and distributed systems) are getting worse, and it seems that nobody cares because the most active researchers in this area are too busy preparing their utopian clean-slate revolution.

So, let me try to give below four incremental improvements that every serious conference should implement, for the sake of a better academic life. Two are quite easy:
  • no more deadline extension. A deadline extension is the irrefutable proof that a conference is crappy. A deadline extension means indeed that either the conference does not attract enough solid submissions or the scientists who submit in this conference are unable to finish a work on time. In both cases, it would be a shame to be associated with such a conference. Furthermore deadline extensions bring at least three very negative effects.
    • it creates an unfair gap between the happy fews who are in the awareness and the others. A scientist who knows in July 2011 that ICC deadline will be Sep 28 has a different schedule than the other scientist who naively thinks the deadline is Sep 6. 
    • it is now folklore to announce an extension a few hours before the deadline. This is highly irrespecutful for the (unaware) authors. Week-ends can be ruined to fulfill a deadline, which you discover on Monday has been extended for two weeks.
    • the day before a submission is stressful. A (lately announced) deadline extension multiplies the number of deadline-stressful days by two. Deadline extensions are killing me.
  • a list of accepted papers on the conference webpage the day of the notification. Why is it so hard? An ugly txt-formatted list of accepted papers is just what most scientists want for. From such a list, it is possible to find a link toward an ArXiV or a technical report on the webpages of the authors of accepted papers. Moreover, titles are inspiring, the sooner every scientist can read the titles, the more inspiring it is. And don't forget curiosity of course. Who did pass the cut this year?
Two other improvements are less incremental, but I think their impact would be worth.
  • no blindness at all. The debate about single vs. double blind is a classic. But very few scientists discuss the blindness of reviewers. There is however a raise of complains about the reviews that are too harsh, scientifically wrong and impolite. It is not hard to believe that if the reviews were signed by their authors, they would be written more carefully. Some argue that this would bring potential desires of revenge among scientists. This ridiculous argument assumes that scientists are no better than kids unable to recognize argued criticisms and unable to retain their negative thoughts. If you are not optimistic about human nature, you should notice that research communities are enlarging. So, the revenge desires of a few bad scientists have really few chances to affect you because the probability that these bad guys represent a majority of reviewers for one of your paper is actually very low. Not mentioning that, academic revengers being stupid people, they are probably not in the committees of top-conferences, so you have nothing to lose. And if you face a majority of reviewers who want to unfairly reject your papers because of your previous bad reviews, well it may be time to consider writing better reviews.
  • open access to papers. I have already signed this pledge about open access. I know that academic professional societies (ACM, IEEE and so) have to re-invent themselves but we will not wait them to do it. We cannot degrade the quality of the scientific activity just because a few jobs are in stake.
I think it is the role of the program committee members to alert their chairmen that the academic life would be far better if conferences stick to these simple rules.

November 9, 2011

What's up in networks (3/3): dash

The last post in this mini-series. After openFlow and hetnets, here is dash.

DASH or Dynamic Adaptive Streaming over HTTP
Although it is not exactly what the MPEG scientists have promoted for a decade, most of today's video traffic is based on HTTP and TCP (Netflix player, Microsoft Smooth Streaming and Adobe OSMF). And it works. The video traffic is exploding: adaptive streaming already represents more than one third of the Internet traffic at peak time, and it is expected to prevail, even on mobiles. Facing this plebiscite, the MPEG consortium has launched the process of standardizing DASH into MPEG.

In short, for a given movie, the video server publishes a manifest file in which it declares several video formats. Each format corresponds to a certain encoding, so a certain quality and a certain bit-rates. All these different videos of the same movie are cut into chunks. A client requesting a movie selects a given video format and then starts downloading the chunks. On a periodic manner, the client tries to estimate whether this video encoding fits the capacity of the network link between her and the server. If she is not satisfied, she considers switching to another encoding for the next chunks. What is the best chunk size, how to estimate the link capacity, what is the best delay between consecutive estimation, how to react to short-term bandwidth changes, how to switch to another encoding… are among the questions that have not received the attention of the scientific community, so every DASH client implements some magic parameters without any concern for potential impacts on the network.

Despite the multimedia scientific community and the video standardization group are large lively communities, many research issues related to DASH have not been anticipated and sufficiently addressed. Among them, I highlight:
  • When several concurrent DASH connections share the same bottleneck, the congestion control mechanism of TCP may be compromise. In fact, a DASH connection is based on TCP, which implements an adaptive congestion control with proven convergence toward a fair sharing of the bottleneck among concurrent connections. By incessantly adapting the flow bit-rate DASH may prevent the convergence of TCP. If network bottlenecks locate on links that are shared by hundreds of concurrent DASH flows, the lack of convergence of the congestion control mechanism is a risk. I may overestimate the impact, but at least understanding the impact of DASH adaptive policy (which seems to use a lot of random parameter settings) on the eventual convergence of a congestion control policy is an exciting scientific topic.
  • When multiple servers store different video encodings of the same movie, the client may incessantly switch from a video encoding to another. A DASH connection works especially well when the bottleneck is always the same, whatever the chosen video encoding. In this case, the adaptive mechanism converges toward the video encoding that fits the bottleneck capacity. But in today's Internet, the content can be located in various distinct locations: CDN servers, Internet proxies, and content routers with caching capabilities. If the links toward the different encodings have different congestion level, the DASH adaptive algorithm may become crazy. 
  • A DASH connection does not support swarming. Swarm downloading (one client fetching a large video content from multiple servers) was expected to be enabled by both the multiple copies of the same content and the chunk-based video format. If every chunk comes from a different server, the congestion cannot be accurately measured. In fact, DASH cannot implement a consistent behavior when multiple paths are used to retrieve the video chunks. 
By the way, DASH is yet another point in favor of HTTP, which is becoming the de facto narrow waist of the Internet. The motivations for using HTTP include its capacity to traverse firewalls and NATs, its nice human-readable names and its capacity to leverage on Internet proxies and CDNs. Somehow, DASH adds congestion control and adaptive content, making the HTTP protocol even more powerful. But the gap between its huge utilization over the Internet and the lack of understanding of its behavior at large scale has the potential to scare network operators. I guess it is the way Internet has always evolved.

November 2, 2011

What's up in networks (2/3): hetnet

Here is the second chapter of the mini-series about some (not-so-fresh) topics in networking area. After openFlow, hetnet.

Hetnet, or the Heterogeneous Cellular Networks:
I am probably not the only one to get bored by GSM cellular networks: they have been created by phone engineers who disliked Internet, they are full of acronyms, they are controlled by an operator, they just works. But cellular networks are now the most common way to access to the Internet. Moreover the devices using these networks are full-featured computers, which are managed by owners who install a lot of applications. The number of devices connected to cellular networks is expected to grow dramatically.

Next-generation cellular networks have good chances to differ from our plain old GSM networks. Here are two technologies that may change the game:
  • femto base stations are small and cheap base stations that anybody can buy and install on its own wired Internet connection (for example here). It means that the clients of a wireless service provider pay (base stations + landline Internet communication + consumed electric power) to improve the infrastructure of the carrier and to have an excellent quality of service at home. Carriers are all jumping into this idea. I still don't understand why would a user prefer to buy a base station and connect to Internet through the 4G although she can use wifi. The main argument is that, wifi wireless spectrum being free and badly managed, a local network can have poor performances because too many wifi access points compete or because too many devices share the pool of wireless channels. The 4G spectrum is licensed and managed by the operator, so some wireless channels can be "reserved" to a user. But if everybody has its own femtocell at home, licensed channels will become scarce too, and nobody will tolerate paying for a femtocell that interfere with the neighbors' ones. In order to tackle this issue, nearby femto base-stations should collaborate to share the wireless spectrum and react to changes in the radio environment (especially when neighbors decide to turn on/off their femto base stations). All scientists interested in peer-to-peer and ad-hoc networks will have fun with the problem of channel allocation: end-users form the infrastructure, ensuring a fair sharing of scarce resources is a challenging objective, clever distributed algorithms should solve the problem, incentives to turn on/off the femtocells should be taken into account. As shown in this article, both deployment and management of femto hetnets are still unclear. Those who are not afraid of acronym orgies can look at these slides for a summary of 3GPP standard and a nice telco-oriented overview of the research problems.
  • direct device-to-device wifi communication is a long-awaited feature. Hurrah, WiFi Direct, which is the official name of this feature in the WiFi alliance, is included in the new Android OS version. At least, wifi direct transmission between devices is becoming a reality, which means that the thousands of academic papers about mesh networks and hybrid ad-hoc cellular networks are suddenly worth reading. However, things have changed. Extending the coverage area of base stations, which has been the most frequent motivation in previous works related to mesh networks, is no longer the main concern of mobile carriers. It is now all about mobile data offloading, that is, avoiding communication via the macro base station. In this context, network operators may combine wifi direct and data caching in devices to reduce the amount of requests sent to the Internet. In other words, strategies related to information-centric networks may turn out to be useful in the wireless world.
In a broader perspective, the over-utilization of wireless networks for accessing the Internet highlights an interesting paradox: the wireless transmission is inherently broadcasting (all devices near the wireless router may hear all messages) although the Internet applications are usually designed for unicast communication (a message has only one destination). The capacity of a mobile carrier to leverage on the broadcasting feature of base stations in their cellular networks may become a key asset.

October 31, 2011

What's up in networks (1/3): openflow

I found time to go a bit deeper into several (not-that-fresh) topics. I hope this quick summary will be of interest for those who did not. First of this mini-series: OpenFlow

OpenFlow, or the Software-Defined Networks:
Thanks to OpenFlow, I now understand the "control plane vs. data plane" idea, which I thought were mysterious magic words allowing telco engineers to recognize themselves. In the OpenFlow world, there are some dumb switches that route packets according to a routing table, and there is a clever controller, which orchestrates these switches. Switch-Controller communication uses the OpenFlow protocol.

The first novelty is that the OpenFlow protocol has been designed at Stanford, therefore (i) it is cool, (ii) software engineers have heard about it, and (iii) it is endorsed by a buzz concept, namely software-defined networking. The second novelty, but a noteworthy one, is that the main network equipment vendors integrate OpenFlow API in their switches (at least Juniper and Cisco). So, it is becoming real: software developers will really be able to control a network remotely.

OpenFlow is both networks and software:
  • In the network area, there is only one truth: every new concept is something already done twenty years ago. Good news for OpenFlow: it looks like MPLS. Therefore OpenFlow is a networking concept. \qed
  • Computer scientists are driven by vaporous concepts like model abstraction, composition and semantic. Guess what? OpenFlow designers dangerously embrace them. Even worse, network scientists have started publishing in POPL and ICFP.
More seriously, OpenFlow meets a demand. More and more "independent" networks have specific needs that cannot been addressed by router vendors. For example the network in a data-center. Private enterprise networks and even next-generation home networks are also complex networks, which would work better if they could be managed according to the wishes of their owner. OpenFlow provides the friendly interface that allow anybody (should (s)he knows programming) to become the network operator for any such network. Needless to say, this perspective brings a lot of excitements and uncertainties (see for example here and here).

October 21, 2011

Was P2P live streaming an academic bubble?

Or is the academic community just disconnected from the reality?

In brief, the motivation for peer-to-peer live streaming is that servers are unable to deliver a live video at large-scale. I know, it sounds crazy in a You-Tube world. In peer-to-peer system, clients should help the poor video provider broadcast its video, without much delay nor quality degradation. To have more fun, no server at all is authorized.

Believe it or not, but Google finds more than 50,000 scientific documents dealing with this issue or one of its variants. Today, only a handful of systems based on a peer-to-peer architecture are used, mostly to illegally broadcast sport events. As far as I know, these systems (released before the crazy scientific boom on the topic) do not implement one thousandth of the scientific proposals described in these 50,000 articles. It seems that the small teams of developers behind these programs haven't found the time to download/read/understand any of these articles.

Was this abundant scientific production useless? Probably not. First, scientists made some practical achievements. For example, the P2P-Next project has released under L-GPL tons of codes implementing state-of-the-art solutions, including the multiparty swift protocol. A protocol is also in the standardization process at IETF. Consequently, the next generation of peer-to-peer programs should be able to cut down TV media industry as it did for music industry. Second, these studies have produced interesting scientific results beyond the P2P streaming applications, for example the robustness of randomized processes for broadcasting data flows in networks. It reminds me the golden era of ad-hoc networks (2000-2005), where scientists had a lot of funs playing with graphs and information, even if only militaries have found these protocol useful. We do understand networks better now!

But, did it deserve 50,000 articles? Of course not. Under-the-spotlights start-ups (Joost) and publicly-funded pioneering companies (BBC) switched back to centralized architecture four years ago although they had a decisive technological advance. It looks like there is no bankable application out there. Maybe it was for the beauty of science, but whoever has funded these research works can only hope that randomized processes in networks will eventually find a way to improve human conditions in the world. Or maybe it was just a good idea to occupy people?

So, yes, P2P live streaming was a bubble. Here are three quick observations, which would deserve a more accurate analysis:
  • An academic bubble starts like a financial bubble. In the latter, no company can take the risk to not invest in an area if all competitors do. In an academic bubble, neither funding agency nor program committee can challenge an abrupt growth in the number of papers in a given area. Therefore scientists obtain quick fundings, publications, and citations, which fuel the bubble. However the academic bubble differs from the financial one because there is no critical damage when the number of papers abruptly drops. The bubble does not hurt when it explodes. So, nobody tries to understand what went wrong. In other words, this bubbling trend can only grow, and the next bubble (content-centric networking?) has good chances to be even bigger.
  • Tracking the next bubble is attractive. Scientists are rewarded on their impact on the community. In this context, the authors of seminal works in this area, for example Chord (nearly 9,000 citations despite distributed hash table has found few usefulness) or SplitStream (more than 1,000 citations for a system relying on a video encoding that has only been used by academics), are rock-stars. Anticipating the sheepish behavior of scientists has become a key academic skill.
  • Scientists are still incapable to focus their energy toward their right client, who were the aforementioned small teams of hackers in this case. This is yet another motivation for revamping the way scientific results are delivered in computer science. Giving free access to papers, releasing the code that has been used in the paper, participating in non-academic events or finding echoes in other communities are among the solutions. Not only to be meaningful, but also to prevent bubbles.
Just an idea: when the bubble is officially there, would it be possible to officially forbid the bullshit motivation paragraphs in the paper? I wish authors would admit that they just want to have fun developing a new model in a useless bubbling scenario.

August 21, 2011

A warm feedback from Sigcomm

The SIGCOMM conference just finished two days ago. Papers, slides, and the video of the talks are online for free. As could be expected, there is no comparison to my experience at ICC. Despite video recording prevented presenters to move on the stage, the talks were excellent: long enough, well prepared, and in a perfect english. For every talk, many questions immediately raised and people actually debate during the coffee break and social events. In brief, Sigcomm is a conference that is worth the price (registration and travel). A series of remarks below:
  • a Sigcomm paper should present "novel results firmly substantiated by experimentation, simulation or analysis." My understanding is that "substantiating ideas" now prevails, and that the novelty has become debatable. Some ideas, which are remarkably substantiated, do not open enough perspectives. For example, deploying wireless antenna on top of data-center racks is a cool idea, but I would not include it in my list of major scientific breakthroughs. Sigcomm program committees are expected to prefer papers that are "exciting but flawed" to the "correct but boring" ones, yet exciting is not always synonyms of inspiring. In this vein, the program includes three papers related to bit-torrent. Come on, we are in 2011! How many scientists are still interested in such an overwhelmingly addressed research area?
  • Europe is back, with six papers. I already mentioned that EU-funded FP7 STREP projects match the characteristics of a competitive Sigcomm paper. This year's program demonstrate the benefits of writing Sigcomm-compatible FP7 project deliverables as all accepted European papers are (sometimes partially) funded by FP7 framework. Such fundings give the opportunity to evaluate a well-identified idea though large-scale deployment. The twenty-six other papers come from prestigious american institutions, which are probably the only places that combine a unique skill in the Art of Writing academic papers and the capacity to substantiate any idea with a bunch of outstanding experimentations.
  • I am not really into measurements, and I will undoubtedly not be. That's probably why I struggle to identify the scientific point behind the six papers that deal with measurement in the program. Indeed, it seems that the main contribution is the result of the measure, not the way these measures have been obtained. They do not present a novel super-approach to make a brand new measurement set. Rather, the idea is that these measures provide key insights of the behavior of a particular application. I agree, but does it deserve a 14-pages LaTeX-written paper? Measurement papers would probably better fit with an infographics (like this one), wouldn't they?
  • I enjoyed some presentations, especially the controversial model that explains the evolution of protocol adoptions, the scheduling of network flows in data-centers, the synchronization of multiple distant data-centers, and the reduction of redundant data transfer.


    July 8, 2011

    Leveraging on collaborative projects to produce better academic research

    Opposing industrial and academic research worlds is a classic discussion. Academics have recently been suspected to address unmotivated problems because they do not manipulate the technologies that are at the core of their research activities. The importance of having an "industrial motivation" behind an academic research is reflected by a statistic: papers authored by at least one industrial researcher represent approximately half of accepted papers in the best conferences in operating systems (15 out of 32 for OSDI'11) and networking (16 out of 32 for Sigcomm'11). These papers monopolize the technical sessions related to new trends, especially datacenter and production network for OSDI, cloud computing and user measurement for Sigcomm.

    In these applied science areas, the best conferences accept papers addressing industry-relevant problems if and only if (i) authors demonstrate the timeliness and relevance of the problem, and (ii) authors carefully evaluate their proposed solutions.
    • problem motivation: a scientist who is only reading papers about a technology can hardly formulate a relevant important problem related to this technology. In order to have an accurate view of the problems faced by companies, a first idea is to spend time there as a visiting researcher, as it is promoted in Google. Another idea is to work with industrials in projects like  FP7 STREP project. I mean, actually work together, and not pretend working together.
    • solution evaluation: a NS2 simulation is no longer enough for a Sigcomm paper. Nowadays, some large-scale infrastructures give free access to scientists (for example Open Cirrus for a large data-center, Planet Lab for an Internet-scale network, Grid 5000 for a grid, Imagin'Lab for a 4G/LTE cellular network). There is no excuse to not test solutions over real infrastructures. However, the access to infrastructure is not sufficient, evaluations should also be based on realistic user patterns. Author of the excellent Hints and tips for Sigcomm authors claims "use realistic traffic models"! Besides using available real traces (for example the amazing network traces from Caida), the idea is again to leverage on a project collaboration with industrials that are able either to deploy a prototype on real clients, or to provide exclusive traces of their real clients.
    Hence, short-term focused collaborative projects are ideal if one wants to write well-motivated well-evaluated industry-relevant papers. But, in this case, why have I never been in position to submit a competitive paper to Sigcomm although I participated in many collaborative projects? Probably because:
    • some of my industrial partners were not really industrial. In large companies, R&D labs are frequently disconnected from the real operational teams, so researchers in these labs are unable to provide substantiated arguments about the criticality of the project, to successfully deploy a prototype, and even to obtain traces from their real clients.
    • in a consortium, every partner has its own agenda. Receiving fundings while minimizing efforts may be the only point all partners agree. I rarely feel that all partners share a strong commitment to make the project actually work. More frequently, the funding acceptance is considered as the final positive outcome, the project itself being only a pain.
    • the project work-plan does not include the writing of a scientific paper. Scientific production is usually seen as a dissemination activity, under the responsibility of an academic partner, although writing a top-class paper requires a precise planning of the contributions of every partner (including milestones and deliverables).
    Now that I understand why successful collaborative projects are critical and why my recent projects have (relatively) failed, I hope I will be able to leverage on collaboration with industrials to do better research (a.k.a. write better papers).