Peerdal: 2010

November 14, 2010

Large-Scale Delivery of Time-shifted Streams

Time-shifted streaming is a general term, which covers two main services: catch-up TV and life-streaming.

In catch-up TV, a program normally broadcast at time t can be viewed at any time after t (from a few seconds to many days). Catch-up TV is provided through network digital video recorders or personal video recorders. Catch-up TV is gaining in popularity: it accounts now for 14% of the overall TV consumption in UK households equipped with DVRs. It is also the TV usage that grew at the highest rate in 2009 in the US. However, despite the efforts of many companies, including the French SME Anevia, catch-up streaming services are still expensive to deploy because conventional disk-based VoD servers cannot massively ingest content, and keep pace with the changing viewing habits of subscribers. Moreover, clients require distinct portions of a stream so no group communication techniques such as peer-to-peer and multicast protocols can be used. Therefore, only big media actors and TV incumbents can offer these services at large scale, which is bad for innovation. Here, a peer-assisted architecture could help start-up and non-profit associations to also propose time-shifted streams to their users.

In parallel, a new form of streams is emerging: life-streams. The concept originally coined by Vannevar Bush is currently revisited by popular social network tools like twitter: every user is a producer of a life-stream, a stream of personal data that is inherently made public in order to be consumed by friends. Should life-streams joint with multimedia data generated by passive life experience capturing systems, the traffic related with these life-stream applications will become huge. In parallel, the proliferation of sensors and the rise of the Internet of things are expected to generate also a large amount of data streams. Both life-streams and sensor-generated streams require time-shifted navigation. Here, these services reveal another critical issue of time-shifted streaming systems: privacy protection. Sensitive life-streams or personal sensor-generated streams highlight the ethical limitations of any architecture having a potential point of control: lesser privacy protection, data lock-in or third-party control. We need a fully distributed system guarantying that the whole stream is actually available, including the most unpopular past portions, and that any past portion can be fetch.

Preserving the privacy of users, and lowering the infrastructure cost for innovative newcomers. Here are the two main motivations for decentralized peer-to-peer systems. It is also a topic that I want to explore further, following two recent papers: here and here. Anybody onboard?

October 8, 2010

Delivering User-Generated Content (UGC) in Massively Multiplayer Online Games (MMOG)

A recent series of interviews of gamers conducted by i2media has revealed that the motivations for playing MMOG are a mix of boredom, challenge, relaxation and socializing. This result confirms the results of previous studies on the same topic. The latter point -- gaming for socializing -- has motivated the CNG project (a EU-funded project I am involved in). Indeed, the tools that enable socializing in a game without "alt-tabbing" are rare (Playxpert and Xfire). In the CNG project, we aim at developing tools that allow gamers to share User-Generated Content (UGC). The interviews have revealed that three scenarios are especially expected by players of MMOG:

a player broadcasts live screen-captured video of its game to any other player (in order to show off)
a player streams live screen-captured video of its game to a restricted group (typically a guild)
a player streams animated virtual 3D objects. The “clients” are players whose virtual position is close to the virtual position of the object.

The owner of the game is in charge of managing the "game server", which ensures the consistency of the game, and delivers the content on time. The management of the UGC is definitely not its concern. Here I come: and if the in-game UGCs are delivered in a peer-to-peer fashion? Neither cost, nor responsibility, the peer-to-peer architecture is quite attractive here.

Basically, the gamers' demand can be meet by implementing peer-to-peer live streaming systems in the game, which is a topic that has already been extensively studied. However, there are some specificities.

the diffusion of one live video stream requires that every peer receiving the video contributes with some physical resources. A peer that receives several videos will eventually reach its limits. In the scenarios 1 and 2, avoiding congestion at a peer is not difficult: it is enough to not authorizing a almost-congested peer to receive a video. For the scenario 3, the congestion management is more tricky: a player that is unfortunately too near from several UGC objects may experience congestion. Our idea is to revisit the concept of area of interest (AoI).
the game is the main motivation for the gamer, the socialization is an option. This is also true for the computing resources: the bandwidth, the CPU and the memory must always be reserved in priority to the game. In other words, the peer-to-peer video streaming system should be friendly to the other applications running on the computer of the end-users. Many classic solutions, including the ones based on Random Linear Coding, are immediately rejected because of their resource consumption. Our idea is to use rateless coding.
the live video should be delivered at about the same time for all players, especially in the scenario 3 where two players in the same region should see the same object in about the same shape. Our ideas are here still unclear.

All in all, the CNG project has good chances to be exciting. At least, it represents a good opportunity to tackle not-so-artificial challenges and to implement our solutions in real systems.

September 29, 2010

Open-source software in the Internet of Things: why we need repository-less package management system

Software has become one of the most critical User-Generated Content (UGC). The number of software that are daily created or updated is overwhelming: the SourceForge community aggregates more than 2 millions of software producers, contributing on 240,000 projects software. The increasing popularity of application stores (e.g. more than 180,000 applications in the Apple Store) confirms several trends in the software industry:

crowdsourced software has become a key economical argument. Apple typically takes advantage of the number of third-party applications that are available exclusively on its devices. The capacity to offer, in a short time, the largest and most diverse amount of software and services is a challenge. In this context, most large actors of the communication industry, including phone manufacturers and network operators, propose incentives for developers (from monetary compensation to open access to data and API), which tend to reinforce the proliferation of new software.
pervasive environments need crowdsourced software. The explosion of the number of devices, as well as commercial issues (especially the time-to-market), induce a gigantic demand for software development. Actually, this demand exceeds by far the capacity of classic software producers. For example, the strength and dynamism of the Linux community is a key factor explaining the rising popularity of Linux OS for small devices.

In comparison to classic UGC aggregation, the management of user-generated software is a challenging task. Indeed, modern software often consist of a huge number of small packages. These packages have inter-dependent relationships that may easily be broken during the deployment life-cycle. Thus finding an efficient and reliable way to maintain, distribute and install these software packages over billions of machines is definitely an issue. In the current approach, software distributors rely on a set of repositories, which are centralized servers collecting all the packages that have been certified. We distinguish two major drawbacks in this architecture:

the certification of packages. The software distributor plays the role of a certification authority. Users must deposit their packages if they want them to be integrated into the repositories. The distributor verifies the integrity of the submitted packages and makes the valid ones available for other users to download. As addressed in the EDOS project, there exist various approaches and tools facilitating the management of large repositories of packages. However, the centralized structure requires expensive infrastructure and extra human management. The process of certificating third party packages is slow and complex. Typically, developers complain about the increasing delay for software availability in the Apple Store. Clearly, a centralized certification of packages does not scale. It is also a severe threat for the privacy of users.
the delivery of packages. It has been emphasized by Microsoft researchers that a set of repositories can not ensure a fast, planet-scale, delivery of packages. However, massive delivery of software patches is a key security requirement. If the number of devices grows as it is commonly admitted in the Internet of Things vision, the limits of a centralized repository-based architecture will soon reach its limits. Moreover, devices in pervasive environment are not necessarily always connected to the Internet. We need to also rely on intermediate devices and opportunistic ad-hoc communications if one wants to upgrade all devices, including the tiniest ones.

We need to revisit, in a clean-slate approach, the package management system: a fully distributed (repository-less) system, which presupposes a modification on the common inter-dependent relationships between packages. We propose an internship, which is expected to be a small first step in that direction.

September 15, 2010

One academic world, two divergent ways to live it

The academic world is like the media industry. Some actors understand the opportunities offered by digital world, the others are still unable to revolutionize themselves in order to fit with our century.

On one hand, you have the unexpected success of a Q&A website devoted to Theoretical Computer Science. Anybody can post question, anybody can suggest answers, anybody can vote on the relevance of these answers. A reputation score is given according to the number of received votes, this reputation score allowing you to slowly become a kind of administrator. Such a website is often associated with chatting teenagers. In this case, more than one thousand of serious academic people subscribed (a third of the whole community?), and now these serious people everyday chat about problems related with theoretical computer science. The bootstrap was uneasy, but the success is here. For example, quantum computing was a hot topic today with two threads. Active participants include PhD students, unknown people, distinguished professors...

Wait, these guys who are expected to review the crappy papers I submitted to prestigious journals are wasting their time chatting with friends instead of doing their job? Well, it seems that the emerging conversation between scientists is worth spending a significant time on the website.

On the other hand, you have the editor of a journal in the network community. I reviewed a bad-but-not-so-bad paper two months ago. The editor sent me a kind email yesterday in order to inform me that, based on the different reviews (two or three reviews I suppose) the paper has been rejected. I kindly requested the other reviews. I just want to know what other scientists who read the same paper as me have thought about this paper. Were they as harsh as me? Were they annoyed by the same weaknesses as me? Did I miss important flaws? Did I misunderstand some points? I received a kind reply "Sorry we don't do that". The reviews exist, but the reviewers cannot access them because the editor has decided so. In parallel, I am in a Program Committee (PC) for a workshop. The reviewing platform does not authorize me to look at the papers that have not been assigned to me. I complaint te PC chair, but his reply was "The main task of TPC members is to give their technical opinion about the papers assigned to them. It would not be of any use if you could access the other papers, since those papers will have their own TPC members." And if I just want to do something that has no use for you, but has interest to me? And if I want to review other papers just for fun? And if I found that a paper that has not been assigned to me deals with a topic I find interesting? And if I want to contribute to the discussions about an exciting paper?...

I am not surprised that it is more and more difficult to find motivated reviewers able to write their reviews on time. What is the incentive to write a review if it is not part of a conversation? The collaborative work about the P vs NP story has demonstrated that collaborative reviewing is far better than just a sum of blind reports.

We could obviously go farther. Here is a list of small changes, ranked from the easiest to the most difficult to admit:

all TPC members access all papers,
all TPC members access all reviews,
all TPC members write reviews for any paper,
all authors access all papers,
all authors access all reviews,
all authors write reviews for any paper,

Would it be a perfect way to prepare a workshop where participants actually discuss?

September 6, 2010

Research in decentralized peer-to-peer: death and need

Gnutella and Kazaa appeared at the end of the last century. The promises of these systems has fostered a intense research activity in the area of peer-to-peer networks. The two most cited papers in Computer Science between 2000 and 2010 are both related with peer-to-peer systems. At that time, the motivations that researchers were authorized to admit were the scalability, and the dependability. The design of free systems (i.e. without any central authority) has never been a convincing argument neither for reviewers, nor for funding agencies. For example, two classic papers in the literature of peer-to-peer -- bit-torrent and freenet -- have been published in minor crappy conferences.

So far, data-centers have demonstrated to be scalable and dependable. In this context, the interest for peer-to-peer systems declines. Immediately, the main conferences dealing with peer-to-peer have claimed to be open to submissions of papers being not totally distributed: it is the time of peer-assisted architectures, and overlays of devices controlled by a central authority (e.g. set-top-boxes). See for example this paragraph in the Call for Papers of the ninth workshop on Peer-to-Peer Systems (IPTPS 2010)

"This year, the workshop's charter will be expanded to include topics relating to self-organizing and self-managing distributed systems. This is in response to recent trends where self-organizing techniques proposed in early peer-to-peer systems have found their way into more managed settings such as datacenters, enterprises, and ISPs to help deal with growing scale, complexity, and heterogeneity. In the context of this year's workshop, peer-to-peer systems are defined to be large-scale distributed systems that are mostly decentralized, are self-organizing, and might or might not include resources from multiple administrative domains."

Another consequence is that the only area where peer-to-peer experts can reasonably argue that pure peer-to-peer systems make sense -- live streaming systems -- has received an dramatic attention fueled by tons of grants: more than seven thousands papers containing the words peer-to-peer, live and streaming have been published since 2009. From an algorithmic perspective, the similarities between pure peer-to-peer and Content-Centric Networking make that this latter is becoming a hot topic among the peer-to-peer experts. To my opinion, the gap between this sudden peak of scientific works and the need for research in these areas is huge.

But, what about the research about fully decentralized peer-to-peer architecture for free systems? The troubles around Wikileaks, the recurrent funding issues faced by free services like wikipedia or arXiv, and the terrible privacy problems of current social platforms should invite every reviewer (not only in conferences but also in funding agencies) to consider the "free systems" motivation as critical.

September 1, 2010

Content-Centric Networking and the Revolution of Content Delivery

Many scientists in the networking field are excited by what has been initially called Content-Centric Networking. Recently, two national funding agencies have announced large projects in this area: Named Data Networking by the US NSF, and Réseaux Orientés au Contenu by the French ANR. Here is my focus on this topic.

It seems that new generations of Internet routers will have the capacity to cache content. Their future deployment represents an opportunity to revisit the techniques that are currently used in the Internet to deliver content. So far, the flaws of the Internet and the drawbacks of IP-layer multicast have been overcome by Content Delivery Networks (CDN) such as the Akamai network. In brief, a CDN is comprised of around a hundred of thousands servers, which are located as near as possible of end-users' networks. These servers are in charge of storing and delivering the content of their clients (here, some service providers) to the end users. Somehow, the predominance of CDNs is a part of the network neutrality debate, because small service providers can not have the same quality of service than Akamai-powered incumbents.

The seminal works done at the Palo Alto Research Center (PARC) addressed the fundamental issue of routing queries and data based on content name. These works enable the exploitation of the caching feature of new Caching Routers. However, the management of thousands of in-network Caching Routers is still an open question, which has to take into account:

the distributed nature of this caching system. Contrarily to the centralized management of CDN, the envisioned network of Caching Routers is by nature distributed: every Caching Routers is expected to decide by itself whether a content that it routes should be cached or not. Moreover, a claimed objective is to retain the simplicity and scalability of current Internet protocols. Actually, Internet works because it is simple, let's stick to this approach.
the complexity of the peering relationships between autonomous networks in the Internet. Internet is a loosely-coordinated aggregation of networks. The equilibrium of the whole Internet depends on the selfish actions of every network. The deployment of Caching Routers is among the few events that have the potential to significantly impact the behavior of inter-network relationships, and affect the global Internet.
the evolution of content. Cisco claims that video traffic will represent 90% of the overall Internet traffic in 2014. If video clips à-la-YouTube can be treated as a classic cacheable content object, many other forms of video services are emerging. In particular, time-shifted streaming is becoming a major trend, for TV of course, but also for potential life-streaming systems (lifecasting). As we have recently showed, these new forms of video consumption represents a challenge for network management.

These challenges are actually exciting. Yet, as usually in the networking community, scientists work in close projects, and prepare papers, which are submitted in prestigious conferences like Infocom, or NSDI, but are too rarely released in an open library like arXiv.

August 17, 2010

The emerging web-based science era

I have been looking for incentives to start blogging for almost five years. I think it is the right time now. The recent events related with the (not so convincing) P vs. NP proof were the ultimate trigger, which let me think that academic science has entered into a new era.

During the summer, I had the feeling that the everlasting flow of criticisms against the flawed academic processes becomes stronger. Some scientists still try to address classic problems of peer reviewing, or to revolutionize the habits of their community. The most cynical ones emphasize that the actual reality of academic sciences is that it is all for the Art of writing papers. In this depressive context, the "P is not equal to NP" truc occurred. Here is the timeline of this event.

As mentioned in this NYTimes article, one of the greatest outcomes of this "bomb" is the spontaneous collaboration, which has emerged between scientists. In my opinion, it is not so surprising. Indeed, the community of "Theory of Computer Science" has already embraced many tools of the web. Many scientific leaders blog and tweet with high frequency. Look for instance at this aggregator. They also make efforts to put online the articles accepted at the flagship conferences.

In comparison, the communities related with Computer Science that I know are not so well organized. Typically, I observed that no scientist tweets during the PODC conference! I am also not aware of a lot of bloggers among the networking community. Therefore, when a major new idea is published, the debates are not online. Every team silently work in order to produce the most artistic paper, which will be accepted at the next must-be conference. For example, look at the (lack of) reactions to this great post from one of the very few blogging scientist in the network community.

In this context, I see my tiny blogging contribution as an experiment of new scientific collaborations beyond conferences, journals, h-index, etc. Besides, I want to be ready when my communities will switch to the new web-based science era!