It has been a while since I read a paper with so much hope, joy, and intrigue. I am talking about the latest work by Van Jacobson and his crew@Parc on Content Centric Networking (CCN). The paper was presented at Co-Next in Rome last month, which by the way is becoming a much stronger venue with more and more interesting pieces of innovative work. CCN is one of the best proposals I have seen in the Content Distribution space in the last decade. I guess the last time I felt this excited about a piece of work in the content distribution space was when I read the BitTorrent paper. CCN basically tries to democratize Content Distribution and re-design the Internet by placing content, and not machines, at its core.
Since the beginning, the Internet has been designed around communications between machines and IP. Most of the Internet traffic today, however, is caused by users retrieving content, and the Internet was not optimized for that (e.g. lack of multicast support). This has often resulted in wasted Internet resources where routers copy over and over again the same data packets or swamped servers. Over the years, we have overcome such problems with imaginative overlay solutions such as Web caching, Content Distribution Networks, and P2P networks, which have worked marvels but which have been also suffered the pitfalls of being after-thought around the original Internet design.
It was about time that someone took a stand and re-designed the Internet protocol stack placing content at the Internet’s core. In this regard, Van Jacobson’s effort makes a lot of sense and it is one of the most interesting proposals for a “Future Internet” design that I have seen (whatever that term means anymore…). Van Jacobson says that CCN is a Copernican revolution since it places content at the center and he hopes it will create the same impact as when the sun, and not the earth, was used as reference point to explain the universe.
CCN provides benefits in various fronts including better usage of Internet resources, location independent content routing, and content security and control. All this is great and could spark a number of innovations, research ideas, and new designs that can catapult this concept to the next level. As I write this post I am trying to clear my thoughts and identify which pieces of this work will have the most impact so it does not become an exercise of what could have been but has not, or drags on for ever in the standardization process as a “solution in search of a problem” as it has been the case for IPv6.
The first interesting part of the work is that it democratizes Content Distribution and ensures that anyone — not just those in position to pay a CDN– can enjoy the benefits of an Internet-broadcast service that amplifies your data whenever and wherever it is needed. With CCN, Content Distribution becomes a “public” service (in the European way) of the Internet. In a sense, P2P has done much of that, providing a public service which publishers can use to propagate their content to many users at very low cost. However, this has been done without taking ISPs into account and that causes various inefficiencies. Instead, CCN happens at the core of the ISP and thus it has more chances of succeeding. Nevertheless, there are already solutions out there for ISPs to deploy cache overlays and ISP-CDNs, thus, making content distribution more efficient for all. So far, whether an ISP deploys content-aware storage infrastructure or not has been an economic problem, and not so much a protocol problem. The decision of whether to deploy storage in the network has been a function of the ISP’s topology, workload, and various economic trade-offs (e.g. cost of bandwidth vs cost of deploying and operating storage nodes), but not the lack of technical elements to doing so. It would be great if CCN could lower the costs for ISPs to deploy storage in their networks, otherwise, an HTTP-based CDN is more likely to be the way to go for many years to come since the investment and the knowledge around it is already high.
The second interesting part of CCN is that it de-couples content and location and the mapping between content and location is done via routing. This is very important. Currently Google (or a similar search engine) does the mapping between keywords and content URL, and then the mapping between the content URL to the machine location (e.g. its IP address) is done via DNS. DNS has been one of the weakest points of the Internet in the last years, being the target of continuous DoS attacks and causing important Internet service disruptions and any solutions in this regard are welcomed. With CCN the DNS functionality is somehow embedded and distributed in each routing node, making it more resilient and scalable. Rather than trusting DNS to map host names to IP addresses, CCN avoids DNS all together and trusts content which can sit in any machine in the path to the data content and which can be retrieved from any cached copy along the path. This also provides very nice support for DTN-like communications where connectivity and arbitrary nodes can appear and disappear instantly. The drawback is that each router now has to do some more work to verify data and keep more state in its routing entries to route across name spaces rather than IP addresses. This can cause some scalability issues and potential DoS attacks, however, I am confident that this is solvable using various optimizations.
Another limitation of the current DNS service that CCN solves is that DNS only resolves host names (e.g. www.foo.com). However, it is not able to resolve different pieces of data under the same host name to different IP addresses (e.g. www.foo.com/file1.html and www.foo.com/file2.html). This limits the possibilities to download parts of content from nearby machines and to do multiple parallel downloads. Alternative approaches are to use different domain names for each file (e.g. www.foo1.com/file1.html and www.foo2.com/file2.html) or to use intercepting proxies with L7 switches, however, none of them are either very convenient because it requires rewriting the content, or deploying expensive hardware equipment. To me fixing the DNS limitations is likely to be one of the strongest selling points of CCN (as long as the extra costs at each router are low).
The third interesting portion of CCN is content security and control. Control and trust are part of the content itself, and not being a property of the IP connections it traverses. Given that any intermediate machine can reply with a cached copy along the path, content needs to be signed with a publishers certificate key and content routers need to verify that the content has been produced by its owner. This permits opening the network to wider participation, determining provenance, tracking where content has been in the network, and evidence based security where it becomes hard for an attacker to succeed in subverting a publisher by forging a fake content with the publisher’s key. Similar mechanisms have been implemented in secure P2P systems such as Microsoft’s Avalanche, and they can be key for CCN’s success. Revocation is also one of the major headaches of CDNs and secure P2P systems, and the current CCN proposal mentions this is part of future work. One last thing that CCN should support is to allow intermediate network nodes to become trusted sources so that they can modify the content as needed (e.g. re-encoding images to fit mobile phones). Both revocation and modifying content on-the-fly may complicate the current CCN design, however, both seem doable. The bigger question around CCN security is what is it different that one can do with CCN in terms of content protection and security that one cannot do protecting content at the application layer (e.g. DRM)? My guess is that provenance and traceability of the content are likely to be in the answer’s bucket.
As you can see, lots of questions but lots of excitement too. One final comment: I hope that it is not too late to see such a clean content networking solution move forward given the plethora of alternative solutions already out there (e.g. CDNs and P2P). The inertia could also be such that by the time something similar to CCN gets deployed on the Internet, the Internet has already changed focus again, say from content networking to video conferencing. Then, it would really feel like we are chasing an evasive ghost, e.g. we design for machine communications and there comes content, we design for content and there comes conferencing, etc…. Ah! one last thing, while reading the CCN paper it came to my mind that it is about time that Google starts doing page rank using content signatures (e.g. Rabin fingerprints) to solve the content aliasing problem: using links is so broken!!