The Next Few Decades of Computing

Linas Vepstas [email protected]
Draft of July 2000-July 2001

This Draft is Out of Date. Click Here for the Latest Version

Abstract

The future of computing (the next 10-20 years) will be radically different than what we know now, and it is sneaking up on us quickly enough. Below, we name the technologies and concepts that will evolve into this revolution. Along with the revolution, there will also be an evolution that will also have widespread and shocking effects, and there will be shades of grey between the evolutionary and the revolutionary. By 'evolutionary', I am trying to tip my hat to efforts such as enhancements to the web and strategies such as Microsoft's .net, which will be pervasive, but remain firmly rooted in a traditional view of the internet as a medium for client-server communications.

The next incarnation of the Internet will liberate both the content and the CPU cycles from the actual hardware that performs storage and computation. Services such as Publius, Freenet and Napster already hint at what true distributed storage might actually be like, while distributed.net and seti@home hint at what distributed cycle serving might be like. By combining these services with a good content description system, a finder, a dose of e-rights and distributed locks in a trivial-to-administer, secure, chroot jail, the future will be upon us.

Computing Eternity

David Gelernter makes a number of marvelous predictions for the future of computing in his manifesto THE SECOND COMING. The ideas he presents are pure and ideal (save the plug for the lifestreams technology, which comes off as a crass attempt at commericalization.) I suppose David's purity of presentation is all the better to get the concepts across, and they do, indeed, come across marvelously. However, I'm an engineer, and it immediately flies to mind: how do you build such a thing? When he touches on certain points, I can't help but say to myself, "but I already know a prototype of this".

To summarize Gelernter's points in present day technologies, lets examine the following influences/confluences:

Eternity Service
The Eternity Service, (prototype) and related concepts, such as FreeNet, Napster, GriPhiN and Publius, all provide ways of publishing information on distributed networks. Each technology enables a user's home machine computer to participate in a broader network to supply distributed storage. If you think about it, this is very, very different than the defacto Internet today, where web pages are firmly rooted to the web servers that server them up.

If you are reading this web page near the turn of the century, chances are good that you're browser fetched it off of the web server I run at home. Chances are also good that you got it off some caching proxy. I know my ISP runs one. But I know that my life would be a lot better if I didn't actually have to be sysadmin for the server I run at home. I would like it much better if I could just publish this page, period. Not worry about maintaining the server, about doing backups. Just publish it on FreeNet or Publius. If everyone's home computer was automatically a node/server on Publius, and if publius required zero system administration, then I, as a writer/publisher, would be very happy. I could just write these thoughts, and not worry about the computing infrastructure to make sure that you can read this. We conclude that the eternity service is an important component of Gelernter's Manifesto, which he sadly fails to name as an important, contributing technology.

A crucial component of this idea is that of 'zero administration': the ultimate system must be so simple that any PC connected to the net could become a node, a part of the distributed storage infrastructure. The owner of a PC (e.g. my mom) should not have to give it much thought: if its hooked up to the Internet, its a part of the system.

See also:

Search Engines
Gelernter goes on at length about content addressable memory, about how it should be possible to retrieve information from the Eternity Service based on its content, and not based on its file name. Well, we already have an existing analogue of this: the Search Engine. Search engines are still not very good: Google is among the best, but can still return garbage if you don't formulate a good query. Imagine layering a distributed search engine on top of the eternity service. Assuming some google/ask-jeeves type smarts are built into it, we get close to having the kind of content-addressable storage that Gelernter daydreams about.

Unfortunately, search engines do *not* relieve me of the duty of adding hyper-links to my writing. This is a bit tedious, a bit odious. It would be nice if any phrase in this hypertext was in fact a link that more or less pointed to that thing which I might have intended it to point at. If that were possible, then we really would have content-addressable memory. Furthermore, search engines are limited to ASCII text: they are essentially useless for binary content. To find binary content, one must now visit specialized sites, such as rufus.w3.org to locate RPM's, tucows to locate shareware, or mp3.com or scour.net to find audiovisual content. Each of these systems are appallingly poor at what they do: the RPM Spec file is used to build the rufus directories, but doesn't really contain adequate information. The mp3 and shareware sites are essentially built by hand: that part of the world doesn't even have the concept of an LSM to classify and describe content! (LSM is a machine-readable format used by metalab.unc.edu to classify the content of packages in its software repository.)

LSM's, Name Spaces and Self-Describing Objects
There is another way to look at the problem of searching and finding an object based on its content, rather than its 'unique identifier'. Filenames/filepaths/URL's are essentially unique identifiers that locate an object. Unfortunately, they only reference it, and maybe provide only the slimest of additional data. For example, in Unix, the file system only provides the filename, owner, read/write privileges, modification/access times. By looking at the file suffix one can guess the mime-type, maybe: .txt .ps .doc .texi .html .exe and so on. File 'magic' can also help guess at the content. URL's don't even provide that much, although the HTTP/1.1 specification describes a number of optional header fields that provide similar information. See, for example, Towards the Anti-Mac or The Anti-Mac Interface for some discussion of this problem.

What is really needed is an infrastructure for more closely defining the content of a 'file' in both machine-readable and human-understandable terms. At the very least, there is the concept of mime-types. Web-page designers can use the <meta> tags to define some additional info about an object. With the growth of popularity of XML, there is some hope that the XML DTD's can be used to understand the type of object. There is the semi-forgotten, semi-ignored concept of 'object naming' and 'object trading brokers' as defined by CORBA, which attempt to match object requests to any object that might fill that request, rather than to an individually named object. Finally, there are sporadic attempts to classify content: LSM's used by metalab.unc.edu, RPM Spec files used by rufus.w3.org, deb's used by the Debian distribution. MP3's have an extremely poor content description mechanism: one can store the name of the artist, the title, the year and the genre. But these are isolated examples with no unifying structure.

Unfortunately, Gelernter is right: there is no all-encompassing object description framework or proposal in existence that can fill these needs. We need something more than a mime-type, and something less than a free-text search engine, to help describe and locate an object. The system must be simple enough to use everywhere: one might desire to build it into the filesystem, in the same way that 'owner' and 'modification date' are file attributes. It will have to become a part of the 'finder', such as the Apple Macintosh Finder or Nautilus, the Eazel finder. It must be general enough to describe non-ASCII files, so that search engines (such as google) could perform intelligent searches for binary content. Today, google cannot classify nor return content based on LSM's, RPM's, deb's, or the MP3 artist/title/genre fields.

distributed.net and SETI@home
distributed.net runs a distributed RC-64 cracking / Golumb Ruler effort. Seti@Home runs a distributed search of radio telescope data for interesting sources of extraterrestrial electromagnetic data. Both of these efforts are quite popular with the general public: they have built specialized clients/screen-savers that have chewed through a quadrillion trillion cpu cycles. Anyone who is happy running a distributed.net client, or a seti@home client might be happy running a generic client for performing massively parallel computations. Why limit ourselves to SETI and cypher cracking? Any problem that requires lots of cpu cycles to solve could, in theory, benefit from this kind of distributed computing. These high-cpu-usage problems need not be scientific in nature. A good example of a non-science high-cpu-cycle application is the animation/special effects rendering needed for hollywood movies. The problem may not even be commercial or require that many cpu cycles: Distributed gaming servers, whether role-playing games, shoot-em-ups, or civilization/war games currently require dedicated servers with good bandwidth connections, administered by knowledgable sysadmins.

The gotcha is that there is currently no distributed computing client that is is 'foolproof': providing generic services, easy to install and operate, hard for a cracker/hacker to subvert. There are no easy programming API's. Commercial startups Popular Power and Process Tree Network offer money for distributed cpu cycles. A criticism of these projects might be that they are centered on large, paying projects: thus, there is no obvious way for smaller projects or individuals to participate. In particular, I might have some application that only needed hundreds of computers for a week, not tens of thousands for a year. Can I, as a small individual, get access to the system? This is important: the massive surge of the popularity of the Internet/www was precisely that it gave "power to the people": individual webmasters could publish whatever they wanted. There was no centralized authority, there was rather a loose confederation. It seems to me that the success of distributed computing also depends on a means of not just delegating rights and authorities, but bringing them to the community for general use and misuse.

Cosm provides a programming API that aims to meet the requirements of distributed computing. It is currently ham-strung over licensing issues. The current license makes commercial and non-commercial use difficult and impossible by requiring the 'data' and 'results' to be published, as well as the 'source code' used in the project. Many users will find these impractical to live up to. My recommendation? GPL it!

Other clients:

ERights and Sandbox Applets
Java still seems to be a technology waiting to fulfill its promise. However, it (and a number of other interpreters) do have one tantalizing concept built in: the sandbox, the chroot jail, the honeypot. Run an unsafe program in the chrooted jail, and we pretty much don't care what the program does, as long as we bothered to put some caps on its CPU and Disk usage. Let it go berzerk. But unfortunately, the chroot jail is a sysadmin concept that takes brains and effort to install. Its not something that your average Red Hat or Debian install script sets up. Hell, we have to chroot named and httpd and dnetc and so on by hand. We are still a long ways off from being able to publish a storage and cpu-cycle playground on our personal computers that others could make use of as they wished. It is not until these sorts of trust and erights systems are set up that the kind of computing that Gelernter talks about is possible.

References:

Streaming Media & Broadcast: Bandwidth Matters
The naivest promise of 'digital convergence' is that soon, you'll watch TV on your computer. Or something like that. There are a multitude of blockers for the roll-out of these kinds of services, and one of them is bandwidth strain put on the broadcaster and the intervening internet backbone. The traditional proposed solution for this problem is MBONE, but MBONE has yet to see widespread deployment.

Alternative systems, such as Swarmcast, are being developed to solve this type of problem with a peer-to-peer infrastructure. The basic idea is that if some local client is receiving the same data, then it can rebroadcast the data to another nearby peer. Note, however, that the benefits of Swarmcast would be quickly diminished if e.g. freenet nodes were widely deployed, and the publication of a file was made through freenet. Essentially all of the interesting properties of Swarmcast are already embodied in distributed file systems. The short-term commercial advantage may be that Swarmcast gets more widely deployed than freenet. However, this advantage, if it indeed exists, is short-term, and might be quickly erased.

I am not yet aware of any generally available streaming-media reflectors, other than on those based on MBONE.

The Internet for the Rest of Us
To understand the future, it is sometimes useful to look at the past. Remember UUCP? It used to tie the Unix world together, as did BITNET for the VAx's and Cray's, or the VM network for mainframes. They were all obsoleted by the IP protocols of the Internet. But for a long time, they lived side-by-side, even attached to the Internet through gateways. The ideas that powered these networks were subsumed into, became a part of the Internet: The King is Dead, Long Live the King!. The spread of the types of technologies that Gelernter talks about will be evolutionary, not revolutionary.

Similarly, remember 'The Computer for the Rest of Us'? Well, before the web exploded, Marc Andressen used to talk about 'The Internet for the Rest of Us'. Clearly, some GUI slapped on the Internet would make it far more palatable, as opposed to the 'command-line' of telnet and ftp. But a web browser is not just a pretty GUI slapped on telnet or ftp, and if it had been, the WWW still wouldn't exist (what happened to 'gopher'? Simple: no pictures, no 'home pages'). The success of the WWW needed a new, simple, easy technology: HTTP and hyperlinks, to make it go. The original HTTP and HTML were dirt-simple, and that was half the power of the early Internet. Without this simplicity and ease of use, the net wouldn't have happened.

What about 'the rest of us'? It wasn't just technology that made the Internet explode, it was what the technology could do. It allowed (almost) anyone to publish anything at a tiny fraction of the cost of traditional print/radio/TV publishing. It gave power to the people. It was a fundamentally democratic movement that was inclusive, that allowed anyone to participate, not just the rich, or the members of media empires. In a bizarrely different way, it is these same forces that power Napster: even if the music publishing industry hadn't fallen asleep at the wheel, it is the democratization that drives Napster. Rather than listening to what the music industry wants me to listen to, I can finally listen to what I want to listen to. At long last, I am able to match artist to the artists work, rather than listening to the radio and scratching my head 'gee I liked that song, but what the hell was the name of the artist?' Before Napster, if I didn't know what music CD to buy, even when I wanted to. I wasn't hip enough to have freinds who new the names of the cool bands, the CD's that were worth buying. Now, finally, I know the names of the bands that I like. Napster gives control back to the man in the street.

Similarly, the final distributed storage/computation infrastructure will have to address similar populist goals: it must be inclusive, not exclusive. Everyone must be able to participate. It must be for 'the rest of us'.

Commercialization
Like the early days of the net, the work of volunteers drove the phenomenon. Only later did it become commercialized. Unlike then, we currently have a Free Software community that is quite conscious of its own existence. Its a more powerful force. Once the basic infrastructure gets built, large companies will come to make use of and control that infrastructure. But meanwhile, we, as engineers, can build it.

I guess the upshot of this little diatribe is that Gelernter talks about his changes in a revolutionary manner, leading us to believe that the very concept of an operating system will have to be re-invented. He is wrong. The very concept of an operating system *will* be reinvented, someday. In the meanwhile, we have a perfectly evolutionary path from here to there, based not only on present technologies and concepts, but, furthermore, based on the principles of free software.

Critique

A discussion of some of the problems and Achilles' heels of the current software.
Revision Control
As an author, I would like to publish articles on Freenet, but, due to its very design, Freenet has trouble with revision control. That is, if I wrote an article yesterday, and I want to change it today, I can't. For if I could, then yesterday's article could be censored: the thought police could be standing behind my back, making sure that I undo yesterdays crime. None-the-less, there needs to be some sort of system for indicating that 'this document obsoletes the previous one'.

(N.B. These remarks are a bit off-base. Freenet now includes a date-based versioning scheme.)

Lack of True Eternity
Most of the file systems lack true eternity: if an unpopular document is posted, then there is a possibility that it might vanish/get erased because no one has accessed it for a while. This is a problem, especially when the published document was meant to be archival. This is particularly troublesome for library systems, where the archival really needs to be permanent, no matter how unpopular or uninteresting the content is to the current generation.

Lack of a Means to Verify Authenticity and Integrity
Many distributed file systems focus on anonimity and repudiability. But in fact, many publishing needs require the opposite: the ability to authenticate the document, to know that it's author is who she says she is, and know that the document hasn't been tampered with. This sort of security is needed not only for legal documents, but also for the publication of laws by governments, for medical records and texts, insurance records, bank records, nuclear power station and aircraft design and maintenance records, and so on.

Private Eternity sks 'Data Backup'
If you've ever bought a tape drive to back up your data, my sympathies. One immediate and highly-practical, non-political application of eternity is to run it on a private LAN, as a means of avoiding backups while also providing distributed file service. Most small offices have a number of machines, with a fair amount of unused disk space. Instead of backing up to tape, it could be considerably more convenient to backup to the pool of storage on the local net. Think of it as LAN-based RAID: any one disk fails, no problem: the service can reconstruct files. CEO's hard drive fails? No problem: CEO's info was encrypted anyway, & sysadmin can reconstruct without need of the password.

Spam
Any free service that defends against censorship will also have a problem with spam. Malicious users could flood the system with garbage bytes. Freenet can somewhat defend against this: old, unwanted files get eventually deleted. Also, anyone who tries to make their spam 'popular' by accessing it with a robot will only succeed in getting it to migrate to servers near them (although this could be a kind of DoS.) DDoS might be harder to defend against. The price that Freenet pays for this is a lack of eternity. A true free eternity service may be far more vulnerable to spam.

Resources

Misc links: In alphabetical order.
BlueSky
collection of references and mailing list for Global-Scale Distributed Storage Systems. This is where a lot of important people hang out.

DLM Distributed Lock Manager
DLM implements a set of VX-cluster style locks for a Linux cluster. Basically, this is a traditional cluster computing lock manager.

Eternal Resource Locator
Eternal Resource Locator. A journal article describing research in a system that can authenticate the authenticity and integrity of documents, based on a trust network of publishers, editors and writers. Existing system is deployed to publish medical records and texts using standard internet technologies.

The Eternity Service
The Eternity Service. The original paper. one of teh earliest on the topic. See also Anderson's notes on the topic.

FreeHaven
FreeHaven Distributed file sharing system. Characteristics: Anonymity for publishers and readers. Published documents are persistant, but the legnth persistance is determined by the publisher. To avoid abuse, the system hopes to add accountability and reputation features. No code available yet.

Freenet
Freenet. Distributed file sharing system. Characteristics: Efficient: distributed storage; retreival time scales as log(n) of network size. Robust: hard to attack/destroy, since due to global decentralization, and encrypted, anonymized contents on local nodes. Secure: Malicious tamering and counterfeiting protect against by cryptography. Uncensorable. Private: identity of readers and publishers are cryptographically anonymous and can't be forcibly discovered (e.g. via court order). Lossy: Files that are infrequently accessed become 'lost' from the system; therefore Freenet is a not a good way to archive rare/obscure materials. On the obverse, spam inserted into Freenet should wither away into oblivion. Lacks ability to get a global count of the number of file downloads. Java, Unix and Windows, also alpha C++ server.

Genny
GnutellaDev hosts the GPL'ed Python implementation of the Gnutella Genny protocol. Its a broadcast/reply based network built on UDP.

InterMezzo
Distributed file system, more along traditional lines.

Mojo Nation
Mojo nation. Includes support for authentication and encryption. Level of access control uncertain. Has basic transaction support, not clear isf the transaction support is general, and what sort of distributed locks are available. technical documentation

OceanStore
OceanStore Distributed File Sharing System. Aims at scalability, durability through promiscuous caching. Uses cryptographic protections to limit damage (e.g. censorship) by local node operators. Includes fault tolerant commit protocol to provide strong consistency across replicas. Includes versioning system to allow version control. Provides an adaptive mechanism to protect against denial-of-service attacks, regional outages. Some components implemented, a prototype in progress. No source-code publically available.

OpenNap
OpenNap Open Source Napster protocol implementation. Server only. This web page lists a large number of free napster clients. Also lists a large number (twenty eight) of free and proprietary file-sharing protocols.

Prague Eternity Service
Prague Eternity Service Distributed File Sharing System. From the Charles University in Prague. Does not seem to be an active, ongoing project. Last activity dates to May 1999. Of historical interest. Source code available, C++. License unclear.

Publius
Publius Distributed File Sharing System. Meant to be censorhip-resistant, and to provide anonymity for publishers. Limited to short documents (in an effort to limit mp3 abuse). The theory is interesting, the implementation leaves something to be desired. There are no clients at the moment, so all data access must happen through proxies. (Thus, there is no anonymity provided to readers.) Using a proxy requires a web browser to be reconfigured, and this interfers with normal web browsing. Servers & proxies are Unix only, Perl. GPL'ed.

Rewebber
Rewebber is an eternity service

SFS
SAN File System, a file system for storage area networks.

SFS
Secure File System Cryptographic file system built on top of UFO, a VFS-like user-space file-system that support ftp, http, etc. Smart-card authentication.

SFS
Self-certifying File System is a secure, global file system with completely decentralized control. Darpa-funded research. GPL'ed. Works and is usable, but subject to incomaptible changes in protocol.

SFS
Symptomatic File System a proposal to build a robust distributed file system. Of historical interest.

Swarmcast
Swarmcast. Broadcast Reflector. Meant only for improving file download performance by locally caching the downloaded document and redeistributing it to other clients interested in downloading the same. It has an immediate economic advantage for operators of popular high-bandwidth websites that pay large monthly internet bills. Not meant for generic distributed file storage. Does not currently have data streaming capabilities. Java client. Open Source.

TAZ
TAZ is an eternity service

USENET Eternity
USENET Eternity Distributed File System. Makes use of USENET for document distribution. Includes mechanism for indexing. The system seems crude, primitive, a quick hack that is up and running, as opposed to a belabored, from-scratch implementation. Doesn't seem to be developing 'critical mass'. It is built on a set of interesting observations: that the USENET document distribution service is hard to censor, and that USENET itself is an old and widely deployed system run by most internet service providers.

Xenoservers
Xenoservers are servers that can safely execute untrusted applications from guests, in exchange for money. Journal article. Implementation being created, not publically available.

Characteristics Cross-Matrix

A crude attempt to cross-index system characteristics against iimpelementations.
General-purpose Distributed File System.
Freenet: yes
Oceanstore: yes
Provides anonymity protections to publishers
Provides censorship protections
Freenet: yes
Oceanstore: yes
Publius: yes
Provides true permanent storage repository
Freehaven: yes
Freenet: no
Allows execution of binaries
Freenet: no
Versioning/Version Control
Freenet: no (sort of? allows publication dates)
Oceanstore: yes

July 2000, Linas Vepstas [email protected]
Updated June 2001
Copyleft (c) 2000,2001 Linas Vepstas
All Rights Reserved