0. Introduction
The key to the peer production of a product or service resides in its' modularity - can it be broken down into components? - and granularity - can it benefit incrementally from even marginal contributions? Data-packet distribution epitomises the modular form, renders distribution of data amenable to peer production and requires only minimal human input. Flat rate broadband connections, and the increasing amounts of memory packaged with commodity workstations, have provided users with an unused surplus of both resources.
Users have capitalised on these conditions and the emergence of distributed file sharing software to exchange media files and in so doing have challenged the structure of an industry founded on selling cultural commodities in containers such as CDs, video cassettes and records. Since the fight of the film industry against the video recorder in the 1970s, media companies been on the alert to the potential destabilising effects of new technologies on their industries and through lobbying and litigation have fought to retain control over their products reproduction and dissemination. The landmark Napster litigation of 2000 underscored this determination in the context of the internet. As explained below these attempts have been in vain but not without effect. In short, this chapter describes the way in which legal rules have influenced technological protocols.
The allocation to Napster of liability for its users actions foreclosed the potential of peer to peer development in a centrally-coordinated direction and imposed the need on developers to use methods that would allow them space for plausible deniability. The results have been contradictory. In the first place the media organisations have not achieved their aim, in fact the file-swapping community is flourishing and counts five million participants at any given time on the best known and most public networks alone. These users are in possession of an enormous amount of music, movies and games that cannot be recaptured irrespective of the perfection of any speculative 'blue-sky' copyright protection technology. Secondly, it has imposed a cost on users (and developers) in forcing them to use search and retrieval software which makes suboptimal use of bandwidth. The legal code ha been exploited to reduce liability, but this reduction comes at the price of operational inefficiency. Paradoxically, this threat is now the innnovative imperative behind a generation of peer to peer technologies that are more truely distributed in design than would likely have been the case if Napster had been able to continue without legal impediment (0).
Having failed in their direct attacks against the technology, content owners have sought to bolt the doors at other points along the delivery conduit, pressurising internet service providers, corporate and university administrators of wide area networks in the hope of enlisting them to police their users. These same parties suffer from the inefficient techniques employed to circumvernt legal liability in a tangible way as it increases the bandwidth requirements of their users and places strain upon their resources. As a result ISPs are imposing transfer caps, or reducing transfer speeds (to achieve the same end), whilst Universities and businnesses are banning use of the software. Such prohibitions in turn breed the development of tools designed tonoutfox network administrators' attempts to impose control, etc. As much of the bandwidth consumed consists of network 'chatter' application developers need to alter their proocols so as to be more sparing and minimise unnecessary wide area network traffic.
File-shareers have assempled a distributed storage and delivery network rather than a mechanism for one-off exchanges. The attention focussed on the contested legality of file-sharing, has obscured other applications such as the virtualisation of storage space across networks (already a significant industry) and bandwidth pooling for the dissemination of content outside the ownership and control of the communications conglomerates.
Content Distribution Networks (CDNs) facilitate the storage, retrieval and dissemination of information. Companies such as Akamai and Digital Harbour have substantial markets based on proprietary models at a global network level. But similar functionality can be delivered by networks of users. Napster represented the first exploitation of this potential and subsequent generations of file-sharing technology have made important progress in increasing the robustness and efficiency of these networks.
(1) Content Distribution Networks- Unused Bandwidth
Network Congestion
The slow roll-out of broadband connections to home users has concentrated much attention on the problem of the so-called 'last mile' in terms of connectivity. The connection between the user and their ISP, however, is but one of four variables deciding the rate at which we access data. Problems of capacity exist at multiple other points in the network, and as high speed lines permeate the 'consumer' population these other bottlenecks will become more apparent (30).
If the desired information is stored at a central server the first shackle on speed is the nature of the connection between that server and the internet backbone. Inadequate bandwidth or attempts to access by an unexpected number of clients making simultaneous requests will handicap transfer rates (the so-called 'slash-dot effect)'. This factor is known as the 'first mile' problem.
In order to reach its destination the data must flow across several networks which are connected on the basis of 'peering' arrangements between networks, implemented at router interfaces (37). Link capacity tends to be underprovided relative to traffic leading to router queuing delays. ISPs make decisions to peer based upon the economic cost of handling and resolving traffic, avoiding arrangements that would be disadvantageous.
The third locus of congestion is the internet backbone, through which almost all traffic currently passes at some point, and whose backbones capacity is a function of its cables and routers where a large divergence between rate of traffic growth and advances in router hardware and software exacerbates the problem. As more data intensive trasfers proliferate this discrepancy between demand and capacity is further accentuated leading to delays. Only after negotiating these three congestion points do we arrive at delay imposed at the last mile.
Current Solutions
Centralised systems such as Akamai employ sophisticated load balancing/routing algorithms in order to manage traffic efficiently, spreading response to requests over multiple servers.Load balancing tools come as both hardware (intelligent switches, traffic distributors) and software. ‘Akamai free-flow’ for example uses a mix of hardware and algorithms plus a mapping server (that cheks latency and chooses the connection requiring the smallest number of hops) and a content server (31).
Other techniques employed are network address translation (NAT) and caching. Destination NAT can be used to redirect connections pointed at some server to randomly chosen servers to do load balancing. Transparent proxying: NAT can be used to redirect HTTP connections targeted at Internet to a special HTTP proxy which will be able to cache content and filter requests. This technique is used by some ISPs to reduce bandwidth usage without requiring their clients to configure their browser for proxy support. Caching servers intercept requests for data, checks to see whether it is present locally. If it is not, then the caching server forwards the request to the source, and passes it back to the requester having made a copy so as to serve the next query more quickly. Local Internet storage caching is less expensive than network retransmission and is becoming increasingly attractive as the price differential widens. Especially where remotely located data international data is concerned caching is a attractive alternative. (Internet 3.0) at 83 Data providers are increasingly opting to use specialist services such as Akamai to overcome these problems and optimise delivery (32).
Virtualisation
Peer systems are typically established at an application level, employ their own routing mechanisms and are either independent of,or only ephemerally dependent on, dedicated servers. Unable to guarantee the presence of any specific peer a given time, virtual CDNs function by aggregating enough nodes to enjoy redundancy. Thus the absence or departure of any given peer does not disable the functioning of the network as a whole. Individual hosts are treated as unreliable and are made subject to easy substitution. Technically the challenge is how to finesse the transfer to replacement nodes (sometimes referred to as the problem of the 'transient web'). The volume of communication required to enable transfer must not be such as to consume the majority of the bandwidth available.
Secondly, quality of service achieved will depend on an efficient allocation of the underlying resources available. This means distributing the data load across the available machines and where possible carrying out transfers within local networks. The latter serves both to increase the speed of the transfer and to save the host ISP money. In short the effectiveness of any given peer network will be determined by: a) Connectivity Structure and b) The efficiency with which it utilises the physical topography of the network, whose resources are limited.
(2) Content Pooling
Popular file-sharing utilities arose to satisfy a more worldly demand than the need to ameliorate infrastructural shortfalls: the desire to share and acquire free music. Music industry reluctance to make their content available, and the complex licensing issues implicit in building an attractive inventory, created the necessary conditions for an illicit network to emerge (33). In the week to September 29, there were 5,549,154 downloads of Kazaa alone. In order to understand the explosive growth of the file-sharing phenomena and the technical developments that have allowed it to maintain traction whilst improving performance, we must review a little history and examine some individual technical innovations. Meanwhile content owners have employed an array of strategies to try to deter users from participation that have required both social and technical responses.
2.1 Napster
Shaun Fanning released his Napster client with the intention of allowing end-users to share MP3 files by providing a centralised index of all songs available on the network at a given moment and the ability for users to interconnect directly to transfer the desired file. Thus Napster controlled the gate to the inventory but was not burdened with execution of the actual file delivery that occurred over HTTP (insert note on the speculative valuation of the system provided by financial analysts, with qualification). Popular file sharing utilities enable content pooling. Centralised directory look-up made Napster the subject of legal action, injunction and ultimately decline. Other successful clients have followed in Napster's wake and met a similar fate, such as Audiogalaxy.
Nonetheless, Napser's legal woes generated such publicity as to encourage user adoption and to attract imitators and competitors to the market, bringing with them another wave of innovation.
Gnutella
Justin Frankel and Tom Pepper developed Gnutella while working for Nullsoft, which was taken over by AOL in 2000. That March the program was made available for download on Nullsoft's servers with the promise that the code would soon be available under a GPL license. AOL quickly removed the program but was too late, Gnutella had been born. (34)
The network consists of scores of clients - such as Gnucelus, Bearshare, Limewire - sharing a common open protocol, and was initially entirely decentralised, each node performing as a servent (that is both Server and Client). Thus connectivity is not dependent on the legal resistance of any single operator. This robustness however is traded off against performance inefficiency in the locating of files.
Despite the demise of Audio Galaxy and the arrival of Morpheus on the Gnutella network, the number of users has been falling quite consistently. Recent snapshots show a total of 120 - 140,000 simultaneous users. A userbase of this size obviously contains a huge amount of desirable cultural works, but is small in comparison with the Fast Track network of 2, 900,000 users.Apparent incompatibilities between Gnutella supernodes/ultrapeers have not made things any easier.
Fast Track/Kazaa
The FastTrack protocol resembles Gnutella with addition of supernodes, which index files available in a network cluster, and super-superpeers. The result is a more comprehensive search with less traffic overhead. In Supernode mode a list of files is automatically uploaded from others in your network neighbourhood, where possible using the same Internet Service provider. FastTrack clients Grokster and Kazaa now constitute the biggest single network, with an average of almost three million simultaneous users online at any given time. Given the impossibility of searching the entire Gnutella rhizome, and the discrepancy in users, it is little surprise that Fast Track should be the preferred choice.
Edonkey/Overnet
Overnet functions on the basis of the Kademlia (2) algorithm to identify and organise the reassembly of files using a distributed hash table. This hash indicates which nodes are likely to possess the desired data, as opposed to randomised searches a la Gnutella, reducing search traffic overhead. Hashing also provides a method of checking content integrity. In addition eDonkey employs multicast (as do most clients now) so that when a file is requested modules are simultaneously downloaded from as many peers as possible, but eDonkey allows the sharing of partial downloads from other peers as part of the pool from the download is sourced, dramatically improving availability. This innovation is now being implemented by other clients also. In 2002 an open source edonkey client named Emule appeared with a more intuitive Grapghical unser Inderface (GUI) which has quickly become extremely popular.
Freenet
Freenet nodes retain information about their neighbours and allow it to make queries in a more targeted manner than the Gnutella network. Given the presence of more information at query routing level less bandwidth is spent on redundant simultaneous searches. In addition, copies of documents requested are deposited at each hop in the return route to the requestor. "Each node maintains its own local datastore which it makes available to the network for reading and writing, as well as a dynamic routing table containing addresses of other nodes and the keys that they are thought to hold."(1) Freenet effectively reproduces the caching mechanism of the web at a peer to peer level so as to respond to the actual demand on the network. If there are no further requests for the document, it will eventually be replaced by other transient data. All locally stored data is encrypted and sourced through hash tables. Any node will maintain knowledge of is own hash tables and those of several other nodes.
Having overcome the need for scattershot searches Freenet theoretically manages bandwidth resources in a much more efficient manner than Gnutella. On the other hand, the importance allocated to maintaining anonymity through encryption detracts from its potential to become a mass installation file-sharing program. Specifically, in order to cloak the identity of the requestor the file is conveyed backwards through the same nodes that resolved the query, utilizing bandwidth unnecessarily in transit. In April 2001 Freenet's inventor, Ian Clarke, founded a company called Uprizer that is porting the Freenet design concept into the commercial arena, whilst jettisoning the privacy/anonymity aspect.(29)
2.2 Connectivity Structure and Searching
In everyday life our knowledge of what is available and how to acquire it is a function of information which we receive from friends, advertisements on media from billboards to televisions and our familiarity with retail spaces where these things are sold. Cultural commodities are produced in limited runs which once exhausted are only repeated if the owner believes there is adequate demand to make it profitable. Thus for rare items, we know that we must go to second-hand shops, specialised dealers or catalgue services that can help us to track down the item.
In the digital universe things are rather different. The universe of choice is incommensurably larger - nothing digital is ever sold out or really unavailable. Advice from friends and our trusted acquaintances (the music guru in school, the film buff) assume even greater meaning. Advertising channels remain influential and to some extent informative, but reveal themselves more transparently as attempts to build trends and popularity. Forums, chat channels and dedicated sites become hubs of both information and communities of interest based on the interactive exchange of views and tastes. This dynamic of personal recommendation has great reach due to the immediately accessible nature of the referents on the web.
Beyond the interesting new social dynamics introduced by the ease of digital communication, the software itself also endows the user with necessary tools, most obviously the search engines built into the GUI of the client. Searching for a song over the network can be a truely revealing process, as one may discover that the artist associated with it is only one of several to have recorded the song - the results often underline the extent to which versioning is a core characteristic of music culture, exposing one to new artists. Once the desired song has been located, one can also nominate to search for songs by the same artist, from the same album, or from the same user. This last option is particularly amazing, as users literally bare to one another an identikit of their musical/cinematic tastes. Concretely this may mean that having found a track by an obscure hungarian gypsy band, I see that the users from whom I am about to download it also has a wide selection of bands from the Balknas whom I have never heard of but might want to try. In this way, beyond being a means for acquiring cultural works at little or no expense, the p2p process funtions to produce preferences in a new way.
Milligram
A 1967 experiment by Stanley Milligram on the structure of social networks yielded surprising results. A random sample of 160 people in the US Mid-West were asked to convey a letter to a stockbroker in Boston using only intermediairies known on a first name basis. 42 of the letters arrived in a media of 5.5 hops, fully one third of the successfully delivered letters passed through the same shopkeeper. The evidence drawn from this experiment was that whilst most people's social networks are narrow and incestuous eacth group contains individuals who act as spokes to other groups. The conclusion drawn were dubbed the 'small world effect' for obvious reasons.
Centralised look up.
The most efficient way to search decentralised and transient content is through a centralised directory. Napster functioned in this way and economised on bandwidth whilst providining a comprehensive directory of works available. Alas it also left them vulnerable to litigation and it is safe to say that any p2p company providing such a service will have the same fate.
Broadcast.
In this case queries are sent to all nodes connected to the requestor. The queries are then forward to nodes connected to those nodes. This leads to massive volume, often sufficient to saturate a dial-up connection. It is also extremely inefficient as the search continues even after a successful solution has been achieved. Most searches have a 'time to live' to limit the extent of the search, and where there are many weaker links the search can die without ever reaching large parts or even the majority of the network. This is the search method initially used by Gnutella. The consumption of bandwidththis entailed effectively obliged the creation of super-peers to centralise knowledge about their local networks so as to be able to scale. Such a step deviates from the pure p2p model offering potentially attractive litigation targets. Traffic volume derived from search requests has become a significant problem (18). Searching based on this model will never be comprehensive, limited to the neighbourhood in the hops-to-live radius
Small World/Power-law
Milligrams experiment contains important indicators for designing efficient search algorithms, as it points to the benefits of exploiting the small number of hubs with a high number of links as opposed to the simple broadcast search model. In the context of file sharing devices this means that each query proceeds to one node at a time, each of which holds information about other nodes in its locality, rather than multiple simultaneous searches, resulting in significant savings in bandwidth traffic. This is the form of search algorithm integrated into Freenet.
2.3 Physical Infrastructure
P2P providers are precluded from disributing network traffic through use of mapping servers and real time data, never mind providing cache copies of frequently requested files, due to fears of litigation. Denied the ability to centrally co-ordinae, they have had to develop other means to manage the load.
Supernodes
Gnutella networks searches are carried out on what is called a broadcast model. Practically this means that the request is passed by the node to all the nodes to which it is connected, which in turn is forwarded to other nodes etc. The responses of each node consume bandwidth and thus must be minimised, particularly where many nodes are (a) operating on a low bandwidth connection and of limited utility for provisioning and (b) not sharing significant amounts of data. Limit searches to 'superpeers' reduces the amount of 'chatter' and ensures that queries go to nodes that have enough resources to function efficiently and act as ephemeral directories for smaller nodes in their vicinity.
Multicast/Swarmed Downloads (Bearshare, Limewire, Sharereaza)
File transfer between two peers fails to maximise bandwidth efficiency due to the congestion problems outlined at the beginning of the chapter. Thus where the file is available from multiple sources it will be different components will be downloaded simultaneously so as to minimise the total time of completion.
Granularity: Partial Download Sharing
Where large files are being shared amongst a large number of users, transfer capacity is limited to those already having the entire file in their shared folder. As this can take over a week to accomplish, this injects a high quotient of unfulfilled demand. Under the MFTP protocol which forms the basis for Edonkey/Overnet other clients can initiate downloading from a partial download on the disk of another peer. The Edonkey network allows users to transfer partial files from other peers, with the consequence that even where no peer may have the whole file at a given moment, all the constitiutive parts can be available and allow a successful transfer to take place. (Ares, Sharereaza and Gnucleus enable PSF as well.
In terms of the granularity of the distribution function, partial downloads represents a breakthrough, facilitating the participation of even low bandwidth participants who may nonetheless be in possession of part or all of a particularly rare work. PSF effectively constitutes a particularly useful form of multicast. This is especially so where very large files are concerned as the inconvenience of absorbing large amounts of storage space incites many users to transfer them to CDs liberating space on their hard disk.
3. Defection/threats to the networks
3.1 Free riding and Collecive Action Mechanisms
A September 2000 paper by scientists at Parc Xerox (3) examined the nature of participation in the Gnutella network to see whether it was falling prey to a 'tragedy of the commons'. This thesis, originally developed by Gareth Hardin in the 1960s, postulates that in the absence of a central coordinating mechanism, common resources will be spoilt by self-serving behaviour by participants. In the context of file-sharing such behaviour would be revealed by a refusal to share content or to provide bandwidth.
"Since files on Gnutella are treated like a public good and the users are not charged in proportion to their use, it appears rational for people to download music files without contributing by making their own files accessible to other users. Because every individual can reason this way and free ride on the efforts of others, the whole system's performance can degrade considerably, which makes everyone worse off - the tragedy of the digital commons."
They found that 73% of users shared ten files or less whilst 10% provide 87% of all files available. Furthermore the top 25% of nodes responded to 98% of queries, the top 1% accounting for 47% of same. An analysis of the queries made revealed that:
"The top 1 percent of those queries accounted for 37% of the total queries on the Gnutella network. The top 25 percent account for over 75% of the total queries. In reality these values are even higher due to the equivalence of queries ("britney spears" vs. "spears britney")."
Default firewalls on Windows XP renders large numbers of files previously available inaccessible and whilst push commands allow the file transfer to occur providing one of the oarties is not behind a firewall but cannot help when this is the case for both. Secondary connections made across network address translations have the same problem. In each of these cases the freeriding is not deliberate and can be partially addressed through raising awareness of the problem.
As each provider can only satisfy a limited number of connections the refusal of other participants to handle part of the load means that some requests will go unsatisfied. Non-participation by some also increases the strain on the resources of those who do contribute. In addition the addition of useless nodes to the network absorbs broadcast-search traffic. On Gnutella each search is given a time to live of five hops, so that any nodes beyond that disappear over the search horizon and will not respond.
Responses
The threat suggested by this analysis of Gnutella is equally valid for other file-sharing systems, and several have introduced mechanisms to attenuate the level of 'leeching' (free riding) (9). In the cases of LimeWire these efforts have meant download limits based upon the number of files shared by the user and an organisation of the network around tiers related to the amount of resources brought to the system determined by number of shared files, bandwidth etc. (8). Edonkey/Overnet sets a download speed fixed as a ratio proportional to the amount of upload bandwidth made available. FreeNet forces caching of downloaded files in various hosts. This allows for replication of data in the network forcing those who are on the network to help provide shared files that are in demand..
Projects such as Mojo Nation (10) have attempted to deal with the resource allocation issue by incorporating systems of virtual credits for services provided by other nodes, creating a market in computing resources. Essentially this transforms a public good into a private net through installing a market architecture. Others such as eMule simply retain information about other hosts and award them reputational points (20). A better reputation entitles the user to prioritisation in terms of resource allocation (6). Applications also allow users to set a limit on the amount of bandwidth to be dedicated to sharing so as not to obstruct other network activity (7).
From late 2000 there appeared browsers capable of searching the Gnutella network whose users were by definition not contributing either (11) bandwidth or content. Bearshare and Limewire thus implemented a browser blocker. Problems with the selfish behaviour of other clients have now led to a proposal to limit access to the user base of the main clients to those in possession of a certificate of good behaviour.(36)
3.2 Appropriation by proprietary technologies
Napster was a copyrighted work and once it become subject to legal action no further conduits could replace it and capitalise on its installed base. Gnutella is an open protocol shared by multiple applications some of which have opted for GPL development (such as Limewire (out of enlightened self-interest) and Gnucleus (out of far-sightedness and commitment to the free software model)), whereas others have remained proprietary. By and large however, Gnutella developers have tended towards co-operation as evidenced by the Gnutella Developers Forum (21). Their coherence is likely galvanised by the fact they are effectively in competition more with the FastTrack network (Kazaa, Grokster) operating on a strictly proprietary basis.
The hazards entailed with reliancy on a proprietary technology were manifested in March 2002 when changes to the FastTrack protocol were made and the owners refused to provide an updated version top the most popular client, Morpheus, whose users were consequently excluded from the network. (27)
Ironically, Morpheus was able to relaunch within three days by taking recourse to the Gnutella protocol and network, appropriating the code behind the Gnucleus client with only minor cosmetic alterations. Nonetheless, the incident highlights the potential threat where one player (a licesor) can sabotage the other and lock their users (along with their shared content) out of the network. The Gnuclus codebase has now generated twelve clones. Elsewhere Overnet/Edonkey is also proprietary in character but recent months have seen the appearance of an open source donkey client called Mule (17).
3.3 Sabotage: Content Integrity
Commercial operations such as Akamai can guarantee the integrity of the content that they deliver through control and ownership of their distributed network of caching servers. P2P networks have no control over what files are introduced onto users machines and other means have been devloped so as to reduce the threat of useless or dangerous files flooding the network.
Hashing
In June it was announced that a company named Overpeer (26) had been employed by record labels to introduce fake and/or corrupted files into shared networks in the hope of frustrating users and driving them back inside the licit market. This had been suspected by many users and observers for some time and in the aftermath of their confirmation arose the news that at least two other entities - the French company 'Retpan' and 'p2poverflow' - were engaged in similar activity (25). Prior to this however a system of authentication had already been introduced by eDonkey users: unique cryptographic SHA1 hashes, based upon the size of the file, ultimately constituted a reliable identifier. Edonkey users had already established portals for what became known as 'P2P web links', where independent parties would authenticate files and then make their description and hash available through a site dedicated to advertising new releases combined with a searchable database. These sites do not store files themselves, but provide a description of the 'release' together with the necessary hash identifier on their site (15). Search results in Limewire, Bearshare and Sharereaza are now displayed with a 'bitzi' look-up so that the hash can be checked against a database to ensure that it's not a fake or of poor quality.(4)
Fundamentally this threat reintroduces the problem of trust into network communications in a practical way. Whilst the likelihood of virus proliferation may be low, other attempts to sabotage the network are to be expected. Where relatively small files are concerned - and the 3.5 to 5.0 megabyte size typical of a music track at 128 bit-rate encoding is small by today's standards - such antics, whilst inconvenient, are unlikely to prove an efficient deterrent. Given that most files have been made available by multiple users there will always be plenty of authentic copies in circulation. The situation is quite different however relating to the sharing of lagre audiovisual works, whose exchange has grown rapidly in the last years principally due to the penetration of broadband and the emergence of the DivX compression format. These downloads can take a long time nonetheless. Obviously, having waited a week one would be rather irritated to discover that instead of Operation Takedown the 600 megabyte file in fact contained a lengthy denunciation of movie piracy courtesy of the MPAA.
Metadata
Given the enormous and constantly expanding volume of information, it is plain that in order to access and manage it efficiently something broadly equivalent to the Dewey system for library organisation is required. Where metadata protocols are collectively accepted they can significantly increase the efficiency of searches through the removal of ambiguities about the data's nature. Absence of standardised metadata has meant that search engines are incapable of reflecting the depth of the web's contents and cover it only in a partial manner. Fruitful searches require a semantically rich metadata structure producing descriptive conventions but pointing to unique resource identifiers (e.g. URLs). Projects such as Bitzi mentioned above attempt to address this problem in a more systmatic way than the portals dealing in 'hot goods' such as Sharereactor.
4. Economic Aspects: Costs to Users
4.1 Trust/security: Viruses, Trojans, Spyware, Stealware etc
Security and privacy threats constitute other elements deterring participation both for reasons relating to users normative beliefs opposed to surveillance and fear of system penetration by untrustworthy daemons. The revelation that the popular application Kazaa had been packaging a utility for distributed processing known as Brilliant Digital in their installer package re-ignited concerns at the use of file-sharing applications to surreptitiously install resource consuming applications extraneous to those required for file sharing. Although unused thusfar it emerged that there was the potential for it to be activated in the future without the knowledge of the end-user. In addition numerous applications are packaged with intrusive spyware that monitors users activity and transmits data on their habits to advertisers etc. Bundling has also been used to install CPU and bandwidth consuming programs such as Gator, and in other cases - such as that involving Limewire this year - trojans. (5)
In September 2002 it was revealed that some of the best-known clients included bundled software that intrerfered with affiliate referral programs, allowing them to amass commission-based paymets on users shoppiong activity, even where the referral came through another party. (35)
4.2 Cost structure of broadband
Whilst it is obvious why users utilise these tools to extract material, it is not so plain why they should also use them to provide material in turn to others and avoid a tragedy of the commons. Key to the willingness to provide bandwidth has been the availability of cable and DSL lines which provide capacity in excess of most individuals needs at a flat rate cost. There is thus no correlation between the amount of bandwidth used and the price paid, so in brief there is no obvious financial cost to the provider. In areas where there are total transfer caps or use is on a strictly metered basis participation is lower for the same reason.
"From an ISP point of view traffic crossing AS borders is more expensive than local traffic. We found that only 2-5% of Gnutella connections link nodes located within the same AS, although more than 40% of these nodes are located within the top ten ASs. This result indicates that most Gnutella-generated traffic crosses AS borders, thus increasing costs, unnecessarily". (13)
Large amounts of extra-network traffic are expensive for ISPs and consequently and increasing number have been introducing bandwidth caps. One September 2002 report claimed that up to 60% of all network traffic was being consumed by P2P usage. Wide implementation of IP multicast has been suggested a potential remedy to these problems, so that once a piece of content is brought within an ISPs network, it will then be served from within to other clients, and thus reduce unnecessary extra-network traffic. Interestingly, the same report argues that much of the 60% derives from search queries and advertising, the former could probably be much reduced by streamlining the search methods.
4.3 Lost CPU cycles
File sharing clients consume CPU cycles and can retard performance of the machine overall. This will particularly be the case if the machine has been attributed supernode status (14). Users can opt-out from the superpeer selection process. On some clients the built in MP3 players can be as cycle-consuming as the application itself.
Throttling bandwidth leakage
Most file sharing clients allow the user ultimate control over the amount of bandwidth to be dedicated to file transfer, but they diverge in terms of the consequences on the user's own capacity. Thus Edonkey limits download speed by a ration related to one's maximum upload. Limewire on the other hand has a default of 50% bandwidth usage but the user can alter this without any significant effects (so long as the number of transfer slots is modulated accordingly). Gnucleus offers an alternative method in its scheduling option, facilitating connection to the network during defined periods of the day, so that bandwidth is dedicated is to file-sharing outside of hours that it is required for other tasks.
5 Threats against Users
5.1 Prosecution/ISP Account Termination and other Woes
The No Electronic Theft Act was introduced in 1997 at the prompting of the music industry. This law making the copying of more than ten copies of a work or works, having a value in excess of a thousand dollars, a federal crime even in the absence of a motivation of 'financial gain'. In August of 1999 a 22 year old student from Oregon, Jeffrey Gerard Levy, became the first person indicted under the act. Subsequently there have been no prosecutions under that title. In July and August 2002 however the Recording industry Association of America publicly enlisted the support of other copyright owners and allied elected representatives in calling on John Ashcroft to commence prosecutions. As mentioned above in relation to free riding on the Gnutella network, the small number of nodes serving a high percentage of files means that such users could be attractive targets for individual prosecution.
In addition at least two companies have boasted that they are currently engaged in identifying and tracing the IP numbers of file shares (Retpan and 'Ranger') so as to individualise the culprits. Such a draconian option is not without risks for the plaintive music companies, and will certainly produce a user backlash. Currently this IP data is being used however to pressure a more realistic and less sympathetic target, namely the user's Internet Service Provider. ISPs, having financial resources, are more sensitive to the threat of litigation and positioned to take immediate unilateral action against users they feel place them in jeopardy. This has already led to the closure of many accounts, and indeed this is not a novel phenomenon, having commenced in the aftermath of the Napster closure with moves against those running 'OpenNap'.
5.2 Hacking
More recently, and with great public brouhaha, the RIAA and their allies have begun pushing for legislation to allow copyright owners to hack the machines of those they have a reasonable belief are sharing files. Copyright owners argue that this will 'even the playing field' in their battle against music 'pirates', and legislation to this effect was introduced by representative Howard Berman (California) at the end of July 2002 (39). As of this writing the function of this initiative is unclear as a real attempt to pursue this course to its logical conclusion will involve the protagonists in a level of conflict with users which would certainly backfire. The likelihood is that this is another salvo in the content industry's drive to force the universal adoption o a draw technology on hardware manufacturers.
6. Strictly Legal Applications of Current File Sharing Applications
Although litigation constantly focuses attention on the alleged copyright infringing uses of these programs large amounts of a public domain or GPL character are also shared. In addition, I believe that we are now witnessing a wider implementation of these networks for the purpose of bypassing the gate-keeping functions of the existing communications industry. One of the most interesting examples in this regard is that provided by Transmission Films, a Canadian company partnered with Overnet that launched in August 2002, the most advanced iteration of the network that began with eDonkey. TF offer independent films for viewing either by streaming or download, with an option also to purchase the film outright. Digital Rights Management is used otherwise to limit access to a five day period from user activation. Customers pay a set fee in advance and then spend the monies in their account selecting the options that they prefer.(28)
In a similar vein, Altnet/Brilliant Digital -owners of Kazaa- have announced the integration of a micro-payments facility into their client to enable acquisition of DRM protected material on their network, to this end they have made agreements with several independent music labels (16).
7. Conclusion
Whilst the ability to capitalise upon diffuse resources is key to distribution and storage networks, decentralisation also udermies the potential efficiency of these mechanisms. Whereas Akamai can employ mapping servers combined with real time traffic analysis so as to enable optimal connections, file sharing systems are precluded from doing so due to fear of legal retaliation by irate copyright owners. As increasing numbers of ISPs introduce bandwidth caps, it is fundamental that search methods seek to identify the user nearest in network terms to the source of the query so as to minimise external traffic.
"Note that the decentralized nature of pure P2P systems means that these properties are emergent properties, determined by entirely local decisions made by individual resources, based only on local information: we are dealing with a self-organized network of independent entities." (19)
Current generation applications lack the necessary information so as to make efficient routing decisions, and too many continue to use search methods generating huge traffic overhead, or alternatively opt for a superpeer type tiering system that increases the danger of individualisation for prolific users and contributors.
Notwithstanding these obstacles, the sharing of files and bandwidth is here to stay. Although free riding leads to sub-optimal performance, the amount of resources gathered is sufficient so as to benefit from redundancy. Given such a huge number of users the incentives problem appears trivial and is taken care of by the massive library of content available. Overnet demonstrates that the potential distribution these systems could offer users is significant, and one would anticipate the emergence of networks dedicated to serving minority interest content around an existing community, or as the pretext for is evolution, as has been the case with Seti@home and other distributed processing projects.
Notes
(0) As a thought experiment we might consider how different history could have been if Shawn Fanning had contacted one of the music 'majors', taken a patent on his invention (which given the current standards in the USPTO could have been extremely broad) and granted them an exclusive license to its use. Patent infringement being a matter of strict liability, the use of file-sharing tools could have been stymied at its source amongst the developer community.
(1) (Hong, Theodore (et al.) (2001). Freenet: A Distributed Anonymous Information Storage and Retrieval System. In Federrath, H. (ed.) Designing Privacy Enhancing Technologies: International Workshop on Design Issues in Anonymity and Unobservability, LNCS 2009. New York: Springer ;)
(2) P. Maymounkov and D. Mazieres. Kademlia: A peerto -peer information system based on the xor metric. In Proceedings of IPTPS02, Cambridge, USA, March 2002. http://www.cs.rice.edu/Conferences/IPTPS02/.
(3) Freeriding and Gnutella, Eyton Adar & Bernardo Huberman (2000) http://www.firstmonday.dk/issues/issue5_10/adar/index.html
(4) [1] CAW lets you assemble an ad-hoc network of "proxies" that you need not trust to behave properly, because you can neutralize any attempts to misbehave. [Gordon Mohr ocn-dev@open-content.net Tue, 18 Jun 2002 11:11:28 -0700 ] Make it so he can search out the media by the hash and you reduce the trust requirements necessary -- all you need to trust is the hash source, which can come easily over a slower link.
If you right-click a search result, you'll notice a nice option called "Bitzi lookup". Bitzi ; is a centralized database for hashes from all major file sharing networks. If you look up a search result, you might find out that someone else has provided information about whether the file is real, what quality it is etc. This is obviously very valuable and will save you from downloading hoaxes or low quality files. The only annoying part is that the Bitzi pages are cluttered with banners.
(5) What They Know Could Hurt You, Michelle Delio http://www.wired.com/news/privacy/0,1848,49430,00.html (6) "eMule is one of the first file sharing clients to compress all packets in real time, thereby increasing potential transfer speed (I'm not sure whether this works only with other eMule users). It further extends the eDonkey protocol by introducing a very basic reputation system ;: eMule remembers the other nodes it deals with and rewards them with quicker queue advanement if they have sent you files in the past. So far, the eDonkey network has relied on its proprietary nature to enforce uploading: If you change the upload speed in the original client, the download speed is scaled down as well. eMule's reputation feature may make this kind of security by obscurity (that has already been undermined by hacks) unnecessary. "
Morpheus 2.0 will include a rating mechanism that will allow users to identify fake files and a method of certifying users as legitimate users, presumably this will be used also to isolate leechers.
(7) Limewire for example sets a default allocation of 50% of bandwidth.
(8) "Thus, not only are low-bandwidth and other clogged hosts pushed toward the outside of the network, but non-sharing users are as well. " http://www.limewire.com/index.jsp/net_improvements
(9) Upload requirements were a commonplace in the days of Bulletin Board Systems and remain the rule for Wares sites and communities around Hotlines.
(10) Mojo Nation FAQ http://web.archive.org/web/20011020213658/http://mojonation.net/docs/faq...
(11) See for example www.asiayeah.com, www.gnute.com
(12) New threat for the millions of Kazaa users http://www.bitdefender.com/press/ref2706.php (13) Source, The effects of P2P on service provider networks, Sandvine September 2002. The methodology employed by Sandvine in assembling their statistics has been criticised as conflating traditional client server downloads with peer transfers. I has been suggested that were ISPs to charge a flat-rate for internal nework traffic network protocols would be altered to take into account the price differential. (14) For example, a Kazaa supernode will use a max of 10% of total CPU resources. Mldonkey has been known to use early 20% of the CPU resources available.
(15) This phenomenon is extending rapidly but the classical examples remain www.sharereactor.com, www.filenexus.com and www.filedonkey.
(16) Altnet Begins Micro Payments, Wednesday September 18, 2002 http://www.slyck.com/newssep2002/091802c.html
(17) http://www.emule-project.net/ Mldonkey does something similar http://www.infoanarchy.org/story/2002/8/7/45415/23698
(18) The Gnutella client Xolox for example, introducing a re-query option to their search, produced what Wired described as something akin to a low-level denial of service attack .(http://www.salon.com/tech/feature/2002/08/08/gnutella_developers/index.html)
(19) Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System (2002), Matei Ripeanu, Ian Foster, Adriana Iamnitchi http://www.cs.rice.edu/Conferences/IPTPS02/128.pdf
(20) Accountability, in Peer-To-Peer: Harnessing the Power of Disruptive Technologies, Roger Dingledine, Michael Freedman and David Molnar. http://freehaven.net/doc/oreilly/accounability-ch16.html
(21) Gnutella Developers Forum http://groups.yahoo.com/group/the_gdf/
(22) Incentives for Sharing in Peer-to-Peer Networks by P. Golle, K. Leyton-Brown and I. Mironov Submitted to the 2001 ACM Conference on Electronic Commerce http://crypto.stanford.edu/~pgolle/
(23) Theodore Hong, "Performance," in Peer-to-Peer: Harnessing the Power of Disruptive Technologies, edited by Andy Oram. http://www.doc.ic.ac.uk/~twh1/longitude/papers/oreilly.pdf
(24) "Internet Infrastructure & Services Internet 3.0: Distribution and Decentralization Change Everything" http://wpinter2.bearstearns.com/supplychain/infrastructure.htm
(25) A Modified Depensation Model of Peer to Peer Networks: Systemic Catastrophes and other Potential Weaknesses, Andrew H. Chen & Andrew M. Schroeder http://students.washington.edu/achen/papers/p2p-paper.pdf
(26) Music industry swamps swap networks with phoney files, Dawn C. Chmielewski http://www.siliconvalley.com/mld/siliconvalley/news/local/3560365.htm
(27) One reason suggested at the time was that the elimination of morpheus was brought on by the fact that it was the most popular client largely due to the fact that it did not integrate spyware monitoring users activity; their elimination effectively provided the opportunity for their two rivals to divide up their users between them.
(28) http://www.overnet.com/ There are currently no DVD players compatible with this form of DRM.
(29) Other players seeking to apply file-sharing concepts in the commercial environment include Swarmcast and Cloudcast, see also http://www.badblue.com/w020408.htm & http://www.gnumarkets.com/
(30) "Typically, QoS is characterized by packet loss, packet delay, time to first packet (time elapsed between a subscribe request send and the start of stream), and jitter. Jitter is effectively eliminated by a huge client side buffer [SJ95]." Deshpande, Hrishikesh; Bawa, Mayank; Garcia-Molina, Hector, Streaming Live Media over a Peer-to-Peer Network
(31) Cisco (DistributedDirector), GTE Internetworking (which acquired BBN and with it Genuity's Hopscotch), and Resonate (Central Dispatch) have been selling such solutions as installable software or hardware. Digex and GTE Internetworking (Web Advantage) offer hosting that uses intelligent load balancing and routing within a single ISP. These work like Akamai's and Sandpiper's services, but with a narrower focus. http://www.wired.com/wired/archive/7.08/akamai_pr. html
(32) Akamai delivers faster content through a combination of proprietary load balancing and distribution algorithms combined with a network of machines installed across hundreds of networks where frequently requested data will be cached. (11,689 servers across 821 networks in 62 countries). This spread of servers allows the obviation of much congestion as the data is provided from the server cache either on the network itself (bypassing the peering and backbone router problems and mitigating that of the first mile) or the most efficient available network given load balancing requirements.
(33) Of course pirate networks had already been in existence at least since the days of the BBS. In more recent years Warez groups and Hotlines cultures did much to keep the unauthorised sharing culture vibrant.
(34) The event was announced on Slashdot, and thousands downloaded the program that day. The next day, AOL stopped the availability of the program over legal concerns and restrained the Nullsoft division from doing any further work on the project. This did not stop Gnutella; after a few days the protocol had been reverse engineered and compatible open source clones started showing up.(Wikipedia)
(35) New Software Quietly Diverts Sales Commissions, by John Schwartz and Bob Tedeschi, 27 September 2002, http://www.nytimes.com/2002/09/27/technology/27FREE.html
(36) Gnutella bandwidth bandits, by Farhad Manjoo http://www.salon.com/tech/feature/2002/08/08/gnutella_developers/print.html For the GDF proposal see http://groups.yahoo.com/group/the_gdf/message/9123
(37) Peering is the agreement to interconnect and exchange routing information.
(38) Analysis of the Traffic on the Gnutella Network, Kelsey Anderson, http://www.cs.ucsd.edu/classes/wi01/cse222/projects/reports/p2p-2.pdf
(39) Congress to turn hacks into hackers, by Thomas C Greene http://www.theregister.co.uk/content/6/26357.html