(iii)
KeyTerms
- Supernodes
In the Gnutella networks searches are carried out owhat is called a broadcast model. Practically this means that the request is passed by the node to all the nodes to which it is connected, which in turn is forwarded to other nodes etc. The respinses of each node consume bandwidth and thus must be minimised, particluarly where many nodes are (a) operating on a low bandwidth connection and of limited utility for provisioning and (b) not sharing significant amounts of data. To overcome this problem, gnutella clients know limit their requests to 'superpeers' that have enough network respources to function efficiently and a c as a ephmeral archives for smaller nodes in their vicinity.
- Multicast/Swarmed Downloads (Bearshare, Limewire, Sharereaza)
File transfer between two peers fails to maximise bandwidth efficiency due to the congestion problems outlined at the beginning of the chapter. Thus where the file is available from multiple sources it will be different components will be downloaded simulatneously so as to minimise the total time of completion. Under the MFTP prootcol which forms the basis for Edonkey/Overnet this also allows other clients to initiate downloading from a partial download on the disk of another peer. [Chck whether this is the case for the others too].
- Partial Download Sharing
Where large files are being shared amongst a larg number of users, capacity is rigidly limited to those which have already a copy of the entire file in their shared folder. As this can take over a week to accompish, this injects a high quotient of unfulfilled demand. The Edonkey network allows users to transfer partial files from other peers, with the consequence that even where no peer may have the whole file at a give moment, all the constitiutent parts can be available aaand allow a successful transfer to take place. (Ares, Sharereaza and Gnucleus apparenltly enable PSF as well.
- Hashing
In June 2002 the media reported that music companies were now employing a company called 'Overnet' to introduce fake files into the file sharing webs, something which many users had suspected for some time. Fortunately a solution lay close at hand and in the case of one network had already been implemented: unique cryptographic hashes based upon the size of the file which ultimately constituted a reliable identifier. Edonkey users had already established portals for what became known as 'P2P web links', where independent parties would verify the authenticity of the files and then make their description and hash available through a site dedicated to highlighting new releases combined with a searchable database. These sites (sharereactor, filenexus, filedonkey) did not actually store any of the actual content files themselves, merely serving as a clearing house for metadata. Need for content verification arose first on edonkey due to the proclivity of is users to share very large files - often in excess of 600 mb - whose transfer could often require several days, and hence implied a significant waste of machine resources and human efforts should the data turn out to be corrupted in anyway.
- Metadata
Given the enormous and constantly expanding volume of information, it is plain that in order to access and manage it efficintly something broadly equivalent to the Dewey system for library organisation is required. Where metadata protocols are collectively accepted they can signifcantly increase the efficency of searches through the removal of ambiguities about the data's nature. Absence of standardised metadata has meant that search engines are incapable of reflecting the depth of the web's contents and cover it only in a partial manner. Fruitful searches require a semantically rich metadata structure producing descriptive conventions but pointing to unique resource identifiers (e.g. URLs).
Apart from the failure to agree collective standards however, the constant threat of liigation also discourages the use of accurate metadata so that content can be serceted, made available only to those privy to a certain naming protocol, and reaching its acme in programs such as 'pig latin'.
If you right-click a search result, you'll notice a nice option called "Bitzi lookup". Bitzi is a centralized database for hashes from all major file sharing networks. If you look up a search result, you might find out that someone else has provided information about whether the file is real, what quality it is etc. This is obviously very valuable and will save you from downloading hoaxes or low quality files. The only annoying part is that the Bitzi pages are cluttered with banners.
(d)
Comparison of Akamai with software based alternative.
(e)
Deviations for pure p2p model
Fasttrack clients have certain centrealised features, not only supernodes. The reverse engineered open source Gift client was shut out through the distribuion of an update that required clients to conact a fasttrack computer for authentication, only following which could the list of supernodes be retrieved.
(f)
Problems
- Appropriation by proprietary technologies
Napster was a copyrighted work, so that once it become subject to legal action no further conduits to the music-pool were available. Gnutella is an open network shared by multiple applications some of which have opted for GPL development (such as Limewire (out of enlightened self-interest) and Gnucleus (out of far-sightedness and commitment to the free software model)) whereas others have remained proprietary. By and large however, Gnutella developers appear to have tended towards co-operation as evidenced by the Gnutella developers list. Their coherence is likely galvanised by the fact they are effectively in competition more with the fasttrack network (Kazaa, Grokster) operating on a strictly proprietary basis.
The hazards entailed with reliancy on a proprietary technology even in the context of a decentralised network were manifested in March 2002 when the changes to the protocol were made and the owners refused to provide an updated version top the most popular client, Morpheus, whose users were consequently excluded from the network. One reason suggested at the time was that the elimination of morpheus was brought on by the fact that it was the most popular client largely due to the fact that it did not integrate spyware monitoring users activity; their elimination effectively provided the opportunity for their two rivals to divide up their users between them.
Ironically, Morpheus was able to relaunch within three days by taking recourse to the Gnutella network, appropriating the code behind the Gnucleus client with only minor, largely cosmetic, alterations. Nonetheless, the incident highlights the weaknesses introduced into networks where one plaayer has the capacity to sabotage the other and lock their users (along with their shared content) out of the network. The Gnuclus codebase has now generated twelve clones.
- Free riding
Freeriding and Gnutella: The Return of the Tragedy of the Commons: Bandwidth, crisis of P2P, tragedy of the commons, Napster's coming difficulty with a business plan and Mojo Karma. Doing things the freenet way. Eyton Adar & Bernardo Huberman (2000)
Hypothesis 1: A significant portion of Gnutella peers are free riders.
Hypothesis 2: Free riders are distributed evenly across different domains (and by speed of their network connections). Hypothesis 3: Peers that provide files for download are not necessarily those from which files are downloaded. " In a general social dilemma, a group of people attempts to utilize a common good in the absence of central authority. In the case of a system like Gnutella, one common good is the provision of a very large library of files, music and other documents to the user community. Another might be the shared bandwidth in the system. The dilemma for each individual is then to either contribute to the common good, or to shirk and free ride on the work of others.
Since files on Gnutella are treated like a public good and the users are not charged in proportion to their use, it appears rational for people to download music files without contributing by making their own files accessible to other users. Because every individual can reason this way and free ride on the efforts of others, the whole system's performance can degrade considerably, which makes everyone worse off - the tragedy of the digital commons ."
Figure 1 illustrates the number of files shared by each of the 33,335 peers we counted in our measurement. The sites are rank ordered (i.e. sorted by the number of files they offer) from left to right. These results indicate that 22,084, or approximately 66%, of the peers share no files, and that 24,347 or 73% share ten or less files.
The top Share As percent of the whole 333 hosts (1%) 1,142,645 37% 1,667 hosts (5%)2,182,08770%3,334 hosts (10%) 2,692,082 87% 5,000 hosts (15%)2,928,90594%6,667 hosts (20%)3,037,23298%8,333 hosts (25%)3,082,57299%Table 1
And providing files actually downloaded?
Again, we measured a considerable amount of free riding on the Gnutella network. Out of the sample set, 7,349 peers, or approximately 63%, never provided a query response. These were hosts that in theory had files to share but never responded to queries (most likely because they didn't provide "desirable" files).
Figure 2 illustrates the data by depicting the rank ordering of these sites versus the number of query responses each host provided. We again see a rapid decline in the responses as a function of the rank, indicating that very few sites do the bulk of the work. Of the 11,585 sharing hosts the top 1 percent of sites provides nearly 47% of all answers, and the top 25 percent provide 98%.
Quality?
We found the degree to which queries are concentrated through a separate set of experiments in which we recorded a set of 202,509 Gnutella queries. The top 1 percent of those queries accounted for 37% of the total queries on the Gnutella network. The top 25 percent account for over 75% of the total queries. In reality these values are even higher due to the equivalence of queries ("britney spears" vs. "spears britney").
Tragedy? First, peers that provide files are set to only handle some limited number of connections for file download. This limit can essentially be considered a bandwidth limitation of the hosts. Now imagine that there are only a few hosts that provide responses to most file requests (as was illustrated in the results section). As the connections to these peers is limited they will rapidly become saturated and remain so, thus preventing the bulk of the population from retrieving content from them.
A second way in which quality of service degrades is through the impact of additional hosts on the search horizon. The search horizon is the farthest set of hosts reachable by a search request. For example, with a time-to-live of five, search messages will reach at most peers that are five hops away. Any host that is six hops away is unreachable and therefore outside the horizon. As the number of peers in Gnutella increases more and more hosts are pushed outside the search horizon and files held by those hosts become beyond reach.
eMule is one of the first file sharing clients to compress all packets in real time, thereby increasing potential transfer speed (I'm not sure whether this works only with other eMule users). It further extends the eDonkey protocol by introducing a very basic reputation system : eMule remembers the other nodes it deals with and rewards them with quicker queue advanement if they have sent you files in the past.
So far, the eDonkey network has relied on its proprietary nature to enforce uploading: If you change the upload speed in the original client, the download speed is scaled down as well. eMule's reputation feature may make this kind of security by obscurity (that has already been undermined by hacks) unnecessary.
More efforts to enforce sharing
http://www.infoanarchy.org/story/2002/6/20/123110/395 and again
Applejuice
http://www.infoanarchy.org/story/2002/6/20/123110/395
Problems of Defaults: Firewalls and NAT
The default firewalls on windows XP has resulted in the inaccessibility of larg numbers of files formerly available. In addition secondary connections made across NATs can make files unreachable from the exterior, a problem addressed throughthe introduction of the push command.
Primary connections typically consume more CPU cycles than secondary connections.
- Trust/security
Security and privacy threats constitute other elements deterring participation both for reasons relating to users normative beliefs opposed to surveillance and fear of system penetration by untrustworthy daemons.
The security question has recently been scrutinised in light of the revelation that the popular application Kazaa had been packaging a utility for distributed processing known as Brilliant Digital in their installer package. Although unused thusfar it emerged that there was the potential for it to be activated in the future without the knowledge of the end-user.
- Viruses, Trojans, Spyware and other nuisances
.vbs and .exe files can be excluded from searches.
MP3s etc are data not executables.
Virus spreads via Kazaa (but the article wrongly identifies it as a worm): http://www.bitdefender.com/press/ref2706.php
Audio Galaxy: Contains really ugly webHancer spyware that may make your Internet connection unusable. Bundled software deliverd with file sharing applications frequently includes spyware that monitors users' activity. Bundling has also been used to install CPU and bandwidth consuming programs such as Gator, and in other cases - such as that involving Limewire this year - trojans (http://www.wired.com/news/privacy/0,1848,49430,00.html).
- Content Integrity
Commercial operations such as Akamai can guarantee the integrity of the content that they deliver through their control and ownership of their distributed network of caching servers. Peer to Peer networks on the other hand cannot guarantee the security of the machines they icorporate and must take recourse to means of integrity verification inherent in the data being transported, as is the case with hash sums derived from the size and other charcteristics of the file (so-called 'self-verifiable URIs').
[http://open-content.net/specs/draft-jchapweske-caw-03.html]
CAW lets you assemble an ad-hoc network of "proxies" that you need not trust to behave properly, because you can neutralize any attempts to misbehave. [Gordon Mohr ocn-dev@open-content.net Tue, 18 Jun 2002 11:11:28 -0700 ]
Make it so he can search out the media by the hash and you reduce the trust requirements necessary -- all you need to trust is the hash source, which can come easily over a slower link.
Fundamentally this factor reintroduces the problem of trust into network communications in a practical way. Whilst the threat of virus proliferation may be low, other nuisances or threats arte much more realistic. In June it was confirmed that a company named Overnet had been employed by record labels to introduce fake and/or corrupted files into shared networks in the hope of frustrating usrs and driving them back inside the licit market. This had been suspected by many users and observers for some time and in the fatermath of their confirmation arose the news that at least two other entities - the french company 'Retpan' and 'p2poverflow' - were engaged in the same activity.
Where relatively small files are concerned - and the 3.5 to 5.0 megabyte size typical of a music track at 128 bitrate encoding constitutes small by today's standards - such antics, whilst inconvenient, are unlikey to prove an efficient deterrent. Given that most files have been made available by multiple users there will aways be plenty of authentic copies in circulation.
The situation is quite different however relating to the sharing of cinematographic works and television programs, whose exchange has grown rapidly in thelast years principally due to the penetration of broadband and the emrgence of the DivX compression format which has made it simple to burn downloads onto single CDRs thus obviating limited hard diosk space as an impediment to the assembling of a collection. A typical studio release takes up in excess of 600 megabytes when compressed into DivX and can take anything from a day to a week to download in its entirety depending on the transfer mechansm used, speed of connection, number of nodes seving the file etc. Obviously, having waited a week one would be rather irritated to discover that instead of Operation Takedown the 600 megabyte file in fact contained a lengthy denunciation of movie piracy courtesy of the MPAA. In order exactly to counter that possibility portals have emerged on the edonkey nework (the principal filesharing type network for files of this size) whose function is to authenticate the content of hash identified files that are brough to their attention. They initiate a download, ensure the integrity of the content, and verify that the file is available on an adequate number of nodes so as to be feasibly downloaded. provided that the aforesaid criteria are satisfied, the the publish a description of the 'release' together with the necessary hash identifier on their site, this phenomenon is accelerating rapidly but the classical examples remain www.sharereactor.com, www.filenexus.com and www.filedonkey. Similar functionality can be derived from the efforts underway as part of the Bitzi metadat project mentioned above and these initiatives could stymy the efforts by the music companies to render the network cirucits useless by increasin the dead noise ration.
- Prosecution/ISP Account Termination and other Woes
At the prompting of the mnusic industry the No Electronic Theft Act was intorduced in 1997 making the copying of more than ten copies of a work or works having a value in excess of a thousand dollars a federal crime even in the absence of a motibation of 'financial gain'. In august of 1999 a 22 year old student from Orgegon, jeffrey gerard Levy, became the first person indicted under the act. Subsequently there have been no prosecutions under that title. In July and August 2002 however the Recording industry Association of America publicly enlisted the support of other copyright owners and allied elected representatives in calling on John Ashcroft to commence prosecutions. As mentioned above in relation to free riding on the Gnutella network, the small number of nodes serving a high percentage of files means that such users could be attractive targets for individual prosecution.
In addition at least two companies have boasted that they are currently engaged in identifying and tracing the IP numbers of file shares (Retpan (again) and a company called 'Ranger') so as to individualise the culprits. Such a draconian option is not a wager without risks for the plaintive music companies, indeed arguably this is why they have forbeared from such a course up until now. Currently this IP data is being used howver to pressure a more realistic and less sympathetic target, namely the user's Internet Service Provider. ISPs, having financial resources, are more sensitive to the threat of litigation and positioned to take immediate unilateral action against users they feel place them in jeopardy. This has already led to the closure of many accounts, and indeed this is not a novel phenomenon, having commenced in the aftermath of the Npaster closure with moves against those running 'OpenNap'.
Hacking
More recently, and with great puiblic brouhaha, the RIAA and their allies have begun pushing for legislation to allow copyright owners to hack the machines of those they have a reasonable belief are sharing files. Copyright ownbers argue that this will 'even the playing field' in their battle against music 'pirates', and legislation to this effect was introduced by representative Howard Berman (California) at the end of July 2002. As of this writing the function of this initiative is unclear as a real attempt to pursue this course to its logical conclusion will involve the protagonists in a level of conflict with users which would certainly backfire. The likelihood is that this is another salvo in the content industry's drive to force the univrsal adoption o a draw technology on hardware manufacturers.
(g) Economic Aspects
- Cost structure of broadband
Whilst it is obvious why users utilise these tools to extract material, it is not so plain why they should also use them to provide material in turn to others and avoid a tragedy of the commons. Key to the willingness to provide bandwidth has been the availability of cable and DSL lines which provide capacity in excess of most individuals needs at a flat rate cost. There is thus no correlation between the amount of bandwidth used and the price paid, so in brief there is no obvious financial cost to the provider. In areas where there are total transfer caps or use is on a strictly metered basis participation is lower for the same reason.
In this case, search bandwidth consumption serves as a tax on the members of the network, which ensures that those who bring the most of that resource to the network, are those that bear the burden of running the network. infoa
From an ISP point of view traffic crossing AS borders is more expensive than local traffic. We found that only 2-5% of Gnutella connections link nodes located within the same AS, although more than 40% of these nodes are located within the top ten ASs. This result indicates that most Gnutella-generated traffic crosses AS borders, thus increasing costs, unnecessarily. .
Large amouts of extra-network traffic are expensive for ISPs and consequently and increasing number have been introducing bandwidth caps. One September 2002 report claimed that up to 60% of all network traffic was being consumed by P2P usage. Wide implementation of IP multicast has been suggested a potential remedy to these problems, such that once a piece of content was brought within an ISPs network, it would then be served from withing the network to other clients, and thus reduce unnecessary extra-network traffic. Interestingly, the same report argues that much of the 60% derives from search queries and advertising, the former could probably be much reduced by a shift to the pwer-law search method described above. (Source, The effects of P2P on service provider networks, Sandvine September 2002. The methodology employed by Sandvine in assembling their statistics has been criticised as conflating traditional client server downlaods with peer transfers.)
1) Service providers should silently remove caps from any transfers among users of the same ISP. Eventually networks would arise that make use of this bandwidth advantage.
- Lost CPU cycles/Throttling bandwidth leakage
Kazaa supernode will use a max of 10% of total CPU resources. Allows an opt-out. All file sharing clients allow the user ultimate contral over the amount of bandwidth to be dedicated to file transfer, but they diverge in terms of the consequences on the user's own capacity. Thus Edonkey limits download speed by a ration related to one's maximum upload. Limewire on the other hand has a default of 50% bandwidth usage but the user can alter this without any significant effects (so long as the number of transfer slots is modulated accordingly). Gnucleus offers an alternative method in its scheduling option, facilitating connction to the network during defined periods of the day, so that bandwidth is dedicated is to file-sharing outside of houyrs that it is required for other tasks.
On some clients the built in MP3 players can be as cycle-comsuming as the applicatio0n itself, as is the case with Limewire.
Mldonkey has been known to use early 20% of the CPU resources available.
- Access to goods
The motivation atttracting participation in these networks remains that which inspired Napster's inventor: the opportunity to acquire practically unlimited content. Early in the growth of Napster's popularity users realised that other types of files could be exchanged apart from music, as all that was required was a straightforward alteration of the naming protocal such that the file appeared to be an MP3 (Unwrapper). Later applications were explicitly intended to facilitate the sharing of other media such that that today huge numbers of films, television programs, books, animations, pornography of every description, games and software are available. The promise of such goodies is obvuiously an adequate incentive for users to search, select and install a client server application and to acquire the knowledge necessary to its operation. Inuitive Graphical User Interfaces enable a fairly rapid learning curve in addition to which a myriad of users discussion forums, weblogs and news groups provide all that the curious or perplexed could demand.
- Collective Action Mechanisms
Solutions?
i. In the "old days" of the modem-based bulletin board services (BBS), users were required to upload files to the bulletin board before they were able to download.
ii. FreeNet, for example, forces caching of downloaded files in various hosts. This allows for replication of data in the network forcing those who are on the network to provide shared files.
iii. Another possible solution to this problem is the transformation of what is effectively a public good into a private one. This can be accomplished by setting up a market-based architecture that allows peers to buy and sell computer processing resources, very much in the spirit in which Spawn was created
Strictly Legal Applications of Current File Sharing Applications
Although litigation constantly focuses attention on the alleged copyright infringing uses of these programs large amounts of a public domain or GPL character are also shared. In addition, I belive that we are now witnessing a wider implementation of these networks for the purpose of bypassing the gatekeeping functions of the existing communicatons industry. One of the most interesting examples in this regard is that provided by Transmission Films, a Canadaia company partnered with Overnet that launched in August 2002, the most advanced iteration of the network that began with Edonkey. TF offer independent films for viewing either by streaming or download, with options also to purchase the films permanaently. Digital Rights Management is used otherwise to limit access to a five day period from user activation. Customers pay a set fee in advance and then spend the monies in their account selecting the options that they prefer.
In a similar vein, Altnet/Brilliant Digital -owners of Kazaa- have announced the integration of a micropayments facility into their client i order to facilitate acquisition of DRM protected material on their network, to this end they have made agreements with several independent music labels.
http://www.slyck.com/newssep2002/091802c.html
Q: How is Overnet different from gnutella?
Gnutella and Overnet are both distributed networks. But Overnet uses what is called a distributed hashtable to organize the data that is searched for. This means that nodes know what other nodes to send a search to. In Gnutella the searches and publishes are sent more or less randomly so the the network is far less efficient. It is like the difference between looking something up in a large pile of papers or in a filing cabinet.
Conclusion
(h) Commercial operations at work in the area.
interesting comparisan of acquisition times in TCC at p. 28
http://www.badblue.com/w020408.htmhttp://www.gnumarkets.com/ commerical implementations
swarmcast, cloudcast, upriser
mojo nation's market in distributed CDN. Mojo Nation has now morphed into MNet, but without the Mojo, which is to say without the idea of how to provide users with a numerical guide as to to how to organise scnat resources. Without this featur MNet seems little more than a file sharing application with out a significant userbase.
II
Content Storage Systems
(a)
Commodity business
Describe current market.
Analaogise process.
Assess scale of resources available.
Costs of memory versus cost of bandwidth.
III
Wireless Community Networks
(a)
Basic description
The physical layer
i) 802.11b
ii) Blue tooth
iii) Nokia Mesh Networking
Breaking the 250 feet footstep
DIY
www.consume.net
b)Public provisioning
c) Security Issues
d) Economics
Mapping the Gnutella Network: Properties of Large-Scale Peer-to-Peer Systems and Implications for System (2002),
Matei Ripeanu, Ian Foster, Adriana Iamnitchi,
http://people.cs.uchicago.edu/~matei/PAPERS/ic.pdf
http://www.analysphere.com/09Sep02/contents.htm
http://lists.infoanarchy.org/mailman/listinfo.cgi/p2pj
http://www.law.wayne.edu/litman/classes/cyber
http://www.noosphere.cc/peerToPeer.html
http://citeseer.nj.nec.com/ripeanu02mapping.html
http://www.infoanarchy.org/story/2002/8/10/33623/3436
http://www.kuro5hin.org/story/2002/1/23/211455/047
Neat, I hadn't heard of GNUnet before.
The keys look like they might be SHA1 hashes -- 20 hex bytes.
Are they hashes over the full content, the content's assigned name, or some sort of composited partial hash -- like the progressive hashing (used by Freenet) or tree hashes (used by Bitzi and OnionNetworks/CAW)?
They might want to consider optionally accepting Base32-specified keys as well... that's the default representation being used for human/text-protocol display by Gnutella, OnionNetworks/CAW, and Bitzi.
http://ova.zkm.de/perl/ova-raplayer?id=1004561024&base=ova.zkm.de