Vous êtes sur la page 1sur 6

File Sharing Made Easy & Fast – II

Common Sense defies Economics!


(Part 2 – Protocol and Algorithms)

Published: Jan 2006


Publication: Information Technology
In the first part of this article, I had introduced the readers to Bit Torrent – a shrewd, extremely scalable
file distribution protocol, which offloads the content distribution costs onto the consumers themselves.
In this concluding part, I shall explore the technicalities of the protocol in slightly greater detail. I will
concentrate on the qualitative aspects of the protocol and involved algorithms. Exact quantitative details
can be found in the actual protocol specification. It is suggested that the uninitiated reader go through the
first part of the article in order to fully comprehend the discussions here.
The explanations in this article are based on the official Bit Torrent protocol specification (BTP/1.0).
I have also referred to the paper titled “Incentives build robustness in Bit Torrent” (May 2003, Author:
Bram Cohen).

Creating and Publishing Torrents


Pieces and Blocks
A torrent (term used for the resource to be shared /downloaded) may consist of one or more files. Since
the essence of BT lies in downloading (and uploading) different parts of the torrent from (to) different
peers simultaneously, the publisher must divide the torrent into a chosen number of parts. In BT
terminology these parts are called pieces. Each of the pieces is verifiable by an SHA-1 hash. For the
purpose of division into pieces the torrent is logically viewed as a continuous byte stream. A multi file
torrent is viewed as a concatenation of all the component files. The order of this logical concatenation is
included in the corresponding .torrent Meta information file. When all the pieces have been downloaded,
they are compiled into the actual file(s). In fact, ways of combining the pieces into files on the fly as and
when they are downloaded may be devised to enable previewing certain file types.
At the time of sharing the torrent, a peer further divides each piece into a number of blocks (generally,
16 KB in size). Note that while the number and size of the pieces in a torrent is static information, decided
by the publisher of the torrent, the block size is a peer implementation specific value. Thus, while a
particular torrent would always have the same number and size of pieces, different peers might divide the
same piece of a torrent into different number of blocks.

Bencoding and the .torrent Meta Information File


In order to share a torrent, its publisher must create and make available a static Meta information file with
the extension ‘.torrent’, containing encoded information about the actual file(s). The usual approach
followed is to serve the .torrent files via a web server.
The MIME type “application/x-bittorrent” is associated with these files, which enables automatic launching
of the default Bit Torrent application from the web browser when the user requests a resource with this
MIME type.
The encoding scheme used for the .torrent file as well as for the responses from the tracker is called
‘Bencoding’. It is a platform independent technique which specifies special formats for Integers and
Strings along with two compound data types – Lists and Dictionaries. A Dictionary consists of a number
of key/value pairs where each “key” is a bencoded string and the corresponding “value” maybe any
bencoded element. A List is a collection of any number of bencoded elements (i.e. strings, integers,
dictionaries or even other lists).
The .torrent Meta information file is a bencoded dictionary. The key / value pairs provide a variety of
information like the URL of the tracker, details about the torrent (like number and size of pieces, the files
that will result from the torrent etc). Importantly, the SHA-1 hash strings corresponding to each piece of
the torrent are provided so that the BT client can verify the pieces as and when they are downloaded.
The exact structure of the Meta information file also depends on whether the torrent consists of a single
file or a number of files. In case of multi file torrents, information about hierarchical organization of files is
also included.
Figure 1 summarizes the common steps for creating and publishing a torrent.
(Figure 1)

Sharing and Caring with Bit Torrent! (Downloading and Uploading)


After retrieving the Meta information file, sharing the actual file(s) via Bit Torrent essentially involves two
logically distinct phases. One - locating a swarm with the help of a tracker and setting up connections
with peers, and two - sharing the actual file with the peers. Consequently, two different protocols are
involved.
The Tracker HTTP Protocol (THP) defines the procedure for contacting a tracker for locating a swarm of
peers hooked to the required torrent and periodically reporting progress and status to the tracker.
The Peer Wire Protocol (PWP) formulates the procedure for communication between the peers in order
to share the file.
Of course, the entire process of contacting the tracker and then connecting to the peers is transparent to
the user. (Unless there is some problem while contacting the tracker or the peers).

Know Your Neigbors – The Tracker HTTP / HTTPS Protocol (THP)


THP is a simple protocol layered on top of HTTP or HTTPS.
A tracker forms the sole centralized component in any Bit Torrent swarm. Though it has no role in sharing
of the actual torrent, it facilitates and controls entry into a swarm and maintains certain statistics.
The tracker is basically an HTTP daemon. The .torrent file lists the URL of the tracker to be contacted in
its ‘announce’ key. In order to contact the tracker, a peer sends a standard parametrized HTTP GET
request to the specified URL. The parameters provide information about the requested torrent (Its unique
info-hash) and the peer (e.g. a 20 byte Peer-ID, IP address of the peer, port number the peer is listening
to for incoming connections from other peers etc).
Even after the peer has joined the swarm, it must continue sending requests to the tracker periodically in
order to obtain the updated list of peers and to report its status, lest the tracker assume that the peer is
dead.
In response to the GET request from a peer, the tracker serves a plain text file (MIME type text/plain)
which is actually a bencoded dictionary. It contains, mainly, information about either the reason for failure
or a list of peers (specifying Peer-ID, IP address / DNS name and the Port number of each peer)
downloading the same torrent. Figure 2 is a Sequence diagram showing the interactions between a local
peer and a tracker. Standard UML conventions have been used.

(Figure 2)

Give and Take – The Peer Wire Protocol (PWP)


Peer Wire Protocol (PWP) defines the rules for contacting the peers and communicating with the
neighboring peers for trading the torrent pieces once connections have been established with them. PWP
is layered on top of TCP and utilizes asynchronous messages for all communication.
Though the choice of algorithms to be used at various stages during the sharing of torrent has been left
to the implementors, the protocol specification does lay down a number of guidelines or design
requirements to be followed. For example, though the protocol suggests a tit-for-tat approach in general,
it mandates special treatment for new entrants into the swarm who might not have any pieces to offer to
the other peers. Further, the algorithms must make efficient use of both the upload and download
bandwidths. Also, each implementation should be compatible with other implementations that possibly
use a different algorithm.

Handshaking
Each peer listens to a TCP port for incoming requests from other peers and reports this port number to
the tracker, as already mentioned. After obtaining the list of peers and their port numbers from the
tracker, the local peer may connect to the remote peers on specified ports. Similarly the remote peers
may open a TCP connection with the local peer at the port reported by it to the tracker. A pair of peers
with an active TCP link between them are known as neighbors. After the connections have been
established, the PWP requires that a two- way “handshaking” operation be performed over each
connection before it is used to exchange any data. A handshake message (a 68 byte string) is sent and
validated at each end of the connection. In case the authentication fails at either end, the connection is
dropped. Thus, handshaking serves as a security measure. As per the current specifications, the
handshake message includes information like the 20 byte Peer ID (a Peer ID uniquely identifies a peer in
a swarm) and the info-hash (SHA - 1 hash of the info key in the meta information file. It uniquely identifies
the torrent).
Each peer can verify the Peer ID of its neighbor by cross cheking against the list obtained from the
tracker and the info-hash against its own copy of the meta information file.
There is provision for incorporating stronger security checks in the handshake message in future.

PWP Messages and Peer States


Following a successful PWP handshake, either end of a connection may send PWP Messages to the
other end asynchronously. Immediately after the handshake, both the ends generally send a “Bitfield”
message which is a representation of the torrent with each piece being denoted by a single bit position. A
1 in any position informs that the corresponding piece is available with the sender.
The PWP messages may be either state-oriented or data-oriented.
Data oriented messages handle the exchange of torrent pieces between the peers. (Request piece, Send
piece, Cancel request etc).
State oriented messages serve to inform the receiver about a change in senders’s state.
PWP identifies the following peer states:-
• Choked / Unchoked
Choking and Unchoking is the basis of the tit-for-tat strategy wired into BT. A peer may choose to
“choke” one or more of its neighbors for some reason. For instance, because the neighbor is
behaving selfish! That is, uploads from the neighbor are unsatisfactory.
A choked peer must not send any data oriented messages (generally, request for data) to the peer
that choked it. Protocol implementations are free to devise their own choking and unchoking criteria
and algorithms. This allows much freedom in using various heuristics for optimum peer selection.
• Intereseted / Uninterested
If a peer assumes “interested” state with respect to a neighbor, the neighbor must expect to receive
data oriented messages from the peer as soon as it unchokes the peer.
It should be obvious that the peer states are connection specific. The same peer may assume different
states for different neigbors, simultaneously.

Strategies and Algorithms


The protocol offers much freedom of choice with regard to the algorithms and strategies to be used for
various purposes. However, some approaches have emerged clear winners and most implementations
use these or their close variants. I will briefly outline the most popular ones here.

Piece Selection
‘Rarest Piece First’ which dictates that a peer first download the pieces that the fewest of its neighbors
have proves to be a good approach in most cases not only for the local peer but for the swarm on the
whole. It helps to increase the availability of rare pieces and ensures that new pieces are quickly
requested incase there is a lone seeder in the swarm (e.g. the publisher). It thus helps in quickly
distributing a concentrated torrent across the swarm. However, at the start of download, a peer may opt
for a ‘Random Piece First’ strategy as the first priority at that point is to start uploading to others. During
the final stages of its download, a peer may send requests to all its neighbors for all the remaining
blocks. It then sends out ‘cancels’ for the blocks that arrive. This helps it to quickly complete its
download.

Choking Algorithms – Tit for Tat


Each BT peer ‘unchokes’ (i.e. uploads to) a fixed number (default 4) of neighbors at any moment.
Deciding whom to unchoke when is the task of the choking algorithm. This is a dynamic decision which
invariably depends upon the current download rate. Basically, the approach adopted by the peers is to
reciprocate positively to the neighbors which upload to them and negatively to others – Tit for Tat!
However, in order to ensure that potentially better avenues are not overlooked, the peers ‘optimistically
unchoke’ a randomly selected neighbor periodically.
An effective choking algorithm is critical for optimum performance.

A Buzzing Swarm …
A swarm of BT peers, hooked onto a torrent bustles with activity.
There are peers joining and leaving the swarm, friendships (read TCP connections) being forged and
broken, neighbors choking each other – it doesn’t quite look like a nice place to be in! And yet, in the end
all are happy!
Lets look at the swarm from a peer’s point of view. After registering itself with and obtaining information
about the swarm from the tracker, the peer, say ‘P’, starts sending out connection requests to the peers
in swarm that it finds attractive. (This depends on the particular Peer Selection Strategy that P
implements). P successfully establishes connections with several peers and starts downloading different
pieces of the torrent from them. As it finishes downloading some of the pieces, P starts receiving
requests from other peers – its now P’s turn to share! If P now tries to act smart by not uploading what it
has, it may soon be penalised by the other peers in the swarm who might choke it. Similarly, P can also
choke some of its neighbors. (Depending on it’s choking algorithm) There might also be mutual
expression of ‘interest’ between P and it’s peers.
And while all this is going on, P must not forget to keep forging new alliances (sending requests to other
peers). It must also keep itself abreast about the current affairs of the swarm! (by periodically obtaining
statistical information from the tracker).
Finally, after P has completed its download, social etiquettes demand that it should continue to offer
some community service by becoming a seeder for some time.

Trackerless Bit Torrent


Bit Torrent is a relatively new protocol and is under continous development. The beta of a “trackerless”
version of Bit Torrent has been released. The technique entails the use of distributed hash tables (DHT)
for efficiently storing and retrieving contact information for peers in a torrent. In effect, every peer
becomes a light weight tracker. The decision about which version to use is left to the publisher of the
torrent. Though there may be genuine reliability and control issues, the idea is to enable anyone with an
internet connection and a website to host BT downloads.

A Parting Note
Some times sheer common sense and imagination can produce magnificent masterpieces!
Bit Torrent is a clear case in point. Kudos to Bram Cohen for his wonderful gift to the internet community!

Vous aimerez peut-être aussi