Vous êtes sur la page 1sur 44

Bit Torrent Protocol

Seminar Report 2011

Introduction

1. Overview

BitTorrent is a peer-to-peer file sharing protocol used to distribute large


amounts of data. BitTorrent is one of the most common protocols for
transferring large files. Its main usage is for the transfer of large sized files. It
makes transfer of such files easier by implementing a different approach. A
user can obtain multiple files simultaneously without any considerable loss of
the transfer rate. It is said to be a lot better than the conventional file transfer
methods because of a different principle that is followed by this protocol. It
also evens out the way a file is shared by allowing a user not just to obtain it
but also to share it with others. This is what has made a big difference between
this and the conventional file transfer methods. It makes a user to share the file
he is obtaining so that the other users who are trying to obtain the same file
would find it easier and also in turn making these users to involve themselves
in the file sharing process. Thus the larger the number of users the more is the
demand and more easily a file can be transferred between them.
BitTorrent protocol has been built on a technology which makes it
possible to distribute large amounts of data without the need of a high capacity
server, and expensive bandwidth. This is the most striking feature of this file
transfer protocol. The transferring of files will never depend on a single source
which is supposed the original copy of the file but instead the load will be
distributed across a number of such sources. Here not just the sources are
responsible for file transfer but also the clients or users who want to obtain the
file are involved in this process. This makes the load get distributed evenly
across the users and thus making the main source partially free from this

1
Bit Torrent Protocol
Seminar Report 2011

process which will reduce the network traffic imposed on it. Because of this,
BitTorrent has become one of the most popular file transfer mechanisms in
today’s world. Though the mechanism itself is not as simple as an ordinary file
transfer protocol, it has gained its popularity because of the sharing policy that
it imposes on its users. This fact is quite obvious, since the recent surveys made
by various organizations show that 35% of the overall internet traffic is because
of BitTorrent. This shows that the amount of files that are being transferred and
shared by users through BitTorrent is very huge.

1.1 History

BitTorrent was created by a programmer named Bram Cohen. After


inventing this new technology he said, "I decided I finally wanted to work on a
project that people would actually use, would actually work and would actually
be fun". Before this was invented, there were other techniques for file sharing
but they were not utilizing the bandwidth effectively. The bandwidth had
become a bottleneck in such methods. Even other peer to peer file sharing
systems like Napster and Kazaa had the capability of sharing files by making
the users involve in the sharing process, but they required only a subset of users
to share the files not all. This meant that most of the users can simply download
the files without being needed to upload. So this again put a lot of network load
on the original sources and on small number of users. This led to inefficient
usage of bandwidth of the remaining users. This was the main intention behind
Cohen’s invention, i.e., to make the maximum utilization of all the users’
bandwidth who are involved in the sharing of files. By doing so, every person
who wants to download a file had to contribute towards the uploading process
also. This new and novel concept of Cohen gave birth to a new peer to peer file
sharing protocol called BitTorrent. Cohen invented this protocol in April 2001.

2
Bit Torrent Protocol
Seminar Report 2011

The first usable version of BitTorrent appeared in October 2002, but the system
needed a lot of fine-tuning. BitTorrent really started to take off in early 2003
when it was used to distribute a new version of Linux and fans of Japanese
anime started relying on it to share cartoons. The most important part of this
protocol that matters a lot about this is that it makes it possible for people with
limited bandwidth to supply very popular files. This means that if you are a
small software developer you can put up a package, and if it turns out that
millions of people want it, they can get it from each other in an automated way.
Thus the bandwidth which used to be a bottleneck in previous systems no
longer poses a problem.

3
Bit Torrent Protocol
Seminar Report 2011

2. Bit Torrent and Other approaches

2.1 Other P2P Methods

The most common method by which files are transferred on the Internet
is the client-server model. A central server sends the entire file to each client
that requests it, this is how both http and ftp work. The clients only speak to the
server, and never to each other. The main advantages of this method are that it's
simple to set up, and the files are usually always available since the servers tend
to be dedicated to the task of serving, and are always on and connected to the
Internet. However, this model has a significant problem with files that are large
or very popular, or both. Namely, it takes a great deal of bandwidth and server
resources to distribute such a file, since the server must transmit the entire file
to each client. Perhaps you may have tried to download a demo of a new game
just released, or CD images of a new Linux distribution, and found that all the
servers report "too many users," or there is a long queue that you have to wait
through. The concept of mirrors partially addresses this shortcoming by
distributing the load across multiple servers. But it requires a lot of
coordination and effort to set up an efficient network of mirrors, and it's usually
only feasible for the busiest of sites.
Another method of transferring files has become popular recently: the
peer-to-peer network, systems such as Kazaa, eDonkey, Gnutella, Direct
Connect, etc. In most of these networks, ordinary Internet users trade files by

4
Bit Torrent Protocol
Seminar Report 2011

directly connecting one-to-one. The advantage here is that files can be shared
without having access to a proper server, and because of this there is little
accountability for the contents of the files. Hence, these networks tend to be
very popular for illicit files such as music, movies, pirated software, etc.
Typically, a downloader receives a file from a single source, however the
newest version of some clients allow downloading a single file from multiple
sources for higher speeds. The problem discussed above of popular downloads
is somewhat mitigated, because there's a greater chance that a popular file will
be offered by a number of peers. The breadth of files available tends to be fairly
good, though download speeds for obscure files tend to be low. Another
common problem sometimes associated with these systems is the significant
protocol overhead for passing search queries amongst the peers, and the
number of peers that one can reach is often limited as a result. Partially
downloaded files are usually not available to other peers, although some newer
clients may offer this functionality. Availability is generally dependent on the
goodwill of the users, to the extent that some of these networks have tried to
enforce rules or restrictions regarding send/receive ratios.
Use of the Usenet binary newsgroups is yet another method of file
distribution, one that is substantially different from the other methods. Files
transferred over Usenet are often subject to miniscule windows of opportunity.
Typical retention time of binary news servers are often as low as 24 hours, and
having a posted file available for a week is considered a long time. However,
the Usenet model is relatively efficient, in that the messages are passed around
a large web of peers from one news server to another, and finally fanned out to
the end user from there. Often the end user connects to a server provided by his
or her ISP, resulting in further bandwidth savings. Usenet is also one of the
more anonymous forms of file sharing, and it too is often used for illicit files of
almost any nature. Due to the nature of NNTP, a file's popularity has little to do

5
Bit Torrent Protocol
Seminar Report 2011

with its availability and hence downloads from Usenet tend to be quite fast
regardless of content. The downsides of this method include a set of rules and
procedures, and requires a certain amount of effort and understanding from the
user. Patience is often required to get a complete file due to the nature of
splitting big files into a huge number of smaller posts. Finally, access to Usenet
often must be purchased due to the extremely high volume of messages in the
binary groups.
BitTorrent is closest to Usenet. It is best suited to newer files, of which a
number of people have interest in. Obscure or older files tend to not be
available. Perhaps as the software matures a more suitable means of keeping
torrents seeded will emerge, but currently the client is quite resource-intensive,
making it cumbersome to share a number of files. BitTorrent also deals well
with files that are in high demand, especially compared to the other methods.

2.2 A Typical HTTP File Transfer

The most common type of file transfer is through a HTTP server. In this
method, a HTTP server listens to the client’s requests and serves them. Here the
client can only depend on the lone server that is providing the file. The overall
download scheme will be limited to the limitations of that server. Also this kind
of transfer of file is subjected to single point of failure, where if the server
crashes then the whole download process will seize. A single server can handle
many such clients and serve the requested file simultaneously to all the clients.
The file being served will be available as one single piece, which means that if
the download process stops abruptly in the middle the whole file has to be
downloaded again. BitTorrent protocol has overcome all these shortcomings
seen in this type and thus it is more robust due to which it is chosen by many
people over this traditional method of file transfer.

6
Bit Torrent Protocol
Seminar Report 2011

Fig 2.1 : HTTP/FTP File Transfer

2.3 The DAP method

Download Accelerator Plus (DAP) is the world's most popular download


accelerator. DAP's key features include the ability to accelerate downloading of
files in FTP and HTTP protocols, to pause and resume downloads, and to
recover from dropped internet connections.
On the Internet the same file is often hosted on numerous mirror sites,
such as at universities and on ISP servers. DAPimmediately senses when a user
begins downloading a file and identifies available mirror sites that host the
requested file. As soon as it is triggered, DAP's client side optimization begins
to determine - in real time - which mirror sites offer the fastest response for the
specific user's location. The file is downloaded in several segments
simultaneously through multiple connections from the most responsive
server(s) and reassembled at the user's PC. This results in better utilization of
the user's available bandwidth. This ensures that each available mirror server is
utilized to serve the users that most benefit. This in turn effects an efficient
balancing of the load among available servers across the entire World Wide

7
Bit Torrent Protocol
Seminar Report 2011

Web, and reduces download times for users while allowing them to receive
maximum benefit from their available bandwidth. DAP's Resume functionality
and the ability to continue downloading even when one of the participating
connections has dropped also provides users with a more reliable download
experience.

2.4 The Bit Torrent Approach

In Bit Torrent, the data to be shared is divided into many equal-sized


portions called pieces. Each piece is further sub-divided into equal-sized sub-
pieces called blocks. All clients interested in sharing this data are grouped into
a swarm, each of which is managed by a central entity called the tracker.
BitTorrent has revolutionized the way files are shared between people. It does
not require a user to download a file completely from a single server. Instead a
file can be downloaded from many such users who are indeed downloading the
same file. A user who has the complete file, called the seed will initiate the
download by transferring pieces of file to the users. Once a user has some
considerable number of such pieces of a file then even he can start sharing them
with other users who are yet to receive those pieces. This concept enables a
client not to depend on a server completely and also it reduces overall load on
the server.

8
Bit Torrent Protocol
Seminar Report 2011

Fig 2.2 : BitTorrent File Transfer


Each client independently sends a file, called a torrent, that contains the
location of the tracker along with a hash of each piece. Clients keep each other
updated on the status of their download. Clients download blocks from other
(randomly chosen) clients who claim they have the corresponding data.
Accordingly, clients also send data that they have previously downloaded to
other clients. Once a client receives all the blocks for a given piece, he can
verify the hash of that piece against the provided hash in the torrent. Thus once
a client has downloaded and verified all pieces, he can be confident that he has
the complete data.
Both BitTorrent and DAP download files from multiple sources. Also
the files are divided into pieces in both approaches. But BitTorrent has many
such features that DAP doesn’t, which has made it the most popular one. In
BitTorrent the users participate actively in sharing files along with servers. This
is the uniqueness of this protocol. Also this needs an implementation of a
dedicated server called tracker to handle the peers connected in the network.
The file transfer in DAP takes place through the traditional HTTP or FTP
protocol which means that the transfer rate will always be limited by the
server’s bandwidth. If these servers are flooded with requests then the
breakdown and the transaction will terminate. This is not the case in BitTorrent

9
Bit Torrent Protocol
Seminar Report 2011

since the whole process is not depending on servers alone. The load is
distributed across the network between peers and servers. This makes
BitTorrent far better than its competing peers like DAP and others.

3. Working of BitTorrent

As previously explained, BitTorrent’s design makes it extremely


efficient in the sharing of large data files among interested peers. Looking
under the hood, BitTorrent is a protocol with some complexity where modeling
is useful to gain a better understanding of its performance. BitTorrent scales
well and is a superior method for transferring and disseminating files between
interested peers while limiting free riding (peers who download but do not
upload) between those same peers. BitTorrent’s is based on a “tit for tat”
reciprocity agreement between users that ultimately results in pareto efficiency.
Pareto efficiency is an important economic concept that maximizes resource
allocation among peers to their mutual advantage. Pareto efficiency is the
crown jewel of BitTorrent and is the driving force behind the protocol’s
popularity and success. Cohen’s vision of peers simultaneously helping each
other by uploading and downloading has been realized by the BitTorrent
system.

10
Bit Torrent Protocol
Seminar Report 2011

Fig 3.1 : A Typical BitTorrent System


The protocol shares data through what are known as torrents. For a
torrent to be alive or active it must have several key components to function.
These components include a tracker server, a .torrent file, a web server where
the .torrent file is stored and a complete copy of the file being exchanged. Each
of these components is described in the following paragraphs.
The file being exchanged is the essence of the torrent and a complete
copy is referred to as a seed. A seed is a peer in the BitTorrent network willing
to share a file with other peers in the network. Why seed owners choose to
share their files is debatable, as the BitTorrent protocol does not reward seed
behavior. In fact, some researchers believe the protocol lacks any incentive
mechanism for encouraging seeds to remain in torrents. Some argue that the
lack of incentive in the protocol is a fundamental design flaw that leads to the
punishment of seeds.
Peers lacking the file and seeking it from seeds are called leechers.
While seeds only upload to leechers, leechers may both download from seeds
and upload to other leechers. BitTorrent’s protocol is designed so leeching
peers seek each other out for data transfer in a process known as “optimistic
unchoking”. Together seeds and leechers engaged in file transfer are referred to
as a swarm. A swarm is coordinated by a tracker server serving the particular

11
Bit Torrent Protocol
Seminar Report 2011

torrent and interested peers find the tracker via metadata known as a .torrent
file. Since BitTorrent has no built in search functionality, .torrent files are
usually located via HTTP through search engines or trackers.
The first step in the BitTorrent exchange occurs when a peer downloads
a .torrent file from a server. The role of .torrent files is to provide the metadata
that allows the protocol to function; .torrent files can be viewed as surrogates
for the files being shared. These .torrent files contain key pieces of data to
function correctly including file length, assigned name, hashing information
about the file and the URL of the tracker coordinating the torrent activity.
Torrent files can be created using a program such as MakeTorrent, another
open source tool available under the free software model.
When a .torrent file is opened by the peer’s client software, the peer then
connects to the tracker server responsible for coordinating activity for that
specific torrent. The tracker and client communicate by a protocol layered on
top of HTTP and the tracker’s key role is to coordinate peers seeking the same
file for Cohen envisioned “The tracker’s responsibilities are strictly limited to
helping peers find each other”. In reality the tracker’s role is a bit more
complex as many trackers collect data about peers engaged in a swarm.
Additionally, some of the newer tracker software being released has integrated
the functions of the tracker and .torrent server.
Leechers and seeds are coordinated by the tracker server and the peers
periodically update the tracker on their status allowing the tracker to have a
global view of the system.
The data monitored by the tracker can include peer IP addresses, amount
of data uploaded/downloaded for specific peers, data transfer rates among
peers, the percentage of the total file downloaded, length of time connected to
the tracker, and the ratio of sharing among peers. Usually a tracker coordinates

12
Bit Torrent Protocol
Seminar Report 2011

multiple torrents and the most popular trackers are busy coordinating thousands
of swarms simultaneously.
It should be noted that .torrent files are not the actual file being shared;
rather .torrent files are the metadata information which allow which trackers
and peers to coordinate their activities. As previously mentioned, the complete
file is actually stored on peer seed nodes and not the tracker server. Since
.torrent files are small and require little space to store, one server can easily
host thousands of .torrent files without prohibitive server or bandwidth
requirements. There is some issue with bandwidth usage to host a tracker,
however, especially if the tracker becomes popular and begins to see heavy
usage. Regardless, the tracker’s bandwidth requirements are much less than
hosting the complete files in a traditional client-server model such as one would
encounter with an FTP site.
While trackers and .torrent files serve as mechanisms to assist the
BitTorrent protocol, the process of actually transferring data is handled by the
peers engaged in the swarm. Cohen’s vision of “tit for tat” is the sole incentive
measure he saw necessary for the protocol’s success. Peers seek tit for tat
behavior from others and discourage free riding by a “choke/unchoke” policy.
This choke policy uses a process known as “optimistic unchoking” to
constantly seek other swarm peers who may have more beneficial connections
to offer an interested peer.
There has been some research of the tit for tat algorithm by modeling
rational users whose behavior is then studied. This work defined rational users
as those peer nodes manipulating their client software beyond default settings.
The fact that many newer BitTorrent clients allow for custom tweaking of
specific upload or download speed indicates that perhaps the original tit for tat
coding was too good, and thus detrimental to other peer node functions such as
normal HTTP traffic. Some BitTorrent FAQs recommend limiting uploads to

13
Bit Torrent Protocol
Seminar Report 2011

approximately 80% of known capacity and personal tests indicate this strategy
does benefit download speeds.
The final important aspect of the BitTorrent protocol’s architecture is its
use of a “rarest piece first” algorithm when a peer begins a file download. The
rarest first algorithm has as its goal the uniform distribution of data across
peers, also known as the “endgame mode”. A rarest first policy requires a seed
to upload new file chunks (those not yet uploaded to a swarm) to the newest
peer connecting to a torrent. This policy encourages distribution of the file
further across peer nodes.. The rarest first algorithm is an interesting aspect of
BitTorrent that when combined with optimistic unchoking may explain why the
protocol has achieved such success.

4. Terminology

These are the common terms that one would come across while making
a typical BitTorrent file transfer.

 Torrent : this refers to the small metadata file you receive from

the web server (the one that ends in .torrent.) Metadata here
means that the file contains information about the data you want
to download, not the data itself.

14
Bit Torrent Protocol
Seminar Report 2011

 Peer : A peer is another computer on the internet that you connect

to and transfer data. Generally a peer does not have the complete
file.
 Leeches : They are similar to peers in that they won’t have the

complete file. But the main difference between the two is that a
leech will not upload once the file is downloaded.
 Seed : A computer that has a complete copy of a certain torrent.

Once a client downloads a file completely, he can continue to


upload the file which is called as seeding. This is a good practice
in the BitTorrent world since it allows other users to have the file
easily.
 Reseed : When there are zero seeds for a given torrent, then
eventually all the peers will get stuck with an incomplete file,
since no one in the swarm has the missing pieces. When this
happens, a seed must connect to the swarm so that those missing
pieces can be transferred. This is called reseeding.
 Swarm : The group of machines that are collectively connected

for a particular file.


 Tracker : A server on the Internet that acts to coordinate the

action of BitTorrent clients. The clients are in constant touch with


this server to know about the peers in the swarm.
 Share ratio : This is ratio of amount of a file downloaded to that

of uploaded. A ratio of 1 means that one has uploaded the same


amount of a file that has been downloaded.
 Distributed copies : Sometimes the peers in a swarm will
collectively have a complete file. Such copies are called
distributed copies.

15
Bit Torrent Protocol
Seminar Report 2011

 Choked : It is a state of an uploader where he does not want to

send anything on his link. In such cases, the connection is said to


be choked.
 Interested : This is the state of a downloader which suggests that

the other end has some pieces that the downloader wants. Then
the downloader is said to be interested in the other end.
 Snubbed : If the client has not received anything after a certain

period, it marks a connection as snubbed, in that the peer on the


other end has chosen not to send in a while.
 Optimistic unchoking : Periodically, the client shakes up the list

of uploaders and tries sending on different connections that were


previously choked, and choking the connections it was just using.
This is called optimistic unchoking.

5. Architecture of BitTorrent

The BitTorrent protocol can be split into the following five main
components:
 Metainfo File - a file which contains all details necessary for the

protocol to operate.
 Tracker - A server which helps manage the BitTorrent protocol.

 Peers - Users exchanging data via the BitTorrent protocol.

16
Bit Torrent Protocol
Seminar Report 2011

 Data - The files being transferred across the protocol.

 Client - The program which sits on a peers computer and implements

the protocol.
Peers use TCP (Transport Control Protocol) to communicate and send data.
This protocol is preferable over other protocols such as UDP (User Datagram
Protocol) because TCP guarantees reliable and in-order delivery of data from
sender to receiver. UDP cannot give such guarantees, and data can become
scrambled, or lost all together. The tracker allows peers to query which peers
have what data, and allows them to begin communication. Peers communicate
with the tracker via the plain text via HTTP (Hypertext Transfer Protocol) The
following diagram illustrates how peers interact with each other, and also
communicate with a central tracker.

Fig 5.1 : Architecture of a BitTorrent System

5.1 Metainfo File

17
Bit Torrent Protocol
Seminar Report 2011

When someone wants to publish data using the BitTorrent protocol, they
must create a metainfo file. This file is specific to the data they are publishing,
and contains all the information about a torrent, such as the data to be included,
and IP address of the tracker to connect to. A tracker is a server which
'manages' a torrent, and is discussed in the next section. The file is given a
'.torrent' extension, and the data is extracted from the file by a BitTorrent client.
This is a program which runs on the user computer, and implements the
bittorrent protocol. Every metainfo file must contain the following information,
(or 'keys'):
• info: A dictionary which describes the file(s) of the torrent. Either for
the single file, or the directory structure for more files. Hashes for every
data piece, in SHA 1 format are stored here.
• announce: The announce URL of the tracker as a string
The following are optional keys which can also be used:
• announce-list: Used to list backup trackers
• creation date: The creation time of the torrent by way of UNIX time
stamp (integer seconds since 1-Jan-1970 00:00:00 UTC)
• comment: Any comments by the author
• created by: Name and Version of programme used to create the
metainfo file
These keys are structured in the metainfo file as follows:

{'info': {'piece length': 131072, 'length': 38190848L, 'name':


'Cory_Doctorow_Microsoft_Research_DRM_talk.mp3', 'pieces':
'\xcb\xfaz\r\x9b\xe1\x9a\xe1\x83\x91~\xed@\.....', } 'announce':
'http://tracker.var.cc:6969/announce', 'creation date': 1089749086L }

18
Bit Torrent Protocol
Seminar Report 2011

Instead of transmitting the keys in plan text format, the keys contained in
the metainfo file are encoded before they are sent. Encoding is done using
bittorrent specific method known as 'bencoding'.

5.1.1 Bencoding :
Bencoding is used by bittorrent to send loosely structured data between
the BitTorrent client and a tracker. Bencoding supports byte strings, integers,
lists and dictionaries. Bencoding uses the beginning delimiters 'i' / 'l' / 'd' for
integers, lists and dictionaries respectively. Ending delimiters are always 'e'.
Delimiters are not used for byte strings.
Bencoding Structure:
• Byte Strings : <string length in base ten ASCII> : <string data>
• Integers: i<base ten ASCII>e
• Lists: l<bencoded values>e
• Dictionaries: d<bencoded string><bencoded element>e
Minus integers are allowed, but prefixing the number with a zero is not
permitted. However '0' is allowed.
Examples of bencoding:

4:spam // represents the string "spam"


i3e // represents the integer "3"
l4:spam4:eggse // represents the list of two strings: ["spam","eggs"]
d4:spaml1:a1:bee // represents the dictionary {"spam" => ["a" , "b"] }

5.1.2 Metainfo File Distribution :


Because all information which is needed for the torrent is included in a
single file, this file can easily be distributed via other protocols, and as the file

19
Bit Torrent Protocol
Seminar Report 2011

is replicated, the number of peers can increase very quickly. The most popular
method of distribution is using a public indexing site which hosts the metainfo
files. A seed will upload the file, and then others can download a copy of the
file over the HTTP protocol and participate in the torrent.

5.2 Tracker
A tracker is used to manage users participating in a torrent (know as
peers). It stored statistics about the torrent, but its main role is allow peers to
'find each other' and start communication, i.e. to find peers with the data they
require. Peers know nothing of each other until a response is received from the
tracker. Whenever a peer contacts the tracker, it reports which pieces of a file
they have. That way, when another peer queries the tracker, it can provide a
random list of peers who are participating in the torrent, and have the required
piece.

Fig 5.2 : Tracker

20
Bit Torrent Protocol
Seminar Report 2011

A tracker is a HTTP/HTTPS service and typically works on port 6969.


The address of the tracker managing a torrent is specified in the metainfo file, a
single tracker can manage multiple torrents. Multiple trackers can also be
specified, as backups, which are handled by the BitTorrent client running on
the users computer. BitTorrent clients communicate with the tracker using
HTTP GET requests, which is a standard CGI method. This consists of
appending a "?" to the URL, and separating parameters with a "&".

The parameters accepted by the tracker are:


• info_hash: 20-byte SHA1 hash of the info key from the metainfo file.
• peer_id: 20-byte string used as a unique ID for the client.
• port: The port number the client is listed on.
• uploaded: The total amount uploaded since the client sent the 'started'
event to the tracker in base ten ASCII.
• downloaded: The total amount downloaded since the client sent the
'started' event to the tracker in base ten ASCII.
• left: The number of bytes the client till has to download, in base ten
ASCII.
• compact: Indicates that the client accepts compacted responses. The
peer list can then be replaced by a 6 bytes per peer. The first 4 bytes are
the host, and the last 2 bytes are port.
• event: If specified, must be one of the following: started, stopped,
completed.
• ip: (optional) The IP address of the client machine, in dotted format.
• numwant: (optional) The number of peers the client wishes to receive
from the tracker.

21
Bit Torrent Protocol
Seminar Report 2011

• key: (optional) Allows a client to identify itself if their IP address


changes.
• trackerid: (optional) If previous announce contained a tracker id, it
should be set here.
The tracker then responds with a "text/plain" document with the following
keys:
• failure message: If present, then no other keys are included. The value
is a human readable error message as to why the request failed.
• warning message: Similar to failure message, but response still gets
processed.
• interval: The number of seconds a client should wait between sending
regular requests to the tracker.
• min interval: Minimum announce interval.
• tracker id: A string that the client should send back with its next
announce.
• complete: Number of peers with the complete file.
• incomplete: number of non-seeding peers (leechers)
• peers: A list of dictionaries including: peer id, IP and ports of all the
peers.

5.2.1 Scraping
Scraping is the process of querying the state of a given torrent (or all
torrents) that the tracker is managing. The result is known as a "scrape page".
To get the scrape, you must start with the announce URL, find the last '/' and if
the text immediately following the '/' is 'announce', then this can be substituted
for 'scrape' to find the scrape page.
Examples:

22
Bit Torrent Protocol
Seminar Report 2011

Announce URL Scrape URL

http://example.com/annnounce  http://example.com/scrape

http://example.com/a/annnounce  http://example.com/a/scrape

http://example.com/announce.php  http://example.com/scrape.php

The tracker then responds with a "text/plain" document with the following
bencoded keys:
• files: A dictionary containing one key pair for each torrent. Each key is
made up of a 20-byte binary hash value. The value of that key is then a
nested dictionary with the following keys:
• complete: number of peers with the entire file (seeds)
• downloaded: total number of times the entire file has been downloaded.
• incomplete: the number of active downloaders (lechers)
• name: (optional) the torrent name
5.3 Peers
Peers are other users participating in a torrent, and have the partial file,
or the complete file (known as a seed). Pieces are requested from peers, but are
not guaranteed to be sent, depending on the status of the peer. BitTorrent uses
TCP (Transmission Control

Protocol) ports 6881-6889 to send messages and data between peers, and unlike
other protocols, does not use UDP (User Datagram Protocol)

23
Bit Torrent Protocol
Seminar Report 2011

5.3.1 Piece Selection


Peers continuously queue up the pieces for download which they
require. Therefore the tracker is constantly replying to the peer with a list of
peers who have the requested pieces. Which piece is requested depends upon
the BitTorrent client. There are three stages of piece selection, which change
depending on which stage of completion a peer is at.
5.3.2 Random First Piece
When downloading first begins, as the peer has nothing to upload, a
piece is selected at random to get the download started. Random pieces are then
chosen until the first piece is completed and checked. Once this happens, the
'rarest first' strategy begins.
5.3.3 Rarest First
When a peer selects which piece to download next, the rarest piece will
be chosen from the current swarm, i.e. the piece held by the lowest number of
peers. This means that the most common pieces are left until later, and focus
goes to replication of rarer pieces.
At the beginning of a torrent, there will be only one seed with the
complete file. There would be a possible bottle neck if multiple downloaders
were trying to access the same piece. rarest first avoids this because different
peers have different pieces. As more peers connect, rarest first will the some
load off of the tracker, as peers begin to download from one another.
Eventually the original seed will disappear from a torrent. This could be
because of cost reasons, or most commonly because of bandwidth issues.
Losing a seed runs the risk of pieces being lost if no current downloaders have
them. Rarest first works to prevent the loss of pieces by replicating the pieces
most at risk as quickly as possible. If the original seed goes before at least one
other peer has the complete file, then no one will reach completion, unless a
seed re-connects.

24
Bit Torrent Protocol
Seminar Report 2011

5.3.4 Endgame Mode


When a download nears completion, and waiting for a piece from a peer
with slow transfer rates, completion may be delayed. To prevent this, the
remaining sub-pieces are request from all peers in the current swarm.
5.3.5 Peer Distribution
The role of the tracker ends once peers have 'found each other'. From
then on, communication is done directly between peers, and the tracker is not
involved. The set of peers a BitTorrent client is in communication with is
known as a swarm.
To maintain the integrity of the data which has been downloaded, a peer
does not report that they have a piece until they have performed a hash check
with the one contained in the metainfo file.
Peers will continue to download data from all available peers that they
can, i.e. peers that posses the required pieces. Peers can block others from
downloading data if necessary. This is known as choking.
5.3.6 Choking
When a peer receives a request for a piece from another peer, it can opt
to refuse to transmit that piece. If this happens, the peer is said to be choked.
This can be done for different reasons, but the most common is that by default,
a client will only maintain a default number of simultaneous uploads
(max_uploads) All further requests to the client will be marked as choked.
Usually the default for max_uploads is 4.

25
Bit Torrent Protocol
Seminar Report 2011

Fig 5.3 : Choking by a peer

The peer will then remain choked until an unchoke message is sent.
Another example of when a peer is choked would be when downloading from a
seed, and the seed requires no pieces. To ensure fairness between peers, there is
a system in place which rotates which peers are downloading. This is know as
optimistic unchoking.
5.3.7 Optimistic Unchoking
To ensure that connections with the best data transfer rates are not
favoured, each peer has a reserved 'optimistic unchoke' which is left unchoked
regardless of the current transfer rate. The peer which is assigned to this is
rotated every 30 seconds. This is enough time for the upload / download rates
to reach maximum capacity.
The peers then cooperate using the tit for tat strategy, where the
downloader responds in one period with the same action the uploader used in
the last period.
5.3.8 Communication Between Peers
Peers which are exchanging data are in constant communication.
Connections are symmetrical, and therefore messages can be exchanged in both

26
Bit Torrent Protocol
Seminar Report 2011

directions. These messages are made up of a handshake, followed by a never-


ending stream of length-prefixed messages.

5.3.9 Handshaking
Handshaking is performed as follows:
1. The handshake starts with character 19 (base 10) followed by the string
'BitTorrent Protocol'.
2. A 20 byte SHA1 hash of the bencoded info value from the metainfo is
then sent. If this does not match between peers the connection is closed.
3. A 20 byte peer id is sent which is then used in tracker requests and
included in peer requests. If the peer id does not match the one expected,
the connection is closed.

5.3.10 Message Stream


This constant stream of messages allows all peers in the swarm to send
data, and control interactions with other peers.

Prefi Additional
Message Structure
x Information

Fixed length,
no payload.
This enables
0 choke <len=0001><id=0> a peer to
block another
peers request
for data.

27
Bit Torrent Protocol
Seminar Report 2011

Fixed length,
no payload.
Unblock
peer, and if
1 unchoke <len=0001><id=1> they are still
interested in
the data,
upload will
begin.

Fixed length,
no payload.
A user is
2 interested <len=0001><id=2> interested if a
peer has the
data they
require.

Fixed length,
no payload.
The peer
not
3 <len=0001><id=3> does not
interested
have any
data
required.

Fixed length.
Payload is
the zero-
based index
4 have <len=0005><id=4><piece index> of the piece.
Details the
pieces that
peer
currently has.

28
Bit Torrent Protocol
Seminar Report 2011

Sent
immediately
after
handshaking.
Optional, and
only sent if
client has
pieces.
Variable
5 bitfield <len=0001+X><id=5><bitfield>
length, X is
the length of
bitfield.
Payload
represents
pieces that
have been
successfully
downloaded.

Fixed length,
used to
request a
block of
pieces. The
payload
contains
6 request <len=0013><id=6><index><begin><length>
integer
values
specifying
the index,
begin
location and
length.

7 piece <len=0009+X><id=7><index><begin><block> Sent together


with request

29
Bit Torrent Protocol
Seminar Report 2011

messages.
Fixed length,
X is the
length of the
block. The
payload
contains
integer
values
specifying
the index,
begin
location and
length.

Fixed length,
used to
cancel block
requests.
payload is
8 cancel <len=13><id=8><index><begin><length> the same as
‘request’.
Typically
used during
‘end game’
mode.

A peer will be 'interested' in data if there is a peer which has the required
pieces. If the peer which has this data is not choked, then data will be
transferred. After handshaking, by default, connections start out as choked, and
not interested.
5.4 Data

30
Bit Torrent Protocol
Seminar Report 2011

BitTorrent is very versatile, and can be used to transfer a single file, of


multiple files of any type, contained within any number of directories. File
sizes can vary hugely, from kilobytes to hundreds of gigabytes.
5.4.1 Piece Size
Data is split into smaller pieces which sent between peers using the
bittorrent protocol. These pieces are of a fixed size, which enables the tracker to
keep tabs on who has which pieces of data. This also breaks the file into
verifiable pieces, each piece can then be assigned a hash code, which can be
checked by the downloader for data integrity. These hashes are stored as part of
the 'metinfo file' which is discussed in the next section.
The size of the pieces remains constant throughout all files in the torrent
except for the final piece which is irregular. The piece size a torrent is allocated
depends on the amount of data. Piece sizes which are too large will cause
inefficiency when downloading (larger risk of data corruption in larger pieces
due to fewer integrity checks), whereas if the piece sizes are too small, more
hash checks will need to be run.
As the number of pieces increase, more hash codes need to be stored in
the metainfo file. Therefore, as a rule of thumb, pieces should be selected so
that the metainfo file is no larger than 50 - 75kb. The main reason for this is to
limit the amount of hosting storage and bandwidth needed by indexing servers.
The most common piece sizes are 256kb, 512kb and 1mb. The number of
pieces is therefore: total length / piece size. Pieces may overlap file boundaries.
For example, a 1.4Mb file could be split into the following pieces. This
shows
5 * 256kb pieces, and a final piece of 120kb.

31
Bit Torrent Protocol
Seminar Report 2011

Fig 5.4 : Pieces of a file

5.5 BitTorrent Clients


A BitTorrent client is an executable program which implements the
BitTorrent protocol. It runs together with the operating system on a users
machine, and handles interactions with the tracker and peers. The client is sits
on the operating system and is responsible for controlling the reading / writing
of files, opening sockets etc.
A metainfo file must be opened by the client to start partaking in a
torrent. Once the file is read, the necessary data is extracted, and a socket must
be opened to contact the tracker. BitTorrent clients use TCP ports 6881-6999.
To find an available port, the client will start at the lowest port, and work
upwards until it finds one it can use. This means the client will only use one
port, and opening another BitTorrent client will use another port. A client can
handle multiple torrents running concurrently.
Clients come in many flavours, and can range from basic applications
with few features to very advanced, customisable ones. For example, some
advanced features are metainfo file wizards and inbuilt trackers. These
additional features means different clients behave very differently, and may use
multiple ports, depending on the number of processes it is running. As all
applications implement the same protocol, there is no incompatibility issues,
however because of various tweaks and improvements between clients, a peer
may experience better performance from peers running the same client.

5.6 Sub Protocols :

32
Bit Torrent Protocol
Seminar Report 2011

BitTorrent can be described in terms of two sub-protocols: one which


describes interactions between the tracker and all clients, and one which
describes all client-to-client interactions.

5.6.1 THP: Tracker HTTP Protocol


The tracker protocol is implemented on top of HTTP/HTTPS. This
means that the machine running the tracker runs a HTPP or HTTPS server, and
has the behaviour described below:

1. The client sends a GET request to the tracker URL, with certain CGI
variables and
values added to the URL. This is done in the standard way, i.e., if the base URL
is
“http://some.url.com/announce”, the full URL would be of this form:
“http://some.url.com/announce?var1=value1&var2=value2&var3=value3”.
2. The tracker responds with a “text/plain” document, containing a bencoded
dictionary.
This dictionary has all the information required for the client.
3. The client then sends re-requests, either on regular intervals, or when an
event occurs,
and the tracker responds.

The CGI variables and values added to the base URL by the client
sending a GET request are:

 info_hash: The 20 byte SHA1 hash calculated from whatever value the

info key maps

33
Bit Torrent Protocol
Seminar Report 2011

to in the metainfo file.


 peer_id: A 20 character long id of the downloading client, random

generated at start
of every download. There is no formal definition on how to generate this
id, but some
client applications have adapted some semiformal standards on how to
generate this
id.
 ip: This is an optional variable, giving the IP address of the client. This

can usually be
extracted from the TCP connection, but this field is useful if the client
and tracker are
on the same machine, or behind the same NAT gateway. In both cases,
the tracker
then might publish an unroutable IP address to the client.
 port: The port number that the client is listening on. This is usually in

the range 6881-


6889.
 uploaded: The amount of data uploaded so far by the client. There is no

official definition on the unit, but generally bytes are used


 left: How much the user has left for the download to be complete, in

bytes.
 event: An optional variable, corresponding to one of four possibilities:

• started: Sent when the client starts the download


• stopped: Sent when the client stops downloading
• completed: Sent when the download is complete. If the download
is complete

34
Bit Torrent Protocol
Seminar Report 2011

at start up, this message should not be sent.


• empty: Has the same effect as if the event key is nonexistent. In
either case, the message in question is one of the messages sent
with regular intervals.

There are some optional variables that can be sent along with the GET
request that are not specified in the official description of the protocol, but are
implemented by some tracker
servers:
 numwant: The number of peers the client wants in the response.

 key: An identification key that is not published to other peers. peer_id is

public, and
is thus useless as authorization. key is used if the peer changes IP
number to prove it’s
identity to the tracker.
 trackerid: If a tracker previously gave its trackerid, this should be given

here.

As mentioned earlier, the response is a “text/plain” response with a


bencoded dictionary. This dictionary contains the following keys:
 failure reason: If this key is present, no other keys are included. The

value mapped to
this key is a human readable string with the reason to why the
connection failed.
 interval: The number of seconds that the client should wait between

regular
rerequests.

35
Bit Torrent Protocol
Seminar Report 2011

 peers: Maps to a list of dictionaries, that each represent a peer, where

each dictionary
has the keys:
• peer_id: The id of the peer in question. The tracker obtained this
by the
peer_id variable in the GET request sent to the tracker.
• ip: The address of the peer, either the IP address or the DNS
domain name.
• port: The port number that the peer listens on.
These are the keys required by the official protocol specification, but
here as well there are optional extensions:
 min interval: If present, the client must do rereqests more often than this.

 warning message: Has the same information as failure reason, but the

other keys in
the dictionary are present.
 tracker id: A string identificating the tracker. A client should resend it in

the
trackerid variable to the tracker.
 complete: This is the number of peers that have the complete file

available for upload.

 incomplete: The number of peers that not have the complete file yet.

5.6.2 PWP: Peer Wire Protocol


The peer wire (peer to peer) protocol runs over TCP. Message passing is
symmetric, i.e. messages are the same sent in both directions. When a client
wants to initiate a connection, it sets up the TCP connection and sends a

36
Bit Torrent Protocol
Seminar Report 2011

handshake message to the other peer. If the message is acceptable, the receiving
side sends a handshake message back. If the initiator accepts this handshake,
message passing can initiate, and continues indefinitely. All integers are
encoded as four byte big-endian, except the first length prefix in the handshake.

Handshake message
The handshake message consists of five parts:
 A single byte, containing the decimal value 19. This is the length of the
character
string following this byte.
 A character string “BitTorrent protocol”, which describes the protocol.
Newer
protocols should follow this convention to facilitate easy identification
of protocols.
 Eight reserved bytes for further extension of the protocol. All bytes are
zero in current
implementations.
 A 20 byte SHA1 hash of the value mapping to the info key in the torrent

file. This is
the same hash sent to the tracker in the info_hash variable.
 The 20 byte character string representing the peer id. This is the same

value sent to the tracker.


If a peer is the first recipient to a handshake, and the info_hash doesn’t
match any torrent it is serving, it should break the connection. If the initiator of
the connection receives a handshake where the peer id doesn’t match with the
id received from the tracker, the connection should be dropped. Each peer
needs to keep the state of each connection. The state consists of two values,
interested and choking. A peer can be either interested or not in another peer,

37
Bit Torrent Protocol
Seminar Report 2011

and either choke or not choke the other peer. Choking means that no requests
will be answered, and interested means that the peer is interested in
downloading pieces of the file from the other peer.
This means that each peer needs four Boolean values for each
connection to keep track of the state.
• am_interested
• am_choking
• peer_interested
• peer_choking

All connections start out as not interested and choking for both peers.
Clients should keep the am_interested value updated continuously, and report
changes to the other peer. The messages sent after the handshaking are
structured as: [message length as an integer] [single
byte describing message type] [payload] Keep alive messages are sent with
regular intervals, and they are simply a message with length 0, and no type or
payload.
Type 0, 1, 2, 3 are choke, unchoke, interested and not interested
respectively. All of them have length 1 and no payload. These messages simply
describe changes in state.
Type 4 is a have. This message has length = 5, and a payload that is a
single integer, giving the integer index of which piece of the file the peer has
successfully downloaded and verified.
Type 5 is bitfield. This message is only sent directly after handshake. It
contains a bitfield representation of which pieces the peer has. The payload is
of variable length, and consists of a bitmap, where byte 0 corresponds to piece
0-7, byte 1 to piece 8-15 etc. A bit set to 1 represents having the piece. Peers
that have no pieces can neglect to send this message.

38
Bit Torrent Protocol
Seminar Report 2011

Type 6 is a request. The payload consists of three integers, piece index,


begin and length. The piece index decides within which piece the client wants
to download, begin gives the byte offset within the piece, and length gives the
number of bytes the client wants to download. Length is usually a power of
two.
Type 7 is a block. This message follows a request. The payload contains
piece index, length and the data itself that was requested. Type 8 is cancel. This
message has the same payload as request messages, and it is used to cancel
requests made. Peers should continuously update their interested status to
neighbours, so that clients know which peers will begin downloading when
unchoked.

6. Vulnerabilities of BitTorrent

6.1 Attacks on BitTorrent

As we have seen so far, BitTorrent is one of most favoured file transfer


protocol in today’s world. But it has been exposed to various attacks in the
recent past due to the vulnerabilities that are being exploited by the hacker
community. Here are some of the attacks that are commonly seen.

6.1.1 Pollution attack


1. The peers receive the peer list from the tracker.
2. One peer contacts the attacker for a chunk of the file.
3. The attacker sends back a false chunk.
4. This false chunk will fail its hash and will be discarded.

39
Bit Torrent Protocol
Seminar Report 2011

5. Attacker requests all chunks from swarm and wastes their upload
bandwidth.

Pollution attacks have become increasingly popular and have been used
by
anti-piracy groups. In 2005 HBO used pollution attacks to prevent people from
downloading their show Rome.

6.1.2 DDOS attack


DDOS stands for Distributed denial of service. This attack is possible
because
of the fact that BitTorrent Tracker has no mechanism for validating peers. This
means there is no way to trace the culprit in these kind of attacks. Also attacks
of this stature are possible because of the modifications that can be done to the
client software.
1. The attacker downloads a large number of torrent files from a
web server.
2. The attacker parses the torrent files with a modified BitTorrent
client and spoofs his IP address and port number with the victims
as he announces he is joining the swarm.
3. As the tracker receives requests for a list of participating peers
from other clients it sends the victims IP and port number.
4. The peers then attempt to connect to the victim to try and
download a chunk of the file.

6.1.3 Bandwidth Shaping

40
Bit Torrent Protocol
Seminar Report 2011

Many ISPs don’t encourage the use of BitTorrent from their users. This
is because BitTorrent is usually used to transfer large sized files due to
which the traffic over the ISPs increase to a large extent. To avoid such
exploding traffic on their servers many ISPs have started to avoid the
traffic caused by BitTorrent. This can be done by sniffing the packets
that pass through and detecting whether they oblige BitTorrent protocol.
ISPs make use of filters to find out such packets and block them from
passing their servers. This has resulted in many file transfer breakdowns
across the world.

6.2 Solutions
Many of the attacks that BitTorrent suffers have been dealt with and some
measures have been taken to avoid such attacks. Here are a few solutions to the
attacks that were discussed above.

6.2.1 Pollution attack


The peers which perform such attacks are identified by tracing their IPs.
Then, such IPs are blacklisted to avoid further communication with them.
These blacklisted IPs are blocked by denying them connections with other
peers. This is done by using software like Peer Guardian or moBlock, which
download the list of blacklisted IPs from internet.

6.2.2 DDOS attack


The main solution to this kind of attack is to have clients parse the
response from the tracker. In the case where a host (tracker) does not respond to
a peer’s request with a valid BitTorrent protocol message it should be inferred
that this host is not running BitTorrent. The peer should then exclude hat
address from its tracker list, or set a high retry interval for that specific tracker.

41
Bit Torrent Protocol
Seminar Report 2011

Another fix would be for web sites hosting torrents to check and report whether
all trackers are active, or even remove the on-responding trackers from the
tracker list in the torrent. Another measure could be to restrict the size of the
tracker list to reduce the effectiveness of such an attack.

6.2.3 Bandwidth Shaping


There are broadly two approaches followed to counter this type of
attacks. The first method is to encrypt the packets sent by the means of
BitTorrent protocol. By doing this, the filters that sniff packets will not be able
to detect such packets belonging to BitTorrent protocol. This means that the
filters are fooled by the encrypted packets and thus packets can sneak through
such filters. Another approach is to make use of tunnels. Tunnels are dedicated
paths where the filters are avoided by using VPN software which connects to
the unfiltered networks. This results in successfully bypassing the filters and
thus the packets are guaranteed to be transmitted across networks.

7. Conclusion

BitTorrent pioneered mesh-based file distribution that effectively utilizes


all the uplinks of participating nodes. Most followon research used similar
distributed and randomized algorithms for peer and piece selection, but with
different emphasis or twists. This work takes a different approach to the mesh-
based file distribution problem by considering it as a scheduling problem, and
strives to derive an optimal schedule that could minimize the total elapsed time.
By comparing the total elapsed time of BitTorrent and CSFD in a wide variety
of scenarios, we are able to determine how close BitTorrent is to the theoretical
optimum. In addition, the study of applicability of BitTorrent to real-time
media streaming applications, shows that with minor modifications, BitTorrent
can serve as an effective media streaming tool as well.

42
Bit Torrent Protocol
Seminar Report 2011

BitTorrent’s application in this information sharing age is almost


priceless. However,it is still not perfected as it is still prone to malicious attacks
and acts of misuse. Moreover, the lifespan of each torrent is still not
satisfactory, which means that the length of file distribution can only survive
for a limited period of time. Thus, further analysis and a more thorough study
in the protocol will enable one to discover more ways to improve it.

8. References

1. BitTorrent Inc. (2006) http://www.bittorrent.com

2. BitTorrent.Org (2006) http://www.bittorrent.org/protocol.htm

3. Cohen, Bram (2003) Incentives Build Robustness in BitTorrent,

May 22 2003
http://www.bitconjurer.org/BitTorrent/bittorrentecon.pdf
4. Cachelogic, BitTorrent bandwidth usage
http://www.cachelogic.com/research/2005_slide06.php
5. Information on BitTorrent Protocol

en.wikipedia.org/wiki/BitTorrent_(protocol)
6. BitTorrent FAQ: http://btfaq.com

43
Bit Torrent Protocol
Seminar Report 2011

7. BitTorrent Specifications

http://wiki.theory.org/BitTorrentSpecification
8. Other Information http://www.dessent.net/btfaq/#compare

44

Vous aimerez peut-être aussi