Vous êtes sur la page 1sur 30

Transport Layer Identification of P2P Traffic

T. Karagiannis, A. Broido, M. Faloutsos, K. Claffy

Outline
Introduction Related work Payload analysis & Limitations Non-payload identification Experiments & Evaluation P2P traffic trends Conclusions

Characters of P2P Traffic


Traffic volume grows rapidly Frequently upgrades & emergence of new protocols Disguise the traffic to circumvent firewalls & legal issues
Non-standard, proprietary protocols (poorly documented) Operate on arbitrary port numbers Support payload encryption

Identification Methodology
Examining packet payload
Signature-based methodology Limitations

Identifying at transport layer


Based on flow patterns & P2P behaviors Advantages

Contributions
Develop a methodology for P2P traffic profiling by identifying flow patterns and behavior characteristics

Evaluate the effectiveness by comparing with payload analysis Convince the growing of P2P traffic by analyzing backbone traces

Previous Work
Detailed characterization of a small subset of P2P protocols & networks Properties of topology, bandwidth, caching & availability, etc. Signature-based traffic identification Traffic estimation of P2P applications with fixed ports

Payload Analysis

Payload Analysis
M1: Flag a flow with a src/dst port number matching one of the well-known port numbers. M2: Flag a flow as P2P if the 16-byte payload of any packet matches the signatures , else flag it as non-P2P.
A loose lower bound on P2P volume

M3: Hash the {src, dst} ip pair of a flow flagged as P2P into a table. Flag the flows containing an IP address in the table as possible P2P even if no payload matches.

Limitations
Captured payload size
Only first 16 bytes of payload Only 4 bytes in older traces

HTTP requests Encryption Other P2P protocols Unidirectional traces

Non-payload Identification
Two main heuristics:
{src, dst} IP pairs that use both TCP and UDP to transfer data The behavior of peers by studying connection characteristics of {IP, port} pairs

High-level description
Data processing
Build the flow table Collect information on various characteristics

Identification of potential P2P pairs


Based on the two P2P heuristics

Eliminate false positives


By other heuristics of non-P2P traffic

TCP/UDP Heuristic
Concurrent usage of both TCP & UDP is typical for many P2P protocols Look for {src, dst} IP pairs that use both TCP & UDP protocols to identify P2P hosts Other protocols that also use TCP & UDP concurrently
DNS, NETBIOS, IRC, gaming, streaming Fixed well-known ports

TCP/UDP Heuristic

If a {src, dst} IP pair concurrently uses both TCP and UDP, we consider flows between this pair P2P so long as the src or dst ports are not in the set in Table 3

Connection Pattern Heuristic


P2P: for a {IP, port} pair, N(distinct connected ports) = N(distinct connected IPs)

Web: for {w, 80} pair, N(distinct connected ports) N(distinct connected IPs) while a host initiates more than one concurrent connection for parallel downloading

False positives
Some heuristics for decreasing false positives
Mail server DNS Gameing Malware Others

Mail Server
Behavior resembles {IP, port} heuristic Examine the flows with port number 25, 110, 113

DNS
Concurrently use TCP & UDP at port 53 For flows that (src-port = dst-port) < 501, both src & dst {IP, port} pairs are considered non-P2P

Gaming & Malware


Many flows to different IPs/ports, carrying the same-sized packet

Gaming & Malware

Other Heuristics
Scans
Count the number of {IP, port} with specific IP to eliminate port scans

One packet pairs


Remove one packet flows

Other Heuristics
Msn messenger server
Port 1863 3 distinct dst IPs within the same prefix

Port history
Examine the set of ports connected to an {IP, port} pair Reject if all ports reflect well-known service

Final Algorithm
P2PIP: IPs classified as P2P by TCP/UDP heuristic P2PPairs: {IP, port} pairs classified as P2P by {IP, port} pair heuristic Rejected: rejected pairs MailServers: rejected IPs IPPort: {IP, port} pairs not in MailServers or Rejected
IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair Avg_pktssizesSet: distinct average packet sizes Transfer_sizesSet: distinct tranferred flow sizes

Final Algorithm

Fraction of Identified P2P Traffic

False Positives

Robustness

Pros & Cons


Pros
Privacy issues Anonymization of IP addresses Storage overhead Processing overhead Ability to detect unknown protocols Overcome encryption

Cons
Disability in analyzing specific protocol

P2P Traffic Trends

Conclusions
Non-payload identification methodology
Ability to identify unknown protocols Miss 5% flows comparing with payload analysis 8%~12% false positives

Challenge the claims of P2P traffics decline

Thanks !

Vous aimerez peut-être aussi