Académique Documents
Professionnel Documents
Culture Documents
Outline
Introduction Related work Payload analysis & Limitations Non-payload identification Experiments & Evaluation P2P traffic trends Conclusions
Identification Methodology
Examining packet payload
Signature-based methodology Limitations
Contributions
Develop a methodology for P2P traffic profiling by identifying flow patterns and behavior characteristics
Evaluate the effectiveness by comparing with payload analysis Convince the growing of P2P traffic by analyzing backbone traces
Previous Work
Detailed characterization of a small subset of P2P protocols & networks Properties of topology, bandwidth, caching & availability, etc. Signature-based traffic identification Traffic estimation of P2P applications with fixed ports
Payload Analysis
Payload Analysis
M1: Flag a flow with a src/dst port number matching one of the well-known port numbers. M2: Flag a flow as P2P if the 16-byte payload of any packet matches the signatures , else flag it as non-P2P.
A loose lower bound on P2P volume
M3: Hash the {src, dst} ip pair of a flow flagged as P2P into a table. Flag the flows containing an IP address in the table as possible P2P even if no payload matches.
Limitations
Captured payload size
Only first 16 bytes of payload Only 4 bytes in older traces
Non-payload Identification
Two main heuristics:
{src, dst} IP pairs that use both TCP and UDP to transfer data The behavior of peers by studying connection characteristics of {IP, port} pairs
High-level description
Data processing
Build the flow table Collect information on various characteristics
TCP/UDP Heuristic
Concurrent usage of both TCP & UDP is typical for many P2P protocols Look for {src, dst} IP pairs that use both TCP & UDP protocols to identify P2P hosts Other protocols that also use TCP & UDP concurrently
DNS, NETBIOS, IRC, gaming, streaming Fixed well-known ports
TCP/UDP Heuristic
If a {src, dst} IP pair concurrently uses both TCP and UDP, we consider flows between this pair P2P so long as the src or dst ports are not in the set in Table 3
Web: for {w, 80} pair, N(distinct connected ports) N(distinct connected IPs) while a host initiates more than one concurrent connection for parallel downloading
False positives
Some heuristics for decreasing false positives
Mail server DNS Gameing Malware Others
Mail Server
Behavior resembles {IP, port} heuristic Examine the flows with port number 25, 110, 113
DNS
Concurrently use TCP & UDP at port 53 For flows that (src-port = dst-port) < 501, both src & dst {IP, port} pairs are considered non-P2P
Other Heuristics
Scans
Count the number of {IP, port} with specific IP to eliminate port scans
Other Heuristics
Msn messenger server
Port 1863 3 distinct dst IPs within the same prefix
Port history
Examine the set of ports connected to an {IP, port} pair Reject if all ports reflect well-known service
Final Algorithm
P2PIP: IPs classified as P2P by TCP/UDP heuristic P2PPairs: {IP, port} pairs classified as P2P by {IP, port} pair heuristic Rejected: rejected pairs MailServers: rejected IPs IPPort: {IP, port} pairs not in MailServers or Rejected
IPSet: distinct IPs with specific pair PortSet: distinct ports for specific pair Avg_pktssizesSet: distinct average packet sizes Transfer_sizesSet: distinct tranferred flow sizes
Final Algorithm
False Positives
Robustness
Cons
Disability in analyzing specific protocol
Conclusions
Non-payload identification methodology
Ability to identify unknown protocols Miss 5% flows comparing with payload analysis 8%~12% false positives
Thanks !