CS 4700 / CS 5700 Network Fundamentals Lecture 18: P2P and BitTorrent (Lars Ulrich Hates You) Revised 8/5/13 Traditional Internet Services Model 2 Client-server Many clients, 1 (or more) server(s) Web servers, DNS, file downloads, video streaming Problems Scalability: how many users can a server support? What happens when user traffic overload servers? Limited resources (bandwidth, CPU, storage)
Reliability: if # of servers is small, what happens when they break, fail, get disconnected, are mismanaged by humans? Efficiency: if your users are spread across the entire globe, how do you make sure you answer their requests quickly? The Alternative: Peer-to-Peer 3 A simple idea Users bring their own resources to the table A cooperative model: clients = peers = servers The benefits Scalability: # of servers grows with users BYOR:
bring your own resources (storage, CPU, B/W) Reliability: load spread across many peers Probability of them all failing is very low Efficiency: peers are distributed Peers can try and get service from nearby peers The Peer-to-Peer Challenge 4 What are the key components for leveraging P2P? New reliability challenges
Communication: how do peers talk to each other Service/data location: how do peers know who to talk to Network reachability, i.e. dealing with NATs Dealing with churn, i.e. short peer uptimes What about security? Malicious peers and cheating The Sybil attack 5 Outline Unstructured P2P BitTorrent Basics TP: Micro Transport Protocol
Cheating on BitTorrent Centralized Approach 6 The original: Napster Centralized index server(s) 1999-2001 Shawn Fanning, Sean Parker Invented at NEU Specialized in MP3s (but not for long) Supported all queries What caused its downfall?
Not scalable Centralization of liability Napster Architecture 7 Napster Central Server B and C have the file Log-in, uploadfor list of Search files Gangnam Style A B G E C
F D Centralized != Scalable? 8 Another centralized protocol: Maze Highly active network in China / Asia Over 2 million users, more than 13 TB transferred/ day Central index servers run out of PKU Survives because RIAA/MPAA doesnt exist in China Why is this interesting? Shows centralized systems can work Of
course have to be smart about it Central servers see everything Quite useful for research / measurement studies Maze Architecture 9 Incentive system Encourage people to upload Assess the trustworthyness of files Maze Central Server B G
E C A Traffic Logs Who downloaded Who uploaded How much data F D Colluding Users 10 Why and How of collusion The Sybil Attack
Collusion gets you points in Maze (incentive system) Spawn fake users/identities for free Collusion detectors (ICDCS 2007) Duplicate traffic across links Pair-wise mutual upload behavior Peer-to-IP ratio of clients Traffic concentration Duplicate transfer graph: 100 links w/ highest duplicate Unstructured P2P Applications 11 Centralized systems have single points of failure Response: fully unstructured P2P
No central server, peers only connect to each other Queries sent as controlled flood Later systems are hierarchical for performance reasons Limitations Bootstrapping: how to join without central knowledge? Floods of traffic = high network overhead Probabilistic: can only search a small portion of the system Uncommon files are easily lost Gnutella 12 First massively popular unstructured P2P application
Justin Frankel, Nullsoft, 2000 AOL was not happy at all Original design: flat network Join via bootstrap node Connect to random set of existing hosts Resolve queries by localized flooding Time to live fields limit hops Recent incarnations use hierarchical structure Problems High bandwidth costs in control messages Flood of queries took up all avail b/w for dialup users File Search via Flooding in Gnutella 13 What if the
file is rare or far away? Redundanc y Traffic Overhead Peer Lifetimes 14 Study of host uptime and application uptime (MMCN 2002) 17,000+ Gnutella peers for 60 hours 7,000 Napster peers for 25 hours Percentage of Hosts Host Uptime (Minutes) Resilience to Failures and Attacks 15
Previous studies (Barabasi) show interesting dichotomy of resilience for scale-free networks Resilient to random failures, but not attacks Heres what it looks like for Gnutella 1771 Peers in Feb, 2001 After random top 4% of peers are removed After 30% of peers removed Hierarchical P2P Networks 16 FastTrack network (Kazaa, Grokster, Morpheus, Gnutella++) supernode
Improves scalability Limits flooding Still no guarantees of performance What if a supernode leaves the Kazaa 17 Very popular from its inception Hierarchical flooding helps improve scale Large shift to broadband helped quite a bit as well Based in Europe, more relaxed copyright laws New problem: poison attacks Mainly used by RIAA-like organizations
Create many Sybils that distribute popular content Files are corrupted, truncated, scrambled In some cases, audio/video about copyright infringement Quite effective in dissuading downloaders Data Poisoning on Kazaa 18 Why is poisoning effective? (IPTPS 2006) People dont check their songs! Apparently not easy to detect file pollution! Metadata Down.Quality Incomplete Noise Shuffle
Distribution of Poisoned Files 19 Why are poisoned files so widely distributed? Slackness, even when users are asked to check files Skype: P2P VoIP 20 P2P client supporting VoIP, video, and text based conversation, buddy lists, etc. Each user registers with a central server Based on Kazaa network (FastTrack)
Overlay P2P network consisting of ordinary and Super Nodes (SN) Ordinary node connects to network through a Super Node User information propagated in a decentralized fashion Uses a variant of STUN to identify the type of NAT and firewall Whats New About Skype 21 MSN, Yahoo, GoogleTalk all provide similar functionality But generally rely on centralized servers So why peer-to-peer for Skype? One reason: cost If redirect VoIP through peers, can leverage geographic distribution i.e. traffic to a phone in Berlin goes to peer in Berlin,
thus becomes a local call Another reason: NAT traversal Choose peers to do P2P rendezvous of NATed clients 22 Outline Unstructured P2P BitTorrent Basics TP: Micro Transport Protocol Cheating on BitTorrent What is BitTorrent 23 Designed for fast, efficient content distribution
Ideal for large files, e.g. movies, DVDs, ISOs, etc. Uses P2P file swarming Not a full fledged P2P system Does not support searching for files File swarms must be located out-of-band Trackers acts a centralized swarm coordinators Fully P2P, trackerless torrents are now possible Insanely popular BitTorrent Overview 24 Tracker
Swarm Seeder Leecher .torrent File 25 Contains all meta-data related to a torrent BitTorrent breaks files into pieces File name(s), sizes Torrent hash: hash of the whole file URL of tracker(s) 64 KB 1 MB per piece .torrent contains the size and SHA-1 hash of each piece
Basically, a .torrent tells you Everything about a given file Where to go to start downloading Torrent Sites 26 Just standard web servers Some also host trackers Many famous ones Allow users to upload .torrent files Search, ratings, comments, etc. Mostly because they host illegal
content Legitimate .torrents Linux distros World of Warcraft patches Torrent Trackers Tracker 27 Really, just a highly specialized webserver BitTorrent protocol is built on top of HTTP Keeps a database of swarms Swarms identified by torrent hash State of each peer in each swarm
IP address, port, peer ID, TTL Status: leeching or seeding Optional: upload/download stats (to track fairness) Returns a random list of peers to new leechers Peer Selection 28 Tracker provides each client with a list of peers Which peers are best? Truthful (not cheating) Fastest bandwidth Option 1: learn dynamically
Try downloading from many peers Keep only the best peers Strategy used by BitTorrent Option 2: use external information E.g. Some torrent clients prefer peers in the Sharing Pieces 29 Initial Seeder 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Leecher Seeder 1 2 3 4 5 6 7 8 Leecher Seeder The Beauty of BitTorrent 30
More leechers = more replicas of pieces More replicas = faster downloads Multiple, redundant sources for each piece Even while downloading, leechers take load off the seed(s) Great for content distribution Cost is shared among the swarm Typical Swarm Behavior 31 Sub-Pieces and Pipelining 32 Each piece is broken into sub-pieces
~16 KB in size TCP Pipelining For performance, you want long lived TCP connections (to get out of slow start) Peers generally request 5 sub-pieces at a time When one finished, immediately request another Dont start a new piece until previous is complete Prioritizes complete pieces Piece Selection 33
Piece download order is critical Worst-case scenario: all leeches have identical pieces Nobody can share anything :( Worst-case scenario: the initial seed disappears If a piece is missing from the swarm, the torrent is broken What is the best strategy for selecting pieces? Trick question It depends on how many pieces you already have Download Phases 34
0% Bootstrap: random selection % Downloaded 100% Steady-state: rarest piece first Initially, you have no pieces to trade Essentially, beg for free pieces at random Ensures that common pieces are saved for last Endgame
Simultaneously request final pieces from multiple peers Cancel connections to slow peers Ensures that final pieces arrive quickly Upload and Download Control 35 How does each peer decide who to trade with? Incentive mechanism Based on tit-for-tat, game theory If you give a piece to me, Ill give a piece to you If you screw me over, you get nothing Two mechanisms: choking and optimistic unchoke
A Bit of Game Theory 36 Iterated prisoners dilemma Very simple game, two players, multiple rounds Maps well to trading pieces in BitTorrent Both players agree: +2 points each One player defects: +5 for defector, +0 to other Both players defect: +0 for each Both peers trade, they both get useful data If both peers do nothing, they both get nothing If one peer defects, he gets a free piece, other peer gets nothing What is the best strategy for this game?
Tit-for-Tat 37 Best general strategy for iterated prisoners dilemma Meaning: Equivalent Retaliation Rules 1. Initially: cooperate 2. If opponent cooperates, cooperate next round 3. If opponent defects, defect next round Roun d Points 1 Cooperat e Cooperat
e +2 / +2 2 Cooperat e Defect +0 / +5 3 Defect Cooperat e +5 / +0 4 Cooperat e Cooperat e
+2 / +2 5 Cooperat Defect +0 / +5 Choking 38 Choke is a temporary refusal to upload Tit-for-tat: choke free riders Cap the number of simultaneous uploads Too many connections congests your network Periodically unchoke to test the network connection Choked
peer might have better bandwidth Optimistic Unchoke 39 Each peer has one optimistic unchoke slot Uploads to one random peer Peer rotates every 30 seconds Reasons for optimistic unchoke Help to bootstrap peers without pieces Discover new peers with fast connections BitTorrent Protocol Fundamentals 40 4
1 2 3 Leecher BitTorrent divides time into rounds Each round, decide who to upload to/download from Rounds are typically 30 seconds Each connection to a peer is controlled by four states Leecher Interested / uninterested do I want a piece from you? Choked / unchoked am I currently downloading from you? Connections are bidirectional You decide interest/choking on each peer
Each peer decides interest/chocking on you Connection States Error states. Most peers are d or D. No need to connect with uninteresting peers. 41 Connection Download control should be D interested and unchoked closed. and unchoked K uninterested d interested and choked
S snubbed (no data received in 60 seconds) F piece(s) failed to hash Upload control u interested and choked U interested and unchoked O optimistic unchoke ? uninterested and unchoked
Connection information More on this More this h used UDP on holelater punching P connection uses TP next week How was this peer located? I incoming connection E/e Using protocol encryption H DHT (distributed hash table)
L local peer discovery (multicast) X peer exchange Upload-Only Mode 42 Once a peer completes a torrent, it becomes a seed No downloads, no tit-for-tat Who to upload to first? BitTorrent policy
Upload to the fastest known peer Why? Faster uploads = more available pieces More available pieces helps the swarm Trackerless Torrents 43 New versions of BitTorrent have the ability to locate swarms without a tracker Based on a P2P overlay Distributed hash table (DHT) Recall: peers located via DHT are given H state More on this next week 44
Outline Unstructured P2P BitTorrent Basics TP: Micro Transport Protocol Cheating on BitTorrent BitTorrent and TCP 45 BitTorrent accounts for 35-70% of all Internet traffic Thus, BitTorrents behavior impacts everyone BitTorrents use of TCP causes problems Long lived, BitTorrent TCP flows are elephants Ramp up past slow start, dominate router queues
Many applications are mice, get trampled by elephants Short lived flows (e.g. HTTP traffic) Delay sensitive apps (i.e. VoIP, SSH, online games) Have you ever tried using SSH while using BitTorrent? Making BitTorrent Play Nice 46 Key issue: long-lived TCP flows are aggressive TCP is constantly probing for more bandwidth TCP induces queuing delay in the network Does BitTorrent really need to be so aggressive?
BitTorrent is not delay sensitive Do you care if your download takes a few minutes longer? BitTorrent is low-priority background traffic You probably want to do other things on the Internet while BitTorrent is downloading Micro Transport Protocol (TP) 47 Designed by BitTorrent, Inc. UDP-based transport protocol Uses LEDBAT principals Duplicates many TCP features
Window based sending, advertised windows Sequence numbers (packet based, not byte based) Reliable, in-order packet delivery Today: widely adopted by BitTorrent clients and open-sourced TP and LEDBAT 48 TP is based on IETF LEDBAT standard (RFC 6817) Low Extra Delay Background Transport Low delay congestion control algorithm Seeks to use all available bandwidth without increasing queuing delay on the path Goal: fast transfer of bulk data in the background
Use all available bandwidth (fast transfer speed) but, do not starve other applications Background data transfer is not delay sensitive Backoff gracefully and give bandwidth to delay sensitive applications LEDBAT Details 49 Delay-based congestion control protocol Constraint: be less aggressive than TCP Similar algorithm to TCP Vegas
Measure one-way delay, reduce rate when delay increases React early to congestion and slow down Do not induce queuing delay in the network LEDBAT is a scavenger cc protocol Scavenge unused bandwidth for file transfer but dont take bandwidth from other flows TP UDP Like TCP flags: SYN=4, Random UDP header, TP Header FIN=1, RST=3, number, gives you DATA=0, STATE=2 uniquely 50 ports identifies each 0 4 (ACK)
8 16 31 connection Destination Port Source Port Checksum Payload Length Connection ID Type Ver. Extension Timestamp (microseconds) Version = Timestamp Difference (microseconds) Like TCP options Advertised Window (bytes) 1 Sequence Number Ack Number Seq. and Ack. Advertised
Many fields are like TCP numbers like window, like TCP new fields areTCP Important the timestamps Timestamps and Delay 51 Timestamps used to measure one-way delay Timestamp: time at which packet was sent Timestamp Difference: sent time received time DATA t0 0 Send at time t0
ACK Received at time t0+100ms t1 100m s Question: why use one-way delay instead of RTT? Sender knows Time difference Queues on Internet paths are not symmetric Delay ondelay one-way = path doesnt impact inserted into path the reverse the forward TP tries to keep one-way delay
~100ms TP Congestion Controller Estimate the baseline delay on CCONTROL_TARGET = 100ms the path 52 base_delay = min([list of time difference samples from the last 2 Is delay below our target (positive minutes]) value), our_delay = last_time_diff_sample base_delay or above our target (negative value) Convert units Time difference Current delay on off_target = CCONTROL_TARGET our_delay from most recent from time to the path above packets ACK the
baseline delay_factorFinally, = off_target / CCONTROL_TARGET adjust the window window_factor = oustanding_packets size (may be + or / max_window adjustment) scaled_gain = MAX_CWND_INCR_PER_RTT * delay_factor * window_factor max_window = max_window + scaled_gain More TP Details 53 Delay-based mechanism replaces slow start and additive increase What if a packet drops?
What if off_target is a large negative number? max_window = max_window * 0.5 (just like TCP) max_window = 1 packet (dont starve the connection) Error handling in TP : Uses RTO like Tahoe to retransmit lost packets Discussion 54 In this case, developing a new transport protocol was (arguably) the right decision BitTorrent generates huge amounts of traffic
Whole Internet benefits if BitTorrent is more friendly However, inventing new protocols is hard TP reimplements most of TCP RTO Early version of TP performed much worse than TCP Lots estimation, Nagles algorithm, etc. of bugs related to packet pacing and sizing Takeaway: develop new transport protocols only if absolutely necessary 55 Outline
Unstructured P2P BitTorrent Basics TP: Micro Transport Protocol Cheating on BitTorrent Incentives to Upload 56 Every round, a BitTorrent client calculates the number of pieces received from each peer Assumption The peers who gave the most will receive pieces in the next round These decisions are made by the unchoker Peers will give as many pieces as possible each round
Based on bandwidth constraints, etc. Can an attacker abuse this assumption? Unchoker Example 57 Round t Round t + 1 13 10 10 10 4 12 10 7 9 15 10
Abusing the Unchocker 58 What if you really want to download from someone? Round t Round t + 1 13 10 10 4 12 7 9 15 20 11 Sendjust a Send lot of enough data, get data,
1stget place 4th place 10 10 10 Sybil Attack 59 Round t 13 10 12 Divide Only resources receive across 10 3 pieces fake peers Round t + 1 10
Implements the come in last strategy Essentially, an unfair unchoker Faster than stock BitTorrent For the Tyrant user Problem with BitTyrant Tragedy of the commons BitTyrant performs well if most peers are honest As more peers use BitTyrant, performace suffers If all users used BitTyrant, torrents wouldnt work at all PropShare Unchoker 61
Goal: modify BitTorrents incentive mechanisms to mitigate come in last and Sybil attacks Levin et al. 2008 Propose PropShare unchoker PropShare clients allocate upload bandwidth proportionally across all peers There is no longer a top four Can you cheat vs. PropShare? PropShare Unchoker 62 Round t Round t + 1 13 13/70 * upload_cap
Round t Round t + 1 13 13/90 10 10/90 4 4/90 12 12/90 7 7/90 9 9/90 15
15/90 20 20/90 Total = 90 PropShare Resiliency to BitTyrant 64 Round t Round t + 1 13 13/81 10 10/81 4 4/81
12 Download always proportional to 12/81 7 upload 7/81 9 No way to game the system 9/81 15 15/81 11 11/81 Total = 81 PropShare Resiliency to Sybils 65 Round t Round t + 1 42 42/42 Total = 42 14 PropShare is Sybil
resistant 14/42 14 14/42 14 14/42 Total = 42 Total Capacity = 42 Unchoker Summary 66 BitTyrant and PropShare are both faster than stock BitTorrent But for different reasons PropShare performs comparably to BitTyrant
PropShare does not suffer from a tragedy of the commons i.e. its safe for all peers to use PropShare Not true for BitTyrant Abusing Optimistic Unchoking 67 So far, assumed peers all have pieces to trade What about peers that have nothing? Thus, all peers are interesting The bootstrap mechanism is supposed to help them Optimistic unchoke: reserve some bandwidth to
give free pieces away (presumably to new peers) BitThief (Locher et al. 2006) Abuses optimistic unchoke, uploads nothing BitThief Details 68 Large-view exploit The swarm is (potentially) huge BitThief client tries to get optimistic unchoke from many, many peers Will only receive one free piece from each Since there is no reciprocal upload
But in aggregate, this is enough to finish download How to deal with this? Enlist the help of peers Have them verify that a given client uploads Encrypted Pieces 69 Seeder 1 2 1 2 Leecher BitThief cant leave, encrypted data is useless 1
2 BitThief Abusing the Endgame 70 Rare pieces are valuable Make you popular, many people want to trade with you More trading partners = faster downloads Selective piece revelation You cant advertise pieces you dont have Peers could detect this
But you can hide information about the pieces you have Why is this useful? Pieces sent at time t impact your popularity at time t+1 Sending common pieces first, monopolize rare pieces Strategic Piece Revelation 71 1 2 1 2 3 4 Leecher 3 4 1 2 3 4 Leecher Conclusions 72
BitTorrent is an extremely efficient tool for content distribution Strong incentive system based on game theory Most popular file sharing client since 2001 More active users than YouTube and Facebook combined However, BitTorrent is a large system with many different mechanisms Ample room to modify the client, alter behavior Cheating can happen, not all strategies are fair
Using quick tens and ones to subtract single-digit numbers from multiples of 10 within 100 (such as 20, 30, 40, 50, etc.) Using number bonds to subtract from multiples of 10. There will be a Topic Quiz NEXT Monday, September...
Fulton State Hospital (FSH) is a forensic psychiatric inpatient facility. It is a government funded, not-for-profit organization. Current daily census is about 340, but capacity is 376 (Missouri Department of Mental Health (DMH), 2014).
BTEC Level 2 Diploma What is a BTEC Level 2 Diploma? BTEC is a qualification, which is entirely assessed through coursework and practical based learning (NO EXAMS!). Where possible learning will take place outside the classroom! This is most suitable...
The unattached gingiva is usually about 1 mm wide and forms the soft wall of the gingival sulcus. Gingivae Interdental gingiva is known as the gingival papilla. Gingival groove is a shallow groove that runs parallel to the margin of...
The Nuts and Bolts of First-Principles Simulation Durham, 6th-13th December 2001 Overview Start here Basis set - plane waves, the discerning choice! Grids, grids everywhere… Unit cells - when is a crystal not a crystal…? K-points and symmetry The Starting...
Choose 5 poems from the anthology that you have a lot to say about. Themes. ... Dharker was born in Pakistan but raised in Glasgow and so has the insight that living between two cultures might bring. ... Shelley writes...
Ready to download the document? Go ahead and hit continue!